Conversation
pckroon
left a comment
There was a problem hiding this comment.
Very nice! I do have some comments, but you probably saw those coming ;)
Also have a look at https://github.com/marrink-lab/cartographer/blob/master/ecfp.py for inspiration
| nonzero = np.array(feature_list, dtype=int) % size | ||
| fp_array[nonzero] = 1 |
There was a problem hiding this comment.
I don't quite see how this affects the length of the feature_list
There was a problem hiding this comment.
not sure I understand the question.
There was a problem hiding this comment.
The goals is to fold feature_list into a binary array of a fixed size, no? Maybe I'm just unclear what feature_list is
| subgraph = set(graph.nodes[node]['subgraph']) | ||
| feat_list_iter = [(idx, graph.nodes[node][label])] | ||
| for neigh in graph.neighbors(node): | ||
| order = graph.edges[(node, neigh)].get('order', 1) |
There was a problem hiding this comment.
'order' needs to be a function argument. Maybe even a list of bond-features to consider. Maybe create initial bond-invariants as well
| feat_list_iter.append((order, graph.nodes[neigh][label])) | ||
| subgraph = subgraph | graph.nodes[neigh]['subgraph'] | ||
| graph.nodes[node]['subgraph'] = subgraph | ||
| feat_list_iter = sorted(feat_list_iter) |
There was a problem hiding this comment.
Is this correct? Shouldn't the central node be the first one, and then sort only the neighbours? Actually, you now sort by bond order...
There was a problem hiding this comment.
Oh, better idea! Don't make a list of tuples, but a set of tuples. Hashing a set will deal with the fact that the order doesn't matter
| subgraph = subgraph | graph.nodes[neigh]['subgraph'] | ||
| graph.nodes[node]['subgraph'] = subgraph | ||
| feat_list_iter = sorted(feat_list_iter) | ||
| feat_tuple_iter = tuple(feat for sub_feat in feat_list_iter for feat in sub_feat) |
| new_hash = hash(feat_tuple_iter) | ||
| graph.nodes[node]['fp_invariant'] = new_hash | ||
| if frozenset(subgraph) not in substructures: | ||
| feature_list.append(new_hash) | ||
| substructures.add(frozenset(subgraph)) |
There was a problem hiding this comment.
You don't use the substructures, do you?
Instead of just doing new_hash = hash(feat_tuple_iter), first try to lookup the hash for this fragment in your substructures. Substructures should then be a dict of frozenset(subgraph) -> hash
There was a problem hiding this comment.
You do because the hashes will be different depending on which atom you start even if they generate the same substructure
There was a problem hiding this comment.
Right. I missed that you append to feature_list here, only if your subgraph is new
| feat_list.append(graph.nodes[node][feat]) | ||
| _hash = hash(tuple(feat_list)) | ||
| graph.nodes[node]['fp_invariant'] = _hash | ||
| graph.nodes[node]['subgraph'] = frozenset(graph.edges(node)) |
There was a problem hiding this comment.
I would just store node indices. It's going to be an induced subgraph anyway
| def annotate_rings(graph): | ||
| """ | ||
| Find all nodes that are part of a ring and | ||
| set the ring attribute to True otherwise | ||
| it is set to False. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| graph: :class:`networkx.Graph` | ||
|
|
||
| Returns | ||
| ------- | ||
| :class:`networkx.Graph` | ||
| """ | ||
| nx.set_node_attributes(graph, 0, 'ring') | ||
| for cycle in nx.cycle_basis(graph): | ||
| for node in cycle: | ||
| graph.nodes[node]['ring'] = 1 | ||
| return graph |
There was a problem hiding this comment.
Make the attribute name an argument
| def _reduced_valency(graph): | ||
| """ | ||
| Compute the reduced valency for fingerprint calculations. | ||
| The reduced valency is given as the valency minus the | ||
| hydrogen atoms. | ||
| """ | ||
| red_valency = defaultdict(int) | ||
| for e1, e2, order in graph.edges(data='order'): | ||
| red_valency[e1] += order | ||
| red_valency[e2] += order | ||
| nx.set_node_attributes(graph, red_valency, 'red_valency') | ||
| return graph |
There was a problem hiding this comment.
Make red_valency and order arguments
| for node in graph.nodes: | ||
| element = graph.nodes[node]['element'] | ||
| graph.nodes[node]['atomic_num'] = PTE[element.capitalize()]['AtomicNumber'] | ||
| graph.nodes[node]['mass'] = PTE[element.capitalize()]['AtomicMass'] |
There was a problem hiding this comment.
SMILES can contain isotope information
| """ | ||
| graph = _prepare_atomic_molecule(graph) | ||
| _assign_invariants(graph, node_feats) | ||
| feature_list = _aggregate_features(graph, cycles=4) |
There was a problem hiding this comment.
iterations needs to be an argument
Implementation of a fingerprint following a tutorial from here: https://chemicbook.com/2021/03/25/a-beginners-guide-for-understanding-extended-connectivity-fingerprints.html
Tests and comparison to rdkit is still missing.