Skip to content

extended connectivity fingerprint#54

Open
fgrunewald wants to merge 1 commit intomasterfrom
cheminfo
Open

extended connectivity fingerprint#54
fgrunewald wants to merge 1 commit intomasterfrom
cheminfo

Conversation

@fgrunewald
Copy link
Copy Markdown
Collaborator

Implementation of a fingerprint following a tutorial from here: https://chemicbook.com/2021/03/25/a-beginners-guide-for-understanding-extended-connectivity-fingerprints.html

Tests and comparison to rdkit is still missing.

Copy link
Copy Markdown
Owner

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I do have some comments, but you probably saw those coming ;)
Also have a look at https://github.com/marrink-lab/cartographer/blob/master/ecfp.py for inspiration

Comment on lines +44 to +45
nonzero = np.array(feature_list, dtype=int) % size
fp_array[nonzero] = 1
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite see how this affects the length of the feature_list

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand the question.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goals is to fold feature_list into a binary array of a fixed size, no? Maybe I'm just unclear what feature_list is

subgraph = set(graph.nodes[node]['subgraph'])
feat_list_iter = [(idx, graph.nodes[node][label])]
for neigh in graph.neighbors(node):
order = graph.edges[(node, neigh)].get('order', 1)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'order' needs to be a function argument. Maybe even a list of bond-features to consider. Maybe create initial bond-invariants as well

feat_list_iter.append((order, graph.nodes[neigh][label]))
subgraph = subgraph | graph.nodes[neigh]['subgraph']
graph.nodes[node]['subgraph'] = subgraph
feat_list_iter = sorted(feat_list_iter)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Shouldn't the central node be the first one, and then sort only the neighbours? Actually, you now sort by bond order...

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, better idea! Don't make a list of tuples, but a set of tuples. Hashing a set will deal with the fact that the order doesn't matter

subgraph = subgraph | graph.nodes[neigh]['subgraph']
graph.nodes[node]['subgraph'] = subgraph
feat_list_iter = sorted(feat_list_iter)
feat_tuple_iter = tuple(feat for sub_feat in feat_list_iter for feat in sub_feat)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wut?

Comment on lines +81 to +85
new_hash = hash(feat_tuple_iter)
graph.nodes[node]['fp_invariant'] = new_hash
if frozenset(subgraph) not in substructures:
feature_list.append(new_hash)
substructures.add(frozenset(subgraph))
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't use the substructures, do you?
Instead of just doing new_hash = hash(feat_tuple_iter), first try to lookup the hash for this fragment in your substructures. Substructures should then be a dict of frozenset(subgraph) -> hash

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do because the hashes will be different depending on which atom you start even if they generate the same substructure

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I missed that you append to feature_list here, only if your subgraph is new

feat_list.append(graph.nodes[node][feat])
_hash = hash(tuple(feat_list))
graph.nodes[node]['fp_invariant'] = _hash
graph.nodes[node]['subgraph'] = frozenset(graph.edges(node))
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just store node indices. It's going to be an induced subgraph anyway

Comment on lines +123 to +141
def annotate_rings(graph):
"""
Find all nodes that are part of a ring and
set the ring attribute to True otherwise
it is set to False.

Parameters
----------
graph: :class:`networkx.Graph`

Returns
-------
:class:`networkx.Graph`
"""
nx.set_node_attributes(graph, 0, 'ring')
for cycle in nx.cycle_basis(graph):
for node in cycle:
graph.nodes[node]['ring'] = 1
return graph
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the attribute name an argument

Comment on lines +110 to +121
def _reduced_valency(graph):
"""
Compute the reduced valency for fingerprint calculations.
The reduced valency is given as the valency minus the
hydrogen atoms.
"""
red_valency = defaultdict(int)
for e1, e2, order in graph.edges(data='order'):
red_valency[e1] += order
red_valency[e2] += order
nx.set_node_attributes(graph, red_valency, 'red_valency')
return graph
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make red_valency and order arguments

for node in graph.nodes:
element = graph.nodes[node]['element']
graph.nodes[node]['atomic_num'] = PTE[element.capitalize()]['AtomicNumber']
graph.nodes[node]['mass'] = PTE[element.capitalize()]['AtomicMass']
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMILES can contain isotope information

"""
graph = _prepare_atomic_molecule(graph)
_assign_invariants(graph, node_feats)
feature_list = _aggregate_features(graph, cycles=4)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterations needs to be an argument

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants