Skip to content

[Question] Does DrugEx v3 support R-group enumeration with specified attachment points? #25

@guosijia

Description

@guosijia

Question

We are trying to perform R-group enumeration using DrugEx v3 and would like to clarify whether DrugEx supports specifying attachment points for fragment-based molecule generation.

Background

We understand that DrugEx v3 can generate molecules based on input fragments, but we're uncertain about its capability to handle specific attachment points for R-group enumeration tasks.

Current Approach and Issues

Approach 1: Using fragments with dummy atoms ([*:1])

Input scaffold:

O=C(CCC(F)(F)F)NC(c1cnn2cc(C(C3CCC(F)(F)CC3)[*:1])nc2c1)C1CC1

Command used:

python -m drugex.dataset \
    -b ${base_dir} \
    -i scaffold_with_star.tsv \
    -mc SMILES \
    -o rgroup_data \
    -mt graph \
    -s  

Issue: The generated graph data file (rgroup_data_graph.txt) contains only column headers but no actual data (empty matrix), preventing further molecule generation.

File content example:

C0 C1 C2 C3 C4 ... C399
(no data rows)

Approach 2: Removing dummy atoms from fragments

Modified scaffold:

O=C(CCC(F)(F)F)NC(c1cnn2cc(C(C3CCC(F)(F)CC3))nc2c1)C1CC1

Issue: While this approach generates valid graph data and molecules successfully, most generated molecules do not grow from our intended attachment point. The model seems to modify the scaffold at random positions rather than the specific location where the [*:1] was originally placed.

Questions

  1. Does DrugEx v3 natively support R-group enumeration with specified attachment points?

  2. Is there a correct way to handle dummy atoms ([:1], [:2], etc.) in DrugEx input fragments?

  3. If attachment point specification is not directly supported, what would be the recommended workflow for R-group enumeration tasks?

  4. Are there any plans to support explicit attachment point specification in future versions?

Expected Behavior

We would like to:

  • Input a scaffold with clearly marked attachment points (e.g., [*:1])
  • Generate molecules that grow specifically from these marked positions
  • Maintain the core scaffold structure while only modifying the R-groups at specified locations

Environment

  • DrugEx version: v3.4.5
  • Python version: 3.8+
  • Operating System: Ubuntu

Any guidance or clarification would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions