Skip to content

AOP-tk wrappers for download, chemical identification + matching + normalization, PDF parsing, relationship finding#728

Open
rdurnik wants to merge 54 commits intoRECETOX:masterfrom
rdurnik:new_wrappers
Open

AOP-tk wrappers for download, chemical identification + matching + normalization, PDF parsing, relationship finding#728
rdurnik wants to merge 54 commits intoRECETOX:masterfrom
rdurnik:new_wrappers

Conversation

@rdurnik
Copy link
Copy Markdown
Contributor

@rdurnik rdurnik commented Jan 29, 2026

AOP-tk wrappers for download, chemical identification + matching + normalization, PDF parsing, relationship finding

@rdurnik rdurnik requested a review from hechth January 29, 2026 09:55
@rdurnik rdurnik changed the title AOP-tk wrappers for download, chemical identification + matching, PDF parsing, relationship finding AOP-tk wrappers for download, chemical identification + matching + normalization, PDF parsing, relationship finding Jan 29, 2026
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be rewritten as a direct galaxy tool wrapper using the following syntax

with open($input_file, "r") as f_in, open($output_file, "w") as f_out:
    f_out.write("id\ttext\tchemicals\n")
    for row in csv.DictReader(f_in, delimiter="\t"):
        chemicals = Spacy().find_chemical(row["text"])
        chemicals_str = (
            "|".join(set([chem.name for chem in chemicals])) if chemicals else ""
        )
        f_out.write(f"{row['id']}\t{row['text']}\t{chemicals_str}\n")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants