Skip to content

Conversation

@rtraborn
Copy link
Collaborator

@rtraborn rtraborn commented Dec 8, 2025

Hi @dkoslicki and team!
I created a sylph coverage model from Shaw and Yu, 2024 and added it to yacht, in a branch I named superyacht just for fun.
This is a draft that I'm still testing, so that and other caveats still apply. A few notes:

  • The coverage code is in a function called cov_calc, which calculates lambda and ani according as specified by the sylph paper.
  • For engineering reasons, I decided to put cov_calc inside get_exclusive_hashes, given that that function provides us with the signature objects needed to make the calculations.
  • Because of this, I am passing the output of cov_calc, a pandas dataframe, along with hypothesis_recovery. There are probably good ways to integrate this, and I'll give this some more thought.
  • Also, I decided to not incorporate the output of cov_calc more deeply into hypothesis_recovery for now. I have some ideas on what might be the best approach that we could discuss if you'd like. I thought it would be best to share this new branch while I look into this more deeply.
  • The script internal_superyacht_test.py is just a script that I have been using to test the new branch, and this can be ignored; I'll remove it once we move towards publication.
  • I plan to update the way I instantiated the AdjustStatusLambda enum in a more idiomatic python way this week. It should be a relatively quick fix.
  • I did not incorporate the taxonomic reassignment/winner_map routine from sylph, but it's something I would like to add.

I'm going to do more testing this week on additional datasets. Happy to discuss here or via email/video!

@rtraborn
Copy link
Collaborator Author

rtraborn commented Jan 2, 2026

After some more testing, I just pushed some additional updates to this branch.

  • Fixed a typo: corrected to logger.warning in cov_calc.py
  • Added missing scipy.special.gamma import to utils.py
  • Fixed a few bugs I discovered in binary_search_lambda()
  • Replaced print statements with logger.info() for consistency
  • Consolidated duplicate constants into utils.py
  • Removed a few unused local variables from hypothesis_recovery_src.py

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 6, 2026

Quality Gate Passed Quality Gate passed

Issues
19 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

@rtraborn
Copy link
Collaborator Author

rtraborn commented Jan 7, 2026

A small update, but with my most recent commit from last night I made the promised change to the AdjustStatusLambda enum in cov_calc to make it more idiomatically python-like. I think it looks cleaner- thanks to @standage for the flagging this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants