Optimize rdf histogram #5104

rhowardstone · 2025-09-04T14:30:46Z

rhowardstone
Sep 4, 2025

Hi all,

Totally new to this community, and not very experienced on GitHub either, so I apologize if I'm making some faux pas here, but I am having some astounding results "nanny coding" (I take a much more structured, verifiable approach than one could reasonably deem 'vibes') with the Claude-code cli tool (only ever Opus 4.1; I have Max).

I heard your histogramming function was quite slow (#3435), so I ('we'?) spent a few hours trying to understand and improve. I'm not quite sure it 'fixes' that issue, entirely, but it does appear to help! My PR#5103 says the "codecov" is failing: by my understanding, this is simply because Numba isn't enabled in the test environment?

I'm happy to respond to any feedback you are kind enough to take the time to provide, given your knowledge of the codebase (code changes are.. I won't say "easy", but.. they're not the bottleneck anymore -- understanding is)! Or, on the other hand, to "beat it" -- if, you don't take kindly to AI-assisted PRs in this community! My apologies, if so!

If you're curious, here is a template that (when combined with post-agent verification, general watchfulness, skepticism, and a red-flag approach to the phrase, 'simpler solution') has been able to produce some rather impressive results, rather quickly. A version of this was used for my PR:
CLAUDE.md
(you are welcome to use this however you like -- or, to laugh at me for trying!)

Thank you for your time,
-Rye

orbeckst · 2025-10-29T20:22:15Z

orbeckst
Oct 29, 2025
Maintainer

Hi @rhowardstone , thank you for your PR #5128 (where I commented). I feel you're being owed a slightly longer response and the discussions is a good place for that.

We are currently deliberating how to handle PRs that are primarily AI-generated. In a way, your PR and communication is a good example for what a "good" (non-slop) contribution can look like: You're communicating clearly how the code was generated but you're not outsourcing the communication to an LLM.

My personal opinion at the moment: You're identifying the "understanding" part as a limitation and this is getting to one part of the problem: If we as open source developers take time to review and bring in our personal expertise then we hope to also cultivate a relationship with other developers, who, maybe, eventually may contribute more to the project — keeping an open source project going is a constant struggle. We are really not interested in providing prompts for someone else to feed to a LLM.

Another concern is how LLMs may generate code from sources that are license-incompatible with the target; it may also violate licenses by not properly attributing. For instance, if your generated code is really coming from a GPL-licensed code base, then we could not include it in MDAnalysis because it would immediately enforce the GPL on our LGPL code. We're not the only open source project who is facing these questions; see, for instance a Draft SPEC on options for use of AI in project contributions scientific-python/summit-2025#35 .

Comments welcome :-).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize rdf histogram #5104

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Optimize rdf histogram #5104

Uh oh!

Uh oh!

rhowardstone Sep 4, 2025

Replies: 1 comment

Uh oh!

orbeckst Oct 29, 2025 Maintainer

rhowardstone
Sep 4, 2025

orbeckst
Oct 29, 2025
Maintainer