Conversation
The previous NOTICE.txt was inherited from the broader Apache Arrow project and listed third-party components (SFrame, DyND, LLVM, google-lint/cpplint, mman-win32, LevelDB, CMake, multibuild, Ibis, Dremio, Google Guava, Apache Kudu, Apache ORC) that only exist in the C++, Python, or Java implementations. None of these have any code incorporated in the Rust crates. Meanwhile, the following actually-incorporated third-party code was not listed: - chronoutil (MIT, Oliver Margetts): arrow-array/src/delta.rs is copied verbatim from the chronoutil crate with its MIT header. - compact-thrift (Apache 2.0, Jörn Horstmann): parquet/src/parquet_macros.rs macros are adapted from this project. - parity-common/uint (Apache 2.0 / MIT): arrow-buffer/src/bigint/div.rs division algorithm is heavily inspired by this crate. - simdjson (Apache 2.0): arrow-json/src/reader/tape.rs JSON tape representation is inspired by simdjson's tape design. The Cargo.toml license field (Apache-2.0) remains correct: MIT is compatible with Apache-2.0 inclusion (and the MIT notice is retained in delta.rs), and all other incorporated code is Apache-2.0 licensed. Prompted by Fedora packaging review at https://bugzilla.redhat.com/show_bug.cgi?id=2456991 about NOTICE.txt over-declaring licenses for components not present in the distributed crate. Generated-by: Claude Opus 4.6 (Anthropic)
arrow-array/src/delta.rs is copied verbatim from the chronoutil crate (Copyright 2020-2023 Oliver Margetts) under the MIT license. The file retains its MIT header, but the crate metadata did not reflect this. - Add arrow-array/LICENSE-MIT from the upstream chronoutil project - Update arrow-array license to "Apache-2.0 AND MIT" - Override the workspace include to ship LICENSE-MIT with the crate Generated-by: Claude Opus 4.6 (Anthropic)
|
note: LICENSE-MIT has the year in which the file was forked, so it ends in 2022 while the latest upstream repo now says 2023. This matches what's in the file header |
| This product includes software developed by Hewlett-Packard: | ||
| (c) Copyright [2014-2015] Hewlett-Packard Development Company, L.P | ||
| This product includes software inspired by the simdjson project (Apache 2.0) | ||
| * https://github.com/simdjson/simdjson |
There was a problem hiding this comment.
Not sure if it actually uses code/the code was inspired from simdjson (or only the approach)?
There was a problem hiding this comment.
yeah, this one can probably be dropped
| https://github.com/wesm/feather | ||
| This product includes software from the compact-thrift project (Apache 2.0) | ||
| * Copyright Jörn Horstmann | ||
| * https://github.com/jhorstmann/compact-thrift |
There was a problem hiding this comment.
This was contributed by @jhorstmann so does it need this addition?
There was a problem hiding this comment.
Technically I think @etseidl used the code from @jhorstmann 's repo as part of
So it probably doesn't hurt to have this in here 🤔
There was a problem hiding this comment.
I included a heavily modified version of @jhorstmann's code, but it's still a derivative work, so I agree with @alamb that it doesn't hurt to include
There was a problem hiding this comment.
My main contribution was probably the idea and prototype to use rust macros, with the goal to contribute that code. Then @etseidl did all the hard work of actually integrating that idea into the arrow-rs codebase.
I would be fine to leave this out of the NOTICE file since there is no code that could be considered a direct copy from that repo. I'm happy with the shout-out I got in to blog post about faster parquet parsing :)
There was a problem hiding this comment.
I think it is ok to leave this shout out in the notice. You can be forever famous (to a very select group of people)
|
I had gemini audit the new claims in
|
alamb
left a comment
There was a problem hiding this comment.
Thank you @michel-slm -- this is pretty great
May I ask how you did the audit (what tool)? The list looks good to me
I also had gemini double check and I was able to find the relevant PRs and issues for these new claims
The original NOTICE.txt i think is left over from when this code was split from the apache/arrow repo and it is a nice cleanup
Thank you @michel-slm
I use Claude Opus (I declared it in the individual commits, but not in the PR itself, hope that's enough) Should I address the two questions from @Dandandan too, @alamb ? Thanks |
Code only inspired by probably should not be listed. Also update LICENSE-MIT to actually match the years in the file header, not the latest from the upstream repo Signed-off-by: Michel Lind <salimma@fedoraproject.org>
Yeah -- that is great. Nothing else is needed from my perspective.
i defer to @Dandandan . I don't have a strong opinion either way nor do I really know how much
th comment says "inspired by" and since simdjson is written in C++ we probably didn't use the code 🤔 But on the other hand giving credit on the safe side is probably ok too |
It was strictly inspired by the approach of having a two-pass decoder using a tape, I don't think it needs a license attribution. |
|
@Dandandan I think my last push addressed your feedback, please let me know if you would like any further changes. Thanks! |
alamb
left a comment
There was a problem hiding this comment.
I think it looks great -- thanks everyone
|
We can make a follow on PR if we need to make additional changes |
Which issue does this PR close?
Rationale for this change
I am going to ship
arrowcrates in Fedora as a dependency for another tool, and we did a license audit as part of the review processWhat changes are included in this PR?
Are these changes tested?
N/A, metadata change only
Are there any user-facing changes?
No