Skip to content

Fedora license audit#9704

Merged
alamb merged 3 commits intoapache:mainfrom
michel-slm:fedora-license-audit
Apr 15, 2026
Merged

Fedora license audit#9704
alamb merged 3 commits intoapache:mainfrom
michel-slm:fedora-license-audit

Conversation

@michel-slm
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

I am going to ship arrow crates in Fedora as a dependency for another tool, and we did a license audit as part of the review process

What changes are included in this PR?

  • Updates NOTICE.txt to reflect third party code actually shipped in the repo
  • Updates Cargo.toml in arrow-array because it actually ships MIT code as well

Are these changes tested?

N/A, metadata change only

Are there any user-facing changes?

No

The previous NOTICE.txt was inherited from the broader Apache Arrow
project and listed third-party components (SFrame, DyND, LLVM,
google-lint/cpplint, mman-win32, LevelDB, CMake, multibuild, Ibis,
Dremio, Google Guava, Apache Kudu, Apache ORC) that only exist in
the C++, Python, or Java implementations. None of these have any
code incorporated in the Rust crates.

Meanwhile, the following actually-incorporated third-party code was
not listed:

- chronoutil (MIT, Oliver Margetts): arrow-array/src/delta.rs is
  copied verbatim from the chronoutil crate with its MIT header.
- compact-thrift (Apache 2.0, Jörn Horstmann):
  parquet/src/parquet_macros.rs macros are adapted from this project.
- parity-common/uint (Apache 2.0 / MIT): arrow-buffer/src/bigint/div.rs
  division algorithm is heavily inspired by this crate.
- simdjson (Apache 2.0): arrow-json/src/reader/tape.rs JSON tape
  representation is inspired by simdjson's tape design.

The Cargo.toml license field (Apache-2.0) remains correct: MIT is
compatible with Apache-2.0 inclusion (and the MIT notice is retained
in delta.rs), and all other incorporated code is Apache-2.0 licensed.

Prompted by Fedora packaging review at
https://bugzilla.redhat.com/show_bug.cgi?id=2456991 about NOTICE.txt
over-declaring licenses for components not present in the distributed
crate.

Generated-by: Claude Opus 4.6 (Anthropic)
arrow-array/src/delta.rs is copied verbatim from the chronoutil crate
(Copyright 2020-2023 Oliver Margetts) under the MIT license. The file
retains its MIT header, but the crate metadata did not reflect this.

- Add arrow-array/LICENSE-MIT from the upstream chronoutil project
- Update arrow-array license to "Apache-2.0 AND MIT"
- Override the workspace include to ship LICENSE-MIT with the crate

Generated-by: Claude Opus 4.6 (Anthropic)
@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 13, 2026
@michel-slm
Copy link
Copy Markdown
Contributor Author

note: LICENSE-MIT has the year in which the file was forked, so it ends in 2022 while the latest upstream repo now says 2023. This matches what's in the file header

Comment thread NOTICE.txt Outdated
This product includes software developed by Hewlett-Packard:
(c) Copyright [2014-2015] Hewlett-Packard Development Company, L.P
This product includes software inspired by the simdjson project (Apache 2.0)
* https://github.com/simdjson/simdjson
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it actually uses code/the code was inspired from simdjson (or only the approach)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this one can probably be dropped

Comment thread NOTICE.txt
https://github.com/wesm/feather
This product includes software from the compact-thrift project (Apache 2.0)
* Copyright Jörn Horstmann
* https://github.com/jhorstmann/compact-thrift
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was contributed by @jhorstmann so does it need this addition?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically I think @etseidl used the code from @jhorstmann 's repo as part of

So it probably doesn't hurt to have this in here 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's actually contributed by @etseidl in #8530

I'm honestly not sure, let me know if this should be removed and I'll remove it. The license is also Apache anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's see what @jhorstmann has to say

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included a heavily modified version of @jhorstmann's code, but it's still a derivative work, so I agree with @alamb that it doesn't hurt to include

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main contribution was probably the idea and prototype to use rust macros, with the goal to contribute that code. Then @etseidl did all the hard work of actually integrating that idea into the arrow-rs codebase.

I would be fine to leave this out of the NOTICE file since there is no code that could be considered a direct copy from that repo. I'm happy with the shout-out I got in to blog post about faster parquet parsing :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok to leave this shout out in the notice. You can be forever famous (to a very select group of people)

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 13, 2026

I had gemini audit the new claims in NOTICE.txt and verified that they reflect the code currently incorporated in the repository. Below is the mapping of each claim to its corresponding implementation and the PR that introduced it:

Full Claim Text (from NOTICE.txt) Affected Code Introduced by
chronoutil (MIT)
Copyright (c) 2020-2022 Oliver Margetts
https://github.com/olliemath/chronoutil
arrow-array/src/delta.rs #2031: Add support for adding intervals to dates
compact-thrift (Apache 2.0)
Copyright Jörn Horstmann
https://github.com/jhorstmann/compact-thrift
parquet/src/parquet_macros.rs #8530: Use custom thrift parser for parquet metadata (phase 1 of Thrift remodel)
parity-common uint (Apache 2.0 / MIT)
Copyright Parity Technologies
https://github.com/paritytech/parity-common
arrow-buffer/src/bigint/div.rs #4663: Faster i256 Division (2-100x)
simdjson (Apache 2.0)
https://github.com/simdjson/simdjson
arrow-json/src/reader/tape.rs #3479: Add Raw JSON Reader (~2.5x faster)

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @michel-slm -- this is pretty great

May I ask how you did the audit (what tool)? The list looks good to me

I also had gemini double check and I was able to find the relevant PRs and issues for these new claims

The original NOTICE.txt i think is left over from when this code was split from the apache/arrow repo and it is a nice cleanup

Thank you @michel-slm

@michel-slm
Copy link
Copy Markdown
Contributor Author

Thank you @michel-slm -- this is pretty great

May I ask how you did the audit (what tool)? The list looks good to me

I also had gemini double check and I was able to find the relevant PRs and issues for these new claims

The original NOTICE.txt i think is left over from when this code was split from the apache/arrow repo and it is a nice cleanup

Thank you @michel-slm

I use Claude Opus (I declared it in the individual commits, but not in the PR itself, hope that's enough)

Should I address the two questions from @Dandandan too, @alamb ?

Thanks

Code only inspired by probably should not be listed.

Also update LICENSE-MIT to actually match the years in the file header,
not the latest from the upstream repo

Signed-off-by: Michel Lind <salimma@fedoraproject.org>
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 13, 2026

I use Claude Opus (I declared it in the individual commits, but not in the PR itself, hope that's enough)

Yeah -- that is great. Nothing else is needed from my perspective.

Should I address the two questions from @Dandandan too, @alamb ?

i defer to @Dandandan . I don't have a strong opinion either way nor do I really know how much

Not sure if it actually uses code/the code was inspired from simdjson (or only the approach)?

th comment says "inspired by" and since simdjson is written in C++ we probably didn't use the code 🤔 But on the other hand giving credit on the safe side is probably ok too

@tustvold
Copy link
Copy Markdown
Contributor

th comment says "inspired by" and since simdjson is written in C++ we probably didn't use the code 🤔 But on the other hand giving credit on the safe side is probably ok too

It was strictly inspired by the approach of having a two-pass decoder using a tape, I don't think it needs a license attribution.

@michel-slm
Copy link
Copy Markdown
Contributor Author

@Dandandan I think my last push addressed your feedback, please let me know if you would like any further changes. Thanks!

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks great -- thanks everyone

@alamb alamb merged commit b946165 into apache:main Apr 15, 2026
27 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 15, 2026

We can make a follow on PR if we need to make additional changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NOTICE.txt is inaccurate

7 participants