-
Notifications
You must be signed in to change notification settings - Fork 645
Put dataplatform examples in standard docs #12317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Latest documentation preview deployed successfully.
Note: This comment is updated whenever you push a commit. |
|
Your changes cannot be automatically cherry-picked to You should remove the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR integrates dataplatform examples into the standard documentation by adding snippet tests that run against a local open-source server with checked-in test datasets from the DROID robotics dataset.
Key changes:
- Adds four new Python example snippets demonstrating dataplatform query operations (dataframe operations, image queries, time alignment, and view operations)
- Checks in test RRD files from the DROID dataset to enable testing without external dependencies
- Modifies the snippet testing infrastructure to handle examples that don't generate RRD output files
Reviewed changes
Copilot reviewed 27 out of 28 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/assets/rrd/video_sample/simple_frame_data.rrd | Adds Git LFS tracked video sample data for testing |
| tests/assets/rrd/sample_5/*.rrd | Adds Git LFS tracked DROID dataset episodes for testing |
| tests/assets/rrd/sample_5/README.md | Documents the DROID dataset source and licensing |
| tests/assets/rrd/*.rrd | Adds Git LFS tracked DROID dataset files to root test directory |
| pixi.toml | Adds pandas dependency required for dataplatform examples |
| docs/snippets/snippets.toml | Configures new dataplatform examples to skip RRD generation checks and exclude from certain language implementations |
| docs/snippets/compare_snippet_output.py | Removes RRD validation checks to support examples that don't generate output files |
| docs/snippets/all/howto/*.py | Adds four new Python example files demonstrating dataplatform query operations |
| docs/content/howto/cloud/*.md | Adds documentation pages for the new dataplatform examples |
| docs/content/howto/cloud.md | Adds cloud section redirect to documentation |
| docs/snippets/INDEX.md | Updates snippet index to include new dataplatform examples |
| crates/build/re_types_builder/src/lib.rs | Fixes orphan file handling to preserve specific redirect files during code generation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Web viewer built successfully.
View image diff on kitdiff. Note: This comment is updated whenever you push a commit. |
60f8f75 to
3e0be95
Compare
|
Moving this back to draft because we are exploring shuffling our docs a bit. Right now this added |
|
Ugh preview doesn't load. Missing a link. Back to do not merge but otherwise this should be ok |
651474d to
0ff8b79
Compare
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
a13e3d1 to
ddf26f8
Compare
|
Moved this back to do not merge since the shuffle got delayed but is soon |
abey79
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition overall. I did find some rough edges here and there, plus some proposal to improve the snippets. I'll let you final judge as I dont want perfection get in the way of iterative improvement.
| CATALOG_URL = server.address() | ||
| client = rr.catalog.CatalogClient(CATALOG_URL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an reason you are not using client = server.client()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes because this approach is the more general one. For instance if you are running a long running local server or connecting to the cloud the diff is smaller.
| CATALOG_URL = server.address() | ||
| client = rr.catalog.CatalogClient(CATALOG_URL) | ||
| dataset = client.get_dataset(name="sample_dataset") | ||
| observations = dataset.filter_contents(["/observation/**"]).reader(index="real_time") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line deserves a comment since there are no explanation around it in the doc page.
| observations = dataset.filter_contents(["/observation/**"]).reader(index="real_time") | |
| # Obtain the observations as `datafusion.DataFrame`. This is cheap because DataFusion's dataframes are lazy query plans, aka no data is downloaded or processed yet. | |
| observations = dataset.filter_contents(["/observation/**"]).reader(index="real_time") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'd rather NOT have the explanation in the how to. We definitely need to add the explanation but that feels like it fits a different section
| In order to more narrowly specify relevant content for further dataframe operations you first generate a view. | ||
| This view can filter on episode, time, column name etc. | ||
| This example shows specific instances highlighting these capabilities. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a sentence to help solidify what we mean by "view" exactly. Aka the fact that they apply to dataset before they are "reshaped" into a dataframe-looking thing by the reader() operation.
Co-authored-by: Antoine Beyeler <49431240+abey79@users.noreply.github.com>
| @@ -0,0 +1,44 @@ | |||
| --- | |||
| title: Common Dataframe Operations | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I tried to split out different things to do with dataframes (joins, aggregations, etc). But without context it feels meaningless? I think Gijs has some follow ups on all this stuff that he is waiting until this lands to add.
| CATALOG_URL = server.address() | ||
| client = rr.catalog.CatalogClient(CATALOG_URL) | ||
| dataset = client.get_dataset(name="sample_dataset") | ||
| observations = dataset.filter_contents(["/observation/**"]).reader(index="real_time") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'd rather NOT have the explanation in the how to. We definitely need to add the explanation but that feels like it fits a different section
Co-authored-by: Antoine Beyeler <49431240+abey79@users.noreply.github.com>
|
Annoying fun fact: If you mark a conversation as resolved it will no longer let you apply suggestions from the gui |
…n into nick/dataplatform_examples
### Related RR-3088 Most of this is just moving the files from our private repo where we initially debugged these. https://github.com/rerun-io/dataplatform-examples ### What * Takes our standard set of dataplatform examples and runs them as snippets using the open source server * Checks in small datasets to compare with * Don't yell about snippets generating empty RRDs. Our snippet comparison is super painful. Will file an issue to clean this up later. For now, add rust/cpp to don't run and add all languages to don't compare, and also add files to backwards check to run but don't verify against rrd 😢 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Antoine Beyeler <49431240+abey79@users.noreply.github.com>
### Related RR-3088 Most of this is just moving the files from our private repo where we initially debugged these. https://github.com/rerun-io/dataplatform-examples ### What * Takes our standard set of dataplatform examples and runs them as snippets using the open source server * Checks in small datasets to compare with * Don't yell about snippets generating empty RRDs. Our snippet comparison is super painful. Will file an issue to clean this up later. For now, add rust/cpp to don't run and add all languages to don't compare, and also add files to backwards check to run but don't verify against rrd 😢 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Antoine Beyeler <49431240+abey79@users.noreply.github.com>
### Related RR-3088 Most of this is just moving the files from our private repo where we initially debugged these. https://github.com/rerun-io/dataplatform-examples ### What * Takes our standard set of dataplatform examples and runs them as snippets using the open source server * Checks in small datasets to compare with * Don't yell about snippets generating empty RRDs. Our snippet comparison is super painful. Will file an issue to clean this up later. For now, add rust/cpp to don't run and add all languages to don't compare, and also add files to backwards check to run but don't verify against rrd 😢 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Antoine Beyeler <49431240+abey79@users.noreply.github.com>
Related
RR-3088
Most of this is just moving the files from our private repo where we initially debugged these. https://github.com/rerun-io/dataplatform-examples
What