Skip to content

Conversation

@ntjohnson1
Copy link
Member

@ntjohnson1 ntjohnson1 commented Jan 5, 2026

Related

RR-3088

Most of this is just moving the files from our private repo where we initially debugged these. https://github.com/rerun-io/dataplatform-examples

What

  • Takes our standard set of dataplatform examples and runs them as snippets using the open source server
    • Checks in small datasets to compare with
  • Don't yell about snippets generating empty RRDs. Our snippet comparison is super painful. Will file an issue to clean this up later. For now, add rust/cpp to don't run and add all languages to don't compare, and also add files to backwards check to run but don't verify against rrd 😢

@ntjohnson1 ntjohnson1 added 📖 documentation Improvements or additions to documentation exclude from changelog PRs with this won't show up in CHANGELOG.md deploy docs Once this PR is merged to main, the resulting commit will be cherry-picked to docs-latest labels Jan 5, 2026
@ntjohnson1 ntjohnson1 requested a review from Copilot January 5, 2026 20:55
@github-actions
Copy link

github-actions bot commented Jan 5, 2026

Latest documentation preview deployed successfully.

Result Commit Link
aca1b5f https://landing-7sjmmqptn-rerun.vercel.app/docs

Note: This comment is updated whenever you push a commit.

@github-actions
Copy link

github-actions bot commented Jan 5, 2026

Your changes cannot be automatically cherry-picked to docs-latest.

You should remove the deploy docs label and perform the cherry-pick manually after merging.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates dataplatform examples into the standard documentation by adding snippet tests that run against a local open-source server with checked-in test datasets from the DROID robotics dataset.

Key changes:

  • Adds four new Python example snippets demonstrating dataplatform query operations (dataframe operations, image queries, time alignment, and view operations)
  • Checks in test RRD files from the DROID dataset to enable testing without external dependencies
  • Modifies the snippet testing infrastructure to handle examples that don't generate RRD output files

Reviewed changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/assets/rrd/video_sample/simple_frame_data.rrd Adds Git LFS tracked video sample data for testing
tests/assets/rrd/sample_5/*.rrd Adds Git LFS tracked DROID dataset episodes for testing
tests/assets/rrd/sample_5/README.md Documents the DROID dataset source and licensing
tests/assets/rrd/*.rrd Adds Git LFS tracked DROID dataset files to root test directory
pixi.toml Adds pandas dependency required for dataplatform examples
docs/snippets/snippets.toml Configures new dataplatform examples to skip RRD generation checks and exclude from certain language implementations
docs/snippets/compare_snippet_output.py Removes RRD validation checks to support examples that don't generate output files
docs/snippets/all/howto/*.py Adds four new Python example files demonstrating dataplatform query operations
docs/content/howto/cloud/*.md Adds documentation pages for the new dataplatform examples
docs/content/howto/cloud.md Adds cloud section redirect to documentation
docs/snippets/INDEX.md Updates snippet index to include new dataplatform examples
crates/build/re_types_builder/src/lib.rs Fixes orphan file handling to preserve specific redirect files during code generation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

github-actions bot commented Jan 5, 2026

Web viewer built successfully.

Result Commit Link Manifest
aca1b5f https://rerun.io/viewer/pr/12317 +nightly +main

View image diff on kitdiff.

Note: This comment is updated whenever you push a commit.

@ntjohnson1 ntjohnson1 force-pushed the nick/dataplatform_examples branch from 60f8f75 to 3e0be95 Compare January 6, 2026 13:19
@ntjohnson1 ntjohnson1 removed the deploy docs Once this PR is merged to main, the resulting commit will be cherry-picked to docs-latest label Jan 6, 2026
@ntjohnson1 ntjohnson1 marked this pull request as ready for review January 6, 2026 13:30
@ntjohnson1 ntjohnson1 added the do-not-merge Do not merge this PR label Jan 6, 2026
@ntjohnson1 ntjohnson1 marked this pull request as draft January 6, 2026 16:08
@ntjohnson1
Copy link
Member Author

Moving this back to draft because we are exploring shuffling our docs a bit. Right now this added cloud which was a bad name. But no reason to figure out a better one if we are about to reorganize

@ntjohnson1 ntjohnson1 removed the do-not-merge Do not merge this PR label Jan 7, 2026
@ntjohnson1 ntjohnson1 marked this pull request as ready for review January 7, 2026 14:44
@ntjohnson1 ntjohnson1 added the do-not-merge Do not merge this PR label Jan 7, 2026
@ntjohnson1
Copy link
Member Author

Ugh preview doesn't load. Missing a link. Back to do not merge but otherwise this should be ok

@ntjohnson1 ntjohnson1 removed the do-not-merge Do not merge this PR label Jan 7, 2026
@ntjohnson1 ntjohnson1 force-pushed the nick/dataplatform_examples branch from 651474d to 0ff8b79 Compare January 8, 2026 19:26
@ntjohnson1 ntjohnson1 force-pushed the nick/dataplatform_examples branch from a13e3d1 to ddf26f8 Compare January 8, 2026 21:53
@ntjohnson1 ntjohnson1 added the do-not-merge Do not merge this PR label Jan 9, 2026
@ntjohnson1
Copy link
Member Author

Moved this back to do not merge since the shuffle got delayed but is soon

@ntjohnson1 ntjohnson1 removed the do-not-merge Do not merge this PR label Jan 9, 2026
Copy link
Member

@abey79 abey79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition overall. I did find some rough edges here and there, plus some proposal to improve the snippets. I'll let you final judge as I dont want perfection get in the way of iterative improvement.

Comment on lines +15 to +16
CATALOG_URL = server.address()
client = rr.catalog.CatalogClient(CATALOG_URL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an reason you are not using client = server.client()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because this approach is the more general one. For instance if you are running a long running local server or connecting to the cloud the diff is smaller.

CATALOG_URL = server.address()
client = rr.catalog.CatalogClient(CATALOG_URL)
dataset = client.get_dataset(name="sample_dataset")
observations = dataset.filter_contents(["/observation/**"]).reader(index="real_time")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line deserves a comment since there are no explanation around it in the doc page.

Suggested change
observations = dataset.filter_contents(["/observation/**"]).reader(index="real_time")
# Obtain the observations as `datafusion.DataFrame`. This is cheap because DataFusion's dataframes are lazy query plans, aka no data is downloaded or processed yet.
observations = dataset.filter_contents(["/observation/**"]).reader(index="real_time")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'd rather NOT have the explanation in the how to. We definitely need to add the explanation but that feels like it fits a different section

In order to more narrowly specify relevant content for further dataframe operations you first generate a view.
This view can filter on episode, time, column name etc.
This example shows specific instances highlighting these capabilities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a sentence to help solidify what we mean by "view" exactly. Aka the fact that they apply to dataset before they are "reshaped" into a dataframe-looking thing by the reader() operation.

Co-authored-by: Antoine Beyeler <49431240+abey79@users.noreply.github.com>
@@ -0,0 +1,44 @@
---
title: Common Dataframe Operations
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I tried to split out different things to do with dataframes (joins, aggregations, etc). But without context it feels meaningless? I think Gijs has some follow ups on all this stuff that he is waiting until this lands to add.

CATALOG_URL = server.address()
client = rr.catalog.CatalogClient(CATALOG_URL)
dataset = client.get_dataset(name="sample_dataset")
observations = dataset.filter_contents(["/observation/**"]).reader(index="real_time")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'd rather NOT have the explanation in the how to. We definitely need to add the explanation but that feels like it fits a different section

ntjohnson1 and others added 2 commits January 12, 2026 11:50
Co-authored-by: Antoine Beyeler <49431240+abey79@users.noreply.github.com>
@ntjohnson1
Copy link
Member Author

Annoying fun fact: If you mark a conversation as resolved it will no longer let you apply suggestions from the gui

@ntjohnson1 ntjohnson1 merged commit 5aeb0a6 into main Jan 12, 2026
44 checks passed
@ntjohnson1 ntjohnson1 deleted the nick/dataplatform_examples branch January 12, 2026 20:37
ntjohnson1 added a commit that referenced this pull request Jan 12, 2026
### Related
RR-3088

Most of this is just moving the files from our private repo where we
initially debugged these.
https://github.com/rerun-io/dataplatform-examples

### What
* Takes our standard set of dataplatform examples and runs them as
snippets using the open source server
   * Checks in small datasets to compare with
* Don't yell about snippets generating empty RRDs. Our snippet
comparison is super painful. Will file an issue to clean this up later.
For now, add rust/cpp to don't run and add all languages to don't
compare, and also add files to backwards check to run but don't verify
against rrd 😢

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Antoine Beyeler <49431240+abey79@users.noreply.github.com>
ntjohnson1 added a commit that referenced this pull request Jan 14, 2026
### Related
RR-3088

Most of this is just moving the files from our private repo where we
initially debugged these.
https://github.com/rerun-io/dataplatform-examples

### What
* Takes our standard set of dataplatform examples and runs them as
snippets using the open source server
   * Checks in small datasets to compare with
* Don't yell about snippets generating empty RRDs. Our snippet
comparison is super painful. Will file an issue to clean this up later.
For now, add rust/cpp to don't run and add all languages to don't
compare, and also add files to backwards check to run but don't verify
against rrd 😢

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Antoine Beyeler <49431240+abey79@users.noreply.github.com>
Wumpf pushed a commit that referenced this pull request Jan 15, 2026
### Related
RR-3088

Most of this is just moving the files from our private repo where we
initially debugged these.
https://github.com/rerun-io/dataplatform-examples

### What
* Takes our standard set of dataplatform examples and runs them as
snippets using the open source server
   * Checks in small datasets to compare with
* Don't yell about snippets generating empty RRDs. Our snippet
comparison is super painful. Will file an issue to clean this up later.
For now, add rust/cpp to don't run and add all languages to don't
compare, and also add files to backwards check to run but don't verify
against rrd 😢

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Antoine Beyeler <49431240+abey79@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📖 documentation Improvements or additions to documentation exclude from changelog PRs with this won't show up in CHANGELOG.md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants