Add cache mechanism for rosidl code generation by otamachan · Pull Request #934 · ros2/rosidl

otamachan · 2026-02-07T00:23:19Z

Description

Add a cache mechanism for rosidl code generation to speed up rebuilds when IDL files and templates have not changed.
This caches two stages of the rosidl pipeline (see #931 for motivation and benchmarks):

IDL parsing results (rosidl_parser/parser.py): Caches the IdlFile object returned by parse_idl_file().
Generated code files (rosidl_pycommon/__init__.py): Caches the output files produced by each generator (e.g., rosidl_generator_py, rosidl_generator_cpp).

The cache is opt-in and disabled by default. It is only activated when ROSIDL_CACHE_DIR or ROSIDL_CACHE_CONFIG is set.

Is this user-facing behavior change?

Users can set the ROSIDL_CACHE_DIR environment variable to enable caching of rosidl code generation, which can speed up incremental builds.

Did you use Generative AI?

Claude Opus 4.6 was used for code review feedback.

Additional Information

Signed-off-by: Tamaki Nishino <otamachan@gmail.com>

mjcarroll · 2026-02-14T15:04:26Z

Do we also want to consider using the generator version as part of the cache key? This probably wouldn't affect most people, but for those of us who also make changes in the rosidl_generator_cpp layer or lower, we would want to make sure that changes there invalidate the cache.

otamachan · 2026-02-15T03:14:09Z

Thanks for the feedback! I agree that including the generator version in the cache key would make it more robust.

I see two possible approaches to retrieve the generator version via importlib.metadata:

Option 1: Use generator_name from the arguments filename
Since generator_name is already extracted from the arguments filename, we can use importlib.metadata.version(generator_name) directly. Simple, but the assumption that the arguments filename matches the Python package name is convention, not enforced (e.g. hardcoded here).

Option 2: Resolve the caller module from the call stack
Use inspect to identify the module that called generate_files() and retrieve its version. This doesn't rely on the filename convention.

Does either approach sound reasonable?

gavanderhoorn · 2026-02-18T19:38:13Z

rosidl_parser/rosidl_parser/cache.py

+        return None
+    try:
+        with open(cache_file, 'rb') as f:
+            return pickle.load(f)


Are we worried at all about unpickling being unsafe?

Thanks for the feedback! To address the unpickling safety concern, we could use JSON serialization instead of pickle here. rosidl_parser.definition contains only simple data classes with primitive attributes, so by embedding a __class__ field during serialization, we can build safe deserialization without touching definition.py.

Here's the approach:

import json import rosidl_parser.definition as _def _CLASSES = { name: obj for name in dir(_def) if isinstance(obj := getattr(_def, name), type) } def _decode(data): if isinstance(data, dict): cls_name = data.get('__class__') entries = {k: _decode(v) for k, v in data.items() if k != '__class__'} if cls_name: obj = _CLASSES[cls_name].__new__(_CLASSES[cls_name]) obj.__dict__.update(entries) return obj return entries if isinstance(data, list): return [_decode(v) for v in data] return data

Since json.loads only produces built-in Python types, and class reconstruction is limited to known classes from rosidl_parser.definition, this will be safer than unpickling.

wjwwood

Thanks for putting this forward. A performance improvement on this scale would be a huge improvement.

I'm torn about whether this is the right way to implement the caching. I think the pros of this approach is that it's non-invasive and gives the user control over how much storage is used and where it's located, a la ccache.

This is good, but I can't help but think that it would be even better if we could preserve and distribute these cache files, so that users can benefit from the caching without explicitly setting it up locally. However, there are some unknowns (to me) that might make my idea impractical.

I'd like to know, if you or anyone else has an idea:

How big are these files
- I saw you have a default ~1GB max size, is that arbitrary or based on experience?
- How would switching to json from pickle impact this, if at all?
Does the caching take into account the source code that parses the interface files?
- Put another way, would changing the code that parses rosidl files invalidate the cache?
- Seems like "no" from a cursory look, but maybe I missed something

If the files are relatively small, and we can somehow work the hash of the parsing code into the cache key, then I think we could do something like:

Check paths in each AMENT_PREFIX_PATH for cached files first
- Cache miss might be due to no cache files found, a change in the rosidl parser invalidates it, or a local change (e.g. someone sudo-edited a .msg file in /opt)
If not found, look in the build folder for the current package for cached files
Finally if neither of those have a cache hit, parse the file and store it in the build folder
Then afterwards, install cache files for interface files of the current package so they can be found by future packages.

This would move this idea towards building/installing precompiled headers or something akin to .pyc files, rather than a model that more closely resembles ccache (in my opinion).

If there's no appetite to do this, then this pr as-is is better than what we have now for sure. I can imagine push back because what I described would be more complicated, requiring cmake changes and wiring those installed files into the command line tool so it can find them.

But I'm interested to hear what others think of what I've said here.

wjwwood · 2026-02-19T22:46:44Z

One other thought, which is more of a "yes and...", is that if we can get such a big win by caching (which itself requires cache hit checking and then parsing the cache file) the idl parsing, is there perhaps some low hanging fruit in the idl parsing code that could speed that process up considerably as well? It's already useful to know that this part of the process is so expensive, so thanks again @otamachan.

wjwwood · 2026-02-19T23:27:08Z

To answer my own question, I see now from the original issue (#931 (comment)) that cache size (for 1 package?) is like ~72MB, which is bigger than I expected, to be honest. To big, I would say, to distribute the cache without more thought.

I'm curious though, based on your comment #934 (comment), whether or not the json approach would be a lot smaller since you're theoretically storing a lot less than blindly pickling the whole data structure.

otamachan · 2026-02-20T01:40:58Z

Thank you for the thoughtful feedback and for considering alternative caching strategies — I really appreciate it.

Cache file size:

The 72MB reported in the original issue is for px4_msgs, a package with over 200 message files. That total includes caches for all generators, not just IDL parsing. The idl_parse cache itself is relatively small — roughly 2KB to 16KB per message.

$ du --max-depth=1 . -h
2.9M    ./rosidl_typesupport_c
7.0M    ./rosidl_typesupport_introspection_cpp
2.2M    ./idl_parse
11M     ./rosidl_generator_cpp
8.3M    ./rosidl_generator_py
7.2M    ./rosidl_typesupport_introspection_c
9.1M    ./rosidl_typesupport_fastrtps_cpp
3.0M    ./rosidl_typesupport_cpp
8.5M    ./rosidl_typesupport_fastrtps_c
14M     ./rosidl_generator_c
72M     .

Default max size (~1GB):

This is an arbitrary value, loosely based on ccache's default of 5GB. I chose a smaller default since rosidl cache files are generally much smaller than compiled object files.

JSON vs pickle size impact:

I haven't measured this yet but plan to. Since the idl_parse cache is already small (2.2MB for 200+ messages), I don't expect a significant difference either way.

Parser source code in cache key:

You're right — currently the cache key only includes the IDL file content, so changes to the parser itself would not invalidate the cache. As discussed in #934 (comment), I think including the parser package version in the cache key (similar to what was suggested for generators) would be a good way to address this.

On the ccache vs install/distribute approach

I think the .pyc-like approach of storing cache files in the build/install space and looking them up via AMENT_PREFIX_PATH is a good idea — it would allow users to benefit from caching without any explicit setup.

A few things I'd like to share for consideration:

CI use case: For CI workflows that build everything from source, the ccache-like approach is straightforward — I just persist and restore a single cache directory between runs. With the install-based approach, CI would need to cache the install space for each package, which adds a bit more complexity.
Generator caching: I'd also like to cache the results of code generators (rosidl_generator_c, rosidl_generator_cpp, etc.), not just the IDL parsing. The ccache-like approach handles both uniformly.
Future-proofing: If a faster parser or generator is developed in the future, the caching layer may become unnecessary or changed. The ccache-like approach is less invasive (Python-only, no CMake changes), which makes it easier to adopt now and remove later if needed.

I'd be happy to hear your thoughts on this.

Signed-off-by: Tamaki Nishino <otamachan@gmail.com>

otamachan · 2026-02-22T16:07:17Z

Added two commits on top of the existing PR:

482886d Replace pickle with JSON for cache serialization
47707c4 Include package versions in cache key for automatic invalidation on upgrades

Benchmark results (px4_msgs, -j8, ccache + rosidl cache):

Build time: pickle 4.4s → JSON 3.9s (comparable)
Cache size: pickle 72M → JSON 75M (+4%)

…pgrades Signed-off-by: Tamaki Nishino <otamachan@gmail.com>

…y changes Signed-off-by: Tamaki Nishino <otamachan@gmail.com>

otamachan force-pushed the introduce-cache branch from 3a6255c to a70f0be Compare February 7, 2026 00:24

Add cache mechanism for rosidl code generation

35baa7a

Signed-off-by: Tamaki Nishino <otamachan@gmail.com>

otamachan force-pushed the introduce-cache branch from a70f0be to 35baa7a Compare February 7, 2026 01:34

wjwwood self-assigned this Feb 12, 2026

mjcarroll self-requested a review February 14, 2026 14:56

gavanderhoorn reviewed Feb 18, 2026

View reviewed changes

wjwwood reviewed Feb 19, 2026

View reviewed changes

Replace pickle with JSON for cache serialization

482886d

Signed-off-by: Tamaki Nishino <otamachan@gmail.com>

Include package versions in cache key for automatic invalidation on u…

47707c4

…pgrades Signed-off-by: Tamaki Nishino <otamachan@gmail.com>

otamachan force-pushed the introduce-cache branch from a0bd221 to 47707c4 Compare February 22, 2026 16:15

Include type_description_info in cache key to invalidate on dependenc…

f66c3fe

…y changes Signed-off-by: Tamaki Nishino <otamachan@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cache mechanism for rosidl code generation#934

Add cache mechanism for rosidl code generation#934
otamachan wants to merge 4 commits intoros2:rollingfrom
otamachan:introduce-cache

otamachan commented Feb 7, 2026

Uh oh!

mjcarroll commented Feb 14, 2026

Uh oh!

otamachan commented Feb 15, 2026

Uh oh!

gavanderhoorn Feb 18, 2026

Uh oh!

otamachan Feb 18, 2026

Uh oh!

wjwwood left a comment

Uh oh!

wjwwood commented Feb 19, 2026

Uh oh!

wjwwood commented Feb 19, 2026

Uh oh!

otamachan commented Feb 20, 2026 •

edited

Loading

Uh oh!

otamachan commented Feb 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

otamachan commented Feb 7, 2026

Description

Is this user-facing behavior change?

Did you use Generative AI?

Additional Information

Uh oh!

mjcarroll commented Feb 14, 2026

Uh oh!

otamachan commented Feb 15, 2026

Uh oh!

gavanderhoorn Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

otamachan Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

wjwwood left a comment

Choose a reason for hiding this comment

Uh oh!

wjwwood commented Feb 19, 2026

Uh oh!

wjwwood commented Feb 19, 2026

Uh oh!

otamachan commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cache file size:

Default max size (~1GB):

JSON vs pickle size impact:

Parser source code in cache key:

On the ccache vs install/distribute approach

Uh oh!

otamachan commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

otamachan commented Feb 20, 2026 •

edited

Loading

otamachan commented Feb 22, 2026 •

edited

Loading