Skip to content

Refactor xcom API to use shared serialisation constants#64148

Merged
amoghrajesh merged 11 commits intoapache:mainfrom
astronomer:shared-serialization
Mar 27, 2026
Merged

Refactor xcom API to use shared serialisation constants#64148
amoghrajesh merged 11 commits intoapache:mainfrom
astronomer:shared-serialization

Conversation

@amoghrajesh
Copy link
Contributor

@amoghrajesh amoghrajesh commented Mar 24, 2026

I wasn't anticipating that we would need this initially, but turns out we lose abilities of XcomEncoder / XcomDecoder if we do not use them as it was removed in: #58900. The ability lost here is disallowing the internal keys like _classname, _types etc.

However, we cannot go back to the old way of importing XcomEncoder again because it internally imports airflow.sdk.serde and we do not want to bring back the dependency of sdk in core that we worked hard to remove.

The ideal solution to this would be to move serialization format constants (classname, __type, __var, etc.)
to a new shared library and consume those across airflow-core and task-sdk. This also brings back the validation in XCom API to reject reserved serialization keys in user-provided data and update read path to avoid unnecessary deserialization.

After changes this is how it looks
image


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@amoghrajesh amoghrajesh force-pushed the shared-serialization branch from ee023d0 to b924f6f Compare March 24, 2026 09:54
@uranusjr
Copy link
Member

I wonder if it’d be easier to just use pre-commit hooks to keep two files in core and sdk in sync instead of a shared module

@amoghrajesh
Copy link
Contributor Author

I wonder if it’d be easier to just use pre-commit hooks to keep two files in core and sdk in sync instead of a shared module

Easier, yes, but I am not highly in favour of that @uranusjr, the shared library seems to be a much more maintainable approach even for such small utilities. With prek hook here, refactors etc could lose the source of truth making it harder to track and maintain over time leading to a drift risk.

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit, otherwise this makes sense to me.

@amoghrajesh amoghrajesh force-pushed the shared-serialization branch from e721a59 to 643c514 Compare March 26, 2026 06:30
Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@amoghrajesh
Copy link
Contributor Author

The failures here are unrelated, like this:

  Provider yandex is not excluded.
  Ignoring existing lockfile due to change in resolution mode: `highest` vs. `lowest-direct`
  warning: The direct dependency `setuptools` is unpinned. Consider setting a lower bound when using `--resolution lowest` or `--resolution lowest-direct` to avoid using outdated versions.
  warning: The direct dependency `pyspark` is unpinned. Consider setting a lower bound when using `--resolution lowest` or `--resolution lowest-direct` to avoid using outdated versions.
  warning: The direct dependency `pydantic-ai-slim` is unpinned. Consider setting a lower bound when using `--resolution lowest` or `--resolution lowest-direct` to avoid using outdated versions.
  warning: The direct dependency `pytest` is unpinned. Consider setting a lower bound when using `--resolution lowest` or `--resolution lowest-direct` to avoid using outdated versions.
  Resolved 928 packages in 8.26s
     Building apache-airflow-providers-yandex @ file:///opt/airflow/providers/yandex
  Downloading yandexcloud (5.4MiB)
        Built apache-airflow-providers-yandex @ file:///opt/airflow/providers/yandex
   Downloaded yandexcloud
    × Failed to download and build `grpcio-tools==1.71.2`
    ├─▶ Failed to acquire lock on the distribution cache
    ├─▶ Could not acquire lock
    ╰─▶ Timeout (300s) when waiting for lock on
        `/root/.cache/uv/sdists-v9/pypi/grpcio-tools/1.71.2` at
        `/root/.cache/uv/sdists-v9/pypi/grpcio-tools/1.71.2/.lock`, is another
        uv process running? You can set `UV_LOCK_TIMEOUT` to increase the
        timeout.
    help: `grpcio-tools` (v1.71.2) was included because
          `apache-airflow-providers-yandex` (v4.4.1) depends on `yandexcloud`
          (v0.376.0) which depends on `grpcio-tools`

Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well, thanks for the improvement.
Non-blocking nits.

@amoghrajesh
Copy link
Contributor Author

Testing for confidence before merge

  1. Write path is protected now
image
  1. Read path

If I add a xcom to the table that has a classpath in it to mimic that it was allowed earlier, this is what table has

3,push_xcom_task,-1,return_value,xcom_example,manual__2026-03-27T05:43:24.337091+00:00,"{""__data__"": {""uri"": ""abcd"", ""name"": ""Hello, XCom!"", ""extra"": {}, ""group"": ""asset"", ""watchers"": []}, ""__version__"": 1, ""__classname__"": ""airflow.sdk.definitions.asset.Asset""}",2026-03-27 05:43:25.625647 +00:00

And this is what the get xcom API with deserialize=true will return

image

amoghrajesh and others added 2 commits March 27, 2026 11:19
Co-authored-by: Jason(Zhe-You) Liu <68415893+jason810496@users.noreply.github.com>
Updated test docstring to clarify purpose.
@amoghrajesh
Copy link
Contributor Author

Okay unrelated failures. Merging this.

@amoghrajesh amoghrajesh merged commit 0ed72b4 into apache:main Mar 27, 2026
277 of 280 checks passed
@amoghrajesh amoghrajesh deleted the shared-serialization branch March 27, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:task-sdk

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants