Skip to content

Pass tenant context to downstream queriers in distributed mode#8722

Open
ViciousEagle03 wants to merge 4 commits intothanos-io:mainfrom
ViciousEagle03:fix-query-tenant-8632
Open

Pass tenant context to downstream queriers in distributed mode#8722
ViciousEagle03 wants to merge 4 commits intothanos-io:mainfrom
ViciousEagle03:fix-query-tenant-8632

Conversation

@ViciousEagle03
Copy link
Copy Markdown

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

This PR fixes a bug in distributed query mode where the THANOS-TENANT context was not included in the outgoing gRPC metadata during the fanout to downstream query nodes.

While the StoreAPI (proxy.go) correctly propagates the tenant, the QueryAPI (remote_engine.go) was leaving the outgoing gRPC metadata empty. This PR adds the missing logic to remote_engine.go by extracting the tenant from the local context and passing it to metadata.AppendToOutgoingContext before executing the instant and range queries.

Note: I am pretty new to the Thanos codebase, so I am not sure if a test needs to be added to remote_engine_test.go since this is more like a bug in the distributed query mode. Happy to add one if it is required.

Verification

  • Validated manually, ran a local distributed topology (Upstream/Downstream/Sidecar) with --query.tenant-header flag enabled.
    Below is the screenshot of the same

Before the Patch:
As seen in the debug output below, just before the Global Querier makes the gRPC call to the downstream node, the THANOS-TENANT key is missing from the outgoing gRPC metadata.

BeforePatch

After the Patch:

The tenant ID is packed into the outgoing gRPC metadata envelope before the network call is executed.
AfterPatch

Reproduction Steps:
Had Prometheus scrape itself for metrics by running the below command (./prometheus --config.file=prometheus.yml),
Then, I spun up the following topology:

# Starting Thanos Sidecar
./.bin/thanos sidecar \
    --tsdb.path=../prometheus-2.51.0.linux-amd64/data \
    --prometheus.url=http://localhost:9090 \
    --http-address=0.0.0.0:10906 \
    --grpc-address=0.0.0.0:10905
# Starting the Downstream Querier
./.bin/thanos query \
    --http-address="0.0.0.0:10903" \
    --grpc-address="0.0.0.0:10904" \
    --query.tenant-header="X-Scope-OrgID" \
    --endpoint="127.0.0.1:10905"
# Starting the Upstream Global Querier
./.bin/thanos query \
    --http-address="0.0.0.0:10902" \
    --grpc-address="0.0.0.0:10901" \
    --query.mode=distributed \
    --query.tenant-header="X-Scope-OrgID" \
    --endpoint="127.0.0.1:10904"
  • Passed make test-local.

Fixes: 8632

Copy link
Copy Markdown
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test for this case? We try to do that to ensure that this behavior will not change accidentally in the future.

@ViciousEagle03
Copy link
Copy Markdown
Author

ViciousEagle03 commented Mar 17, 2026

Could we add a test for this case? We try to do that to ensure that this behavior will not change accidentally in the future.

Done, I've added a test case for this fix. Let me know if the approach looks right to you or if it needs any tweaks.


if tenant := ctx.Value(tenancy.TenantKey); tenant != nil {
if tenantStr, ok := tenant.(string); ok {
qctx = metadata.AppendToOutgoingContext(qctx, tenancy.DefaultTenantHeader, tenantStr)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also realized that this is a bit tricky - the downstream might have a different "tenant header" setting set and then this won't work, right? Seems like we should move the "tenant" field into querypb.QueryRequest and then read it in the handler to make this foolproof. What do you think?

Copy link
Copy Markdown
Author

@ViciousEagle03 ViciousEagle03 Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Apologies for the delay. I was unwell, so it took me some time to get myself up and running again.

You are correct to point out that we should be using a protobuf field for the tenant ID to keep it robust.
To sum up my recent commit,

  • Added the tenant field to the querypb.QueryRequest and the querypb.QueryRangeRequest.
  • Updated the Query handler in pkg/api/query/grpc.go to pull the tenant out from the protobuf and inject it into the local context.
  • Updated the test in remote_engine_test.go.
  • Also rebased my branch over the recent changes of the main

…d query mode

Signed-off-by: Piyush Sharma <piyushsharma04321@gmail.com>
Signed-off-by: Piyush Sharma <piyushsharma04321@gmail.com>
Signed-off-by: Piyush Sharma <piyushsharma04321@gmail.com>
…er mismatch

Signed-off-by: Piyush Sharma <piyushsharma04321@gmail.com>
@ViciousEagle03 ViciousEagle03 force-pushed the fix-query-tenant-8632 branch from ab608b0 to 1fca96f Compare March 23, 2026 07:46
@ViciousEagle03
Copy link
Copy Markdown
Author

ViciousEagle03 commented Mar 24, 2026

I've also rebased my branch over main. It seems to have inherited some CI failures that were most likely introduced when the PR #8720 got merged into the main. Could you confirm the same?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

query: tenant header not forwarded on distributed Query gRPC fanout

2 participants