perf: Parallelize DynamoDB batch reads in sync online_read by abhijeet-dhumal · Pull Request #6024 · feast-dev/feast

abhijeet-dhumal · 2026-02-25T14:47:09Z

Summary

Execute DynamoDB BatchGetItem requests in parallel using ThreadPoolExecutor instead of sequentially. This significantly reduces latency when reading features for many entities that span multiple batches.

Changes

Pre-split entity IDs into batches upfront
Use ThreadPoolExecutor to execute batch requests concurrently
Skip parallelization for single batch (no overhead)
Merge results in original order after parallel fetch

Expected Behavior

For multiple batches, DynamoDB BatchGetItem requests should execute in parallel, reducing total latency from N × network_latency to approximately 1 × network_latency.

Current Behavior

The sync online_read method executes batch requests sequentially in a while loop:

while True:
    batch = list(itertools.islice(entity_ids_iter, batch_size))
    if len(batch) == 0:
        break
    response = dynamodb_resource.batch_get_item(RequestItems=...)  # Sequential!
    result.extend(batch_result)

For 500 entities with batch_size=100, this makes 5 sequential network calls.

Steps to Reproduce

Configure DynamoDB online store with batch_size=100
Call get_online_features for 500 entities
Profile network latency - observe 5 sequential calls

Specifications

Version: 0.47.0+
Platform: All
Subsystem: sdk/python/feast/infra/online_stores/dynamodb.py

Performance Impact

For 500 entities with batch_size=100 (5 batches):

Before: 5 sequential network calls = 50-150ms
After: 5 parallel network calls = 10-30ms
Estimated savings: 40-120ms for large entity sets

Possible Solution

Already implemented in this PR using ThreadPoolExecutor:

with ThreadPoolExecutor(max_workers=min(len(batches), batch_size)) as executor:
    responses = list(executor.map(fetch_batch, batches))

I think creating here we are creating new session on each thread. Instead we can share a single _dynamodb_client across all threads.

def fetch_batch(batch: List[str]) -> Dict[str, Any]: batch_entity_ids = self._to_client_batch_get_payload( online_config, table_name, batch ) return dynamodb_client.batch_get_item(RequestItems=batch_entity_ids)

https://docs.aws.amazon.com/boto3/latest/guide/clients.html#multithreading-or-multiprocessing-with-clients

abhijeet-dhumal requested a review from a team as a code owner February 25, 2026 14:47

abhijeet-dhumal changed the title ~~perf: parallelize DynamoDB batch reads in sync online_read~~ perf: Parallelize DynamoDB batch reads in sync online_read Feb 25, 2026

perf: Parallelize DynamoDB batch reads in sync online_read

f75c446

Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>

abhijeet-dhumal force-pushed the perf/parallel-dynamodb-batch-reads branch from 2a3e9b8 to f75c446 Compare February 25, 2026 14:49

This comment was marked as resolved.

Sign in to view

abhijeet-dhumal added 2 commits February 25, 2026 20:32

test: add unit tests for DynamoDB parallel batch reads

a00a2c3

Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>

fix: address thread-safety and max_workers issues in parallel DynamoD…

703606c

…B reads Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>

ntkathole reviewed Feb 25, 2026

View reviewed changes

+                      # Execute batch requests in parallel for multiple batches
+                      # Note: boto3 resources are NOT thread-safe, so we create a new resource per thread
+                      def fetch_batch(batch: List[str]) -> Dict[str, Any]:
+                          thread_resource = _initialize_dynamodb_resource(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Parallelize DynamoDB batch reads in sync online_read#6024

perf: Parallelize DynamoDB batch reads in sync online_read#6024
abhijeet-dhumal wants to merge 3 commits intofeast-dev:masterfrom
abhijeet-dhumal:perf/parallel-dynamodb-batch-reads

abhijeet-dhumal commented Feb 25, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

ntkathole Feb 25, 2026

Uh oh!

ntkathole Feb 25, 2026

Uh oh!

ntkathole Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                          _get_table_name(online_config, config, table)
-                      )
+                      table_name = _get_table_name(online_config, config, table)
+                      table_instance = dynamodb_resource.Table(table_name)

+                      # Use ThreadPoolExecutor for parallel I/O
+                      # Cap at 10 workers to avoid excessive thread creation
+                      max_workers = min(len(batches), 10)

Conversation

abhijeet-dhumal commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Expected Behavior

Current Behavior

Steps to Reproduce

Specifications

Performance Impact

Possible Solution

Related

Uh oh!

This comment was marked as resolved.

Uh oh!

ntkathole Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

ntkathole Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

ntkathole Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abhijeet-dhumal commented Feb 25, 2026 •

edited

Loading