perf: Parallelize DynamoDB batch reads in sync online_read#6024
Open
abhijeet-dhumal wants to merge 3 commits intofeast-dev:masterfrom
Open
perf: Parallelize DynamoDB batch reads in sync online_read#6024abhijeet-dhumal wants to merge 3 commits intofeast-dev:masterfrom
abhijeet-dhumal wants to merge 3 commits intofeast-dev:masterfrom
Conversation
Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
2a3e9b8 to
f75c446
Compare
Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
…B reads Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
ntkathole
reviewed
Feb 25, 2026
| _get_table_name(online_config, config, table) | ||
| ) | ||
| table_name = _get_table_name(online_config, config, table) | ||
| table_instance = dynamodb_resource.Table(table_name) |
Member
There was a problem hiding this comment.
table_instance is only used in the single-batch path, and seems unnecessary since table_name can be used instead of table_instance.name
ntkathole
reviewed
Feb 25, 2026
|
|
||
| # Use ThreadPoolExecutor for parallel I/O | ||
| # Cap at 10 workers to avoid excessive thread creation | ||
| max_workers = min(len(batches), 10) |
Member
There was a problem hiding this comment.
I think this should be configurable (max_read_workers) in online store configs
ntkathole
reviewed
Feb 25, 2026
| # Execute batch requests in parallel for multiple batches | ||
| # Note: boto3 resources are NOT thread-safe, so we create a new resource per thread | ||
| def fetch_batch(batch: List[str]) -> Dict[str, Any]: | ||
| thread_resource = _initialize_dynamodb_resource( |
Member
There was a problem hiding this comment.
I think creating here we are creating new session on each thread. Instead we can share a single _dynamodb_client across all threads.
def fetch_batch(batch: List[str]) -> Dict[str, Any]:
batch_entity_ids = self._to_client_batch_get_payload(
online_config, table_name, batch
)
return dynamodb_client.batch_get_item(RequestItems=batch_entity_ids)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Execute DynamoDB BatchGetItem requests in parallel using ThreadPoolExecutor instead of sequentially. This significantly reduces latency when reading features for many entities that span multiple batches.
Changes
ThreadPoolExecutorto execute batch requests concurrentlyExpected Behavior
For multiple batches, DynamoDB BatchGetItem requests should execute in parallel, reducing total latency from
N × network_latencyto approximately1 × network_latency.Current Behavior
The sync
online_readmethod executes batch requests sequentially in a while loop:For 500 entities with
batch_size=100, this makes 5 sequential network calls.Steps to Reproduce
batch_size=100get_online_featuresfor 500 entitiesSpecifications
sdk/python/feast/infra/online_stores/dynamodb.pyPerformance Impact
For 500 entities with batch_size=100 (5 batches):
Possible Solution
Already implemented in this PR using
ThreadPoolExecutor:Related
online_read_async) already usesasyncio.gather()for parallel execution