Skip to content

perf: Remove redundant entity key serialization in online_read for SQLite/MySQL/Snowflake #6005

@abhijeet-dhumal

Description

@abhijeet-dhumal

Problem

In online_read() for SQLite, MySQL, and Snowflake online stores, entity keys are serialized twice:

  1. First to build the SQL WHERE clause
  2. Again when iterating through results to lookup in the grouped rows dict

This doubles the serialization overhead unnecessarily.

Affected Code

SQLite (sdk/python/feast/infra/online_stores/sqlite.py):

# Line 247-253: First serialization
serialized_entity_keys = [
    serialize_entity_key(entity_key, entity_key_serialization_version=...)
    for entity_key in entity_keys
]

# Line 266-270: REDUNDANT second serialization
for entity_key in entity_keys:
    entity_key_bin = serialize_entity_key(entity_key, ...)  # Already computed above!
    res = rows.get(entity_key_bin, [])

Same pattern exists in:

  • mysql_online_store/mysql.py (lines 207-208)
  • snowflake.py (lines 200-201)

Impact

For a request with N entities:

  • Current: 2N serialize_entity_key() calls
  • Fixed: N serialize_entity_key() calls

At 500 entities, this eliminates 500 redundant serialization calls per feature view read.

Proposed Fix

Reuse the already-computed serialized_entity_keys list:

serialized_entity_keys = [
    serialize_entity_key(entity_key, entity_key_serialization_version=...)
    for entity_key in entity_keys
]
# ... execute query ...

for i, entity_key in enumerate(entity_keys):
    entity_key_bin = serialized_entity_keys[i]  # Reuse instead of re-serialize
    # ... rest of logic ...

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions