Codex/primary blocking bzpopmin by bootjp · Pull Request #372 · bootjp/elastickv

bootjp · 2026-03-20T11:11:42Z

No description provided.

gemini-code-assist · 2026-03-20T11:12:10Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the operational resilience and Redis compatibility of the ElasticKV system. It provides a crucial migration path for legacy Raft storage, ensuring data integrity during upgrades. Furthermore, it refines the Redis proxy's behavior by correctly handling blocking commands with dynamic timeouts and implementing intelligent caching for Lua scripts, which is vital for dual-write scenarios. The changes also include enhancements to the Jepsen testing framework and a more robust Pub/Sub shadowing mechanism, collectively contributing to a more stable, maintainable, and performant distributed key-value store.

Highlights

Raft Storage Migration: Introduced a new command-line tool and updated documentation to facilitate the migration of Raft logs and stable state from legacy BoltDB files to the more modern Pebble storage format. This ensures smoother upgrades and better data management.
Redis Blocking Commands Handling: Enhanced the Redis proxy to correctly identify and apply appropriate timeouts for various Redis blocking commands (e.g., BZPOPMIN, BLPOP, XREAD). This prevents proxy timeouts from prematurely interrupting long-running blocking operations.
Redis Script Caching: Implemented a caching mechanism for Lua scripts within the Redis proxy. This allows the proxy to remember scripts sent via EVAL and automatically fall back to EVAL if a subsequent EVALSHA command fails due to the script not being present on a secondary Redis instance.
Jepsen Test Robustness: Improved the Jepsen test suite by adding a fail-on-invalid! function. This ensures that the test runner explicitly fails if Jepsen's analysis determines the history of operations to be invalid, making test outcomes more reliable.
Shadow Pub/Sub Refactoring: Refactored the shadow Pub/Sub divergence detection mechanism to manage unmatched secondary messages directly within each shadowPubSub instance, rather than using a global buffer. This improves memory management, simplifies concurrency control, and enhances the accuracy of divergence reporting.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant set of features, including a migration tool for legacy BoltDB Raft storage to Pebble, comprehensive documentation, and a robust rolling update script. It also enhances the proxy by adding support for blocking commands with dynamic timeouts and improving Lua script handling with caching and NOSCRIPT error fallbacks. The changes are well-tested and substantially improve the system's operational capabilities. My review includes a few suggestions to enhance code clarity and reduce duplication in the new migration logic and helper scripts.

I am having trouble creating individual review comments. Click here to see my feedback.

internal/raftstore/migrate.go (144-158)

The functions copyLegacyStable and copyLegacyLogs both contain boilerplate code for opening a read-only bbolt transaction and iterating over a bucket. This can be extracted into a helper function to reduce duplication and make the migration logic clearer. For example:

func forEachInBucket(db *bbolt.DB, bucketName []byte, fn func(k, v []byte) error) error {
    return errors.WithStack(db.View(func(tx *bbolt.Tx) error {
        bucket := tx.Bucket(bucketName)
        if bucket == nil {
            return errors.Newf("legacy bucket %q not found", string(bucketName))
        }
        return bucket.ForEach(fn)
    }))
}

Using this helper would simplify both copyLegacyStable and copyLegacyLogs.

References

Centralize common complex logic into shared helper functions to avoid code duplication and improve maintainability.

proxy/script_cache.go (105-114)

The stringArg function's default case is a bit inefficient as it creates a new slice []any{arg} just to call argsToBytes. This can be simplified for better readability and performance by handling the default case with fmt.Sprintf directly and moving the strings.ToLower call to the end.

func stringArg(arg any) string {
	var s string
	switch v := arg.(type) {
	case []byte:
		s = string(v)
	case string:
		s = v
	default:
		s = fmt.Sprintf("%v", v)
	}
	return strings.ToLower(s)
}

scripts/rolling-update.sh (386-393)

The functions grpc_healthy and peer_grpc_healthy have very similar logic for checking TCP port availability. You can combine them into a single function that accepts the host as an argument to reduce code duplication and improve maintainability.

You could replace both with a single function:

is_grpc_healthy() {
  local host="$1"
  bash -lc "exec 3<>/dev/tcp/${host}/${RAFT_PORT}" 2>/dev/null
}

Then you can replace calls to grpc_healthy with is_grpc_healthy "$NODE_HOST" and peer_grpc_healthy "$some_host" with is_grpc_healthy "$some_host".

Copilot

Pull request overview

Adds proxy-side support to better handle Redis blocking commands and script replication in dual-write mode.

Changes:

Add per-command timeout handling for blocking commands (e.g., BZPOPMIN, BLMOVE, XREAD BLOCK) via a timeout-aware backend path.
Add a lightweight script cache to enable EVALSHA → EVAL fallback on secondary when it returns NOSCRIPT.
Force the go-redis client to use RESP2 for upstream connections, with tests covering these behaviors.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`proxy/script_cache.go`	Introduces script digesting, caching, `NOSCRIPT` detection, and `EVALSHA` fallback argument construction.
`proxy/blocking.go`	Adds blocking-command timeout parsing utilities and a timeout-capable backend interface.
`proxy/dualwrite.go`	Uses per-call timeout override for blocking commands; remembers scripts after script commands; retries secondary writes on `NOSCRIPT` with fallback args.
`proxy/backend.go`	Sets upstream protocol to RESP2 and adds `DoWithTimeout` implementation on `RedisBackend`.
`proxy/proxy_test.go`	Adds unit tests for blocking timeout parsing, timeout-aware backend usage, RESP2 enforcement, and `EVALSHA` fallback behavior.

Copilot · 2026-03-20T11:30:38Z

proxy/script_cache.go

+func (d *DualWriter) evalFallbackArgs(cmd string, iArgs []any) ([]any, bool) {
+	upper := strings.ToUpper(cmd)
+	if upper != cmdEvalSHA && upper != "EVALSHA_RO" {
+		return nil, false
+	}
+	if len(iArgs) < minEvalSHAArgs {
+		return nil, false
+	}
+
+	sha := stringArg(iArgs[1])
+	script, ok := d.lookupScript(sha)
+	if !ok {
+		return nil, false
+	}
+
+	fallback := make([]any, len(iArgs))
+	fallback[0] = []byte(cmdEval)
+	fallback[1] = []byte(script)
+	copy(fallback[2:], iArgs[2:])
+	return fallback, true


evalFallbackArgs treats EVALSHA_RO the same as EVALSHA but always falls back to EVAL. For read-only scripts this changes semantics (and may fail on servers enforcing read-only script execution). Preserve the original command by falling back to EVAL_RO when the incoming command is EVALSHA_RO (and similarly keep EVAL for EVALSHA).

Copilot · 2026-03-20T11:30:38Z

proxy/dualwrite.go

 	result := d.primary.Do(ctx, iArgs...)
 	resp, err := result.Result()
 	d.metrics.CommandDuration.WithLabelValues(cmd, d.primary.Name()).Observe(time.Since(start).Seconds())
+	d.rememberScript(cmd, args)



rememberScript(cmd, args) is called before checking err from the primary. This means a failed SCRIPT FLUSH would still clear the local script cache, creating an inconsistent state where the primary still has scripts cached but the proxy can no longer fall back from EVALSHA to EVAL for secondary writes. Update the flow so cache mutations (especially SCRIPT FLUSH) only happen when the primary successfully applied the command (e.g., call rememberScript after the error return, or pass err into rememberScript and skip clearScripts() on failure).

Copilot · 2026-03-20T11:30:39Z

proxy/script_cache.go

+func (d *DualWriter) storeScript(script string) {
+	sha := scriptSHA(script)
+
+	d.scriptMu.Lock()
+	defer d.scriptMu.Unlock()
+	d.scripts[sha] = script
+}


The script cache stores full script bodies in an unbounded map (d.scripts[sha] = script) with no size cap or eviction. In a long-running proxy, clients that generate many unique scripts (or large scripts) can cause unbounded memory growth. Consider adding a maximum entry count and eviction strategy (e.g., LRU), and/or limiting stored script size to reduce OOM risk.

bootjp · 2026-03-20T11:39:00Z

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot · 2026-03-20T11:39:07Z

@bootjp I've opened a new pull request, #373, to work on those changes. Once the pull request is ready, I'll request review from you.

…add bounded eviction Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

proxy: fix EVALSHA_RO fallback, guard script cache on primary error, bounded eviction

Hot fix/grpc resourceexhausted

bootjp added 2 commits March 20, 2026 19:52

proxy: honor blocking command timeouts

b751fd1

proxy: reduce secondary replay mismatches

7f3f917

bootjp changed the base branch from main to feature/rolling-update March 20, 2026 11:12

gemini-code-assist bot reviewed Mar 20, 2026

View reviewed changes

bootjp requested a review from Copilot March 20, 2026 11:26

Copilot started reviewing on behalf of bootjp March 20, 2026 11:27 View session

Base automatically changed from feature/rolling-update to main March 20, 2026 11:28

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Initial plan

969c7b7

Copilot AI mentioned this pull request Mar 20, 2026

proxy: fix EVALSHA_RO fallback, guard script cache on primary error, bounded eviction #373

Merged

bootjp and others added 4 commits March 20, 2026 20:47

Raise gRPC message size limits

e8d532c

proxy: fix EVALSHA_RO fallback, guard script cache on primary error, …

f26bcd3

…add bounded eviction Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Merge pull request #373 from bootjp/copilot/sub-pr-372

055fa86

proxy: fix EVALSHA_RO fallback, guard script cache on primary error, bounded eviction

Merge branch 'main' into codex/primary-blocking-bzpopmin

201bda9

bootjp enabled auto-merge March 20, 2026 11:52

Merge pull request #374 from bootjp/hot-fix/grpc-resourceexhausted

d12b3b6

Hot fix/grpc resourceexhausted

bootjp merged commit 0f2a871 into main Mar 20, 2026
8 checks passed

bootjp deleted the codex/primary-blocking-bzpopmin branch March 20, 2026 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex/primary blocking bzpopmin#372

Codex/primary blocking bzpopmin#372
bootjp merged 8 commits intomainfrom
codex/primary-blocking-bzpopmin

bootjp commented Mar 20, 2026

Uh oh!

gemini-code-assist bot commented Mar 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

bootjp commented Mar 20, 2026

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bootjp commented Mar 20, 2026

Uh oh!

gemini-code-assist bot commented Mar 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

internal/raftstore/migrate.go (144-158)

proxy/script_cache.go (105-114)

scripts/rolling-update.sh (386-393)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Mar 20, 2026

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants