PVC Evictor #215

guygir · 2025-12-15T17:47:10Z

The PVC Evictor is a multi process kubernetes deployment designed to automatically manage disk space on PVCs used for vLLM KV cache offloading. It monitors disk usage and automatically deletes cold cache files when storage thresholds are exceeded, ensuring continuous operation of vLLM workloads without manual intervention.

Follow up optimization plans can be seen at Issue PVC Evictor optimizations #218 .
Once complete benchmark results on llm-d will be available, I'll post them as a comment here.

guygir · 2025-12-16T16:46:33Z

Follow up optimization plans can be seen at Issue #218 .
Currently working on addressing Kfir's feedback.

…rove code quality

guygir · 2025-12-18T08:11:24Z

Above commit was following @kfirtoledo's feedback.
The changes maintain all existing functionality, they're mostly about maintainability and design practices.

Code Structure: Split pvc_evictor.py into modular components: config.py for configuration management, processes/ directory for crawler/activator/deleter processes, utils/ for system utilities (mostly logging), and evictor.py as the main process.
Deployment: Moved to Docker image deployment (currently stored at ghcr.io/guygir/pvc-evictor:latest). Simplified deploy.sh script. Changed all oc commands to kubectl.
Code Quality: Renamed variables, moved non-critical logs to DEBUG level, added inline design decision comments and renamed processes.
Logic: Simplified cache path structure and exception handling. Removed overkill fallbacks.
Documentation: Updated README.md and QUICK_START.md to reflect (hopefully) all changes.

orozery

Code looks very good overall!
General thoughts:

Currently the crawler assumes a specific directory structure. Any reason why not remove this dependency?
Each process logs its own periodic outputs. I think it's cleaner to aggregate all stats to the main evictor process and have it do the periodic outputs.

guygir · 2026-01-04T08:32:40Z

Code looks very good overall! General thoughts:

Currently the crawler assumes a specific directory structure. Any reason why not remove this dependency?

Each process logs its own periodic outputs. I think it's cleaner to aggregate all stats to the main evictor process and have it do the periodic outputs.

Thanks!

The hardcoded structure was mainly for efficiency (the offload connector uses a known structure) and safety (to avoid deleting unwanted files in the folder - this concern may be an overkill...). It could be that flexibility is more important.
I think it's a good idea. The original design was due to process isolation, and to make sure that important logs could be sent immediately without coordination overhead. But this does lead to messier logs. Aggregated logging would be cleaner (maybe apart from crucial warning/error logs - but even that separation may be unnecessary)

sagearc · 2026-01-07T19:06:38Z

kv_connectors/pvc_evictor/deploy.sh

I'm a bit concerned about the maintainability of deploy.sh long-term. While bash is okay for simple cases, at 100+ lines it becomes unreadable and difficult to debug.

Can we decouple this into a helm chart? It would allow us to define defaults cleanly in values.yaml and let users inject overrides via --set or -f values.yaml without needing custom parsing logic.

Thanks for the suggestion, I agree - this makes more sense.
The first implementation was more like a PoC, but Helm should fit better within the llm-d framework, will improve consistency and maintainability.

Copilot

Pull request overview

This PR introduces the PVC Evictor, a multi-process Kubernetes deployment that automatically manages disk space on PVCs used for vLLM KV cache offloading. The system monitors disk usage and deletes cold cache files when storage thresholds are exceeded, implementing a hot/cold cache eviction strategy based on file access time.

Changes:

Implements an N+2 process architecture with configurable crawler processes (1, 2, 4, 8, or 16), an activator process for monitoring, and a deleter process for batch file deletion
Uses multiprocessing with Queue and Event primitives for inter-process communication and coordination
Provides automated deployment via deploy.sh script with auto-detection of namespace-specific security contexts

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
config.py	Configuration management using environment variables with dataclass-based Config
evictor.py	Main coordinator spawning N+2 processes with graceful shutdown handling
utils/system.py	Disk usage monitoring via statvfs() and logging setup utilities
utils/init.py	Empty initialization module for utils package
processes/crawler.py	Crawler processes that discover and queue cache files with hex-based partitioning
processes/activator.py	Activator process monitoring disk usage and controlling deletion triggers
processes/deleter.py	Deleter process performing batch file deletions using xargs
processes/init.py	Empty initialization module for processes package
deployment_evictor.yaml	Kubernetes deployment manifest with configurable environment variables
deploy.sh	Bash deployment script with auto-detection of security context values
Dockerfile	Container image definition using Python 3.12-slim base
README.md	Comprehensive documentation covering architecture, configuration, and deployment
QUICK_START.md	Quick start guide with deployment examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-12T06:56:45Z

kv_connectors/pvc_evictor/evictor.py

+            "file_access_time_threshold_minutes": self.config.file_access_time_threshold_minutes,
+        }
+
+        self.logger.info("PVC Cleanup Service v4 (10-Process Architecture) initialized")


The comment on line 74 states "10-Process Architecture" but this is misleading since the number of processes is actually N+2 where N is configurable (1, 2, 4, 8, or 16). With the default of 8 crawlers, this would be 10 processes, but the architecture is not fixed at 10 processes. This should be updated to reflect the dynamic nature of the process count.

Suggested change

self.logger.info("PVC Cleanup Service v4 (10-Process Architecture) initialized")

self.logger.info(

f"PVC Cleanup Service v4 (N+2-Process Architecture: "

f"{config.num_crawler_processes + 2} total processes) initialized"

)

Copilot · 2026-01-12T06:56:45Z

kv_connectors/pvc_evictor/processes/deleter.py

+
+                    should_process_partial = (
+                        current_batch and (
+                            (time_since_last_check >= partial_batch_timeout and len(current_batch) > 0) or


The redundant condition "len(current_batch) > 0" in line 202 is unnecessary since line 201 already checks "current_batch" which is falsy when empty. The additional length check adds no value and reduces code clarity.

Suggested change

(time_since_last_check >= partial_batch_timeout and len(current_batch) > 0) or

(time_since_last_check >= partial_batch_timeout) or

Copilot · 2026-01-12T06:56:45Z

kv_connectors/pvc_evictor/processes/activator.py

+                    except:
+                        pass


Bare except clause on line 59 silently catches all exceptions without logging. If there's an issue with the deletion event check or queue status, it will be silently ignored. Consider either being more specific about which exceptions to catch or at least logging that an exception occurred.

Suggested change

except:

pass

except Exception as e:

logger.warning(

f"Queue status check failed: {e}"

)

Copilot · 2026-01-12T06:56:46Z

kv_connectors/pvc_evictor/processes/deleter.py

+                if int(time.time()) % 30 == 0:  # Every 30 seconds
+                    logger.debug("Deletion OFF - waiting for trigger")
+                time.sleep(0.5)


The timing-based periodic check on line 264 using int(time.time()) % 30 == 0 may log multiple times or miss the logging window entirely. If the loop iterates rapidly, it could log multiple times within the same second. If it takes longer than 1 second between iterations, it might skip the exact second where the modulo equals 0. Consider using a last_log_time variable to track when the message was last logged, similar to the heartbeat_interval pattern used elsewhere in the code.

Copilot · 2026-01-12T06:56:46Z

kv_connectors/pvc_evictor/processes/activator.py

+                if int(current_time) % 10 == 0:  # Every 10 seconds
+                    try:
+                        # Try to get queue size (might fail if queue is in different process)
+                        logger.debug(
+                            f"Queue status check (deletion={'ON' if deletion_event.is_set() else 'OFF'})"
+                        )
+                    except:
+                        pass


The timing-based periodic check on line 53 using int(current_time) % 10 == 0 has the same issue as found in the deleter process. This could log multiple times per second if the loop is fast, or skip the logging window entirely if iterations take longer than 1 second. Consider using a time-since-last-log pattern instead.

Copilot · 2026-01-12T06:56:48Z

kv_connectors/pvc_evictor/processes/deleter.py

+                    queue_empty = False
+                    try:
+                        queue_empty = file_queue.empty()
+                    except Exception:
+                        pass
+
+                    should_process_partial = (
+                        current_batch and (
+                            (time_since_last_check >= partial_batch_timeout and len(current_batch) > 0) or
+                            queue_empty


The logic to check if the queue is empty uses a try/except block and sets queue_empty to False by default. However, if an exception occurs, queue_empty remains False, which means the code will assume the queue is NOT empty when in fact the state is unknown. This could lead to partial batches not being processed when they should be. Consider either setting queue_empty to True on exception (conservative approach to process batches), or handling the exception case differently.

Copilot · 2026-01-12T06:56:48Z

kv_connectors/pvc_evictor/processes/crawler.py

+    def log_timing(event_type: str, duration_ms: float, **kwargs):
+        """Log timing event."""
+        try:
+            unix_timestamp = time.time()
+            extra_fields = ",".join(f"{k}={v}" for k, v in kwargs.items())
+            log_line = f"TIMING_{event_type}:{unix_timestamp:.3f},{duration_ms:.3f}"
+            if extra_fields:
+                log_line += f",{extra_fields}"
+            logger.debug(log_line)
+        except Exception:
+            pass
+


Variable log_timing is not used.

Suggested change

def log_timing(event_type: str, duration_ms: float, **kwargs):

"""Log timing event."""

try:

unix_timestamp = time.time()

extra_fields = ",".join(f"{k}={v}" for k, v in kwargs.items())

log_line = f"TIMING_{event_type}:{unix_timestamp:.3f},{duration_ms:.3f}"

if extra_fields:

log_line += f",{extra_fields}"

logger.debug(log_line)

except Exception:

pass

Copilot · 2026-01-12T06:56:48Z

kv_connectors/pvc_evictor/processes/crawler.py

+            if extra_fields:
+                log_line += f",{extra_fields}"
+            logger.debug(log_line)
+        except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except Exception:

except Exception:

# Best-effort timing log: ignore any logging errors to avoid impacting crawler behavior.

Copilot · 2026-01-12T06:56:49Z

kv_connectors/pvc_evictor/processes/deleter.py

+        except Exception:
+            pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except Exception:

pass

except Exception as e:

# Never let timing/logging failures affect core deletion logic,

# but emit a debug message so issues can be diagnosed if needed.

logger.debug(f"Failed to log timing event '{event_type}': {e}", exc_info=True)

Copilot · 2026-01-12T06:56:49Z

kv_connectors/pvc_evictor/evictor.py

+            except Exception:
+                pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except Exception:

pass

except OSError as exc:

# Continue retrying, but log the error to aid diagnostics.

self.logger.warning(

"Error while checking PVC mount path '%s': %s",

self.config.pvc_mount_path,

exc,

)

kfirtoledo · 2026-01-13T06:25:45Z

kv_connectors/pvc_evictor/QUICK_START.md

@@ -0,0 +1,98 @@
+# Quick Start Guide - PVC Evictor
+
+## Prerequisites


Start with a description of the quick start

kfirtoledo · 2026-01-13T06:27:13Z

kv_connectors/pvc_evictor/QUICK_START.md

+1. OpenShift/Kubernetes cluster access
+2. `kubectl` CLI installed
+3. PVC exists and is bound
+4. Appropriate RBAC permissions


What do you mean by appropriate? And why do you need it?

kfirtoledo · 2026-01-13T06:29:15Z

kv_connectors/pvc_evictor/deploy.sh

@@ -0,0 +1,228 @@
+#!/bin/bash


Can we convert it to a python script with arguments and default and required items

kfirtoledo · 2026-01-13T06:33:27Z

kv_connectors/pvc_evictor/QUICK_START.md

+**Arguments:**
+- `pvc-name`: Name of the PVC to manage - **Required** (first positional argument)
+- `--namespace=<namespace>`: Kubernetes namespace - **Optional** (auto-detected from `kubectl config` context if not provided)
+- `--fsgroup=<fsgroup>`: Filesystem group ID - **Optional but Recommended** (auto-detected from existing pods/deployments if not provided)


Why do we need sgroup, selinux-level, runasuser`? Can you also explain how you can auto-detect it they can be multiple deployment?

kfirtoledo · 2026-01-13T06:35:56Z

kv_connectors/pvc_evictor/QUICK_START.md

+
+## Quick Deployment
+
+### Option 1: Using deploy.sh (Recommended)


where is option 2?
Add an option using yaml with the image configuration

kfirtoledo · 2026-01-13T07:05:22Z

kv_connectors/pvc_evictor/deployment_evictor.yaml

+          value: "/tmp/evictor_all_logs.txt"
+
+        volumeMounts:
+        - name: kv-cache-storage


This should also be defined by the user; maybe we can use Helm

kfirtoledo · 2026-01-13T07:06:32Z

kv_connectors/pvc_evictor/deployment_evictor.yaml

+
+        resources:
+          requests:
+            cpu: 1000m  # Base CPU for processes


replace with 1

kfirtoledo · 2026-01-13T07:07:08Z

kv_connectors/pvc_evictor/deployment_evictor.yaml

+            cpu: 4000m  # Allow more CPU for parallel processing (adjust based on NUM_CRAWLER_PROCESSES)
+            memory: 2Gi  # Allow more memory for parallel processing
+
+        # Health checks


Try to remove what is not necessary

kfirtoledo · 2026-01-13T07:08:12Z

kv_connectors/pvc_evictor/config.py

+    cache_directory: str  # Subdirectory within PVC containing cache files (default: kv/model-cache/models)
+    dry_run: bool  # If true, simulate deletion without actually deleting files (default: false)
+    log_level: str  # Logging verbosity: DEBUG, INFO, WARNING, ERROR (default: INFO)
+    timing_file_path: str  # Path for timing analysis file (default: /tmp/timing_analysis.txt, reserved for future use)


Why do we need it?

kfirtoledo · 2026-01-13T07:09:03Z

kv_connectors/pvc_evictor/config.py

+    file_access_time_threshold_minutes: (
+        float  # Skip files accessed within this time (default: 60.0 minutes)
+    )
+    log_file_path: Optional[str] = None  # Optional file path to write logs to


Not sure if we need it

…ctoring

guygir · 2026-01-22T09:27:06Z

@kfirtoledo All comments addressed and fixes applied.

sagearc

Great job getting this all working!

A quick note on process: This PR is massive (~3k lines). In the future, it would be great to break this down into smaller, stackable PRs so we can review the logic more carefully.

sagearc · 2026-01-22T10:31:37Z

kv_connectors/pvc_evictor/config.py

+"""Configuration management for PVC Evictor."""
+
+import os
+from dataclasses import dataclass
+from typing import Optional
+
+
+@dataclass
+class Config:
+    """Configuration loaded from environment variables.
+    Environment variables are used instead of CLI arguments.
+    """
+
+    pvc_mount_path: str  # Mount path of PVC in pod (default: /kv-cache)
+    cleanup_threshold: float  # Disk usage % to trigger deletion (default: 85.0)
+    target_threshold: float  # Disk usage % to stop deletion (default: 70.0)
+    cache_directory: str  # Subdirectory within PVC containing cache files (default: kv/model-cache/models)
+    dry_run: bool  # If true, simulate deletion without actually deleting files (default: false)
+    log_level: str  # Logging verbosity: DEBUG, INFO, WARNING, ERROR (default: INFO)
+    # timing_file_path: Currently not actively used but kept for backward compatibility and future extensibility
+    timing_file_path: str  # Path for timing analysis file (default: /tmp/timing_analysis.txt)
+    num_crawler_processes: int  # P1-PN (default: 8, valid: 1, 2, 4, 8, 16)
+    logger_interval: float  # P9 monitoring interval (default: 0.5s)
+    file_queue_maxsize: int  # Max items in file queue (default: 10000)
+    file_queue_min_size: (
+        int  # Min queue size to maintain when deletion OFF (default: 1000)
+    )
+    deletion_batch_size: int  # Files per deletion batch (default: 100)
+    file_access_time_threshold_minutes: (
+        float  # Skip files accessed within this time (default: 60.0 minutes)
+    )
+    aggregated_logging: bool  # Enable aggregated logging in main process (default: true)
+    aggregated_logging_interval: float  # Interval for aggregated log output in seconds (default: 30.0)
+    cache_structure_mode: str  # Cache structure mode: "file_mapper" (default) or "vllm" (legacy)
+    # log_file_path: Optional file logging for persistent log storage and debugging
+    log_file_path: Optional[str] = None  # Optional file path to write logs to (default: None, stdout only)
+
+    @classmethod
+    def from_env(cls) -> "Config":
+        """Load configuration from environment variables."""
+        return cls(
+            pvc_mount_path=os.getenv("PVC_MOUNT_PATH", "/kv-cache"),
+            cleanup_threshold=float(os.getenv("CLEANUP_THRESHOLD", "85.0")),
+            target_threshold=float(os.getenv("TARGET_THRESHOLD", "70.0")),
+            cache_directory=os.getenv("CACHE_DIRECTORY", "kv/model-cache/models"),
+            dry_run=os.getenv("DRY_RUN", "false").lower() == "true",
+            log_level=os.getenv("LOG_LEVEL", "INFO"),
+            timing_file_path=os.getenv("TIMING_FILE_PATH", "/tmp/timing_analysis.txt"),
+            num_crawler_processes=int(os.getenv("NUM_CRAWLER_PROCESSES", "8")),
+            logger_interval=float(os.getenv("LOGGER_INTERVAL_SECONDS", "0.5")),
+            file_queue_maxsize=int(os.getenv("FILE_QUEUE_MAXSIZE", "10000")),
+            file_queue_min_size=int(os.getenv("FILE_QUEUE_MIN_SIZE", "1000")),
+            deletion_batch_size=int(os.getenv("DELETION_BATCH_SIZE", "100")),
+            log_file_path=os.getenv("LOG_FILE_PATH", None),
+            file_access_time_threshold_minutes=float(
+                os.getenv("FILE_ACCESS_TIME_THRESHOLD_MINUTES", "60.0")
+            ),
+            aggregated_logging=os.getenv("AGGREGATED_LOGGING", "true").lower() == "true",
+            aggregated_logging_interval=float(os.getenv("AGGREGATED_LOGGING_INTERVAL", "30.0")),
+            cache_structure_mode=os.getenv("CACHE_STRUCTURE_MODE", "file_mapper"),
+        )
+


Where is config.Config being used?

Config is imported in evictor.py and used to initialize PVCEvictor using these settings, keeping the configuration logic separate from the main code.

sagearc · 2026-01-22T10:43:50Z

kv_connectors/pvc_evictor/QUICK_START.md

+```
+
+For complete Helm documentation, see [helm/README.md](helm/README.md).
+### Option 2: Using deploy.sh (Legacy - Automated Script)


I noticed this script is being introduced in this PR but is already labeled as 'Legacy.' We generally shouldn't merge code that is deprecated upon arrival. Since this file wasn't in the codebase previously, let's remove it entirely and stick to the Helm chart as the single source of truth.

Will do, awaiting final confirmation from Kfir regarding the file_mapper.py structure and then legacies will be removed completely.

kfirtoledo

Looks good. some minor fixes

kv_connectors/pvc_evictor/helm/templates/_helpers.tpl

kfirtoledo · 2026-01-29T09:07:24Z

kv_connectors/pvc_evictor/helm/README.md

+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `image.repository` | Container image repository | `ghcr.io/guygir/pvc-evictor` |


Can we open in quay for now, a more general place?

Agreed that the image should not remain in my personal ghcr.io account. Will update all image references once the hosting location is decided.

kv_connectors/pvc_evictor/helm/README.md

kv_connectors/pvc_evictor/README.md

kfirtoledo · 2026-01-29T09:15:33Z

kv_connectors/pvc_evictor/README.md

+
+*Figure: PVC Evictor multi-process architecture showing N crawler processes, activator, deleter, and IPC mechanisms (Queue and Events)*
+
+### N+2 Process Design


Maybe combine this with architecture and process details - I don;t want the README willbe too long

Made it more concise.

kv_connectors/pvc_evictor/processes/deleter.py

kv_connectors/pvc_evictor/evictor.py

kv_connectors/pvc_evictor/processes/crawler.py

liu-cong · 2026-02-02T07:13:13Z

kv_connectors/pvc_evictor/README.md

+
+### Hot Cache Configuration
+
+#### `FILE_ACCESS_TIME_THRESHOLD_MINUTES`


Imagine an extreme case where the cache fills the disk quickly, despite they are within the FILE_ACCESS_TIME_THRESHOLD_MINUTES. Should the evictor force deleting files if the size exceeds a "hard threshold" such as 95%?

What's the behavior today if this happens? Failing to write new cache files?

Yes - failing to write new cache files is the behavior today. I've updated the documentation to address this.

For the initial release, the soft threshold approach is designed for large storage deployments where this edge case is unlikely.

Although issue #218 tracks optimizations for the PVC Evictor, which will handle this edge case by moving from a binary cache model to a continuum, so this issue won’t occur.

liu-cong · 2026-02-02T07:14:38Z

kv_connectors/pvc_evictor/QUICK_START.md

+
+## Prerequisites
+
+1. OpenShift/Kubernetes cluster access


Is there anything specific for Openshift, or does this generally work for any k8s cluster?

This should work on any k8s cluster. OpenShift is mentioned because it’s what we use in our test environment. The examples use kubectl to show it works everywhere.

liu-cong · 2026-02-02T07:15:32Z

kv_connectors/pvc_evictor/QUICK_START.md

+3. Helm 3.0+ installed (for Helm deployment)
+4. PVC exists and is bound
+5. Appropriate RBAC permissions to create deployments
+6. **Docker image available** - The evictor uses `ghcr.io/guygir/pvc-evictor:latest`


should we host this image in llm-d?

Agreed that the image should not remain in my personal ghcr.io account. Will update all image references once the hosting location is decided.

liu-cong · 2026-02-02T07:26:15Z

kv_connectors/pvc_evictor/config.py

+    log_level: str  # Logging verbosity: DEBUG, INFO, WARNING, ERROR (default: INFO)
+    # timing_file_path: Currently not actively used but kept for backward compatibility and future extensibility
+    timing_file_path: str  # Path for timing analysis file (default: /tmp/timing_analysis.txt)
+    num_crawler_processes: int  # P1-PN (default: 8, valid: 1, 2, 4, 8, 16)


why does this need to be power of 2? can this be, say, 6?

Power of 2 workers are recommended for best performance, not required.
The crawler partitions work into 16 hex buckets. With power-of-2 worker counts, buckets are (±) evenly distributed. Non-power-of-2 counts still work but may cause uneven load.

Updated the documentation to clarify this.

liu-cong · 2026-02-02T07:29:07Z

kv_connectors/pvc_evictor/QUICK_START.md

+```
+
+For complete Helm documentation, see [helm/README.md](helm/README.md).
+### Option 2: Using deploy.sh (Legacy - Automated Script)


This is a new tool right? What's the "legacy" here?

I suggest removing this option given the yaml and helm options. Unless there is a specific reason

Now that llmd_fs_backend is integrated, there is only one standard approach, so legacy code is no longer needed and has been fully removed.

liu-cong · 2026-02-02T07:38:11Z

kv_connectors/pvc_evictor/QUICK_START.md

+- Default: `8`
+- Valid values: 1, 2, 4, 8, 16
+- More crawlers = faster file discovery on large directories
+- Recommendation: Use 8-16 for multi-TB volumes, 1-4 for smaller volumes


do you have some benchmark data for these recommended values?

These values are based on empirical testing during development, not formal benchmarks, and are meant as starting points.
Optimal settings depend on storage performance, file characteristics, etc.
Update documentation to mark them as recommended starting values, but should be tuned per workload.
Proper benchmarking is tracked in issue #218 .

liu-cong · 2026-02-02T07:39:30Z

kv_connectors/pvc_evictor/deployment_evictor.yaml

+      #   kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.securityContext}'
+      #   kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].securityContext}'
+      securityContext:
+        fsGroup: <your_fsgroup>  # REQUIRED: Set to your namespace's fsGroup. Will try to auto-detect if not provided.


Should this match the security context of the vllm pods?

Yes, the security context must match the vllm pods. The PVC is written by vllm with a specific UID/GID, and the evictor needs the same UID/GID to read and delete files. Mismatches will cause permission denied errors.

…bugs, refactor stats sending and update documentation.

guygir · 2026-02-08T20:58:52Z

All feedback has been addressed. The only remaining item is deciding the official image hosting location.

guygir requested review from dannyharnik, elevran, kfirtoledo and vMaroon as code owners December 15, 2025 17:47

kfirtoledo removed the request for review from elevran December 15, 2025 18:03

guygir added 2 commits December 15, 2025 20:14

Add PVC evictor with auto-detection and improved deployment

effda14

Fixed Cache Structure and .md changes

d931061

guygir force-pushed the evictor branch from 00fe35a to d931061 Compare December 15, 2025 18:15

kfirtoledo requested a review from orozery December 15, 2025 18:17

guygir mentioned this pull request Dec 16, 2025

PVC Evictor optimizations #218

Open

Refactor PVC Evictor: modularize code, add Docker deployment, and imp…

ac69899

…rove code quality

orozery reviewed Jan 4, 2026

View reviewed changes

sagearc suggested changes Jan 7, 2026

View reviewed changes

vMaroon requested a review from Copilot January 12, 2026 06:50

Copilot started reviewing on behalf of vMaroon January 12, 2026 06:51 View session

Copilot AI reviewed Jan 12, 2026

View reviewed changes

kfirtoledo reviewed Jan 13, 2026

View reviewed changes

guygir added 6 commits January 19, 2026 11:07

Phases 1 and 2: Docs and simplex code fixes

76400ef

Phase 3: Improve exception handling, timing, error tracking, and refa…

9eaf4c2

…ctoring

Phase 4: Helm Chart Migration

58a47f2

Phase 5.1: Aggregated Logging

587fe1a

Phase 5.2: Directory Structure part a

5d08635

Phase 5.3: Directory Structure part b (file_mapper.py integration)

1486c9b

vMaroon requested review from hyeongyun0916, liu-cong, sagearc and yankay January 22, 2026 09:14

Last Phase: file_mapper integration and testing

322d441

guygir force-pushed the evictor branch from e615df6 to 322d441 Compare January 22, 2026 09:25

sagearc suggested changes Jan 22, 2026

View reviewed changes

kfirtoledo reviewed Feb 1, 2026

View reviewed changes

liu-cong reviewed Feb 2, 2026

View reviewed changes

Addresses PR llm-d#215 code review feedback. Remove legacy code, fix …

d12b9ae

…bugs, refactor stats sending and update documentation.

vMaroon requested review from liu-cong and sagearc February 8, 2026 19:22

	(time_since_last_check >= partial_batch_timeout and len(current_batch) > 0) or
	(time_since_last_check >= partial_batch_timeout) or

-                    except:
-                        pass
+                    except Exception as e:
+                        logger.warning(
+                            f"Queue status check failed: {e}"
+                        )

	except Exception:
	except Exception:
	# Best-effort timing log: ignore any logging errors to avoid impacting crawler behavior.

-        except Exception:
-            pass
+        except Exception as e:
+            # Never let timing/logging failures affect core deletion logic,
+            # but emit a debug message so issues can be diagnosed if needed.
+            logger.debug(f"Failed to log timing event '{event_type}': {e}", exc_info=True)

-            except Exception:
-                pass
+            except OSError as exc:
+                # Continue retrying, but log the error to aid diagnostics.
+                self.logger.warning(
+                    "Error while checking PVC mount path '%s': %s",
+                    self.config.pvc_mount_path,
+                    exc,
+                )

		@@ -0,0 +1,98 @@
		# Quick Start Guide - PVC Evictor

		## Prerequisites


		## Quick Deployment

		### Option 1: Using deploy.sh (Recommended)


		Figure: PVC Evictor multi-process architecture showing N crawler processes, activator, deleter, and IPC mechanisms (Queue and Events)

		### N+2 Process Design


		### Hot Cache Configuration

		#### `FILE_ACCESS_TIME_THRESHOLD_MINUTES`

PVC Evictor #215

Are you sure you want to change the base?

PVC Evictor #215

Uh oh!

Conversation

guygir commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guygir commented Dec 16, 2025

Uh oh!

guygir commented Dec 18, 2025

Uh oh!

orozery left a comment

Choose a reason for hiding this comment

Uh oh!

guygir commented Jan 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guygir commented Jan 22, 2026

Uh oh!

sagearc left a comment

Choose a reason for hiding this comment

guygir commented Dec 15, 2025 •

edited

Loading

sagearc Jan 22, 2026 •

edited

Loading

guygir Feb 8, 2026 •

edited

Loading