-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
Blazegraph containers started via pyomnigraph have poor SPARQL query performance due to insufficient JVM heap memory allocation. The default Docker container likely uses only 512MB-1GB heap, which is inadequate for complex queries on large datasets like GOV genealogy data.
Complex SPARQL queries (especially with property paths like (gp:isPartOf/gp:ref){1,10}) timeout or run very slowly on local dockerized Blazegraph instances.
Root Cause
The BlazegraphConfig.get_docker_run_command() method in omnigraph/servers/blazegraph.py (lines 37-44) does not set JVM heap parameters via the JAVA_OPTS environment variable.
Proposed Solution
Add JVM memory configuration support to the Blazegraph docker run command:
docker_run_command = (
f"docker run -d --name {self.container_name} "
f"-e BLAZEGRAPH_UID={os.getuid()} "
f"-e BLAZEGRAPH_GID={os.getgid()} "
f"-e JAVA_OPTS='-Xmx4g -Xms4g' " # Add JVM heap settings
f"-p {self.port}:8080 "
f"-v {data_dir}/RWStore.properties:/RWStore.properties "
f"-v {data_dir}:/data "
f"{self.image}"
)Recommendations
- Make it configurable: Add optional
java_optsorheap_sizefield to ServerConfig - Sensible defaults: Use at least 4GB heap by default for production use
- Consider additional optimizations:
- Add query hint settings:
-Dcom.bigdata.btree.writeRetentionQueue.capacity=4000 - Add branching factor:
-Dcom.bigdata.btree.BTree.branchingFactor=128
- Add query hint settings:
Configuration example
Allow users to specify in servers.yaml:
blazegraph:
server: "blazegraph"
heap_size: "6g" # or
java_opts: "-Xmx6g -Xms6g -Dcom.bigdata.btree.writeRetentionQueue.capacity=4000"Impact
This significantly improves query performance for users working with large RDF datasets and complex SPARQL queries.