🧠 GemmaSense: Hybrid Architecture for Robotics Perception

GemmaSense is a Hybrid Vision-Language System that solves the "Semantic Grounding" problem in robotics. It bridges the gap between fast, local edge inference and deep, cloud-based visual reasoning.

🏗 The Hybrid Brain Architecture

🔄 System Lifecycle & Pipeline

GemmaSense follows a rigorous three-phase lifecycle designed for robust robotic deployment:

Phase 1: Training - Local dataset collection, automated labeling via PaliGemma 2, and high-speed LoRA fine-tuning for environment-specific adaptation.
Phase 2: Testing - Visual evaluation and confidence scoring. The system intelligently routes queries between the Local Edge (low latency) and the Cloud Brain (Gemini 3.0 - high reasoning) based on uncertainty thresholds.
Phase 3: Deployment - Live hybrid inference stream feeding into an interactive Semantic Grounding Q&A loop, ultimately generating high-precision robotic action maps.

🌓 Operating Modes

GemmaSense operates in two distinct modes to ensure the robot is never "lost" in a new environment:

Local Edge Mode (PaliGemma 2): Quantized 3B model running on-device for known environments. Low latency, zero-shot detection.
Cloud Brain Mode (Gemini 3.0): High-level reasoning for environment shifts. Triggered when local confidence is low. Deep contextual understanding and recursive grounding.

✨ Features

Semantic Grounding: Detects objects and simultaneously queries the user for their "state" (e.g., Is that soldering iron hot? Is that glass beaker fragile?).
Context Transfer: Allows a robot trained for a "Kitchen" to understand a "Workshop" by leveraging high-level cloud reasoning.
LoRA Support: Includes a built-in training engine to fine-tune local models on your own specialized datasets.
Interactive UI: A futuristic HUD for human-in-the-loop verification of robotic world models.

🚀 Getting Started

📦 Installation

git clone https://github.com/pgeedh/GemmaSense
cd GemmaSense

# Setup Python environment
pip install -r requirements.txt

# (Optional) Add your Gemini API Key for Cloud Mode
export GEMINI_API_KEY="your_key_here"

# Start the engine
chmod +x start.sh
./start.sh

🧠 Training locally

To tune the model for your specific kitchen or lab:

Place 10-20 images in dataset/images/.
Run python3 auto_label.py to bootstrap descriptions.
Run python3 train_engine.py to generate your local LoRA adapter.

🦾 Robotic Stack Integration (LeRobot / SO-101)

GemmaSense is designed to be plugged directly into high-level robotic control stacks such as LeRobot. It acts as a "Cognitive Filter" that sits between the low-level camera feed and the motion policy execution.

Architectural Flow:

🛠 Integration Steps:

Initialize the Node: Import the GemmaSenseNode into your LeRobot policy script.
Context Injection: Before the arm executes a pick or place command, pass the current camera frame to the node.
Semantic Verification: The node analyzes the object (e.g., "glass") for contextual safety (is it fragile? is it full?).
Human-in-the-Loop (HITL): If uncertainty exists, the system triggers an inquiry.
Motion Execution: The robot only proceeds once the semantic context is grounded and verified.

Headless Integration Example:

from gemma_sense_node import GemmaSenseNode
from PIL import Image

# Initialize the engine
node = GemmaSenseNode()

# Capture frame from your robot's camera
frame = Image.open("robot_view.jpg")

# Get contextual grounding before executing a move
context = node.ground_objects(frame, target_object="glass")

if context['safety_warning']:
    print(f"ROBOT HALT: {context['recommended_inquiry']}")
    # Trigger voice prompt or wait for user confirmation
else:
    # Execute SO-101 movement policy
    pass

📁 Repository Structure

backend/: FastAPI server managing the PaliGemma/Gemini hybrid logic.
frontend/: Real-time dashboard for visual grounding.
train_engine.py: Native PyTorch loop for high-speed local fine-tuning.
auto_label.py: Utility to auto-generate training data using pre-trained models.

“Bridging the gap between seeing and understanding, from the edge to the cloud.”

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
backend		backend
dataset		dataset
frontend		frontend
README.md		README.md
auto_label.py		auto_label.py
fine_tune_robotics.py		fine_tune_robotics.py
gemma_sense_node.py		gemma_sense_node.py
requirements.txt		requirements.txt
start.sh		start.sh
train_engine.py		train_engine.py
train_faster.py		train_faster.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 GemmaSense: Hybrid Architecture for Robotics Perception

🏗 The Hybrid Brain Architecture

🔄 System Lifecycle & Pipeline

🌓 Operating Modes

✨ Features

🚀 Getting Started

📦 Installation

🧠 Training locally

🦾 Robotic Stack Integration (LeRobot / SO-101)

Architectural Flow:

🛠 Integration Steps:

Headless Integration Example:

📁 Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 GemmaSense: Hybrid Architecture for Robotics Perception

🏗 The Hybrid Brain Architecture

🔄 System Lifecycle & Pipeline

🌓 Operating Modes

✨ Features

🚀 Getting Started

📦 Installation

🧠 Training locally

🦾 Robotic Stack Integration (LeRobot / SO-101)

Architectural Flow:

🛠 Integration Steps:

Headless Integration Example:

📁 Repository Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages