AI-powered geospatial intelligence platform that combines computer vision and large language models to analyze, interpret, and reason over satellite and aerial imagery.
- Overview
- Key Features
- System Architecture
- Tech Stack
- Project Structure
- Getting Started
- Usage Guide
- Supported Use Cases
- API Reference
- Environment Variables
- Contributing
- Roadmap
- License
Geo-Vision-GPT bridges the gap between raw geospatial imagery and actionable intelligence by leveraging multi-modal large language models. Users can upload satellite images, aerial photographs, or geospatial rasters, and ask natural language questions — the system will visually interpret the content, extract spatial features, and return structured, human-readable insights.
Whether you're monitoring land cover changes, detecting infrastructure, analyzing disaster zones, or understanding urban expansion — Geo-Vision-GPT makes spatial reasoning accessible to both domain experts and non-technical users alike.
| Feature | Description |
|---|---|
| Multi-modal Image Understanding | Upload satellite/aerial images and ask questions in plain English |
| GPT-4 Vision Integration | Uses OpenAI's vision-capable models to reason over geospatial imagery |
| Spatial Feature Extraction | Detects land use, terrain features, water bodies, buildings, and roads |
| Change Detection Prompting | Compare two time-series images and identify spatial changes |
| Natural Language GIS | Query geographic attributes without writing GIS code |
| Batch Processing | Process multiple images via API or CLI for pipeline integration |
| Exportable Insights | Output results as JSON, CSV, or GeoJSON for downstream use |
| Streamlit Frontend | Interactive web interface for drag-and-drop image analysis |
The system follows a layered architecture separating the user interface, orchestration logic, AI backbone, and geospatial tooling:
┌─────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ Streamlit Web App / REST API │
└──────────────────────┬──────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────┐
│ ORCHESTRATION LAYER │
│ Query Parser → Prompt Builder → Response Parser │
└─────────┬────────────────────────────────┬──────────┘
│ │
┌─────────▼──────────┐ ┌──────────▼──────────┐
│ AI BACKBONE │ │ GEOSPATIAL TOOLS │
│ GPT-4 Vision API │ │ Rasterio / GDAL │
│ LangChain Agent │ │ GeoPandas / Shapely │
│ Prompt Templates │ │ OpenStreetMap API │
└─────────┬──────────┘ └──────────┬──────────┘
│ │
┌─────────▼────────────────────────────────▼──────────┐
│ DATA LAYER │
│ Image Storage │ Vector Data │ Result Cache │
└─────────────────────────────────────────────────────┘
| Library | Version | Role |
|---|---|---|
openai |
≥1.0 | GPT-4 Vision API access |
langchain |
≥0.2 | Prompt orchestration, agent chaining |
Pillow |
≥9.0 | Image preprocessing and manipulation |
| Library | Version | Role |
|---|---|---|
rasterio |
≥1.3 | Reading/writing geospatial rasters (GeoTIFF) |
geopandas |
≥0.13 | Vector data handling |
shapely |
≥2.0 | Geometry operations |
pyproj |
≥3.5 | CRS transformations |
folium |
≥0.14 | Interactive map rendering |
| Library | Version | Role |
|---|---|---|
streamlit |
≥1.30 | Web interface |
fastapi |
≥0.100 | REST API backend |
uvicorn |
≥0.23 | ASGI server |
| Library | Version | Role |
|---|---|---|
boto3 |
≥1.28 | AWS S3 image storage (optional) |
redis |
≥4.0 | Response caching |
python-dotenv |
≥1.0 | Environment management |
Geo-Vision-GPT/
│
├── app/
│ ├── main.py # Streamlit entrypoint
│ ├── api.py # FastAPI REST endpoint
│ └── pages/
│ ├── analyze.py # Single image analysis page
│ ├── compare.py # Change detection page
│ └── batch.py # Batch processing page
│
├── core/
│ ├── agent.py # LangChain agent orchestrator
│ ├── prompt_builder.py # Domain-specific prompt templates
│ ├── vision_client.py # OpenAI GPT-4V API wrapper
│ └── response_parser.py # Structured output extraction
│
├── geo/
│ ├── image_loader.py # Rasterio-based image loader
│ ├── preprocessor.py # Tiling, normalization, band selection
│ ├── feature_extractor.py # Spatial feature detection utilities
│ └── exporter.py # GeoJSON / CSV export
│
├── data/
│ ├── sample_images/ # Example satellite images
│ └── outputs/ # Analysis output results
│
├── tests/
│ ├── test_vision_client.py
│ ├── test_geo_preprocessor.py
│ └── test_agent.py
│
├── architecture.excalidraw # System architecture diagram
├── .env.example # Example environment config
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
└── README.md
- Python 3.9+
- An OpenAI API key with access to
gpt-4-vision-previeworgpt-4o GDALsystem dependency (for rasterio)git
Note on GDAL: GDAL must be installed at the OS level before installing rasterio.
- Ubuntu/Debian:
sudo apt-get install gdal-bin libgdal-dev- macOS:
brew install gdal- Windows: Use OSGeo4W or Conda
1. Clone the repository
git clone https://github.com/Shreyashio/Geo-Vision-GPT.git
cd Geo-Vision-GPT2. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
# OR
venv\Scripts\activate # Windows3. Install dependencies
pip install -r requirements.txtCopy the example environment file and fill in your credentials:
cp .env.example .envOpen .env and configure:
# Required
OPENAI_API_KEY=sk-...your-openai-api-key...
# Optional — model selection
OPENAI_MODEL=gpt-4o # Default: gpt-4-vision-preview
# Optional — Redis cache
REDIS_URL=redis://localhost:6379
# Optional — AWS S3 for image storage
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
S3_BUCKET_NAME=geo-vision-gpt-images
# Optional — app config
MAX_IMAGE_SIZE_MB=20
TILE_SIZE=512
APP_PORT=8501Option A — Streamlit UI (recommended for local use)
streamlit run app/main.pyThen open http://localhost:8501 in your browser.
Option B — FastAPI REST API
uvicorn app.api:app --reload --port 8000API documentation is auto-generated at http://localhost:8000/docs.
Option C — Docker Compose
docker-compose up --buildThis spins up the Streamlit UI, FastAPI backend, and Redis cache together.
- Navigate to the Analyze page
- Upload a satellite image (supported:
.tif,.tiff,.png,.jpg,.jp2) - Type your natural language question in the prompt box, e.g.:
- "What type of land cover is visible in this image?"
- "Are there any water bodies or flooded areas present?"
- "Count the approximate number of buildings visible."
- Click Analyze — results appear within seconds
- Optionally export results as JSON or GeoJSON
import requests
with open("sample.tif", "rb") as f:
response = requests.post(
"http://localhost:8000/analyze",
files={"image": f},
data={"query": "Describe the land use in this image"}
)
print(response.json())Sample Response:
{
"status": "success",
"query": "Describe the land use in this image",
"analysis": "The image shows a predominantly agricultural area with rectangular field parcels. There is a small settlement cluster in the northeast quadrant. A river meander is visible along the western edge, with riparian vegetation.",
"detected_features": ["agriculture", "settlement", "river", "vegetation"],
"confidence": 0.91,
"model": "gpt-4o",
"processing_time_ms": 1847
}python -m geo.batch_analyze \
--input-dir ./data/sample_images \
--query "Identify land use type and any infrastructure" \
--output ./data/outputs/results.jsonlAsk the model to identify and describe different land cover types — forests, agriculture, urban areas, water bodies, barren land — directly from imagery without running a dedicated ML classification pipeline.
Detect roads, buildings, bridges, airports, and industrial facilities. Useful for urban planning assessments, post-disaster surveys, and construction monitoring.
Upload a before/after image pair and prompt the model to identify what has changed — deforestation, flood extent, urban sprawl, or infrastructure damage.
Analyze vegetation health indicators, identify burned areas, monitor coastline erosion, or assess wetland coverage from multispectral imagery.
Rapidly assess satellite imagery after a natural disaster to identify affected areas, damaged infrastructure, and potential rescue zones using natural language queries.
Detect crop types, estimate field parcel boundaries, identify irrigation patterns, and flag anomalies like drought stress or pest damage zones.
Analyze a single geospatial image.
| Parameter | Type | Required | Description |
|---|---|---|---|
image |
file |
✅ | Image file (.tif, .png, .jpg) |
query |
string |
✅ | Natural language question |
model |
string |
❌ | Override model (default: gpt-4o) |
export_format |
string |
❌ | json or geojson |
Compare two images for change detection.
| Parameter | Type | Required | Description |
|---|---|---|---|
image_before |
file |
✅ | Earlier image |
image_after |
file |
✅ | Later image |
query |
string |
✅ | e.g., "What has changed between these two images?" |
Returns API health status and model availability.
| Variable | Required | Default | Description |
|---|---|---|---|
OPENAI_API_KEY |
✅ | — | Your OpenAI API key |
OPENAI_MODEL |
❌ | gpt-4o |
Vision model to use |
REDIS_URL |
❌ | None |
Redis connection for caching |
AWS_ACCESS_KEY_ID |
❌ | None |
AWS credential for S3 |
AWS_SECRET_ACCESS_KEY |
❌ | None |
AWS credential for S3 |
S3_BUCKET_NAME |
❌ | None |
S3 bucket for image storage |
MAX_IMAGE_SIZE_MB |
❌ | 20 |
Max upload size in MB |
TILE_SIZE |
❌ | 512 |
Tile size for large image splitting |
APP_PORT |
❌ | 8501 |
Streamlit server port |
# Run all tests
pytest tests/ -v
# Run with coverage report
pytest tests/ --cov=core --cov=geo --cov-report=htmlContributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Make your changes and add tests
- Ensure all tests pass:
pytest tests/ - Commit with a clear message:
git commit -m "feat: add support for GeoTIFF multi-band export" - Push to your branch:
git push origin feature/your-feature-name - Open a Pull Request against
main
Please follow the Conventional Commits format for commit messages.
- Multi-band analysis — Support NIR, SWIR, and thermal band reasoning
- SAM integration — Use Segment Anything Model for pixel-level segmentation before GPT reasoning
- Time-series analysis — Multi-image temporal reasoning over a sequence of dates
- GIS tool integration — Native QGIS plugin for in-app use
- Fine-tuned model — Domain-adapted vision model on geospatial annotation datasets
- Geolocation inference — Estimate image geographic location from visual cues
- 3D terrain understanding — Integrate DEM (Digital Elevation Model) data alongside imagery
This project is licensed under the MIT License — see the LICENSE file for details.
- OpenAI GPT-4 Vision for multi-modal reasoning
- LangChain for agent orchestration
- Rasterio and GDAL for geospatial I/O
- Streamlit for rapid UI development
- The open geospatial community for datasets and tooling inspiration
Built with love for the geospatial AI community 🌍
