ImageAI

Version 0.33.0 | A comprehensive desktop application and CLI for multi-provider AI image generation, video creation, and professional layout design.

Overview

ImageAI is a powerful, cross-platform desktop application that provides a unified interface for AI-powered image and video generation. Whether you're creating social media content, generating video projects from lyrics, designing custom fonts, or building character animations, ImageAI brings together multiple AI providers in one elegant interface.

Key Highlights

🎨 Multi-Provider Support - Google Gemini, OpenAI DALL·E, Stability AI, Local Stable Diffusion, and more
🎬 Video Projects - Create AI-powered videos from lyrics with MIDI synchronization
🖼️ Reference Images - Transform existing images with style transfer and positioning controls
📐 Professional Layouts - Publication layout engine for books and documents
🎭 Character Animator - Convert images into Adobe Character Animator puppets
🔤 Font Generator - Create custom fonts from alphabet images
🚀 AI Upscaling - Real-ESRGAN integration for high-quality image enhancement
💻 Dual Interface - Modern GUI and powerful CLI for automation

Features

Image Generation

Multi-Provider Support
- Google Gemini (Gemini 2.5 Flash Image, Nano Banana Pro 4K)
- OpenAI (GPT Image 1.5, DALL·E 3, DALL·E 2)
- Stability AI (Stable Diffusion XL, SD 2.1, SD 1.6)
- Local Stable Diffusion (run models locally, GPU recommended)
- Midjourney integration
- Ollama local model support
Advanced Controls
- Aspect ratio selection (1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9)
- Custom aspect ratios (e.g., "16:10" or decimal "1.6")
- Resolution control with provider-aware limits
- Batch generation (1-4 variations)
- Real-time cost estimation
- Quality settings (standard/HD)
Reference Image System
- Upload reference images for style guidance
- Style options: Natural blend, blurred edges, in circle, in frame, as background
- Position controls: Auto, corners, center, edges
- Multi-reference support (Google Imagen 3)
AI-Powered Tools
- Prompt enhancement using LLMs (GPT-5, Claude, Gemini)
- Reference image analysis and description generation
- Semantic search for prompt building
- Template system with placeholder substitution

Video Projects

Lyric-to-Video Pipeline
- Convert lyrics/text into storyboard scenes
- AI-powered prompt generation from lyrics
- MIDI synchronization for precise timing
- Scene-by-scene image generation
- Multiple render engines (FFmpeg slideshow, Google Veo 3.1)
Advanced Features
- Version control with time-travel restore
- Ken Burns effects and transitions
- Audio track integration with volume/fade controls
- Frame-accurate timing with MIDI support
- Batch scene processing
- Instrumental gap detection

Additional Tools

Character Animator Puppet Creator
- AI body segmentation using MediaPipe and SAM 2
- Cloud AI-powered viseme generation (14 mouth shapes)
- Eye blink state generation
- Export to PSD or SVG with Adobe-compatible naming
Font Generator
- Automatic character segmentation
- Vector tracing with configurable smoothing
- Font metrics calculation
- Export to TTF and OTF formats
- Real-time preview
Layout Engine
- Professional publication layouts
- Template management system
- Export presets for various formats

Installation

Prerequisites

Python 3.12+ (recommended: Python 3.12.x)
pip (Python package manager)
Git (optional, for cloning the repository)

Quick Install

Clone or download the repository:

git clone https://github.com/lelandg/ImageAI.git
cd ImageAI

Create a virtual environment:

# Windows (PowerShell)
python -m venv .venv
.\.venv\Scripts\Activate.ps1

# macOS/Linux
python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install --upgrade pip
pip install -r requirements.txt

Launch the application:
```
python main.py
```

Optional Dependencies

Some features require additional dependencies that can be installed on-demand:

Local Stable Diffusion: Uncomment lines in requirements.txt or install via GUI
AI Upscaling: Installed automatically via GUI when needed (includes PyTorch)
Ollama: Install separately: pip install ollama

For detailed installation instructions, see Docs/ImageAI-Installation-Guide.md.

Quick Start

GUI Mode (Default)

Simply run without arguments to launch the graphical interface:

python main.py

The GUI provides:

Image Tab: Generate images with full control over all parameters
Video Tab: Create video projects from lyrics
Templates Tab: Browse and use prompt templates
Settings Tab: Configure API keys and preferences
History Tab: View and manage generated images
Help Tab: Access documentation and guides

CLI Mode

Generate images from the command line:

# Show help
python main.py -h

# Test API key
python main.py --provider google -t

# Generate an image
python main.py --provider google -p "a beautiful sunset over mountains" -o output.png

# Use OpenAI DALL·E 3
python main.py --provider openai -m dall-e-3 -p "a futuristic city" -o city.png

# Generate multiple variations
python main.py -p "a cat wearing sunglasses" -n 4 -o cat.png

API Key Configuration

Via GUI: Settings tab → Enter API keys → Save & Test
Via CLI: Use -k flag or --api-key-file option
Via Environment: Set GOOGLE_API_KEY, OPENAI_API_KEY, etc.

Get API Keys:

Google Gemini: https://aistudio.google.com/apikey
OpenAI: https://platform.openai.com/api-keys
Stability AI: https://platform.stability.ai/account/keys

Usage Examples

Image Generation

Basic generation:

python main.py -p "a serene Japanese garden with cherry blossoms"

With specific provider and model:

python main.py --provider openai -m dall-e-3 --quality hd -p "a cyberpunk cityscape at night"

Custom resolution:

python main.py -p "a landscape" --size 1792x1024

Batch generation:

python main.py -p "a fantasy castle" -n 4 -o castle.png

Video Projects

Open the Video tab in the GUI
Create a new project or load an existing one
Paste lyrics or load from file
Configure timing (manual or MIDI sync)
Generate storyboard scenes
Enhance prompts with AI (optional)
Generate images for each scene
Render video (FFmpeg slideshow or Veo 3.1)

Reference Images

In the Image tab, click "Reference Image" button
Select an image file
Choose style and position options
Enter your prompt
Generate - the AI will blend your reference with the prompt

Configuration

Storage Locations

Configuration files are stored in platform-specific directories:

Windows: %APPDATA%\ImageAI\
macOS: ~/Library/Application Support/ImageAI/
Linux: ~/.config/ImageAI/

Configuration Files

config.json - API keys and application settings
history.json - Generation history
prompt_builder_history.json - Prompt builder history
logs/ - Application logs

Security

API keys are stored securely (encrypted when possible)
Keys never leave your local machine
Optional keyring integration for enhanced security
Google Cloud ADC support for enterprise deployments

Supported Providers & Models

Google Gemini

gemini-3-pro-image-preview - Gemini 3 Pro Image (Nano Banana Pro) - 4K
gemini-2.5-flash-image - Gemini 2.5 Flash Image (Nano Banana) - Default
imagen-4.0-generate-001 - Imagen 4 (Best Quality) - Vertex AI only
imagen-3.0-generate-002 - Imagen 3 (General Purpose) - Vertex AI only

Authentication: API key or Google Cloud ADC

OpenAI

gpt-image-1.5 - GPT Image 1.5 (Latest)
gpt-image-1 - GPT Image 1
gpt-image-1-mini - GPT Image 1 Mini (Fast)
dall-e-3 - DALL·E 3
dall-e-2 - DALL·E 2

Authentication: API key

Stability AI

stable-diffusion-xl-1024-v1-0 - Stable Diffusion XL 1.0
stable-diffusion-xl-beta-v2-2-2 - Stable Diffusion XL Beta
stable-diffusion-512-v2-1 - Stable Diffusion 2.1
stable-diffusion-v1-6 - Stable Diffusion 1.6

Authentication: API key

Local Stable Diffusion

stabilityai/stable-diffusion-xl-base-1.0 - SDXL Base 1.0
segmind/SSD-1B - SSD-1B (Fast SDXL)
stabilityai/stable-diffusion-2-1 - Stable Diffusion 2.1
runwayml/stable-diffusion-v1-5 - Stable Diffusion 1.5
CompVis/stable-diffusion-v1-4 - Stable Diffusion 1.4

Authentication: None (runs locally)

Video Providers

Google Veo 3.1 - veo-3.1-generate-001 (frames-to-video)
Google Veo 3.0 - veo-3.0-generate-001 (single-frame animation)
FFmpeg Slideshow - Local rendering with Ken Burns effects

Keyboard Shortcuts

Global Shortcuts

Ctrl+Enter - Generate image (from anywhere)
Ctrl+S - Save current image
Ctrl+Shift+C - Copy image to clipboard
Ctrl+F - Find in prompt field
F1 - Open help
Alt+G - Generate button
Alt+S - Save button
Alt+P - Prompt Builder

Dialog Shortcuts

Ctrl+Enter - Submit dialog
Escape - Close dialog
Ctrl+E - Edit mode (in Ask dialog)

Project Structure

ImageAI/
├── main.py                 # Entry point (CLI & GUI routing)
├── core/                   # Core functionality
│   ├── config.py          # Configuration management
│   ├── constants.py        # App constants and version
│   ├── logging_config.py  # Logging setup
│   └── video/              # Video project system
├── providers/              # AI provider implementations
│   ├── base.py            # Base provider interface
│   ├── google.py          # Google Gemini/Imagen
│   ├── openai.py          # OpenAI DALL·E
│   ├── stability.py       # Stability AI
│   ├── local_sd.py        # Local Stable Diffusion
│   └── video/             # Video providers (Veo)
├── gui/                    # Graphical interface
│   ├── main_window.py     # Main window and tabs
│   ├── video/             # Video project UI
│   ├── layout/            # Layout engine UI
│   └── character_animator/ # Character Animator tools
├── cli/                    # Command-line interface
│   ├── parser.py          # Argument parsing
│   └── runner.py          # CLI execution
├── templates/             # Template definitions
├── data/                  # JSON resources (prompts, presets)
├── Docs/                  # Documentation
└── requirements.txt       # Python dependencies

For detailed code navigation, see Docs/CodeMap.md.

Troubleshooting

Common Issues

"Module not found" errors:

Ensure virtual environment is activated
Reinstall dependencies: pip install -r requirements.txt

API key errors:

Verify API key is correct in Settings tab
Check API key has proper permissions
For Google Cloud, ensure ADC is configured if using gcloud auth

GUI won't launch:

Install PySide6: pip install PySide6
Check display server (Linux/WSL may need X11 forwarding)

Video generation fails:

Ensure FFmpeg is installed (via imageio-ffmpeg)
Check audio file format is supported
Verify MIDI file is valid (if using MIDI sync)

Local SD not working:

Install PyTorch with CUDA support for GPU acceleration
Check GPU drivers are up to date
Verify model files are downloaded

Debug Files

On exit, ImageAI automatically copies debug files to the project root:

./imageai_current.log - Most recent log file
./imageai_current_project.json - Last loaded project

Check these files for detailed error information.

Getting Help

Documentation: See Docs/ directory for detailed guides
Issues: Report bugs on GitHub Issues
Discord: Join the Chameleon Labs Discord

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Development Setup

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes
Test thoroughly
Commit: git commit -m 'Add amazing feature'
Push: git push origin feature/amazing-feature
Open a Pull Request

Code Style

Follow PEP 8 for Python code
Use type hints where appropriate
Add docstrings for public functions/classes
Update documentation for new features

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

Acknowledgments

Google Gemini API team
OpenAI for DALL·E
Stability AI for Stable Diffusion
All contributors and beta testers

Changelog

See CHANGELOG.md for a complete list of changes.

Recent Highlights

v0.33.0 (2026-01-28)

Font Generator with AI glyph identification and generation
Enhanced character segmentation and positioning

v0.32.0 (2026-01-24)

Character Animator Puppet Creator wizard
Font Generator with vector tracing

v0.31.0 (2025-12-05)

Gemini 3 Pro LLM support
File memory in Ask About Files dialog

v0.30.0 (2025-11-27)

Claude Opus 4.5 support
Enhanced reference image system

Roadmap

See the Plans/ directory for upcoming features and development roadmap.

Planned Features

Enhanced video generation with more providers
Advanced layout templates
Community template sharing
Plugin system for custom providers
Web interface option

Links

Documentation: See Docs/ directory

Made with ❤️ for the AI art community

Name		Name	Last commit message	Last commit date
Latest commit History 264 Commits
.claude		.claude
.junie		.junie
.vscode		.vscode
Docs		Docs
Notes		Notes
Plans		Plans
Prompt Library		Prompt Library
Screenshots		Screenshots
assets		assets
cli		cli
core		core
data/prompts		data/prompts
gui		gui
providers		providers
scripts		scripts
templates		templates
tools		tools
utils		utils
weights		weights
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
BUGFIX_veo_duration_handling.md		BUGFIX_veo_duration_handling.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
COMMIT_MESSAGE.txt		COMMIT_MESSAGE.txt
CONTRIBUTING.md		CONTRIBUTING.md
GEMINI.md		GEMINI.md
HF_Auth_UI_Guide.md		HF_Auth_UI_Guide.md
HuggingFace_Auth_Guide.md		HuggingFace_Auth_Guide.md
LINUX_VIDEO_TAB_FIX.md		LINUX_VIDEO_TAB_FIX.md
MIDI_KARAOKE_FEATURES.md		MIDI_KARAOKE_FEATURES.md
Makefile		Makefile
OLLAMA_SETUP.md		OLLAMA_SETUP.md
Popular_Image_Prompts.md		Popular_Image_Prompts.md
README.md		README.md
REFACTORING_NOTES.md		REFACTORING_NOTES.md
Reference Image Prompts.md		Reference Image Prompts.md
VIDEO_TAB_LOGGING.md		VIDEO_TAB_LOGGING.md
__init__.py		__init__.py
check_avif_support.py		check_avif_support.py
check_durations.py		check_durations.py
diagnose_ollama.py		diagnose_ollama.py
diagnose_qt_multimedia.py		diagnose_qt_multimedia.py
download_models.py		download_models.py
download_social_icons.py		download_social_icons.py
feature-documenter.skill		feature-documenter.skill
imageai_codemap_agent.md		imageai_codemap_agent.md
imageai_current_project_fixed.json		imageai_current_project_fixed.json
install_log.txt		install_log.txt
main.py		main.py
migrate_config.py		migrate_config.py
migrate_history.py		migrate_history.py
quick_test_ollama.sh		quick_test_ollama.sh
requirements-local-sd.txt		requirements-local-sd.txt
requirements.txt		requirements.txt
run.sh		run.sh
screenshot_20250915.png		screenshot_20250915.png
secure_keys.py		secure_keys.py
test_ollama.py		test_ollama.py
verify_ollama_ui.py		verify_ollama_ui.py

Folders and files

Latest commit

History

Repository files navigation

ImageAI

Overview

Key Highlights

Features

Image Generation

Video Projects

Additional Tools

Installation

Prerequisites

Quick Install

Optional Dependencies

Quick Start

GUI Mode (Default)

CLI Mode

API Key Configuration

Usage Examples

Image Generation

Video Projects

Reference Images

Configuration

Storage Locations

Configuration Files

Security

Supported Providers & Models

Google Gemini

OpenAI

Stability AI

Local Stable Diffusion

Video Providers

Keyboard Shortcuts

Global Shortcuts

Dialog Shortcuts

Project Structure

Troubleshooting

Common Issues

Debug Files

Getting Help

Contributing

Development Setup

Code Style

License

Credits

Acknowledgments

Changelog

Recent Highlights

Roadmap

Planned Features

Links

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages