Version 0.33.0 | A comprehensive desktop application and CLI for multi-provider AI image generation, video creation, and professional layout design.
ImageAI is a powerful, cross-platform desktop application that provides a unified interface for AI-powered image and video generation. Whether you're creating social media content, generating video projects from lyrics, designing custom fonts, or building character animations, ImageAI brings together multiple AI providers in one elegant interface.
- 🎨 Multi-Provider Support - Google Gemini, OpenAI DALL·E, Stability AI, Local Stable Diffusion, and more
- 🎬 Video Projects - Create AI-powered videos from lyrics with MIDI synchronization
- 🖼️ Reference Images - Transform existing images with style transfer and positioning controls
- 📐 Professional Layouts - Publication layout engine for books and documents
- 🎭 Character Animator - Convert images into Adobe Character Animator puppets
- 🔤 Font Generator - Create custom fonts from alphabet images
- 🚀 AI Upscaling - Real-ESRGAN integration for high-quality image enhancement
- 💻 Dual Interface - Modern GUI and powerful CLI for automation
-
Multi-Provider Support
- Google Gemini (Gemini 2.5 Flash Image, Nano Banana Pro 4K)
- OpenAI (GPT Image 1.5, DALL·E 3, DALL·E 2)
- Stability AI (Stable Diffusion XL, SD 2.1, SD 1.6)
- Local Stable Diffusion (run models locally, GPU recommended)
- Midjourney integration
- Ollama local model support
-
Advanced Controls
- Aspect ratio selection (1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9)
- Custom aspect ratios (e.g., "16:10" or decimal "1.6")
- Resolution control with provider-aware limits
- Batch generation (1-4 variations)
- Real-time cost estimation
- Quality settings (standard/HD)
-
Reference Image System
- Upload reference images for style guidance
- Style options: Natural blend, blurred edges, in circle, in frame, as background
- Position controls: Auto, corners, center, edges
- Multi-reference support (Google Imagen 3)
-
AI-Powered Tools
- Prompt enhancement using LLMs (GPT-5, Claude, Gemini)
- Reference image analysis and description generation
- Semantic search for prompt building
- Template system with placeholder substitution
-
Lyric-to-Video Pipeline
- Convert lyrics/text into storyboard scenes
- AI-powered prompt generation from lyrics
- MIDI synchronization for precise timing
- Scene-by-scene image generation
- Multiple render engines (FFmpeg slideshow, Google Veo 3.1)
-
Advanced Features
- Version control with time-travel restore
- Ken Burns effects and transitions
- Audio track integration with volume/fade controls
- Frame-accurate timing with MIDI support
- Batch scene processing
- Instrumental gap detection
-
Character Animator Puppet Creator
- AI body segmentation using MediaPipe and SAM 2
- Cloud AI-powered viseme generation (14 mouth shapes)
- Eye blink state generation
- Export to PSD or SVG with Adobe-compatible naming
-
Font Generator
- Automatic character segmentation
- Vector tracing with configurable smoothing
- Font metrics calculation
- Export to TTF and OTF formats
- Real-time preview
-
Layout Engine
- Professional publication layouts
- Template management system
- Export presets for various formats
- Python 3.12+ (recommended: Python 3.12.x)
- pip (Python package manager)
- Git (optional, for cloning the repository)
-
Clone or download the repository:
git clone https://github.com/lelandg/ImageAI.git cd ImageAI -
Create a virtual environment:
# Windows (PowerShell) python -m venv .venv .\.venv\Scripts\Activate.ps1 # macOS/Linux python3 -m venv .venv source .venv/bin/activate
-
Install dependencies:
pip install --upgrade pip pip install -r requirements.txt
-
Launch the application:
python main.py
Some features require additional dependencies that can be installed on-demand:
- Local Stable Diffusion: Uncomment lines in
requirements.txtor install via GUI - AI Upscaling: Installed automatically via GUI when needed (includes PyTorch)
- Ollama: Install separately:
pip install ollama
For detailed installation instructions, see Docs/ImageAI-Installation-Guide.md.
Simply run without arguments to launch the graphical interface:
python main.pyThe GUI provides:
- Image Tab: Generate images with full control over all parameters
- Video Tab: Create video projects from lyrics
- Templates Tab: Browse and use prompt templates
- Settings Tab: Configure API keys and preferences
- History Tab: View and manage generated images
- Help Tab: Access documentation and guides
Generate images from the command line:
# Show help
python main.py -h
# Test API key
python main.py --provider google -t
# Generate an image
python main.py --provider google -p "a beautiful sunset over mountains" -o output.png
# Use OpenAI DALL·E 3
python main.py --provider openai -m dall-e-3 -p "a futuristic city" -o city.png
# Generate multiple variations
python main.py -p "a cat wearing sunglasses" -n 4 -o cat.png- Via GUI: Settings tab → Enter API keys → Save & Test
- Via CLI: Use
-kflag or--api-key-fileoption - Via Environment: Set
GOOGLE_API_KEY,OPENAI_API_KEY, etc.
Get API Keys:
- Google Gemini: https://aistudio.google.com/apikey
- OpenAI: https://platform.openai.com/api-keys
- Stability AI: https://platform.stability.ai/account/keys
Basic generation:
python main.py -p "a serene Japanese garden with cherry blossoms"With specific provider and model:
python main.py --provider openai -m dall-e-3 --quality hd -p "a cyberpunk cityscape at night"Custom resolution:
python main.py -p "a landscape" --size 1792x1024Batch generation:
python main.py -p "a fantasy castle" -n 4 -o castle.png- Open the Video tab in the GUI
- Create a new project or load an existing one
- Paste lyrics or load from file
- Configure timing (manual or MIDI sync)
- Generate storyboard scenes
- Enhance prompts with AI (optional)
- Generate images for each scene
- Render video (FFmpeg slideshow or Veo 3.1)
- In the Image tab, click "Reference Image" button
- Select an image file
- Choose style and position options
- Enter your prompt
- Generate - the AI will blend your reference with the prompt
Configuration files are stored in platform-specific directories:
- Windows:
%APPDATA%\ImageAI\ - macOS:
~/Library/Application Support/ImageAI/ - Linux:
~/.config/ImageAI/
config.json- API keys and application settingshistory.json- Generation historyprompt_builder_history.json- Prompt builder historylogs/- Application logs
- API keys are stored securely (encrypted when possible)
- Keys never leave your local machine
- Optional keyring integration for enhanced security
- Google Cloud ADC support for enterprise deployments
gemini-3-pro-image-preview- Gemini 3 Pro Image (Nano Banana Pro) - 4Kgemini-2.5-flash-image- Gemini 2.5 Flash Image (Nano Banana) - Defaultimagen-4.0-generate-001- Imagen 4 (Best Quality) - Vertex AI onlyimagen-3.0-generate-002- Imagen 3 (General Purpose) - Vertex AI only
Authentication: API key or Google Cloud ADC
gpt-image-1.5- GPT Image 1.5 (Latest)gpt-image-1- GPT Image 1gpt-image-1-mini- GPT Image 1 Mini (Fast)dall-e-3- DALL·E 3dall-e-2- DALL·E 2
Authentication: API key
stable-diffusion-xl-1024-v1-0- Stable Diffusion XL 1.0stable-diffusion-xl-beta-v2-2-2- Stable Diffusion XL Betastable-diffusion-512-v2-1- Stable Diffusion 2.1stable-diffusion-v1-6- Stable Diffusion 1.6
Authentication: API key
stabilityai/stable-diffusion-xl-base-1.0- SDXL Base 1.0segmind/SSD-1B- SSD-1B (Fast SDXL)stabilityai/stable-diffusion-2-1- Stable Diffusion 2.1runwayml/stable-diffusion-v1-5- Stable Diffusion 1.5CompVis/stable-diffusion-v1-4- Stable Diffusion 1.4
Authentication: None (runs locally)
- Google Veo 3.1 -
veo-3.1-generate-001(frames-to-video) - Google Veo 3.0 -
veo-3.0-generate-001(single-frame animation) - FFmpeg Slideshow - Local rendering with Ken Burns effects
Ctrl+Enter- Generate image (from anywhere)Ctrl+S- Save current imageCtrl+Shift+C- Copy image to clipboardCtrl+F- Find in prompt fieldF1- Open helpAlt+G- Generate buttonAlt+S- Save buttonAlt+P- Prompt Builder
Ctrl+Enter- Submit dialogEscape- Close dialogCtrl+E- Edit mode (in Ask dialog)
ImageAI/
├── main.py # Entry point (CLI & GUI routing)
├── core/ # Core functionality
│ ├── config.py # Configuration management
│ ├── constants.py # App constants and version
│ ├── logging_config.py # Logging setup
│ └── video/ # Video project system
├── providers/ # AI provider implementations
│ ├── base.py # Base provider interface
│ ├── google.py # Google Gemini/Imagen
│ ├── openai.py # OpenAI DALL·E
│ ├── stability.py # Stability AI
│ ├── local_sd.py # Local Stable Diffusion
│ └── video/ # Video providers (Veo)
├── gui/ # Graphical interface
│ ├── main_window.py # Main window and tabs
│ ├── video/ # Video project UI
│ ├── layout/ # Layout engine UI
│ └── character_animator/ # Character Animator tools
├── cli/ # Command-line interface
│ ├── parser.py # Argument parsing
│ └── runner.py # CLI execution
├── templates/ # Template definitions
├── data/ # JSON resources (prompts, presets)
├── Docs/ # Documentation
└── requirements.txt # Python dependencies
For detailed code navigation, see Docs/CodeMap.md.
"Module not found" errors:
- Ensure virtual environment is activated
- Reinstall dependencies:
pip install -r requirements.txt
API key errors:
- Verify API key is correct in Settings tab
- Check API key has proper permissions
- For Google Cloud, ensure ADC is configured if using
gcloudauth
GUI won't launch:
- Install PySide6:
pip install PySide6 - Check display server (Linux/WSL may need X11 forwarding)
Video generation fails:
- Ensure FFmpeg is installed (via
imageio-ffmpeg) - Check audio file format is supported
- Verify MIDI file is valid (if using MIDI sync)
Local SD not working:
- Install PyTorch with CUDA support for GPU acceleration
- Check GPU drivers are up to date
- Verify model files are downloaded
On exit, ImageAI automatically copies debug files to the project root:
./imageai_current.log- Most recent log file./imageai_current_project.json- Last loaded project
Check these files for detailed error information.
- Documentation: See
Docs/directory for detailed guides - Issues: Report bugs on GitHub Issues
- Discord: Join the Chameleon Labs Discord
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Test thoroughly
- Commit:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Open a Pull Request
- Follow PEP 8 for Python code
- Use type hints where appropriate
- Add docstrings for public functions/classes
- Update documentation for new features
This project is licensed under the MIT License - see the LICENSE file for details.
Author: Leland Green
Email: contact@lelandgreen.com
Copyright: © 2025 Leland Green
- Google Gemini API team
- OpenAI for DALL·E
- Stability AI for Stable Diffusion
- All contributors and beta testers
See CHANGELOG.md for a complete list of changes.
v0.33.0 (2026-01-28)
- Font Generator with AI glyph identification and generation
- Enhanced character segmentation and positioning
v0.32.0 (2026-01-24)
- Character Animator Puppet Creator wizard
- Font Generator with vector tracing
v0.31.0 (2025-12-05)
- Gemini 3 Pro LLM support
- File memory in Ask About Files dialog
v0.30.0 (2025-11-27)
- Claude Opus 4.5 support
- Enhanced reference image system
See the Plans/ directory for upcoming features and development roadmap.
- Enhanced video generation with more providers
- Advanced layout templates
- Community template sharing
- Plugin system for custom providers
- Web interface option
- Documentation: See
Docs/directory
Made with ❤️ for the AI art community