Skip to content

S-Smile-828/ai-image-video-workbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

264 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ImageAI

Version 0.33.0 | A comprehensive desktop application and CLI for multi-provider AI image generation, video creation, and professional layout design.

Python 3.12+ License: MIT


Overview

ImageAI is a powerful, cross-platform desktop application that provides a unified interface for AI-powered image and video generation. Whether you're creating social media content, generating video projects from lyrics, designing custom fonts, or building character animations, ImageAI brings together multiple AI providers in one elegant interface.

Key Highlights

  • 🎨 Multi-Provider Support - Google Gemini, OpenAI DALL·E, Stability AI, Local Stable Diffusion, and more
  • 🎬 Video Projects - Create AI-powered videos from lyrics with MIDI synchronization
  • 🖼️ Reference Images - Transform existing images with style transfer and positioning controls
  • 📐 Professional Layouts - Publication layout engine for books and documents
  • 🎭 Character Animator - Convert images into Adobe Character Animator puppets
  • 🔤 Font Generator - Create custom fonts from alphabet images
  • 🚀 AI Upscaling - Real-ESRGAN integration for high-quality image enhancement
  • 💻 Dual Interface - Modern GUI and powerful CLI for automation

Features

Image Generation

  • Multi-Provider Support

    • Google Gemini (Gemini 2.5 Flash Image, Nano Banana Pro 4K)
    • OpenAI (GPT Image 1.5, DALL·E 3, DALL·E 2)
    • Stability AI (Stable Diffusion XL, SD 2.1, SD 1.6)
    • Local Stable Diffusion (run models locally, GPU recommended)
    • Midjourney integration
    • Ollama local model support
  • Advanced Controls

    • Aspect ratio selection (1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9)
    • Custom aspect ratios (e.g., "16:10" or decimal "1.6")
    • Resolution control with provider-aware limits
    • Batch generation (1-4 variations)
    • Real-time cost estimation
    • Quality settings (standard/HD)
  • Reference Image System

    • Upload reference images for style guidance
    • Style options: Natural blend, blurred edges, in circle, in frame, as background
    • Position controls: Auto, corners, center, edges
    • Multi-reference support (Google Imagen 3)
  • AI-Powered Tools

    • Prompt enhancement using LLMs (GPT-5, Claude, Gemini)
    • Reference image analysis and description generation
    • Semantic search for prompt building
    • Template system with placeholder substitution

Video Projects

  • Lyric-to-Video Pipeline

    • Convert lyrics/text into storyboard scenes
    • AI-powered prompt generation from lyrics
    • MIDI synchronization for precise timing
    • Scene-by-scene image generation
    • Multiple render engines (FFmpeg slideshow, Google Veo 3.1)
  • Advanced Features

    • Version control with time-travel restore
    • Ken Burns effects and transitions
    • Audio track integration with volume/fade controls
    • Frame-accurate timing with MIDI support
    • Batch scene processing
    • Instrumental gap detection

Additional Tools

  • Character Animator Puppet Creator

    • AI body segmentation using MediaPipe and SAM 2
    • Cloud AI-powered viseme generation (14 mouth shapes)
    • Eye blink state generation
    • Export to PSD or SVG with Adobe-compatible naming
  • Font Generator

    • Automatic character segmentation
    • Vector tracing with configurable smoothing
    • Font metrics calculation
    • Export to TTF and OTF formats
    • Real-time preview
  • Layout Engine

    • Professional publication layouts
    • Template management system
    • Export presets for various formats

Installation

Prerequisites

  • Python 3.12+ (recommended: Python 3.12.x)
  • pip (Python package manager)
  • Git (optional, for cloning the repository)

Quick Install

  1. Clone or download the repository:

    git clone https://github.com/lelandg/ImageAI.git
    cd ImageAI
  2. Create a virtual environment:

    # Windows (PowerShell)
    python -m venv .venv
    .\.venv\Scripts\Activate.ps1
    
    # macOS/Linux
    python3 -m venv .venv
    source .venv/bin/activate
  3. Install dependencies:

    pip install --upgrade pip
    pip install -r requirements.txt
  4. Launch the application:

    python main.py

Optional Dependencies

Some features require additional dependencies that can be installed on-demand:

  • Local Stable Diffusion: Uncomment lines in requirements.txt or install via GUI
  • AI Upscaling: Installed automatically via GUI when needed (includes PyTorch)
  • Ollama: Install separately: pip install ollama

For detailed installation instructions, see Docs/ImageAI-Installation-Guide.md.


Quick Start

GUI Mode (Default)

Simply run without arguments to launch the graphical interface:

python main.py

The GUI provides:

  • Image Tab: Generate images with full control over all parameters
  • Video Tab: Create video projects from lyrics
  • Templates Tab: Browse and use prompt templates
  • Settings Tab: Configure API keys and preferences
  • History Tab: View and manage generated images
  • Help Tab: Access documentation and guides

CLI Mode

Generate images from the command line:

# Show help
python main.py -h

# Test API key
python main.py --provider google -t

# Generate an image
python main.py --provider google -p "a beautiful sunset over mountains" -o output.png

# Use OpenAI DALL·E 3
python main.py --provider openai -m dall-e-3 -p "a futuristic city" -o city.png

# Generate multiple variations
python main.py -p "a cat wearing sunglasses" -n 4 -o cat.png

API Key Configuration

  1. Via GUI: Settings tab → Enter API keys → Save & Test
  2. Via CLI: Use -k flag or --api-key-file option
  3. Via Environment: Set GOOGLE_API_KEY, OPENAI_API_KEY, etc.

Get API Keys:


Usage Examples

Image Generation

Basic generation:

python main.py -p "a serene Japanese garden with cherry blossoms"

With specific provider and model:

python main.py --provider openai -m dall-e-3 --quality hd -p "a cyberpunk cityscape at night"

Custom resolution:

python main.py -p "a landscape" --size 1792x1024

Batch generation:

python main.py -p "a fantasy castle" -n 4 -o castle.png

Video Projects

  1. Open the Video tab in the GUI
  2. Create a new project or load an existing one
  3. Paste lyrics or load from file
  4. Configure timing (manual or MIDI sync)
  5. Generate storyboard scenes
  6. Enhance prompts with AI (optional)
  7. Generate images for each scene
  8. Render video (FFmpeg slideshow or Veo 3.1)

Reference Images

  1. In the Image tab, click "Reference Image" button
  2. Select an image file
  3. Choose style and position options
  4. Enter your prompt
  5. Generate - the AI will blend your reference with the prompt

Configuration

Storage Locations

Configuration files are stored in platform-specific directories:

  • Windows: %APPDATA%\ImageAI\
  • macOS: ~/Library/Application Support/ImageAI/
  • Linux: ~/.config/ImageAI/

Configuration Files

  • config.json - API keys and application settings
  • history.json - Generation history
  • prompt_builder_history.json - Prompt builder history
  • logs/ - Application logs

Security

  • API keys are stored securely (encrypted when possible)
  • Keys never leave your local machine
  • Optional keyring integration for enhanced security
  • Google Cloud ADC support for enterprise deployments

Supported Providers & Models

Google Gemini

  • gemini-3-pro-image-preview - Gemini 3 Pro Image (Nano Banana Pro) - 4K
  • gemini-2.5-flash-image - Gemini 2.5 Flash Image (Nano Banana) - Default
  • imagen-4.0-generate-001 - Imagen 4 (Best Quality) - Vertex AI only
  • imagen-3.0-generate-002 - Imagen 3 (General Purpose) - Vertex AI only

Authentication: API key or Google Cloud ADC

OpenAI

  • gpt-image-1.5 - GPT Image 1.5 (Latest)
  • gpt-image-1 - GPT Image 1
  • gpt-image-1-mini - GPT Image 1 Mini (Fast)
  • dall-e-3 - DALL·E 3
  • dall-e-2 - DALL·E 2

Authentication: API key

Stability AI

  • stable-diffusion-xl-1024-v1-0 - Stable Diffusion XL 1.0
  • stable-diffusion-xl-beta-v2-2-2 - Stable Diffusion XL Beta
  • stable-diffusion-512-v2-1 - Stable Diffusion 2.1
  • stable-diffusion-v1-6 - Stable Diffusion 1.6

Authentication: API key

Local Stable Diffusion

  • stabilityai/stable-diffusion-xl-base-1.0 - SDXL Base 1.0
  • segmind/SSD-1B - SSD-1B (Fast SDXL)
  • stabilityai/stable-diffusion-2-1 - Stable Diffusion 2.1
  • runwayml/stable-diffusion-v1-5 - Stable Diffusion 1.5
  • CompVis/stable-diffusion-v1-4 - Stable Diffusion 1.4

Authentication: None (runs locally)

Video Providers

  • Google Veo 3.1 - veo-3.1-generate-001 (frames-to-video)
  • Google Veo 3.0 - veo-3.0-generate-001 (single-frame animation)
  • FFmpeg Slideshow - Local rendering with Ken Burns effects

Keyboard Shortcuts

Global Shortcuts

  • Ctrl+Enter - Generate image (from anywhere)
  • Ctrl+S - Save current image
  • Ctrl+Shift+C - Copy image to clipboard
  • Ctrl+F - Find in prompt field
  • F1 - Open help
  • Alt+G - Generate button
  • Alt+S - Save button
  • Alt+P - Prompt Builder

Dialog Shortcuts

  • Ctrl+Enter - Submit dialog
  • Escape - Close dialog
  • Ctrl+E - Edit mode (in Ask dialog)

Project Structure

ImageAI/
├── main.py                 # Entry point (CLI & GUI routing)
├── core/                   # Core functionality
│   ├── config.py          # Configuration management
│   ├── constants.py        # App constants and version
│   ├── logging_config.py  # Logging setup
│   └── video/              # Video project system
├── providers/              # AI provider implementations
│   ├── base.py            # Base provider interface
│   ├── google.py          # Google Gemini/Imagen
│   ├── openai.py          # OpenAI DALL·E
│   ├── stability.py       # Stability AI
│   ├── local_sd.py        # Local Stable Diffusion
│   └── video/             # Video providers (Veo)
├── gui/                    # Graphical interface
│   ├── main_window.py     # Main window and tabs
│   ├── video/             # Video project UI
│   ├── layout/            # Layout engine UI
│   └── character_animator/ # Character Animator tools
├── cli/                    # Command-line interface
│   ├── parser.py          # Argument parsing
│   └── runner.py          # CLI execution
├── templates/             # Template definitions
├── data/                  # JSON resources (prompts, presets)
├── Docs/                  # Documentation
└── requirements.txt       # Python dependencies

For detailed code navigation, see Docs/CodeMap.md.


Troubleshooting

Common Issues

"Module not found" errors:

  • Ensure virtual environment is activated
  • Reinstall dependencies: pip install -r requirements.txt

API key errors:

  • Verify API key is correct in Settings tab
  • Check API key has proper permissions
  • For Google Cloud, ensure ADC is configured if using gcloud auth

GUI won't launch:

  • Install PySide6: pip install PySide6
  • Check display server (Linux/WSL may need X11 forwarding)

Video generation fails:

  • Ensure FFmpeg is installed (via imageio-ffmpeg)
  • Check audio file format is supported
  • Verify MIDI file is valid (if using MIDI sync)

Local SD not working:

  • Install PyTorch with CUDA support for GPU acceleration
  • Check GPU drivers are up to date
  • Verify model files are downloaded

Debug Files

On exit, ImageAI automatically copies debug files to the project root:

  • ./imageai_current.log - Most recent log file
  • ./imageai_current_project.json - Last loaded project

Check these files for detailed error information.

Getting Help


Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Development Setup

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes
  4. Test thoroughly
  5. Commit: git commit -m 'Add amazing feature'
  6. Push: git push origin feature/amazing-feature
  7. Open a Pull Request

Code Style

  • Follow PEP 8 for Python code
  • Use type hints where appropriate
  • Add docstrings for public functions/classes
  • Update documentation for new features

License

This project is licensed under the MIT License - see the LICENSE file for details.


Credits

Author: Leland Green
Email: contact@lelandgreen.com
Copyright: © 2025 Leland Green

Acknowledgments

  • Google Gemini API team
  • OpenAI for DALL·E
  • Stability AI for Stable Diffusion
  • All contributors and beta testers

Changelog

See CHANGELOG.md for a complete list of changes.

Recent Highlights

v0.33.0 (2026-01-28)

  • Font Generator with AI glyph identification and generation
  • Enhanced character segmentation and positioning

v0.32.0 (2026-01-24)

  • Character Animator Puppet Creator wizard
  • Font Generator with vector tracing

v0.31.0 (2025-12-05)

  • Gemini 3 Pro LLM support
  • File memory in Ask About Files dialog

v0.30.0 (2025-11-27)

  • Claude Opus 4.5 support
  • Enhanced reference image system

Roadmap

See the Plans/ directory for upcoming features and development roadmap.

Planned Features

  • Enhanced video generation with more providers
  • Advanced layout templates
  • Community template sharing
  • Plugin system for custom providers
  • Web interface option

Links

  • Documentation: See Docs/ directory

Made with ❤️ for the AI art community

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages