Skip to content

ANUcybernetics/panic-tda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,417 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PANIC-TDA

An Elixir tool for computing "runs" of text-to-image and image-to-text models (with outputs fed recursively back in as inputs) and analysing the resulting text-image-text-image trajectories using topological data analysis.

If you've got a sufficiently capable rig you can use this tool to:

  1. specify text-to-image and image-to-text generative AI models in a "network" (a cycling list of models)
  2. starting from a specified initial prompt, recursively iterate the output of one model in as the input of the next to create a "run" of model invocations
  3. embed each output into a high-dimensional embedding space using one or more embedding models
  4. compute persistence diagrams and cluster them to identify topological structure in the trajectories

The results of all the above computations are stored in a local SQLite database for further analysis.

This tool was initially motivated by the PANIC! art installation (first exhibited 2022) --- see DESIGN.md for more details. Watching PANIC! in action, there is clearly some structure to the trajectories that the genAI model outputs "trace out". This tool is an attempt to quantify and understand that structure (see why? below).

Requirements

  • mise for managing Erlang/Elixir versions (see mise.toml)
  • a GPU which supports CUDA (for running the genAI and embedding models)
  • SQLite3

Installation

# install Erlang & Elixir via mise
mise install

# fetch deps and set up the database
mise exec -- mix setup

Use

Experiments are configured via JSON files and run with Mix tasks. Here's an example configuration:

{
  "network": ["SD35Medium", "Moondream"],
  "prompts": ["a red apple"],
  "embedding_models": ["Nomic"],
  "max_length": 100,
  "num_runs": 4
}

Fields:

  • network: a list of models that cycle (T2I -> I2T -> T2I -> ...)
  • prompts: initial text inputs; each prompt creates num_runs runs
  • embedding_models: models used in the embeddings stage
  • max_length: number of model invocations per run
  • num_runs (optional, default 1): how many runs to create per prompt

Then, to run the experiment:

# run an experiment
mise exec -- mix experiment.run config/my_experiment.json

# check the status of an experiment (by ID prefix)
mise exec -- mix experiment.status abc123

# list all experiments
mise exec -- mix experiment.list

# resume an interrupted experiment
mise exec -- mix experiment.resume abc123

Available models

Type Models
text-to-image SD35Medium, Flux2Klein, Flux2Dev, ZImageTurbo, QwenImage, HunyuanImage, GLMImage
image-to-text Moondream, Qwen25VL, Gemma3n, Pixtral, LLaMA32Vision, Florence2
text embedding STSBMpnet, STSBRoberta, STSBDistilRoberta, Nomic, JinaClip, Qwen3Embed
image embedding NomicVision, JinaClipVision
dummy (testing) DummyT2I, DummyI2T, DummyT2I2, DummyI2T2, DummyText, DummyText2, DummyVision, DummyVision2

Approximate run times

Measured on a single NVIDIA RTX 4090 with NF4 quantisation where applicable. Times include model loading/swapping overhead.

Model Single Batch of 3 (per image)
Text-to-image
SD35Medium ~9s ~3s
ZImageTurbo ~8s ~6s
Flux2Klein ~20s ~7s
GLMImage ~44s ~28s
QwenImage ~46s ~23s
Flux2Dev ~100s ~75s
HunyuanImage ~124s ~109s
Image-to-text
Moondream ~4s ~3s
Qwen25VL ~12s ~5s
Gemma3n ~16s ~6s
LLaMA32Vision ~17s ~8s
Pixtral ~19s ~8s
Florence2 TBD TBD

The design space of different models is vast; with both fundamentally different architectures and many different finetunes of the same base models. This project's goals involve asking questions about both: are different architectures more likely to diverge (long-term trajectory-wise) than finetunes of the same model? Or is there no particular pattern there?

Testing

Tests use ExUnit with dummy models (no GPU required):

mise exec -- mix test

GPU smoke tests (all real model combinations) are tagged :gpu and excluded by default:

mise exec -- mix test --include gpu

For further info, see the design doc.

Why?

At the School of Cybernetics we love thinking about the way that feedback loops (and the connections between things) define the behaviour of the systems in which we live, work and create. That interest sits behind the design of PANIC! as a tool for making (and breaking!) networks of hosted generative AI models.

Anyone who's played with (or watched others play with) PANIC! has probably had one of these questions cross their mind at some point.

One goal in building PANIC is to provide answers to these questions which are both quantifiable and satisfying (i.e. it feels like they represent deeper truths about the process).

how did it get here from that initial prompt?

  • was it predictable that it would end up here?
  • how sensitive is it to the input, i.e. would it still have ended up here with a slightly different prompt?

is it stuck?

  • the text/images it's generating now seem to be "semantically stable"; will it ever move on to a different thing?
  • can we predict in advance which initial prompts lead to a "stuck" trajectory?

has it done this before?

  • how similar is this run's trajectory to previous runs?
  • what determines whether they'll be similar? the initial prompt, or something else?

which parts of the system have the biggest impact on what happens?

  • does a certain genAI model "dominate" the behaviour of the network? or is the prompt more important? or is it an emergent property of the interactions between all models in the network?

Authors

Ben Swift wrote the code, and Sunyeon Hong is the mastermind behind the TDA stuff.

Licence

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors