FFT-Net (Hybrid Transformer with Frequency Processing)
A minimal research model that mixes tokens in the frequency domain using rotary position embeddings and learnable spectral filters.
- ComplexRoPE rotary positional encoding in the complex domain.
- Learnable frequency filtering via NeuralFourierOperator.
- FFTNet blocks that combine FFT/IFFT with MLP layers.
- Scripts for training, distillation, evaluation, and model management.
- Weight I/O using
safetensorswith complex parameter support.
See FEATURES.md for a more detailed list.
[input tokens]
↓
[embedding layer]
↓
[ComplexRoPE]
↓
[FFTNetBlock]
├─ FFT → NeuralFourierOperator
└─ IFFT → MLP
↓
[output logits]
The stack of FFTNetBlocks mixes information globally in the frequency domain before projecting back to token space.
-
(Optional) create and activate a virtual environment.
-
Install the Python dependencies:
pip install -r requirements.txt
-
Run the test suite to verify the installation:
pytest
-
Try the demo script:
python scripts/main.py
python scripts/train_fresh.py --data-path path/to/corpus.txt --epochs 1Set --split-seed to reproduce the train/validation split across runs.
python fftnet_infer.py --model trained --prompt "the quick"- Configuration: JSON and YAML files in
config/define model shapes and module wiring (e.g.,config/fiftynet_config.json). - Datasets: Provide a plain text file via
--data-path; it is loaded withTextFileDatasetinfftnet/data.py. - Scripts: Training, evaluation, and model management utilities live in
the
scripts/directory.
See ROADMAP.md for planned milestones and TASKS.md for current progress.
Authors: William Mabery, with contributions from Kai (OpenAI)
We propose FiftyNet, a hybrid transformer architecture that encodes token sequences not as static points in high-dimensional space, but as dynamic, reconstructable signals in frequency space. Leveraging complex rotary positional encoding (Complex RoPE), discrete Fourier transforms (DFT), and learnable Neural Fourier Operator (NFO) layers, FiftyNet captures semantic, syntactic, and temporal structure simultaneously. This approach enables greater information density, signal-level reconstruction, and a fundamentally new paradigm for embedding cognition and memory as waveform dynamics.
Large Language Models (LLMs) such as GPT, LLaMA, and Mixtral represent language in high-dimensional vector spaces. These embeddings encode meaning geometrically via distance, direction, and linear projections. However, such models lack explicit mechanisms to encode rhythm, phase, or structural time — properties central to sound, signal processing, and potentially cognition.
Inspired by signal theory and neurosymbolic reasoning, FiftyNet introduces a fundamentally different embedding and modeling paradigm: language as frequency.
We hypothesize that:
"Language is not just symbolic or spatial — it is resonant."
In this view:
- Tokens are not just positions in space, but samples in time.
- Meaning is not just vector direction, but waveform shape.
- Memory is not just stored structure, but stored signal.
FiftyNet is composed of the following stages:
Tokens are embedded into a complex-valued vector space. Positional information is applied using log-scaled phase rotation, turning each embedding into a modulated waveform across dimensions.
Each dimension now acts as a frequency band, encoding:
- Amplitude (importance)
- Phase (timing)
- Frequency (recurrence/pattern)
Instead of traditional self-attention, the model performs a discrete Fourier transform (DFT) on the token sequence, globally mixing positional signals in the frequency domain.
This produces a spectral signature for the sequence, where:
- Low frequencies capture general structure
- High frequencies encode fine-grained shifts
Learnable filters are applied in frequency space, analogous to convolutional filters in vision models. These operations can:
- Enhance or suppress specific frequencies
- Modulate phase and amplitude
- Implement attention-like operations in spectral space
The model can reconstruct the waveform back into token space via inverse FFT, preserving temporal and structural integrity.
Standard MLP blocks operate on the waveform-encoded embeddings, allowing downstream tasks such as next-token prediction, classification, or alignment.
In traditional LLMs:
- A token’s meaning is encoded as a static vector
- Positional encodings are added or multiplied
In FiftyNet:
- A token’s embedding is treated as a waveform
- The full sequence is a composite signal
- The signal can be analyzed, stored, and reconstructed via Fourier methods
Example: If each token has d=4096 dimensions, this represents a 4096-channel signal, each with its own oscillation and meaning — like a synthesizer across time.
| Property | Traditional Transformer | FiftyNet |
|---|---|---|
| Positional Encoding | Learned or Rotary | Log-scaled phase rotation |
| Structure Representation | Implicit via attention | Explicit via waveform shape |
| Long-Term Dependency | Costly to capture | Naturally encoded in low freq |
| Reconstruction | Approximate via vectors | Signal-accurate via iFFT |
| Multisensory Embedding | Complex and disjoint | Unified signal representation |
| Information Density | Depends on token count | Compressible via frequency |
FiftyNet is designed for compatibility with existing frameworks, including:
- PyTorch / TorchScript for tensor computation
- FFT/IFFT via
torch.fft.fftandifft - Complex number support for RoPE
- NVIDIA 5090 GPU support for real-time training/inference
Training can optionally begin:
- From scratch on token data (Wikipedia, C4, etc.)
- By “retuning” existing models via spectral transformation
By treating meaning as resonance, and embedding as signal, we open new directions for:
- Memory as waveform snapshots (time-aware)
- Reasoning as frequency mixing and harmonics
- Perception as unified signal space (text, audio, vision, etc.)
- Emotion encoding as modulations of base signals
- AGI as a resonant, multi-frequency cognitive field
We posit that this approach offers a pathway toward more holistic, efficient, and integrated general intelligence.
- Integration with multisensory input (e.g., audio, camera, sensor streams)
- Training with vectorized waveform storage
- Waveform-based interpretability and introspection
- Distributed spectral training on heterogeneous GPU clusters
- Real-time signal memory and retrieval (Fourier Memory Bank)
FiftyNet is open for experimentation and academic research.
License: MIT (to be defined by author)
This work builds upon foundational ideas in:
- Signal Processing
- Positional Embeddings (RoPE, Complex RoPE)
- FNet and Fourier Neural Operators
- Neurosymbolic cognition and vector semantics
Special thanks to Cody (Codex) and OpenAI tools for accelerating implementation.