Skip to content

Commit e0d1bbe

Browse files
committed
[README]
1 parent c88a85a commit e0d1bbe

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+729
-5664
lines changed

.pre-commit-config.yaml

Lines changed: 0 additions & 18 deletions
This file was deleted.

.readthedocs.yml

Lines changed: 0 additions & 13 deletions
This file was deleted.

Dockerfile

Lines changed: 0 additions & 25 deletions
This file was deleted.

Makefile

Lines changed: 0 additions & 22 deletions
This file was deleted.

README.md

Lines changed: 132 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,161 @@
1-
[![Multi-Modality](agorabanner.png)](https://discord.com/servers/agora-999382051935506503)
2-
3-
# Python Package Template
1+
# OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts
42

53
[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)
64

7-
A easy, reliable, fluid template for python packages complete with docs, testing suites, readme's, github workflows, linting and much much more
8-
95

10-
## Installation
116

12-
You can install the package using pip
137

14-
```bash
15-
pip install -e .
8+
[![PyPI version](https://badge.fury.io/py/omegavit.svg)](https://badge.fury.io/py/omegavit)
9+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
10+
[![Build Status](https://github.com/Agora-Lab-AI/OmegaViT/workflows/build/badge.svg)](https://github.com/Agora-Lab-AI/OmegaViT/actions)
11+
[![Documentation Status](https://readthedocs.org/projects/omegavit/badge/?version=latest)](https://omegavit.readthedocs.io/en/latest/?badge=latest)
12+
13+
OmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space modeling, and mixture of experts to achieve superior performance across various computer vision tasks. The model can process images of any resolution while maintaining computational efficiency.
14+
15+
## Key Features
16+
17+
- **Flexible Resolution Processing**: Handles arbitrary input image sizes through adaptive patch embedding
18+
- **Multi-Query Attention (MQA)**: Reduces computational complexity while maintaining model expressiveness
19+
- **Rotary Embeddings**: Enables better modeling of relative positions and spatial relationships
20+
- **State Space Models (SSM)**: Integrates efficient sequence modeling every third layer
21+
- **Mixture of Experts (MoE)**: Implements conditional computation for enhanced model capacity
22+
- **Comprehensive Logging**: Built-in loguru integration for detailed execution tracking
23+
- **Shape-Aware Design**: Continuous tensor shape tracking for reliable processing
24+
25+
## Architecture
26+
27+
```mermaid
28+
flowchart TB
29+
subgraph Input
30+
img[Input Image]
31+
end
32+
33+
subgraph PatchEmbed[Flexible Patch Embedding]
34+
conv[Convolution]
35+
norm1[LayerNorm]
36+
conv --> norm1
37+
end
38+
39+
subgraph TransformerBlocks[Transformer Blocks x12]
40+
subgraph Block1[Block n]
41+
direction TB
42+
mqa[Multi-Query Attention]
43+
ln1[LayerNorm]
44+
moe1[Mixture of Experts]
45+
ln2[LayerNorm]
46+
ln1 --> mqa --> ln2 --> moe1
47+
end
48+
49+
subgraph Block2[Block n+1]
50+
direction TB
51+
mqa2[Multi-Query Attention]
52+
ln3[LayerNorm]
53+
moe2[Mixture of Experts]
54+
ln4[LayerNorm]
55+
ln3 --> mqa2 --> ln4 --> moe2
56+
end
57+
58+
subgraph Block3[Block n+2 SSM]
59+
direction TB
60+
ssm[State Space Model]
61+
ln5[LayerNorm]
62+
moe3[Mixture of Experts]
63+
ln6[LayerNorm]
64+
ln5 --> ssm --> ln6 --> moe3
65+
end
66+
end
67+
68+
subgraph Output
69+
gap[Global Average Pooling]
70+
classifier[Classification Head]
71+
end
72+
73+
img --> PatchEmbed --> TransformerBlocks --> gap --> classifier
1674
```
1775

18-
# Usage
19-
```python
20-
print("hello world")
76+
## Multi-Query Attention Detail
77+
78+
```mermaid
79+
flowchart LR
80+
input[Input Features]
81+
82+
subgraph MQA[Multi-Query Attention]
83+
direction TB
84+
q[Q Linear]
85+
k[K Linear]
86+
v[V Linear]
87+
rotary[Rotary Embeddings]
88+
attn[Attention Weights]
89+
90+
input --> q & k & v
91+
q & k --> rotary
92+
rotary --> attn
93+
attn --> v
94+
end
95+
96+
MQA --> output[Output Features]
2197
2298
```
2399

100+
## Installation
24101

102+
```bash
103+
pip install omegavit
104+
```
25105

26-
### Code Quality 🧹
106+
## Quick Start
27107

28-
- `make style` to format the code
29-
- `make check_code_quality` to check code quality (PEP8 basically)
30-
- `black .`
31-
- `ruff . --fix`
108+
```python
109+
import torch
110+
from omegavit import create_advanced_vit
32111

33-
### Tests 🧪
112+
# Create model
113+
model = create_advanced_vit(num_classes=1000)
34114

35-
[`pytests`](https://docs.pytest.org/en/7.1.x/) is used to run our tests.
115+
# Example forward pass
116+
batch_size = 8
117+
x = torch.randn(batch_size, 3, 224, 224)
118+
output = model(x)
119+
print(f"Output shape: {output.shape}") # [8, 1000]
120+
```
36121

37-
### Publish on PyPi 🚀
122+
## Model Configurations
38123

39-
**Important**: Before publishing, edit `__version__` in [src/__init__](/src/__init__.py) to match the wanted new version.
124+
| Parameter | Default | Description |
125+
|-----------|---------|-------------|
126+
| hidden_size | 768 | Dimension of transformer layers |
127+
| num_attention_heads | 12 | Number of attention heads |
128+
| num_experts | 8 | Number of expert networks in MoE |
129+
| expert_capacity | 32 | Tokens per expert in MoE |
130+
| num_layers | 12 | Number of transformer blocks |
131+
| patch_size | 16 | Size of image patches |
132+
| ssm_state_size | 16 | Hidden state size in SSM |
40133

41-
```
42-
poetry build
43-
poetry publish
44-
```
134+
## Performance
45135

46-
### CI/CD 🤖
136+
*Note: Benchmarks coming soon*
47137

48-
We use [GitHub actions](https://github.com/features/actions) to automatically run tests and check code quality when a new PR is done on `main`.
138+
## Citation
49139

50-
On any pull request, we will check the code quality and tests.
140+
If you use OmegaViT in your research, please cite:
51141

52-
When a new release is created, we will try to push the new code to PyPi. We use [`twine`](https://twine.readthedocs.io/en/stable/) to make our life easier.
142+
```bibtex
143+
@article{omegavit2024,
144+
title={OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts},
145+
author={Agora Lab},
146+
journal={arXiv preprint arXiv:XXXX.XXXXX},
147+
year={2024}
148+
}
149+
```
53150

54-
The **correct steps** to create a new realease are the following:
55-
- edit `__version__` in [src/__init__](/src/__init__.py) to match the wanted new version.
56-
- create a new [`tag`](https://git-scm.com/docs/git-tag) with the release name, e.g. `git tag v0.0.1 && git push origin v0.0.1` or from the GitHub UI.
57-
- create a new release from GitHub UI
151+
## Contributing
58152

59-
The CI will run when you create the new release.
153+
We welcome contributions! Please see our [contributing guidelines](CONTRIBUTING.md) for details.
60154

61-
# Docs
62-
We use MK docs. This repo comes with the zeta docs. All the docs configurations are already here along with the readthedocs configs.
155+
## License
63156

157+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
64158

159+
## Acknowledgments
65160

66-
# License
67-
MIT
161+
Special thanks to the Agora Lab AI team and the open-source community for their valuable contributions and feedback.

agorabanner.png

-194 KB
Binary file not shown.

docs/.DS_Store

-8 KB
Binary file not shown.

docs/applications/customer_support.md

Lines changed: 0 additions & 42 deletions
This file was deleted.

docs/applications/enterprise.md

Whitespace-only changes.

docs/applications/marketing_agencies.md

Lines changed: 0 additions & 64 deletions
This file was deleted.

0 commit comments

Comments
 (0)