This project implements and analyses multiple deep learning architectures to predict the popularity of Super Mario Maker 2 levels. By leveraging multi-modal data, including level thumbnails, metadata (tags, difficulty, etc.), text (titles, descriptions), and object sequences, the system aims to classify levels as "Popular" or not.
The project explores three distinct multi-modal architectures, representing different approaches to the problem.
- Source:
code/efficientnet.ipynb - Architecture: A robust multi-modal network fusing three branches:
- Visual Branch: Uses EfficientNet-B2 (pretrained on ImageNet) as a feature extractor for level thumbnails.
- NLP Branch: A 1D Convolutional Neural Network (CNN) processes tokenised level titles to capture local semantic patterns.
- Metadata Branch: A Multi-Layer Perceptron (MLP) encodes categorical features like difficulty and tags.
- Fusion: Features from all branches are combined using a Residual Fusion Block (concatenation followed by a residual connection) before classification.
- Source:
code/transformer.ipynb - Architecture: An advanced architecture incorporating sequence modelling:
- Visual Branch: Uses EfficientNet-B0 (pretrained) for visual feature extraction.
- Sequential Branch: A Transformer Encoder (2 layers, 4 heads, 64 dim) processes the sequence of game objects (blocks, enemies, items), sorted by X-coordinate, to understand the level's structural design.
- Metadata Branch: MLP for encoding context tags and categorical data.
- Fusion: Concatenation of features into a 1408-dimensional vector (1280 vision + 64 sequence + 64 metadata) before final classification.
- Source:
code/ensemble.ipynb - Architecture: An ensemble approach distinct from the others, as it uses no pre-trained weights and trains purely on domain data.
- Vision Stream ("Eye"): A Custom 5-block CNN with Swish activation and
RandomContrastaugmentation, trained from scratch on 256x144 thumbnails. - Context Stream ("Brain"): A wide MLP processing 92 tabular features combined with text embeddings. It includes engineered features for "Psychological Patterns" (e.g., begging in descriptions) and "Cheat Detection" (e.g., hidden dev exits).
- Vision Stream ("Eye"): A Custom 5-block CNN with Swish activation and
- Strategy: Predictions are formed via a Performance-Weighted Average, where the contribution of the "Eye" and "Brain" models is weighted based on their respective validation accuracy.
superMarioMaker/
├── code/
│ ├── efficientnet.ipynb # Training notebook for the fine-tuned EfficientNet
│ ├── transformer.ipynb # Training notebook for Hybrid Transformer
│ ├── ensemble.ipynb # Training notebook for Multi-modal Ensemble approach
│ ├── models-analysis.ipynb # Comparison and error analysis of all models
│ ├── find_best_worst.py # Scans test dataset for best/worst rated levels using the same logic as the demo
│ ├── demo.py # Script for running inference on single levels
│ └── preprocessing/
│ ├── preproccesing - ensemble + efficientnet.ipynb # Data prep: Renders images and extracts tabular metadata for ENS/EFF models
│ └── preprocessing - transformer.ipynb # Data prep: Parses binary to extract object sequences for Transformer model
├── models/
│ ├── context_metadata.keras # Ensemble: Weights for the tabular/metadata branch
│ ├── vision_img.keras # Ensemble: Weights for the visual (CNN) branch
│ ├── efficientnet.pth # Weights for the standalone EfficientNet (PyTorch)
│ └── transformer.pth # Weights for the Hybrid Transformer (PyTorch)
├── results/
│ ├── demo_images/ # Preloaded level images for testing the demo script
│ ├── efficientnet/ # Training artefacts (CM, plots, vocab) for EfficientNet
│ ├── ensemble/ # Training artefacts (history, scaler, plots) for Ensemble
│ ├── transformer/ # Training artefacts and results for Transformer
│ ├── demo_screenshot_example # Screenshot demonstrating demo usage
│ └── final_scores.csv # Predictions for all models on the 10k test set (using simplified demo logic)
├── presentation.pdf # Project presentation
└── report.pdf # Project report
- Python 3.8+
- PyTorch (for EfficientNet & Transformer)
- TensorFlow (for Ensemble)
protobuf==3.20.3
The project relies on specific datasets hosted on Kaggle:
- Test Dataset (10k Levels): testdatasetsmm2-10k
- Contains 10,000 levels used for final testing and evaluation.
- Inference Results: inference
- Contains model predictions on the 10,000 test images.
- Transformer Preprocessed Data: supermar
- Dataset preprocessed specifically for the Hybrid Transformer model.
- General Preprocessed Data: superm
- Dataset preprocessed for the EfficientNet and Ensemble models (non-transformer).
Data preparation is split by model requirement.
- For Ensemble & EfficientNet: Run
code/preprocessing/preprocessing - ensemble + efficientnet.ipynb. - For Transformer: Run
code/preprocessing/preprocessing - transformer.ipynb.
Open the respective Jupyter notebook in code/ (efficientnet.ipynb, transformer.ipynb, or ensemble.ipynb) and run all cells. Ensure DATA_DIR points to your processed dataset location.
To test the model on a new level, use the demo.py script:
python code/demo.pyWarning
Performance Note: The demo.py script is a simplified version. It predicts popularity using only the image, the title and 4 other features. Our models in the training and analysis pipelines also analyse the actual level files to locate every enemy and block. Since the demo skips this complex step to run fast, results may differ slightly from the report.