See implemented Dynamic and Kinematic Models here...
This repository contains a complete pipeline for training a deep neural network to control an Autonomous Underwater Vehicle (AUV). The network learns to imitate the behavior of a computationally expensive nonlinear Model Predictive Controller (NL-MPC), enabling real-time, high-performance control.
The project demonstrates a successful workflow from data generation and preprocessing to model training, optimization, and evaluation, culminating in a model with 97.6% R² performance on a challenging, unseen test set.
This project underwent a systematic optimization process that dramatically improved performance from the initial baseline.
- Initial Problem: The baseline model achieved a respectable 77.3% R² score but failed to learn the dynamics of specific thrusters, with R² scores as low as 0.25 for Thruster 3.
- Core Improvements:
- Corrected Data Handling: Implemented strict scenario-based splitting to eliminate data leakage between training and test sets, ensuring honest performance metrics.
- Intelligent Feature Engineering: Reduced the reference trajectory input from 492 features to just 16. Instead of the full path, the model now receives 4 key future waypoints (
[x, y, z, yaw]), making the input more concise and focused. This dramatically improved learning efficiency. - Increased Model Capacity: Enhanced the network architecture to better capture complex, non-linear dynamics.
- Robust Loss Function: Switched from
MSELosstoHuberLossto make training less sensitive to outlier thruster commands.
- Final Result: The optimized model achieves an overall R² of 0.9762, with even the weakest thrusters now performing excellently (e.g., Thruster 3 R² improved from 0.25 to 0.94).
| Metric | fossen_net | fossen_net_0 | fossen_net_1 | fossen_net_2 | fossen_net_3 |
|---|---|---|---|---|---|
| R² (Overall) | 0.7735 | 0.9762 | 0.9908 | 0.9914 | 0.9926 |
| Thruster 3 R² | 0.2516 | 0.9428 | 0.9781 | 0.9802 | 0.9835 |
| Thruster 4 R² | 0.5913 | 0.9528 | 0.9839 | 0.9850 | 0.9870 |
| Training Platform | Local | Local | Amazon Sagemaker | Local | Amazon Sagemaker |
| Data Size | ~20k | ~180k | ~420k | ~420k | ~760k |
| Input Size | 24 | 501 | 501 | 34 | 34 |
| Scenerio Count | 2 | 2 | 2 | 2 | 7 |
-
Inputs:
- Current State (9 features): The AUV's current state
[u, v, w, p, q, r, phi, theta, psi](velocities and orientations), excluding absolute world position. - Reference Trajectory (16 features): A down-sampled, relative representation of the future path. Instead of the full 492-feature trajectory, the model is given 4 key future waypoints (from timesteps 10, 20, 30, and 40), each with 4 features (
[x, y, z, yaw]). This provides crucial path information in a much more compact format.
- Current State (9 features): The AUV's current state
-
Detailed Architecture:
- State Processing Branch: A series of fully connected layers (
Linear(9, 64) -> Linear(64, 32)) withBatchNorm1dfor stable learning andLeakyReLUactivation functions. - Trajectory Processing Branch: The 16 trajectory features are reshaped into a sequence of 4 timesteps (the 4 key waypoints) with 4 features each. A two-layer
LSTMwith a hidden size of 128 processes this sequence. The output from the final timestep is passed through aLinear(128, 64)layer. - Combined Branch: The outputs from the state and trajectory branches are concatenated. This combined feature vector is processed by a deeper series of
Linearlayers (Linear(96, 256) -> Linear(256, 128) -> Linear(128, 64) -> Linear(64, 8)) to produce the final 8 thruster commands.BatchNorm1dis used here as well.
- State Processing Branch: A series of fully connected layers (
control_test/: Legacy and experimental approaches for AUV control.data/: Stores HDF5 datasets generated by the data generation process.Model/:train.ipynb: The primary Jupyter Notebook for training theFossenNetmodel.test/: A C++ application for running high-performance inference using trained TorchScript models.
FossenNet is a multi-branch neural network designed to process vehicle state and a reference trajectory to predict optimal thruster commands.
-
Inputs:
- Current State (9 features): The AUV's current state
[u, v, w, p, q, r, phi, theta, psi](velocities and orientations), excluding absolute world position. - Reference Trajectory (492 features): The desired path over a future horizon, provided as 41 timesteps of 12 state features each. The path is pre-processed to be relative to the AUV's current position.
- Current State (9 features): The AUV's current state
-
Detailed Architecture:
- State Processing Branch: A series of fully connected layers (
Linear(9, 64) -> Linear(64, 32)) withBatchNorm1dfor stable learning andLeakyReLUactivation functions. - Trajectory Processing Branch: A two-layer
LSTMwith a hidden size of 128 and internal dropout (0.2) processes the time-series trajectory. The output from the final timestep is passed through aLinear(128, 64)layer. - Combined Branch: The outputs from the state and trajectory branches are concatenated. This combined feature vector is processed by a deeper series of
Linearlayers (Linear(96, 256) -> Linear(256, 128) -> Linear(128, 64) -> Linear(64, 8)) to produce the final 8 thruster commands.BatchNorm1dis used here as well.
- State Processing Branch: A series of fully connected layers (
The training dataset is generated by running the NL-MPC controller across a variety of scenarios.
- Dataset Schema: Each data point includes the current state, the full reference path, and the optimal thruster command (
u_opt) calculated by the NL-MPC. - Format: Data is stored in HDF5 format (
data.h5).
The data for this project was generated in the cloud to leverage powerful GPU resources.
- AWS Setup:
- EC2 Instance:
g4dn.xlarge - AMI: Ubuntu 24.04 Deep Learning OSS AMI
- EC2 Instance:
- Docker Container: A public Docker image contains the data generation environment.
To replicate the data generation process:
- Pull the Docker image:
docker pull elymsyr/auv_generate
- Run the container with GPU access (see help
-h):docker run --gpus all -it elymsyr/auv_generate
- Once the process completes, copy the data from the container to your host machine:
docker cp <container_id>:/app/build/data.h5 <your_local_path>/data.h5
-
Prepare Data: Generate the dataset using the Docker instructions above or your own method.
-
Train the Model: Open and run the
Model/train.ipynbnotebook. The notebook handles:- Preprocessing: Loading data and applying
StandardScalernormalization (fit only on the training set). Crucially, it performs scenario-aware splitting to prevent data leakage. - Training: Implements the
FossenNetmodel and a training loop usingHuberLoss, theAdamoptimizer, and aReduceLROnPlateaulearning rate scheduler. - Evaluation: Provides detailed metrics (R², MAE, MSE) and visualizations to assess model performance on the test set.
- Preprocessing: Loading data and applying
-
Saving Models for Deployment:
- During training, the best model weights are saved to
best_model_huber_adam.pthwhenever validation loss improves. This file should be used for analysis and further Python-based work. - For deployment, first load the state dictionary from the
.pthfile into the model, then create a TorchScript version:# Load best weights model = FossenNet() model.load_state_dict(torch.load('best_model_huber_adam.pth')) model.eval() # Create and save scripted model for deployment scripted_model = torch.jit.script(model) scripted_model.save('fossen_net_scripted_BEST.pt')
- During training, the best model weights are saved to
-
Real-Time Inference: Use the C++ application in
Model/test/with the generatedfossen_net_scripted_BEST.ptfile for real-time, low-latency control.
- Python 3.11
- PyTorch
- scikit-learn
- h5py
- matplotlib
- numpy
- Docker (for data generation)
