Encoder-Decoder Network on the Shuttle Dataset
Université de Sherbrooke
This project implements an autoencoder neural network dedicated to anomaly detection on the Shuttle dataset. The model learns to reconstruct only normal data in order to identify deviations during the test phase.
The NASA Shuttle dataset contains readings from 9 onboard sensors that monitor the vehicle's operational state during flight. When a sensor develops a fault, its output deviates from the patterns observed under normal conditions. A neural network trained only on normal data learns to accurately reconstruct healthy sensor readings. Any input it cannot reconstruct well becomes a candidate anomaly, turning a manual inspection problem into an automated, always-on detection pipeline.
The network adopts a symmetric "hourglass" structure to extract the essential features from the signals.
ReLU layers after each linear layer introduce the non-linearity needed to capture complex relationships in the data. The absence of output activation allows unconstrained reconstruction of continuous values. Mathematically, ReLU computes f(x) = max(0, x), setting any negative activation to zero and passing positive values through unchanged. This forces the network to build sparse, non-negative internal representations at each layer, which reduces redundancy and prevents any single neuron from dominating the learned compression.
Figure 1: Encoder-Decoder network architecture diagram.
Eigenvalues measure how much variance in the data each component captures. When computed for the Shuttle dataset, 6 values are significantly larger than the remaining 3, meaning 6 directions in the sensor space contain almost all the meaningful signal. Setting k=6 forces the autoencoder to compress the 9 input features into exactly those 6 informative directions, discarding noise while preserving the structure needed to detect deviations.
The choice of k=6 is justified by the following spectral analysis, where the last 6 eigenvalues clearly separate from the first 3:
Figure 2: MSE loss evolution (K=6) showing model stabilization.
Convergence by epoch 76 of 80 means the model extracted all learnable structure from the normal class without over-training. A plateau this early in the schedule confirms k=6 is well-matched to the intrinsic dimensionality of the data. Choosing the latent dimension by eigenvalue analysis rather than grid search produced this clean convergence at a fraction of the compute cost.
Final classification relies on computing the reconstruction error (MSE). A threshold is determined to separate "normal" from "anomalous".
F-Measure formula used for evaluation.
Full distribution of reconstruction errors — normal data vs anomalies.
Figure 3: Zoomed view, separation between normal and anomalous distributions.
The clear gap between normal and anomalous reconstruction error distributions means threshold selection carries low risk. Overlapping distributions would force a precision-recall tradeoff with no clean decision boundary. Training exclusively on normal data creates this separation: the network never learns to reconstruct anomalies, so anomalous inputs consistently produce higher errors.
In other words, normal sensor readings cluster below a reconstruction error of roughly 0.03, while anomalous readings consistently fall well above that value. The threshold of 0.03 sits precisely at the gap visible in the zoomed histogram, making it the natural decision boundary where the two distributions are most separated. This value was then used as the reference threshold for the F1-score curve in the section below: the curve peaks at that exact point, confirming that 0.03 simultaneously minimizes missed anomalies and false alarms and is the optimal operating threshold for this model.
Figure 4: F-measure and Accuracy evolution as a function of the selected threshold.
A sharp F-measure peak at a single threshold value means the optimal cut point is stable and not sensitive to small calibration errors in deployment. A broad or flat peak would signal fragile behavior under distribution shift. The sharpness follows directly from the distributional separation shown in the histograms above.
An F-score of 0.9912 with 98.62% accuracy means the model misclassifies fewer than 2 samples in 100. For a sensor anomaly detection system, false negatives carry higher operational risk than false alarms, and this result keeps both rates low. Normalizing the test set using training statistics rather than the test distribution prevented anomalies from distorting the normalization scale, which directly contributed to this result.
Figure 5: Final anomaly detection results on the Shuttle dataset.
This project demonstrates that a lightweight autoencoder with a latent space of dimension 6 is sufficient to effectively detect anomalies in the Shuttle dataset. The reconstruction approach trains only on normal data, then thresholds the MSE error to classify. This proves robust and interpretable.