Anomaly Detection with an Autoencoder

Encoder-Decoder Network on the Shuttle Dataset

Université de Sherbrooke

1. Hyperparameter Selection

This project implements an autoencoder neural network dedicated to anomaly detection on the Shuttle dataset. The model learns to reconstruct only normal data in order to identify deviations during the test phase.

Parameter selection is based on a statistical analysis of the data to ensure optimal compression.

Selected Parameters
  • Latent Space (k = 6): Based on eigenvalue analysis — 6 significant values are observed
  • Learning Rate: 0.0015 — ensures smooth loss stabilization
  • Optimizer: Adam — efficient convergence
  • Epochs: 80 — convergence plateau reached around epoch 76

Eigenvalue Analysis

The choice of k=6 for the latent space is justified by the following spectral analysis, where 6 eigenvalues clearly stand out:

λ₁ = 7.9204 × 10⁻⁴  |  λ₂ = 1.2390 × 10⁻⁴  |  λ₃ = 2.3841 × 10⁻³
λ₄ = 6.2564 × 10⁻¹  |  λ₅ = 9.8503 × 10⁻¹  |  λ₆ = 1.0004
λ₇ = 1.0245  |  λ₈ = 1.7436  |  λ₉ = 3.6172

Technology Stack

Python 3.x
PyTorch
NumPy
Matplotlib
Scikit-learn
MSE Loss Curve

Figure 1: MSE loss evolution (K=6) showing model stabilization.

2. Network Architecture

The network adopts a symmetric "hourglass" structure to extract the essential features from the signals.

Architecture Specifications
  • Input/Output dimensions: 1×9 → 1×9
  • Encoder: Three progressive linear layers (9 → 8 → 7 → 6)
  • Activations: ReLU after each linear layer
  • Latent Space: Compressed dimension of 6
  • Decoder: Mirror structure of the encoder (6 → 7 → 8 → 9)
  • Final output: No activation — faithful reconstruction of real values

ReLU layers after each linear layer introduce the non-linearity needed to capture complex relationships in the data. The absence of output activation allows unconstrained reconstruction of continuous values.

Input
9
dimensions
Latent Space
6
dimensions
Output
9
dimensions
Encoder-Decoder Architecture

Figure 2: Encoder-Decoder network architecture diagram.

3. Training Strategy

Training is optimized to process normal data only, effectively ignoring anomalies.

Training Parameters
  • Batch Size: 256 — maximizes computation speed while avoiding excessive generalization
  • Training: Exclusively on normal class data (class 1)
  • Loss Function: Mean Squared Error (MSE)
Preprocessing & Normalization
  • Preprocessing: Removal of invalid data and label separation to keep only the 9 sensor dimensions
  • Normalization: Computed only on valid training data (class 1)
  • Test set: Normalized using the training set's mean and standard deviation to prevent anomalies from biasing the scale

4. Optimal Threshold Determination

Final classification relies on computing the reconstruction error (MSE). A threshold is determined to separate "normal" from "anomalous".

Determination Method
  • Objective: Find the ideal trade-off minimizing False Negatives (FN) and False Positives (FP)
  • Visual analysis: Loss histograms to identify the separation between the two distributions
  • Evaluation: Validated by the F-measure (F1-score) curve as a function of the threshold
F-Measure Formula

F-Measure formula used for evaluation.

Reconstruction Loss Histograms

Loss Histogram

Full distribution of reconstruction errors — normal data vs anomalies.

Loss Histogram (zoomed)

Figure 4: Zoomed view — separation between normal and anomalous distributions.

F-Measure & Accuracy Curve

F-Measure Curve

Figure 5: F-measure and Accuracy evolution as a function of the selected threshold.

5. Results

Final performance demonstrates the model's high precision on this dataset.

F-Score
0.9912
F1-Score
Accuracy
98.62%
on test set
Test Loss (MSE)
2.5372
final
Model Results

Final anomaly detection results on the Shuttle dataset.

6. Conclusion

Project Summary

This project demonstrates that a lightweight autoencoder with a latent space of dimension 6 is sufficient to effectively detect anomalies in the Shuttle dataset. The reconstruction approach — training only on normal data then thresholding the MSE error — proves to be robust and interpretable.

Key Highlights

  • F-score of 0.9912 — near-perfect performance on this dataset
  • Accuracy of 98.62% — excellent generalization
  • Minimalist symmetric architecture — 9 → 6 → 9 with ReLU
  • Stable convergence reached by epoch 76 out of 80
  • Optimal threshold determined by F-measure analysis, minimizing FP and FN

Future Improvements

  • Variational Autoencoder (VAE): Probabilistic latent space modeling for better generalization
  • Automatic threshold search: Threshold optimization via cross-validation
  • Other datasets: Test robustness on KDD Cup, MNIST-Anomaly, etc.
  • LSTM Autoencoder: Leverage the temporal dimension of sensor data

Resources & Links

View Source Code on GitHub
Associated Documents
  • Dataset: UCI Shuttle Dataset — 9 sensor features, normal/anomaly classes
  • Source Code: Model, training and evaluation scripts on GitHub