Réseaux de neurones convolutifs en traitement d'images | GRO721
Université de Sherbrooke | 25 février 2025
This project is a proof of concept (PoC) for a baggage screening system design company. The objective is to determine the feasibility of developing commercial software based on deep learning techniques to automate the baggage examination process.
Develop a system capable of classifying, detecting and segmenting simple geometric shapes (circle, triangle, cross) in grayscale images. This task is a simplified model of the real problem of object detection in airport scanner images.
The classification model identifies the presence or absence of each shape in an image. Inspired by AlexNet, the architecture consists of four segments (Conv + ReLU + MaxPool) for progressive feature extraction, followed by two fully-connected layers.
The use of Sigmoid rather than Softmax is justified because each shape is independent (an image can contain 0, 1, 2 or 3 different shapes).
The detection model generates bounding boxes around identified shapes. Inspired by YOLO, the network consists of three convolution segments with BatchNorm and LeakyReLU, producing an output of dimension (1, 3, 7) for a 3×7 grid.
Adding BatchNorm improves gradient stability and accelerates convergence. LeakyReLU (slope 0.1) prevents the "dying ReLU" problem.
The segmentation model classifies each pixel according to its class (circle, triangle, cross or background). The U-Net architecture uses an encoder to extract features and a decoder to reconstruct the image, with skip-connections to preserve spatial information.
Skip-connections are crucial for recovering spatial information lost during MaxPool. They enable precise reconstruction of shape contours.
Data is loaded via a ConveyorSimulator class inheriting from torch.utils.data.Dataset. The split used is 90/5/5 (train/validation/test) instead of the standard 70/15/15, maximizing training data for this small dataset.
Symptom: Training loss decreases, validation loss plateaus
Solution: Early stopping, learning rate adjustment, data augmentation
Symptom: Some classes less frequent, model bias
Solution: Class weighting in the loss function
Symptom: Significant fluctuations during validation
Solution: Reduced learning rate (0.001), BatchNorm in layers
The classification model shows solid performance with 92.5% accuracy. The training curve shows a progressive decrease in loss, while precision increases steadily. Fluctuations in validation may be caused by natural variations in the dataset.
Tests on images show that the model recognizes shapes correctly. The main error cases correspond to overlapping shapes or partially visible shapes. Performance can be improved through data augmentation (random rotations, zooms).
Detection achieves an mAP of 78.3%, showing good ability to locate and classify objects. However, fluctuations are visible on the validation set, suggesting potential instability and overfitting. Some bounding boxes are not perfectly aligned with the actual objects.
Results show some error cases: imperfect box alignment, shape classification errors, multiple detections of the same object. Improvements could come from revising the loss function or more aggressive data augmentation (random rotations).
Semantic segmentation achieves an average IoU of 79.1%, demonstrating the U-Net model's ability to accurately segment shapes at the pixel level. The architecture with skip-connections proves very effective for preserving contours. Training curves show stable convergence.
Segmentation results show excellent correspondence between predictions and ground truth. The model accurately segments shape contours, even in overlapping cases. An epoch number of 17-25 proves optimal; beyond 25 the model saturates without improvement.
| Task | Key Metric | Value | Parameters | Inference Time |
|---|---|---|---|---|
| Classification | Accuracy | 92.5% | ~180 K / 200 K | ~2-3 ms |
| Detection | mAP | 78.3% | ~390 K / 400 K | ~4-5 ms |
| Segmentation | IoU | 79.1% | ~994 K / 1000 K | ~8-10 ms |
Classification (7.5% error): Errors mainly on images with 2-3 overlapping or partially visible shapes. Model sometimes confuses boundaries between two shapes.
Detection (21.7% error): Box alignment, shape classification errors, multiple detections. Likely caused by too high a learning rate or suboptimal loss function.
Segmentation (20.9% error): Mainly on fine contours or overlaps. Excellent IoU for well-separated shapes, less good at boundaries.
The three models demonstrate a good balance between performance and computational efficiency:
Total inference time to process an image (classification → detection → segmentation) would be approximately 14-18 ms, acceptable for an airport scanner application requiring fast processing but not real-time (<100 ms acceptable).
This project validated the feasibility of a proof of concept to automate scanner image processing through convolutional neural networks. The three developed architectures (classification, detection, segmentation) demonstrate that optimized models can achieve good performance even with strict resource constraints.
Team 6: Andrei Corduneanu (cora5428), Marek Théoret (them0901)
Course: GRO-720 - Artificial Neural Networks
Date: February 25, 2025
University: University of Sherbrooke