Robust CNNs for Sign Language

Advanced Image Classification with Data Augmentation and CNN Architectures

Github URL: Code Base
Tech Stack: Python, TensorFlow, Keras, NumPy, Scikit-learn, Google Cloud Storage
Model Architectures: Multiple CNN variants (A, B, C, D) with baseline and augmented training
Data Processing: Custom data augmentation pipeline, batch processing, image transformations
Performance Metrics: Loss tracking, accuracy measurements, batch-wise evaluation
Training Infrastructure: Google Colab, GPU acceleration, cloud storage integration

Project Overview

This project implements a sophisticated image classification system using convolutional neural networks (CNNs) with extensive data augmentation capabilities. The system features multiple CNN architectures, each trained with and without augmentation, to evaluate the impact of various data transformation techniques on model performance.

Key Features

Advanced Data Augmentation Pipeline
- Affine transformations with customizable parameters
- Scale adjustments (0.9x to 1.1x range)
- Positional shifts with precise offset control
- Contrast modifications (±10 to ±30 range)
- Brightness adjustments (±0.4 range)
- Batch-wise augmentation with controllable transformation percentages
Model Architecture Suite
- CNN Model A: Baseline accuracy ~99.97% on non-augmented data
- CNN Model B: Enhanced performance with augmentation resistance
- CNN Model C: Optimized for affine transformation handling
- CNN Model D: Specialized for brightness and contrast variations
- Each model trained with both augmented and non-augmented datasets
Training Infrastructure
- Custom BatchIter implementation for efficient data handling
- Transform sets for controlled data augmentation
- Early stopping and learning rate reduction strategies
- Automated model checkpointing and restoration

Technical Implementation

The system employs a modular architecture with separate transformation sets for different augmentation types. The TransformSet class enables dynamic application of transformations with configurable parameters. Each CNN model variant is trained using both augmented and baseline datasets to establish performance benchmarks and evaluate augmentation effectiveness.

Data Augmentation Details

Affine Transformations: Implements precise coordinate mapping with 3-point correspondence
Scale Modifications: Supports both uniform and non-uniform scaling operations
Positional Adjustments: Enables fine-grained control over image positioning
Intensity Modifications: Implements contrast and brightness adjustments with normalization

Performance Analysis

Baseline Models
- High accuracy on clean data (99.97%)
- Degraded performance on transformed inputs
- Consistent behavior across validation sets
Augmented Models
- Improved robustness to transformations
- Maintained accuracy on clean data
- Better generalization to unseen variations

Training Process

Models are trained using a sophisticated pipeline that includes early stopping, learning rate reduction, and automated model checkpointing. The training process utilizes GPU acceleration through Google Colab and integrates with cloud storage for efficient data handling. Each model variant undergoes extensive evaluation across different transformation types.

Future Enhancements

Integration of additional transformation types
Implementation of adaptive augmentation strategies
Enhanced performance metrics and visualization tools
Automated hyperparameter optimization
Extended model architecture exploration