Stroke Risk Modeling: Convolutional Networks and XGBoost with REST API
- Github URL: Code Base
- Tech Stack: Python, Keras, FastAPI, XGBoost, Scikit-learn, Pandas, Seaborn, NumPy, Tensorflow, Random Forest, Naive Bayes
- Machine Learning Models: XGBoost Classifier, Neural Network, Gaussian Naive Bayes
- Data Processing: RandomOverSampler, LabelEncoder, Train-Test Split
- Visualization: Seaborn, Matplotlib
- Deployment: FastAPI, Uvicorn, Pydantic
Project Overview
This comprehensive healthcare analytics project implements a sophisticated machine learning pipeline for stroke prediction using multiple models and advanced data processing techniques. The system processes patient health records including factors such as age, BMI, glucose levels, and lifestyle factors to predict stroke risk with high accuracy.
Key Features
- Data Preprocessing Pipeline
- Automated handling of categorical variables using LabelEncoder
- Null value management and data cleaning
- Implementation of RandomOverSampler to address class imbalance
- Feature engineering and standardization
- Model Implementation
- XGBoost Classifier with GridSearchCV optimization (97.91% accuracy)
- Deep Neural Network with custom architecture (79% accuracy)
- Gaussian Naive Bayes implementation (74.29% accuracy)
- Comprehensive model evaluation using confusion matrices and accuracy metrics
- API Development
- RESTful API implementation using FastAPI
- Pydantic models for request validation
- Real-time prediction endpoint with JSON response
- Automated model loading and initialization
Technical Implementation
The project utilizes an ensemble of machine learning models, with XGBoost emerging as the primary classifier due to its superior performance. The neural network implementation features a custom architecture with three dense layers and ReLU activation functions. The system employs advanced hyperparameter tuning through GridSearchCV, optimizing for parameters including learning rate, max depth, and alpha values.
Model Performance
- XGBoost Classifier: 97.91% accuracy with optimized hyperparameters
- Neural Network: Achieved 79% accuracy with MSE optimization
- Naive Bayes: 74.29% accuracy as baseline comparison
API Integration
The FastAPI implementation provides a robust prediction endpoint that accepts JSON payloads containing patient data. The API features automatic request validation, error handling, and efficient model serving capabilities. The system is containerized and can be easily deployed to cloud platforms.
Future Enhancements
- Integration of additional biomarkers and health indicators
- Implementation of real-time model retraining pipeline
- Enhanced visualization dashboard for risk factor analysis
- Expanded API functionality with batch prediction capabilities