BMI Predictor is a production-ready Machine Learning pipeline that predicts BMI categories from basic statistics (weight, height, and gender) using a Random Forest Classifier. The model achieves 100% accuracy on the test set and includes comprehensive experiment tracking with MLflow and Docker containerization for easy deployment.
- 6 BMI Categories: From "Extremely Weak" to "Extreme Obesity"
- Interactive Dashboard: Real-time predictions with visual gauges
- Experiment Tracking: Full MLflow integration for hyperparameter tuning
- Production Ready: Docker containerization with health checks
- CI/CD Automation: GitHub Actions pipeline with Docker registry
- End-to-end processing from ETL to Model Training
- Random Forest Classifier with hyperparameter optimization
- Data cleaning, outlier detection, and feature engineering
- 100% accuracy on 98-sample test set
- Streamlit-based web UI for real-time predictions
- Visual gauges and color-coded health categories
- Responsive design with sidebar controls
┌─────────────────────────────────────────┐
│ ⚖️ BMI Health Dashboard │
│ │
│ [User Input Panel] [Results Card] │
│ - Gender: [Male] ┌──────────────┐ │
│ - Weight: [70 kg] │ Normal │ │
│ - Height: [170 cm] │ BMI: 24.2 │ │
│ └──────────────┘ │
│ [Calculate Button] [Health Gauge] │
└─────────────────────────────────────────┘
- Hyperparameter Tracking: GridSearchCV with 12 parameter combinations
- Metrics Logging: Accuracy, precision, recall, F1 per class
- Artifact Management: Model files (.pkl) and confusion matrices
- Model Registry: Versioned models with MLflow model registry
- Multi-stage build for optimized image size (~150MB)
- Health checks and automatic restart policies
- Environment variable support for flexible deployment
- Docker Compose for local development
bmi-predictor/
├── 📁 dashboard/ # Streamlit web application
│ └── app.py # Main dashboard entry point
├── 📁 data/ # Datasets
│ ├── bmi.csv # Raw data
│ └── bmi_cleaned.csv # Preprocessed data (486 samples)
├── 📁 models/ # Trained models
│ ├── *.joblib # scikit-learn pipelines
│ └── models_exported/
│ └── bmi_model.pkl # Production model
├── 📁 notebooks/ # Jupyter notebooks (workflow)
│ ├── 01-ETL.ipynb # Extract, Transform, Load
│ ├── 02-EDA.ipynb # Exploratory Data Analysis
│ ├── 03-Training.ipynb # Model training with MLflow
│ └── 04-Testing.ipynb # Model validation
├── 📁 scripts/ # CI/CD automation (NEW)
│ ├── train_with_mlflow.py # Automated training script
│ └── utils/
│ └── mlflow_utils.py # MLflow helper functions
├── 📁 mlruns/ # MLflow tracking data (auto-generated)
├── 📁 mlruns_artifacts/ # MLflow artifacts (auto-generated)
├── 🐳 Dockerfile # Docker image definition
├── 🐳 docker-compose.yml # Docker Compose configuration
├── 📄 .dockerignore # Docker build exclusions
├── 📄 requirements.txt # Python dependencies
├── 📄 AGENTS.md # AI agent guidelines
└── 📄 README.md # This file
┌─────────────────────────────────────────────────────────────┐
│ User Interface │
│ (Streamlit Dashboard - Port 8501) │
└─────────────────────┬───────────────────────────────────────┘
│ HTTP Requests
▼
┌─────────────────────────────────────────────────────────────┐
│ Docker Container │
│ ┌─────────────────┐ ┌─────────────────────────────────┐ │
│ │ Streamlit App │───▶│ Model Inference │ │
│ │ (dashboard/) │ │ - Random Forest Classifier │ │
│ └─────────────────┘ │ - Input: Height, Weight, │ │
│ │ Gender │ │
│ │ - Output: BMI Category (0-5) │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼ Training/Experimentation
┌─────────────────────────────────────────────────────────────┐
│ MLflow Tracking Server │
│ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────┐ │
│ │ Hyperparameters│ │ Metrics │ │ Artifacts │ │
│ │ - n_estimators │ │ - Accuracy │ │ - .pkl models│ │
│ │ - max_depth │ │ - Precision │ │ - Confusion │ │
│ │ - GridSearchCV │ │ - Recall/F1 │ │ Matrix │ │
│ └─────────────────┘ └─────────────────┘ └────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Choose your preferred deployment method:
# 1. Activate virtual environment
source venv/bin/activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. Launch the dashboard
streamlit run dashboard/app.py
# Access at: http://localhost:8501# Using Docker Compose (recommended)
docker-compose up --build
# Or using Docker directly
docker build -t bmi-predictor .
docker run -p 8501:8501 bmi-predictor
# Access at: http://localhost:8501# Run automated training with full tracking
python scripts/train_with_mlflow.py
# View results in MLflow UI
mlflow ui --backend-store-uri file://$(pwd)/mlruns
# Access at: http://localhost:5000| Component | Details |
|---|---|
| Hyperparameters | n_estimators, max_depth, min_samples_split, random_state |
| Cross-Validation | 5-fold GridSearchCV with 12 parameter combinations |
| Metrics | Accuracy, Precision, Recall, F1 (macro & weighted) |
| Artifacts | best_model.pkl, confusion_matrix.png |
| Model Registry | Versioned models: bmi-predictor-rf |
$ python scripts/train_with_mlflow.py
- Docker Engine 20.10+
- Docker Compose 2.0+ (optional)
# Quick start with Docker Compose
docker-compose up --build
# Manual Docker build
docker build -t bmi-predictor .
docker run -d \
--name bmi-predictor \
-p 8501:8501 \
--restart unless-stopped \
bmi-predictor| Variable | Default | Description |
|---|---|---|
MODEL_PATH |
/app/models/models_exported/bmi_model.pkl |
Path to model file |
STREAMLIT_SERVER_ADDRESS |
0.0.0.0 |
Server bind address |
STREAMLIT_SERVER_PORT |
8501 |
Server port |
- Health Checks: Automatic container health monitoring
- Volume Mounting: Easy model updates without rebuild
- Restart Policy:
unless-stoppedfor production stability - Port Mapping: Host 8501 → Container 8501
This project uses GitHub Actions for continuous integration and deployment.
| Stage | Description | Trigger |
|---|---|---|
| Test | Run pytest, validate model with test suite | Push/PR to any branch |
| Build | Build Docker image, vulnerability scan with Trivy | Push/PR to any branch |
| Deploy | Push image to GitHub Container Registry (ghcr.io) | Push to main or master |
- ✅ Automated Testing: Runs
pytest test/on every commit - 📤 Docker Image Building: Multi-stage build with Buildx
- 🔍 Security Scanning: Trivy vulnerability scanner for critical/high CVEs
- 📊 Artifact Upload: Test results and reports
- 🏷️ Multi-tag Support:
latest, branch names, commit SHA, and PR tags - 💾 Build Caching: GitHub Actions cache for faster builds
Images are automatically pushed to ghcr.io:
# Pull the latest image
docker pull ghcr.io/simon-ramirez28/bmi-predictor:latest
# Run the container
docker run -p 8501:8501 ghcr.io/simon-ramirez28/bmi-predictor:latestThe workflow is defined in .github/workflows/ci-cd.yml and runs on:
- Push to
main,master, ordevelopbranches - Pull requests to
main,master, ordevelopbranches
-
📥 ETL (
01-ETL.ipynb)- Load raw data from
data/bmi.csv - Clean duplicates (11 removed) and outliers (3 removed)
- Calculate BMI values and encode gender
- Export:
data/bmi_cleaned.csv(486 samples)
- Load raw data from
-
📈 EDA (
02-EDA.ipynb)- Statistical analysis and visualizations
- Distribution analysis by BMI category
- Correlation matrices and pair plots
-
🎯 Training (
03-Training.ipynb)- Train/test split (80/20) with stratification
- Feature scaling with StandardScaler
- GridSearchCV hyperparameter tuning
- MLflow experiment tracking integration
-
✅ Testing (
04-Testing.ipynb)- Model validation on holdout set
- Confusion matrix analysis
- Classification report generation
-
🔬 MLflow Tracking (
scripts/train_with_mlflow.py)- Automated training pipeline
- Hyperparameter logging
- Model artifact management
- Version control with MLflow registry
- Accuracy: 100%
- Precision: 1.00 (macro avg)
- Recall: 1.00 (macro avg)
- F1-Score: 1.00 (macro avg)
| BMI Category | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| 0 - Extremely Weak | 1.00 | 1.00 | 1.00 | 2 |
| 1 - Weak | 1.00 | 1.00 | 1.00 | 4 |
| 2 - Normal | 1.00 | 1.00 | 1.00 | 14 |
| 3 - Overweight | 1.00 | 1.00 | 1.00 | 13 |
| 4 - Obesity | 1.00 | 1.00 | 1.00 | 26 |
| 5 - Extreme Obesity | 1.00 | 1.00 | 1.00 | 39 |
This project follows AI agent guidelines defined in AGENTS.md. Key conventions:
- Code Style: 4-space indentation, snake_case naming
- Logging: Use
loggingmodule with emoji indicators - Paths: Use
os.path.join()for cross-platform compatibility - Reproducibility: Always set
RANDOM_STATE = 42
See AGENTS.md for complete guidelines.
- Total Samples: 500 (raw) → 486 (cleaned)
- Features: Height (cm), Weight (kg), Gender, BMI_Value
- Target: 6 BMI categories (Index 0-5)
- Train/Test Split: 388 / 98 samples (80/20)
- Python: 3.11+
- ML: scikit-learn, pandas, numpy
- Dashboard: Streamlit, Plotly
- Experiment Tracking: MLflow 2.10+
- Containerization: Docker, Docker Compose
- Visualization: Matplotlib, Seaborn
- Chrome 90+
- Firefox 88+
- Safari 14+
- Edge 90+
This project is licensed under the MIT License - see the LICENSE file for details.
- BMI dataset from Kaggle
- Built with Streamlit and scikit-learn
- Experiment tracking powered by MLflow
For issues or questions:
- Check
AGENTS.mdfor development guidelines - Review the GitHub Issues page