CFD is a modular, high-performance fraud detection system designed to identify malicious financial transactions with exceptional accuracy. Built on the PaySim dataset, it processes millions of transaction logs to flag fraud in real-time, leveraging advanced ensemble learning techniques to handle extreme class imbalances.
Tip
Deep Dive: For a technical breakdown, see the System Architecture Guide. For a business perspective on value and usage, including impact and reliability analysis, see the Executive Summary.
- Advanced AI Modeling: Utilizes weighted Random Forest and Gradient Boosting classifiers.
- Enterprise Architecture: Fully modular design with separate data ingestion, feature engineering, and evaluation layers.
- Audit Compliance: Comprehensive JSON-based audit logging for every system action.
- Automated Reporting: Generates instant performance metrics (ROC-AUC, Confusion Matrix).
- Production Ready: rapid train/predict capabilities via CLI.
The current release features a Random Forest Classifier trained on 6.3 million transactions.
| Metric | Score | Notes |
|---|---|---|
| ROC-AUC | 0.999 | Excellent discrimination capability |
| Precision | 1.00 | Zero false positives on test set |
| Recall | 1.00 | 100% fraud detection rate on test set |
Important
Performance Note: This model is trained on synthetic data (PaySim), where fraud patterns are often deterministic (e.g., specific account emptying rules). The near-perfect scores (0.999 AUC) are expected for this specific dataset but would likely be lower (~0.95+) in a real-world, noisy environment. We have verified that this is not due to target leakage (see Feature Analysis).
The trained model is available at models/fraud_model.pkl.
credit_card_fraud/
├── data/ # Dataset storage
├── docs/ # System Documentation
│ └── SYSTEM_OVERVIEW.md
├── src/ # Core Logic
│ ├── data_loader.py # Ingestion & Cleaning
│ ├── features.py # Feature Engineering
│ ├── model.py # Model Definition
│ ├── evaluation.py # Reporting
│ └── utils.py # Logging
├── tests/ # Unit Tests
├── logs/ # Audit Logs
├── models/ # Serialized Models
└── main.py # CLI Entry Point# Clone the repository
git clone https://github.com/arpahls/cfd.git
cd cfd
# Setup Environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install Dependencies
pip install -r requirements.txtTrain the Model
python main.py --mode train --model_type rfRun Predictions
python main.py --mode predict --model_path models/fraud_model.pklRun Tests
pytest tests/For details on our Unit vs. Integration testing strategy, see the Testing Guide.
This project is licensed under the MIT License - see the LICENSE file for details.

