Machine learning prototype for detecting fraudulent insurance claims.
π Live Demo: fraud-detection-demo.symfa.com
A machine learning prototype for detecting fraudulent insurance claims, based on the 2023 Travelers NESS Statathon Kaggle Competition. This project aims to develop a robust fraud detection system for insurance claims using machine learning techniques. Fraudulent claims cost the insurance industry billions of dollars annually, making accurate detection crucial for maintaining affordable premiums and operational efficiency.
The goal is to build a predictive model that can identify potentially fraudulent insurance claims based on various claim and policyholder characteristics. This binary classification task helps insurance companies:
- Reduce financial losses from fraudulent claims
- Streamline the claims investigation process
- Allocate investigation resources more efficiently
fraud-detection/
βββ backend/ # π Python Backend (UV workspace member)
β βββ src/fraud_detection/ # FastAPI application
β β βββ __init__.py
β β βββ main.py # API endpoints
β βββ models/ # Trained ML model artifacts
β βββ notebooks/ # Jupyter notebooks (EDA, experiments)
β βββ scripts/ # Training & preprocessing scripts
β βββ data/ # Datasets
β β βββ source.csv
β βββ pyproject.toml # Backend dependencies
β
βββ frontend/ # βοΈ Next.js Frontend
β βββ src/app/
β β βββ layout.js
β β βββ page.js
β β βββ globals.css
β βββ package.json
β
βββ pyproject.toml # UV workspace definition
βββ uv.lock # Lockfile
βββ .pre-commit-config.yaml # Code quality hooks
βββ README.md
The dataset contains insurance claim records with the following features:
| Feature | Description |
|---|---|
age_of_driver |
Age of the driver |
gender |
Gender of the driver (M/F) |
marital_status |
Marital status indicator |
annual_income |
Annual income of the policyholder |
high_education_ind |
Higher education indicator |
living_status |
Living status (Own/Rent) |
zip_code |
ZIP code of the policyholder |
| Feature | Description |
|---|---|
claim_number |
Unique claim identifier |
claim_date |
Date of the claim |
claim_day_of_week |
Day of the week when claim was filed |
accident_site |
Location type of the accident |
past_num_of_claims |
Number of past claims |
witness_present_ind |
Whether a witness was present |
liab_prct |
Liability percentage |
channel |
Claim submission channel |
policy_report_filed_ind |
Whether a policy report was filed |
claim_est_payout |
Estimated claim payout amount |
| Feature | Description |
|---|---|
age_of_vehicle |
Age of the vehicle |
vehicle_category |
Category of the vehicle |
vehicle_price |
Price of the vehicle |
vehicle_color |
Color of the vehicle |
vehicle_weight |
Weight of the vehicle |
safty_rating |
Safety rating of the vehicle |
| Feature | Description |
|---|---|
fraud |
Target (1 = Fraudulent, 0 = Legitimate) |
- Python 3.13+
- FastAPI - Modern, high-performance web framework
- Pydantic - Data validation
- uvicorn - ASGI server
- Next.js 16 - React framework with SSR
- TypeScript - Type-safe JavaScript
- Tailwind CSS 4 - Utility-first CSS framework
- React 19
- pandas - Data manipulation
- scikit-learn - Machine learning (planned)
- uv - Fast Python package manager
- pre-commit - Git hooks for code quality
- ruff - Linter and formatter
- mypy - Static type checker
- Python 3.13+
- Node.js 18+
- pnpm (fast and efficient Node.js package manager)
- uv (recommended for Python)
-
Clone the repository:
git clone https://github.com/Symfa-Inc/fraud-detection.git cd fraud-detection -
Install Python dependencies:
uv sync
-
Install frontend dependencies:
cd frontend pnpm install
Backend (FastAPI):
uv run uvicorn fraud_detection.main:app --reloadAPI will be available at: http://localhost:8000 API docs at: http://localhost:8000/docs
Frontend (Next.js):
cd frontend
pnpm devFrontend will be available at: http://localhost:3000