Skip to content

Full-stack fraud detection prototype using machine learning to identify fraudulent insurance claims

License

Notifications You must be signed in to change notification settings

Symfa-Inc/fraud-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Fraud Detection Logo

πŸ•΅οΈ Insurance Claim Fraud Detection

Python 3.13 TypeScript FastAPI Next.js scikit-learn Docker

Machine learning prototype for detecting fraudulent insurance claims.

πŸ”— Live Demo: fraud-detection-demo.symfa.com

πŸ“‹ Overview

A machine learning prototype for detecting fraudulent insurance claims, based on the 2023 Travelers NESS Statathon Kaggle Competition. This project aims to develop a robust fraud detection system for insurance claims using machine learning techniques. Fraudulent claims cost the insurance industry billions of dollars annually, making accurate detection crucial for maintaining affordable premiums and operational efficiency.

🎯 Problem Statement

The goal is to build a predictive model that can identify potentially fraudulent insurance claims based on various claim and policyholder characteristics. This binary classification task helps insurance companies:

  • Reduce financial losses from fraudulent claims
  • Streamline the claims investigation process
  • Allocate investigation resources more efficiently

πŸ“ Project Structure

fraud-detection/
β”œβ”€β”€ backend/                        # 🐍 Python Backend (UV workspace member)
β”‚   β”œβ”€β”€ src/fraud_detection/        # FastAPI application
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── main.py                 # API endpoints
β”‚   β”œβ”€β”€ models/                     # Trained ML model artifacts
β”‚   β”œβ”€β”€ notebooks/                  # Jupyter notebooks (EDA, experiments)
β”‚   β”œβ”€β”€ scripts/                    # Training & preprocessing scripts
β”‚   β”œβ”€β”€ data/                       # Datasets
β”‚   β”‚   └── source.csv
β”‚   └── pyproject.toml              # Backend dependencies
β”‚
β”œβ”€β”€ frontend/                       # βš›οΈ Next.js Frontend
β”‚   β”œβ”€β”€ src/app/
β”‚   β”‚   β”œβ”€β”€ layout.js
β”‚   β”‚   β”œβ”€β”€ page.js
β”‚   β”‚   └── globals.css
β”‚   └── package.json
β”‚
β”œβ”€β”€ pyproject.toml                  # UV workspace definition
β”œβ”€β”€ uv.lock                         # Lockfile
β”œβ”€β”€ .pre-commit-config.yaml         # Code quality hooks
└── README.md

πŸ“Š Dataset

The dataset contains insurance claim records with the following features:

Driver Demographics

Feature Description
age_of_driver Age of the driver
gender Gender of the driver (M/F)
marital_status Marital status indicator
annual_income Annual income of the policyholder
high_education_ind Higher education indicator
living_status Living status (Own/Rent)
zip_code ZIP code of the policyholder

Claim Information

Feature Description
claim_number Unique claim identifier
claim_date Date of the claim
claim_day_of_week Day of the week when claim was filed
accident_site Location type of the accident
past_num_of_claims Number of past claims
witness_present_ind Whether a witness was present
liab_prct Liability percentage
channel Claim submission channel
policy_report_filed_ind Whether a policy report was filed
claim_est_payout Estimated claim payout amount

Vehicle Information

Feature Description
age_of_vehicle Age of the vehicle
vehicle_category Category of the vehicle
vehicle_price Price of the vehicle
vehicle_color Color of the vehicle
vehicle_weight Weight of the vehicle
safty_rating Safety rating of the vehicle

Target Variable

Feature Description
fraud Target (1 = Fraudulent, 0 = Legitimate)

πŸ› οΈ Tech Stack

Backend

  • Python 3.13+
  • FastAPI - Modern, high-performance web framework
  • Pydantic - Data validation
  • uvicorn - ASGI server

Frontend

  • Next.js 16 - React framework with SSR
  • TypeScript - Type-safe JavaScript
  • Tailwind CSS 4 - Utility-first CSS framework
  • React 19

ML & Data Science

  • pandas - Data manipulation
  • scikit-learn - Machine learning (planned)

Development

  • uv - Fast Python package manager
  • pre-commit - Git hooks for code quality
  • ruff - Linter and formatter
  • mypy - Static type checker

πŸš€ Getting Started

Prerequisites

  • Python 3.13+
  • Node.js 18+
  • pnpm (fast and efficient Node.js package manager)
  • uv (recommended for Python)

Installation

  1. Clone the repository:

    git clone https://github.com/Symfa-Inc/fraud-detection.git
    cd fraud-detection
  2. Install Python dependencies:

    uv sync
  3. Install frontend dependencies:

    cd frontend
    pnpm install

Running the Application

Backend (FastAPI):

uv run uvicorn fraud_detection.main:app --reload

API will be available at: http://localhost:8000 API docs at: http://localhost:8000/docs

Frontend (Next.js):

cd frontend
pnpm dev

Frontend will be available at: http://localhost:3000

πŸ”— References

About

Full-stack fraud detection prototype using machine learning to identify fraudulent insurance claims

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •