Skip to content

The aim of this project is to generate a dataset of captchas associated with the segmentation mask of the characters. It will be used to train a model later.

License

Notifications You must be signed in to change notification settings

malerbe/captcha_dataset_generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Captcha Dataset Generator with Mask Extraction

Generate datasets of captchas and their character-level masks for training computer vision (OCR, segmentation, etc) models.
Based on the captcha library, enhanced with mask extraction for each character, via my custom fork here.


Features

  • Easy generation of captcha images with associated per-character mask images
  • Customizable captcha length, fonts, noise, backgrounds etc.

📸 Examples

Captcha Example Character Mask Example
captcha example mask example

(Replace the image paths above with your own example images)


Usage Instructions

1. Clone this repository

git clone https://github.com/your-username/captcha-dataset-generator.git
cd captcha-dataset-generator

2. Install my improved captcha library (with mask support) /!\ IMPORTANT STEP /!\

pip install git+https://github.com/your-username/captcha-masks.git

3. Generate captchas and masks

python generate_dataset.py --count <nbr of captchas to generate> --output /path/to/output/directory --length <nbr of characters in each captcha>

Requirements

captcha (modified fork)

License

MIT

About

The aim of this project is to generate a dataset of captchas associated with the segmentation mask of the characters. It will be used to train a model later.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published