GitHub

BigOCR PDF is a powerful utility integrated into the Linux desktop environment (specifically optimized for BigLinux) that brings Optical Character Recognition (OCR) capabilities to your PDF documents and image files. It seamlessly transforms scanned documents into searchable PDFs and allows for easy text extraction from images or screen regions.

✨ Features

Make PDFs Searchable: Convert scanned non-searchable PDFs into files where you can search, select, and copy text.
Image OCR: Extract text directly from standard image files (JPG, PNG, etc.).
Screen Capture Integration: Extract text from anywhere on your screen—perfect for grabbing text from videos, protected websites, or UI elements—by selecting a rectangular region.
Batch Processing: Efficiently process multiple files at once directly from your file manager.

🚀 Usage

1. Processing PDF Files

Scanned PDFs often lack a text layer. To fix this:

Open your file manager.
Select one or more PDF files.
Right-click and select the "OCR" option.
A new, searchable version of the file will be generated.

2. Extracting Text from Images

Right-click on any image file.
Select "Extract text from image (OCR)".
The extracted text will be available for use.

3. Screen Text Extraction

For text that cannot be selected normally (e.g., inside a video or image on a website):

Launch your screenshot tool (e.g., press Print Screen).
Select the "Rectangular Region" tool.
Highlight the area containing the text you want to copy.
Click "Export" and choose "Extract text from image (OCR)".

🛠️ Installation & Development

Prerequisites

Ensure you have the following system dependencies installed:

Python 3.10 or higher
GTK4 and Libadwaita
OCRmyPDF (the core OCR engine)
Tesseract OCR
Ghostscript

Building from Source

To install the latest version from the repository:

# Clone the repository
git clone https://github.com/biglinux/bigocrpdf.git
cd bigocrpdf

# Install the package
pip install .

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.github/workflows		.github/workflows
bigocrpdf		bigocrpdf
pkgbuild		pkgbuild
src/bigocrpdf		src/bigocrpdf
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
default.nix		default.nix
flake.nix		flake.nix
pyproject.toml		pyproject.toml
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

✨ Features

🚀 Usage

1. Processing PDF Files

2. Extracting Text from Images

3. Screen Text Extraction

🛠️ Installation & Development

Prerequisites

Building from Source

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

biglinux/bigocrpdf

Folders and files

Latest commit

History

Repository files navigation

✨ Features

🚀 Usage

1. Processing PDF Files

2. Extracting Text from Images

3. Screen Text Extraction

🛠️ Installation & Development

Prerequisites

Building from Source

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages