OCRTextReader

A lightweight Windows desktop application that extracts text from images, PDFs, and Office documents with ease.

Features

Image Selection: Select images or documents (JPG, PNG, BMP, GIF, TIFF, PDF, EXCEL, POWERPOINT (PPTX)
OCR Processing: Extract text from images using Tesseract OCR engine
Text Preview: View extracted text in the application
Word Export: Export extracted text to Microsoft Word (.docx) format
User-Friendly Interface: Clean and intuitive Windows Forms UI built using the ReaLTaiizor UI framework.

Download

Download Link

Prerequisites

.NET 4.8 SDK or later
Tesseract OCR installed on your system

Installing Tesseract OCR

Option 1: Using Installer (Recommended for Windows)

Download Tesseract OCR installer from: https://github.com/UB-Mannheim/tesseract/wiki
Run the installer and install to default location (usually C:\Program Files\Tesseract-OCR)
The installer includes English language data files by default

Option 2: Using Chocolatey

choco install tesseract

Option 3: Manual Installation

Download Tesseract binaries
Extract to a folder (e.g., C:\Tesseract-OCR)
Download language data files from: https://github.com/tesseract-ocr/tessdata
Place eng.traineddata in the tessdata folder

Building the Application

Open a terminal in the project directory
Restore NuGet packages:
```
dotnet restore
```
Build the project:
```
dotnet build
```
Run the application:
```
dotnet run
```

Usage

Launch the application
Click "Select Image/Document" to choose an image file
Click "Extract Text (OCR)" to process the image and extract text
Review the extracted text in the text box
Click "Export to Word Document" to save the text as a .docx file

Project Structure

OCRTextReaderApp/
├── MainForm.cs          # Main UI form
├── OCRService.cs        # OCR text extraction service
├── WordExportService.cs # Word document export service
├── Program.cs           # Application entry point
├── OCRTextReader.csproj # Project file

Dependencies

Tesseract: OCR engine for text extraction
DocumentFormat.OpenXml: For creating Word documents

Troubleshooting

"OCR processing failed" Error

Ensure Tesseract OCR is installed
Verify that eng.traineddata exists in the tessdata folder
Check that the tessdata path is accessible

"No text could be extracted"

The image quality might be too low
Try using higher resolution images
Ensure the image contains clear, readable text
Check if the text is in a supported language (English by default)

Notes

The application currently supports English text extraction by default
To add support for other languages, download the corresponding language data files from the Tesseract tessdata repository
PDF files may require additional processing depending on their format

Third-Party Libraries

This project uses the following open-source libraries:

Tesseract OCR – Licensed under the Apache License 2.0
https://github.com/tesseract-ocr/tesseract
DocumentFormat.OpenXml – Licensed under the MIT License
https://github.com/OfficeDev/Open-XML-SDK
ReaLTaiizor – Licensed under the MIT License
https://github.com/Taiizor/ReaLTaiizor

Screenshot

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vs/OCRTextReader		.vs/OCRTextReader
Properties		Properties
Resources		Resources
Setup		Setup
bin		bin
obj		obj
.gitattributes		.gitattributes
.gitignore		.gitignore
DocumentTextExtractorService.cs		DocumentTextExtractorService.cs
LICENSE		LICENSE
Main.Designer.cs		Main.Designer.cs
Main.cs		Main.cs
Main.resx		Main.resx
OCRService.cs		OCRService.cs
OCRTextReader.csproj		OCRTextReader.csproj
OCRTextReader.csproj.user		OCRTextReader.csproj.user
OCRTextReader.sln		OCRTextReader.sln
Program.cs		Program.cs
README.md		README.md
UpgradeLog.htm		UpgradeLog.htm
WordExportService.cs		WordExportService.cs
icons8_ocr.ico		icons8_ocr.ico

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCRTextReader

Features

Download

Prerequisites

Installing Tesseract OCR

Option 1: Using Installer (Recommended for Windows)

Option 2: Using Chocolatey

Option 3: Manual Installation

Building the Application

Usage

Project Structure

Dependencies

Troubleshooting

"OCR processing failed" Error

"No text could be extracted"

Notes

Third-Party Libraries

Screenshot

Icons by Icons8

About

Uh oh!

Releases 1

Packages

Languages

License

seizue/OCRTextReader

Folders and files

Latest commit

History

Repository files navigation

OCRTextReader

Features

Download

Prerequisites

Installing Tesseract OCR

Option 1: Using Installer (Recommended for Windows)

Option 2: Using Chocolatey

Option 3: Manual Installation

Building the Application

Usage

Project Structure

Dependencies

Troubleshooting

"OCR processing failed" Error

"No text could be extracted"

Notes

Third-Party Libraries

Screenshot

Icons by Icons8

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages