Skip to content

A powerful text processing tool that combines PDF summarization, text generation, and NLP analysis using state-of-the-art transformer models.

Notifications You must be signed in to change notification settings

SauRavRwT/TextGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TextGenerator: Advanced Text Analysis & Generation Tool

A powerful text processing tool that combines PDF summarization, text generation, and NLP analysis using state-of-the-art transformer models.

Key Features

  • PDF Document Processing

    • Automatic text extraction
    • Length-adaptive summarization
    • Smart chunking for large documents
  • Text Enhancement Options

    • Basic summarization
    • Detailed expansion
    • Question generation
    • Executive summary creation
    • Custom content generation
    • NLP analysis
  • NLP Analysis Capabilities

    • Sentiment analysis
    • Named Entity Recognition (NER)
    • Keyword extraction using TF-IDF
    • Document classification
    • Readability metrics
    • Flesch-Kincaid score calculation
  • AI Text Generation

    • Custom prompt-based generation
    • Adjustable output length
    • Optional NLP analysis integration

Prerequisites

Local Installation

  • Python 3.8 or higher
  • CUDA-compatible GPU (optional, for faster processing)
  • 8GB RAM minimum (16GB recommended)
  • 10GB free disk space for models

Required Python Packages

transformers==4.30.0 or higher
torch==2.0.0 or higher
gradio==3.40.0 or higher
PyPDF2==3.0.0 or higher
spacy==3.5.0 or higher
nltk==3.8.0 or higher
scikit-learn==1.0.0 or higher
numpy==1.24.0 or higher

Installation

Local Setup

  1. Clone the repository:
git clone https://github.com/SauRavRwT/TextGenerator.git
cd TextGenerator
  1. Create and activate virtual environment:
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt

Google Colab Setup

  1. Create a new Colab notebook

  2. Install required packages:

!pip install gradio flask-ngrok PyPDF2
  1. Copy the TextGenerator.py code into a code cell

Usage

Local Run

python TextGenerator.py

The interface will be available at http://localhost:7860

Colab Run

Execute the cell containing the TextGenerator code. Gradio will provide a public URL for accessing the interface.

Model Information

The tool uses several pre-trained models:

  • BART (facebook/bart-large-cnn) for summarization
  • GPT-2 Medium for text generation
  • DistilBERT for sentiment analysis
  • BERT for named entity recognition
  • BART Large MNLI for text classification

Models are automatically downloaded on first use.

Memory Requirements

  • GPU Mode: ~6GB VRAM recommended
  • CPU Mode: ~4GB RAM minimum
  • Additional memory required for processing large PDFs

Performance Notes

  • First run will download required models (~5GB total)
  • GPU acceleration automatically enabled if available
  • Processing time varies based on input length and selected features
  • Large PDFs are automatically chunked for efficient processing

Limitations

  • Maximum input text length: 1024 tokens per chunk
  • PDF processing limited by available RAM
  • Some features may be slower without GPU acceleration
  • Support for English language only

Contributing

Feel free to open issues or submit pull requests for improvements and bug fixes.

About

A powerful text processing tool that combines PDF summarization, text generation, and NLP analysis using state-of-the-art transformer models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published