A powerful text processing tool that combines PDF summarization, text generation, and NLP analysis using state-of-the-art transformer models.
-
PDF Document Processing
- Automatic text extraction
- Length-adaptive summarization
- Smart chunking for large documents
-
Text Enhancement Options
- Basic summarization
- Detailed expansion
- Question generation
- Executive summary creation
- Custom content generation
- NLP analysis
-
NLP Analysis Capabilities
- Sentiment analysis
- Named Entity Recognition (NER)
- Keyword extraction using TF-IDF
- Document classification
- Readability metrics
- Flesch-Kincaid score calculation
-
AI Text Generation
- Custom prompt-based generation
- Adjustable output length
- Optional NLP analysis integration
- Python 3.8 or higher
- CUDA-compatible GPU (optional, for faster processing)
- 8GB RAM minimum (16GB recommended)
- 10GB free disk space for models
transformers==4.30.0 or higher
torch==2.0.0 or higher
gradio==3.40.0 or higher
PyPDF2==3.0.0 or higher
spacy==3.5.0 or higher
nltk==3.8.0 or higher
scikit-learn==1.0.0 or higher
numpy==1.24.0 or higher- Clone the repository:
git clone https://github.com/SauRavRwT/TextGenerator.git
cd TextGenerator- Create and activate virtual environment:
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt-
Create a new Colab notebook
-
Install required packages:
!pip install gradio flask-ngrok PyPDF2- Copy the TextGenerator.py code into a code cell
python TextGenerator.pyThe interface will be available at http://localhost:7860
Execute the cell containing the TextGenerator code. Gradio will provide a public URL for accessing the interface.
The tool uses several pre-trained models:
- BART (facebook/bart-large-cnn) for summarization
- GPT-2 Medium for text generation
- DistilBERT for sentiment analysis
- BERT for named entity recognition
- BART Large MNLI for text classification
Models are automatically downloaded on first use.
- GPU Mode: ~6GB VRAM recommended
- CPU Mode: ~4GB RAM minimum
- Additional memory required for processing large PDFs
- First run will download required models (~5GB total)
- GPU acceleration automatically enabled if available
- Processing time varies based on input length and selected features
- Large PDFs are automatically chunked for efficient processing
- Maximum input text length: 1024 tokens per chunk
- PDF processing limited by available RAM
- Some features may be slower without GPU acceleration
- Support for English language only
Feel free to open issues or submit pull requests for improvements and bug fixes.

