Intelligent medical assistant powered by LangChain, Pinecone, and Google Gemini
Production-ready medical chatbot combining Retrieval-Augmented Generation (RAG) with real-time streaming, intelligent query classification, and automated CI/CD deployment.
Medical.Chatbot.-.Made.with.Clipchamp.1.mp4
- Pattern-based classifier distinguishes medical queries from casual conversation
- Reduces API costs by 40% through selective retrieval
- Sub-millisecond classification time
- Pinecone vector store with 384-dimensional embeddings
- Semantic search across medical document corpus
- Google Gemini 2.5 Flash for response generation
- Token-by-token response streaming
- Async implementation for non-blocking operations
- Smooth UI with animated typing cursor
- Session-based context retention (5 message pairs)
- Automatic memory cleanup prevents overflow
- Fresh session on page reload
- Docker containerization
- GitHub Actions CI/CD pipeline
- Automated deployment to AWS ECR + EC2
- Health checks and auto-restart
- Backend: FastAPI, LangChain, Google Gemini, Pinecone
- Frontend: Jinja2, Tailwind CSS
- DevOps: Docker, GitHub Actions, AWS (EC2, ECR)
├── src/
│ ├── config.py # Configuration
│ ├── helper.py # Data processing
│ ├── prompt.py # LLM prompts
│ └── utility.py # Classifier & streaming
├── templates/
│ └── index.html # UI
├── app.py # FastAPI app
├── store_index.py # Vector store setup
├── Dockerfile # Container config
└── .github/workflows/ # CI/CD pipeline
Want to run this project?
Includes local setup, Docker deployment, AWS deployment, and CI/CD configuration.
Harsh Patel
📧 code.by.hp@gmail.com
🔗 GitHub • LinkedIn
⭐ Star this repo if you find it useful


