A fast MVP that allows internal team members to search for engineers using natural language queries.
Build a fast MVP that allows internal team members to search for engineers using natural language queries, such as:
"Looking for backend engineers with NestJS and AWS, 3+ years of experience in fintech."
┌────────────────────────────┐
│ Next.js Frontend │
│ - Query input │
│ - Results list │
└────────────┬───────────────┘
│
▼
┌────────────────────────────┐
│ FastAPI Backend │
│ - /ingest endpoint │
│ - /search endpoint │
└────────────┬───────────────┘
│
┌───────────────┴────────────────────┐
▼ ▼
[Greenhouse API] [Pinecone Vector DB]
- Fetch resumes - Store semantic embeddings
- PDFs + metadata - Search similar profiles
▼ ▲
[OpenAI (GPT-4)] [OpenAI Embeddings]
- Parse resume text to JSON - Convert query into vector
- Extract skills, years, domains
│
▼
[PostgreSQL Database]
- Store resume text + structured metadata
| Layer | Tool | Why |
|---|---|---|
| Frontend | Next.js | Fast React-based framework, easy to deploy |
| Backend | FastAPI | Simple, async Python API layer |
| Resume Source | Greenhouse API | Resume and candidate data |
| Resume Parsing | pdfplumber + OpenAI GPT-4 | Easy, accurate text extraction and structuring |
| Embedding | OpenAI text-embedding-ada-002 |
Reliable and powerful for semantic search |
| Vector DB | Pinecone | Hosted, scalable vector database |
| Relational DB | PostgreSQL | Stores resume text and structured data |
| Deployment | Docker | For clean reproducible local/in-cloud deployment |
- Python 3.8+
- Node.js 18+
- Docker and Docker Compose
- PostgreSQL
- Pinecone account
- OpenAI API key
- Greenhouse API access
- Clone the repository
- Copy
.env.exampleto.envand fill in your API keys - Run the setup scripts
# Backend setup
cd backend
pip install -r requirements.txt
# Frontend setup
cd frontend
npm install
# Database setup
docker-compose up -d postgres# Start backend
cd backend
uvicorn main:app --reload
# Start frontend
cd frontend
npm run devpipelinepal/
├── backend/ # FastAPI backend
│ ├── app/
│ │ ├── api/ # API routes
│ │ ├── core/ # Configuration and utilities
│ │ ├── models/ # Database models
│ │ ├── services/ # Business logic
│ │ └── utils/ # Helper functions
│ ├── requirements.txt
│ └── main.py
├── frontend/ # Next.js frontend
│ ├── components/ # React components
│ ├── pages/ # Next.js pages
│ ├── styles/ # CSS styles
│ └── package.json
├── docker-compose.yml # Database services
└── README.md
- Pull resumes from Greenhouse API (PDFs + candidate metadata)
- Convert PDFs to text using
pdfplumber - Use OpenAI (GPT-4) to extract skills, job titles, companies, domains, years of experience
- Save structured data + full text in PostgreSQL
- Generate embeddings using OpenAI Embeddings
- Store embeddings in Pinecone for semantic search
- User submits natural language query via UI
- Query is embedded using OpenAI
- Perform vector search in Pinecone
- Retrieve top matching candidates and metadata from PostgreSQL
- Return results to frontend for display
- Resume ingestion from Greenhouse API
- GPT-4-based resume parsing
- Embedding pipeline using OpenAI
- PostgreSQL storage for resumes and metadata
- Pinecone vector storage for semantic search
- Search endpoint with natural language queries
- Next.js frontend with search interface
- Docker deployment setup
POST /ingest- Ingest resumes from GreenhouseGET /search- Search candidates with natural languageGET /candidates- List all candidatesGET /health- Health check
| Week | Focus |
|---|---|
| Week 1 | Resume ingestion, parsing, storage, and embedding |
| Week 2 | Search pipeline, frontend UI, filters, testing, and deployment |
- Using LLMs instead of rule-based NLP allows for rapid prototyping and handles messy resume formats
- MVP favors speed and quality over scale
- Future improvements: add filters, improve ranking, move to local LLM if needed