This repository contains my submission for the Voice AI Startup Assignment.
The project is built in Google Colab and analyzes sales call recordings to extract useful insights.
- β Talk-time ratio (percentage each person spoke)
- β Number of questions asked
- β Longest monologue duration
- β Call sentiment (positive / negative / neutral)
- β One actionable insight for improvement
- π― Bonus: Speaker diarization (identify Sales Rep vs Customer)
- Python
- Google Colab
- OpenAI Whisper β Speech-to-text
- HuggingFace Transformers β Sentiment analysis
- Pyannote / WhisperX β Speaker diarization
- yt-dlp β Extract audio from YouTube
My approach uses speech-to-text + text analysis.
I first extract the call audio and transcribe it using Whisper, which handles poor-quality audio.
Using timestamps, I calculate talk-time ratio and longest monologue. Questions are counted by detecting ? and interrogatives. Sentiment is identified with HuggingFace transformers. Finally, I generate an actionable insight to improve sales interactions.
For the bonus task, I used speaker diarization with Pyannote/WhisperX to differentiate between the sales rep and the customer.
The system runs under 30 seconds on the free Colab tier.
π¦ Call_Quality_Analyzer β£ π Call_Quality_Analyzer.ipynb # Main Colab notebook β£ π README.md # Project documentation
- Open the notebook in Google Colab
- Run all cells in order (install β import β download audio β transcription β analysis)
- Results will be printed at the end
- Assignment test file: YouTube Call Recording
Vimal Anand