As LLM ecosystems grow (GPT, Claude, Llama, Mixtral, etc.), choosing the right model for the right prompt becomes an optimization problem.
Querying all models is expensive; querying only one model is risky.
This project contains two parts:
- An academic literature review, summarizing current research on LLM routing and decision-making under uncertainty (see
/report) - An implementation and evaluation of a promising routing strategy (see
/src)
As promising solution research pointed towards multi-armed bandits (MAB) which strive in dynamic, exploratory
environments. Lin et al. [1] proposed a neural MAB which determines its routing decision through an ensemble of two neural networks.
We tested their approach since their neural bandits offer:
- Fast online learning
- Low feedback requirements
- Adaptation to prompt distributions
- Scalability to many LLMs
This project demonstrates that even simple neural routing models learn robust selection strategies that outperform random baselines
and achieve a higher average correctness score compared to asking a single strong model.
To further improve performance in static or semi-static settings, we augment their method with ε-greedy exploration, enabling the router to make more informed decisions by acquiring a broader understanding of the surrounding action space.
[1] S. Lin, Y. Yao, P. Zhang, H. Y. Noh, and C. Joe-Wong, “A neural-based bandit approach to mobile crowdsourcing,” in Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications, ser. HotMobile ’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 15–21. [Online]. Available: https://doi.org/10.1145/3508396.3512886
Before starting you need to ensure you have the following on your machine:
- GitHub repository: Clone the repository to your machine.
- Python: The project was tested with Python 3.12
- Navigate to the root of the repository
- Create a new virtual environment (name it
.venv), using therequirements.txtfile andpython3.12. - Execute one of the provided training script:
- Basic online training: compares a random LLM routing strategy with the neural MAB
- Online training with ε-greedy exploration: compares random routing, standard Neural MAB, and Neural MAB augmented with an initial exploration phase
- Basic online training: compares a random LLM routing strategy with the neural MAB