Skip to content

A neural multi-armed bandit framework for routing prompts to the most suitable LLM in a multi-agent system.

Notifications You must be signed in to change notification settings

arnobock/llm-routing-to-expert-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-Routing between Expert Agents in Multi-Agent Systems (MAS)


About

As LLM ecosystems grow (GPT, Claude, Llama, Mixtral, etc.), choosing the right model for the right prompt becomes an optimization problem. Querying all models is expensive; querying only one model is risky.

This project contains two parts:

  1. An academic literature review, summarizing current research on LLM routing and decision-making under uncertainty (see /report)
  2. An implementation and evaluation of a promising routing strategy (see /src)

As promising solution research pointed towards multi-armed bandits (MAB) which strive in dynamic, exploratory environments. Lin et al. [1] proposed a neural MAB which determines its routing decision through an ensemble of two neural networks.

We tested their approach since their neural bandits offer:

  • Fast online learning
  • Low feedback requirements
  • Adaptation to prompt distributions
  • Scalability to many LLMs

This project demonstrates that even simple neural routing models learn robust selection strategies that outperform random baselines and achieve a higher average correctness score compared to asking a single strong model.

To further improve performance in static or semi-static settings, we augment their method with ε-greedy exploration, enabling the router to make more informed decisions by acquiring a broader understanding of the surrounding action space.


[1] S. Lin, Y. Yao, P. Zhang, H. Y. Noh, and C. Joe-Wong, “A neural-based bandit approach to mobile crowdsourcing,” in Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications, ser. HotMobile ’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 15–21. [Online]. Available: https://doi.org/10.1145/3508396.3512886


Prerequisites

Before starting you need to ensure you have the following on your machine:

  • GitHub repository: Clone the repository to your machine.
  • Python: The project was tested with Python 3.12

Set-Up & Run

  1. Navigate to the root of the repository
  2. Create a new virtual environment (name it .venv), using the requirements.txt file and python3.12.
  3. Execute one of the provided training script:
    • Basic online training: compares a random LLM routing strategy with the neural MAB
    • Online training with ε-greedy exploration: compares random routing, standard Neural MAB, and Neural MAB augmented with an initial exploration phase

About

A neural multi-armed bandit framework for routing prompts to the most suitable LLM in a multi-agent system.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages