This repository contains the findings and methodology for the research paper:
[AGENT-BASED GPT: A COMPARATIVE STUDY OF METAGPT AND AUTOGEN]
This project provides a detailed analysis and comparison of two open-source multi-agent frameworks: MetaGPT and AutoGen.
We investigated their capabilities by integrating them with open-source Large Language Models (LLMs) like CodeLLaMA and Mistral, and compared their performance against larger models like GPT-4.
The study focuses on:
- Software requirements and installation procedures
- Output consistency across frameworks
- Practical use cases such as automated game generation
The main goal was to determine whether open-source models provide a viable alternative to expensive, proprietary AI models for complex tasks.
-
Performance with Open-Source LLMs
- Both MetaGPT and AutoGen struggled to complete complex code-generation tasks (e.g., creating a Snake game).
- Issues included timeout errors, repetitive text instead of code, and memory allocation warnings.
-
Performance with GPT-4
- With GPT-4 (via Azure OpenAI Service), MetaGPT successfully and consistently generated complete playable games (Snake, 2048, Brick Breaker).
-
Resource Requirements
- MetaGPT is more resource-intensive, ideally requiring 32 GB of RAM and a powerful processor for large local models.
- AutoGen is lighter but still needs significant resources.
-
Framework Approach
- MetaGPT: Structured meta-programming workflow modeled after human software engineering (with SOPs).
- AutoGen: Conversation-based multi-agent workflows with customizable agents.
- Python: 3.9 or higher (tested with 3.11.4)
- RAM: 16 GB minimum; 32 GB recommended for large models
- Processor: Strong CPU recommended for MetaGPT
- OS: For Ollama on Windows → WSL required
conda create -n metagpt python=3.11.4 conda activate metagpt
conda create -n autogen python=3.11.4 conda activate autogen
pip install metagpt pip install pyautogen
curl https://ollama.ai/install.sh | sh ollama serve ollama pull codellama ollama pull mistral
pip install litellm pip install async_generator # if needed
litellm --model ollama/codellama
Edit config.yaml: OPENAI_API_KEY: "sk-fake-key" OPENAI_API_BASE: "http://0.0.0.0:8000"
Example Python configuration: import autogen
config_list_codellama = [ { "base_url": "http://0.0.0.0:8000", "api_key": "NULL", "model": "codellama" } ]
coder = autogen.AssistantAgent( name="Coder", llm_config={"config_list": config_list_codellama} )
This project explored how different AI frameworks and models perform on the task of end-to-end game generation from a single prompt.
- Prompt:
"write a CLI snake game based on pygame" - Outcome: ❌ Failure
- Details:
- Frameworks produced repetitive text outputs (product goals, user stories, competitive analysis) instead of executable code.
- Long runtimes with no usable results.
- Ollama server encountered memory allocation errors even on a 16 GB RAM system.
- Prompt Examples:
"Create a complete 2048 game based on pygame""Write a brick breaker based on pygame"
- Outcome: ✅ Success
- Details:
- MetaGPT, configured with GPT-4, generated full, modular, and playable source code.
- Games included Snake, 2048, and Brick Breaker, each runnable directly after code generation.
- Output quality was robust, with structured files, comments, and reusable functions.
- Open-source LLMs (CodeLLaMA, Mistral) struggled with complex multi-step generation like building playable games.
- GPT-4 demonstrated reliable code synthesis and framework orchestration, producing end-to-end working solutions.
For a complete and detailed overview of the methodology, analysis, and results, please read the full PDF repor!