This project is a proof-of-concept (PoC) for an AI-powered tool designed to demonstrate the capabilities of function calling using Large Language Models (LLMs). It aims to assist researchers and professionals in analyzing biodiversity datasets, specifically focusing on mammal occurrences (as of now). You can ask questions in natural language, and the tool will provide summarized results and suggest chart visualizations whenever possible.
Zoogist_demo.mp4
- Submit a question using the text input or select a demo query.
- After submitting, the LLM powered assistant will give a response, and also suggest charts to visualise it.
- Use the container in the sidebar to plot charts through given drop-down selections for x, y axes, chart type, and color.
- You can also select values from the habitat column and generate specific charts.
Here's a breakdown of how Zoogist Insights processes your queries and delivers data insights:
utils.py:
- First, it loads and preprocesses the dataset through the
load_and_preprocess_datafunction. - Second, it includes the
execute_sql_queryfunction (defined as a tool) to execute the SQL queries generated by the LLM & retrieve data from the dataset. Further, it utilizes an in-memory (temporary)sqlite3database for executing the queries and then returns the results of SQL execution as a list of dictionaries. - Third, it sets up the LLM agent powered by the
llama-3.1-8b-instantmodel through the Groq API for data analysis, and summarization with specific instructions about the data, all this through LangChain framework.- This language model has been used for its balance between speed and text generation quality, and has worked well during PoC development for summarizing insights.
- The
execute_sql_queryfunction serves as a tool for the LLM agent to query the database by understanding when to call the function and passing the required query string in the specified format for analysis.
app.py:
- The main script demonstrates how the LLM agent works through a user-friendly web interface using Streamlit.
- It loads the
01-mammals-data-final.csvdataset, and utilizes the functions fromutils.pyto perform data analysis.- The LLM agent can handle your biodiversity-related questions and provide suggestions for plotting relevant charts.
- Chart plotting is tackled separately, enabling you to tweak parameters and unlock valuable insights from the data.
- Changing the query using the
Run Querybutton will clear the stored charts and generate a new response based on the updated question. - The map is displayed separately with geographical points, where the tooltips show both place and species names.
- Streamlit
- LangChain
- Groq API
- Pandas
- Plotly
This PoC is a demonstration of the potential of AI in biodiversity data analysis and is not intended to be a fully functional or specialized AI tool. It should not be used for professional analysis or critical scientific decisions without proper validation. This tool may have inaccuracies, biases, or limitations based on underlying language model capabilities and data quality. The app is a basic implementation and can be further improved by implementing advanced analytics and agentic AI techniques.
- Original dataset taken from Zenodo presented under NCF public repository.
- Thanks to Yuichiro's Streamlit Theme Editor that helped me find the suitable app's theme :)