Skip to content

ShruAgarwal/Zoogist-Insights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zoogist Insights 🐘🦦

About 🎯

This project is a proof-of-concept (PoC) for an AI-powered tool designed to demonstrate the capabilities of function calling using Large Language Models (LLMs). It aims to assist researchers and professionals in analyzing biodiversity datasets, specifically focusing on mammal occurrences (as of now). You can ask questions in natural language, and the tool will provide summarized results and suggest chart visualizations whenever possible.

Demo 🕹

Streamlit App

Zoogist_demo.mp4

How To Use 👀

  1. Submit a question using the text input or select a demo query.
  2. After submitting, the LLM powered assistant will give a response, and also suggest charts to visualise it.
  3. Use the container in the sidebar to plot charts through given drop-down selections for x, y axes, chart type, and color.
  4. You can also select values from the habitat column and generate specific charts.

Behind the Scenes ⚙

Here's a breakdown of how Zoogist Insights processes your queries and delivers data insights:

  1. utils.py:
  • First, it loads and preprocesses the dataset through the load_and_preprocess_data function.
  • Second, it includes the execute_sql_query function (defined as a tool) to execute the SQL queries generated by the LLM & retrieve data from the dataset. Further, it utilizes an in-memory (temporary) sqlite3 database for executing the queries and then returns the results of SQL execution as a list of dictionaries.
  • Third, it sets up the LLM agent powered by the llama-3.1-8b-instant model through the Groq API for data analysis, and summarization with specific instructions about the data, all this through LangChain framework.
    • This language model has been used for its balance between speed and text generation quality, and has worked well during PoC development for summarizing insights.
    • The execute_sql_query function serves as a tool for the LLM agent to query the database by understanding when to call the function and passing the required query string in the specified format for analysis.
  1. app.py:
  • The main script demonstrates how the LLM agent works through a user-friendly web interface using Streamlit.
  • It loads the 01-mammals-data-final.csv dataset, and utilizes the functions from utils.py to perform data analysis.
    • The LLM agent can handle your biodiversity-related questions and provide suggestions for plotting relevant charts.
    • Chart plotting is tackled separately, enabling you to tweak parameters and unlock valuable insights from the data.
    • Changing the query using the Run Query button will clear the stored charts and generate a new response based on the updated question.
    • The map is displayed separately with geographical points, where the tooltips show both place and species names.

Tech Stack 🛠

  • Streamlit
  • LangChain
  • Groq API
  • Pandas
  • Plotly

Disclaimer ⚠

This PoC is a demonstration of the potential of AI in biodiversity data analysis and is not intended to be a fully functional or specialized AI tool. It should not be used for professional analysis or critical scientific decisions without proper validation. This tool may have inaccuracies, biases, or limitations based on underlying language model capabilities and data quality. The app is a basic implementation and can be further improved by implementing advanced analytics and agentic AI techniques.

Credits ✨

About

Ask questions about mammal species and get AI-powered summaries with interactive charting options.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages