Add run_async to LlamaCppChatGenerator. by kudos07 · Pull Request #2821 · deepset-ai/haystack-core-integrations

kudos07 · 2026-02-09T06:57:51Z

Summary

Add run_async to LlamaCppChatGenerator.

Implement run_async (wraps run() in asyncio.to_thread since llama-cpp-python is synchronous).
Add async unit tests and an optional integration test.
Add pytest-asyncio config and a CHANGELOG entry.

Related Issues

fixes add run_async for LlamaCppChatGenerator #1890

Proposed Changes

New run_async method on LlamaCppChatGenerator with identical signature to run.
Unit tests:
- TestLlamaCppChatGeneratorAsync::test_run_async
- TestLlamaCppChatGeneratorAsync::test_run_async_with_params
- TestLlamaCppChatGeneratorAsync::test_run_async_with_empty_message
Optional integration test test_live_run_async (marked integration).
PyTest config updated to enable asyncio mode.
CHANGELOG updated with an Unreleased note.

How did you test it?

Created local venv and installed dependencies (used CPU wheels for llama-cpp-python on Windows).
Ran unit tests (mocked) locally:
- python -m pytest tests/test_chat_generator.py::TestLlamaCppChatGeneratorAsync -v -m "not integration" — all async unit tests passed.
Integration test included but may be skipped locally (downloads a GGUF model). CI will run full integration tests.

Notes for the reviewer

Implementation is intentionally minimal and consistent with existing patterns (Fallback uses asyncio.to_thread).
Streaming behavior is unchanged; run_async currently uses the same streaming semantics as run (streaming callback runs from the thread). If the project prefers a queue-based async streaming bridge (like HF local), that can be implemented in a follow-up.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used an appropriate conventional commit type for the PR title (e.g., feat:)

- Implement run_async (wraps run() in asyncio.to_thread). - Add async unit tests and optional integration test. - Add pytest-asyncio config and CHANGELOG entry.

anakin87 · 2026-02-09T18:09:02Z

@kudos07 thank you for the contribution!

I'll take a look in the next few days...

anakin87

I left some comments on possible improvements...

anakin87 · 2026-02-10T12:15:00Z

integrations/llama_cpp/CHANGELOG.md

+### 🚀 Features
+
+- Add `run_async` to `LlamaCppChatGenerator` for AsyncPipeline support
+


this file is automatically generated at release time, so please remove the addition

anakin87 · 2026-02-10T12:18:21Z

integrations/llama_cpp/tests/test_chat_generator.py

+        return generator
+
+    @pytest.mark.integration
+    async def test_live_run_async(self, generator):


Suggested change

async def test_live_run_async(self, generator):

@pytest.mark.parametrize("streaming_callback", [None, print_streaming_chunk])

async def test_live_run_async(self, generator):

let's also verify that async+streaming works

anakin87 · 2026-02-10T12:22:22Z

...s/llama_cpp/src/haystack_integrations/components/generators/llama_cpp/chat/chat_generator.py

+        :returns: A dictionary with the following keys:
+            - `replies`: The responses from the model
+        """
+        return await asyncio.to_thread(


my impression is that since llama.cpp python is not thread-safe (abetlen/llama-cpp-python#951), this could be problematic.

A simple idea to fix this is the following:

Lock in __init__ (with an explanatory comment)
self._inference_lock = asyncio.Lock()

Use the lock in run_async

async with self._inference_lock: return await asyncio.to_thread(self.run, ...)

Ofc, this means only performing one generation at a time in case of multiple requests but it is thread-safe and exposes an async interface.

Add run_async to LlamaCppChatGenerator.

fe47e2a

- Implement run_async (wraps run() in asyncio.to_thread). - Add async unit tests and optional integration test. - Add pytest-asyncio config and CHANGELOG entry.

kudos07 requested a review from a team as a code owner February 9, 2026 06:57

kudos07 requested review from anakin87 and removed request for a team February 9, 2026 06:57

github-actions bot added integration:llama_cpp type:documentation Improvements or additions to documentation labels Feb 9, 2026

kudos07 added 2 commits February 8, 2026 23:19

Update async tests for LlamaCppChatGenerator

dc02cdc

Fix ruff UP034 in async chat generator tests

8607909

anakin87 requested changes Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add run_async to LlamaCppChatGenerator.#2821

Add run_async to LlamaCppChatGenerator.#2821
kudos07 wants to merge 3 commits intodeepset-ai:mainfrom
kudos07:feat/llamacpp-add-run-async

kudos07 commented Feb 9, 2026

Uh oh!

anakin87 commented Feb 9, 2026

Uh oh!

anakin87 left a comment

Uh oh!

anakin87 Feb 10, 2026

Uh oh!

anakin87 Feb 10, 2026

Uh oh!

anakin87 Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		### 🚀 Features

		- Add `run_async` to `LlamaCppChatGenerator` for AsyncPipeline support

	async def test_live_run_async(self, generator):
	@pytest.mark.parametrize("streaming_callback", [None, print_streaming_chunk])
	async def test_live_run_async(self, generator):

Conversation

kudos07 commented Feb 9, 2026

Summary

Related Issues

Proposed Changes

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

anakin87 commented Feb 9, 2026

Uh oh!

anakin87 left a comment

Choose a reason for hiding this comment

Uh oh!

anakin87 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

anakin87 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

anakin87 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants