Skip to content

[BUG] MCP session cleanup fails with cancel-scope error when agents call other agents with MCP tools #1276

@Daniel-Vaz

Description

@Daniel-Vaz

📋 Prerequisites

  • I have searched the existing issues to avoid creating a duplicate
  • By submitting this issue, you agree to follow our Code of Conduct
  • I am using the latest version of the software
  • I have tried to clear cache/cookies or used incognito mode (if ui-related)
  • I can consistently reproduce this issue

🎯 Affected Service(s)

Multiple services / System-wide issue

🚦 Impact/Severity

Blocker

🐛 Bug Description

When using kagent 0.7.13 in a Kubernetes environment with a multi-agent setup (an orchestrator agent invoking other agents that use MCP tools), kagent intermittently crashes during MCP session cleanup.

The failure manifests as:

Warning: Error during MCP session cleanup for session_no_headers:
Attempted to exit a cancel scope that isn't the current task's current cancel scope

followed by a CancelledError from an asyncio queue during event stream shutdown, ultimately resulting in a 500 Internal Server Error from the kagent API.

The problem appears when:

  • An orchestrator agent delegates work to another agent that uses MCP tools, and
  • The tool call completes (or is cancelled), triggering MCP session teardown.

This leaves me with a impression that this is a bug in how MCP cancel scopes / task lifecycles are managed during cleanup, likely when nested agents and MCP tools are involved.

This results in seeing Agents calls in the UI that just return a empty response:

{"result":""}

Any tips on how this could be resolved would be greatly appreciated.

🔄 Steps To Reproduce

  1. Deploy kagent v0.7.13 on a kubeadm Kubernetes cluster (v1.34.3).
  2. Configure Azure OpenAI gpt-5-mini as the default model for all agents.
  3. Create an orchestrator agent that calls another agent as a tool:
apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: orchestrator-agent
spec:
  description: "(...)"
  type: Declarative
  declarative:
    modelConfig: default-model-config
    systemMessage: |
      (...)
    tools:
    - type: Agent
      agent:
        name: discovery-agent
  1. Create a secondary agent that uses an MCP server:
apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: discovery-agent
spec:
  description: "(...)"
  type: Declarative
  declarative:
    a2aConfig:
      skills:
        (...)
    modelConfig: default-model-config
    systemMessage: |
      (...)
    tools:
    - type: McpServer
      mcpServer:
        apiGroup: kagent.dev
        kind: RemoteMCPServer
        name: grafana-mcpserver
        toolNames:
        - list_prometheus_metric_names
        - list_prometheus_metric_metadata
        - list_prometheus_label_names
        - list_prometheus_label_values
  1. Issue a request to the orchestrator agent that causes it to call the discovery-agent, which then calls one of the Grafana MCP tools.
  2. Observe logs during tool execution.

🤔 Expected Behavior

  • MCP sessions should close cleanly after tool execution.
  • No warnings about cancel scopes.

📱 Actual Behavior

  • kagent logs emit:
Warning: Error during MCP session cleanup for session_no_headers:
Attempted to exit a cancel scope that isn't the current task's current cancel scope
  • This is followed by:
asyncio.exceptions.CancelledError: Cancelled by cancel scope ...
  • The HTTP request ultimately fails with:
500 Internal Server Error
  • The crash happens during event queue shutdown inside MCP cleanup (event_queue.close()), indicating incorrect handling of cancel scopes across tasks.

💻 Environment

Component Version / Details
kagent 0.7.13
Kubernetes kubeadm v1.34.3
Cloud / Infra Self-managed k8s cluster
LLM provider Azure OpenAI
Model gpt-5-mini
MCP Servers Used kagent-tools + Grafana MCP Server
Grafana MCP Server 0.9.0
Agent topology Orchestrator → secondary agent → MCP tools

🔧 CLI Bug Report

No response

🔍 Additional Context

  • The stack trace suggests a mismatch between where a cancel scope is entered vs. exited during MCP session cleanup.
  • Similar issues have been reported in other MCP-based projects involving cancel-scope lifecycles during teardown (e.g., mcp-agent and ADK Python).
  • This may indicate a deeper issue in how kagent integrates MCP session management with asyncio task groups.

📋 Logs

Warning: Error during MCP session cleanup for session_no_headers: Attempted to exit a cancel scope that isn't the current tasks's current cancel scope
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/.kagent/.venv/lib/python3.13/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        self.scope, self.receive, self.send
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/.kagent/.venv/lib/python3.13/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.kagent/.venv/lib/python3.13/site-packages/fastapi/applications.py", line 1139, in __call__
INFO:     172.17.103.57:37580 - "POST / HTTP/1.1" 500 Internal Server Error
    await super().__call__(scope, receive, send)
  File "/.kagent/.venv/lib/python3.13/site-packages/starlette/applications.py", line 107, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/.kagent/.venv/lib/python3.13/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/.kagent/.venv/lib/python3.13/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 795, in __call__
    await self.app(scope, otel_receive, otel_send)
  File "/.kagent/.venv/lib/python3.13/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/.kagent/.venv/lib/python3.13/site-packages/opentelemetry/instrumentation/fastapi/__init__.py", line 307, in __call__
    await self.app(scope, receive, send)
  File "/.kagent/.venv/lib/python3.13/site-packages/starlette/middleware/exceptions.py", line 63, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/.kagent/.venv/lib/python3.13/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/.kagent/.venv/lib/python3.13/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/.kagent/.venv/lib/python3.13/site-packages/starlette/routing.py", line 716, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/.kagent/.venv/lib/python3.13/site-packages/starlette/routing.py", line 736, in app
    await route.handle(scope, receive, send)
  File "/.kagent/.venv/lib/python3.13/site-packages/starlette/routing.py", line 290, in handle
    await self.app(scope, receive, send)
  File "/.kagent/.venv/lib/python3.13/site-packages/fastapi/routing.py", line 119, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/.kagent/.venv/lib/python3.13/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/.kagent/.venv/lib/python3.13/site-packages/fastapi/routing.py", line 105, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/.kagent/.venv/lib/python3.13/site-packages/fastapi/routing.py", line 385, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/.kagent/.venv/lib/python3.13/site-packages/fastapi/routing.py", line 284, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/server/apps/jsonrpc/jsonrpc_app.py", line 368, in _handle_requests
    return await self._process_non_streaming_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        request_id, a2a_request, call_context
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/server/apps/jsonrpc/jsonrpc_app.py", line 448, in _process_non_streaming_request
    handler_result = await self.handler.on_message_send(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        request_obj, context
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/utils/telemetry.py", line 196, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/server/request_handlers/jsonrpc_handler.py", line 106, in on_message_send
    task_or_message = await self.request_handler.on_message_send(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        request.params, context
        ^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/utils/telemetry.py", line 196, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/server/request_handlers/default_request_handler.py", line 342, in on_message_send
    await self._cleanup_producer(producer_task, task_id)
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/utils/telemetry.py", line 196, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/server/request_handlers/default_request_handler.py", line 438, in _cleanup_producer
    await producer_task
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/utils/telemetry.py", line 196, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/server/request_handlers/default_request_handler.py", line 197, in _run_event_stream
    await queue.close()
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/utils/telemetry.py", line 196, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.kagent/.venv/lib/python3.13/site-packages/a2a/server/events/event_queue.py", line 175, in close
    await asyncio.gather(
        self.queue.join(), *(child.close() for child in self._children)
    )
  File "/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/asyncio/queues.py", line 239, in join
    async def join(self):
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f3e5ba34c00

📷 Screenshots

Image

🙋 Are you willing to contribute?

  • I am willing to submit a PR to fix this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions