Skip to content

fix: increase session supervisor resilience for cloud STT models#3583

Open
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1770092793-fix-cloud-stt-meltdown
Open

fix: increase session supervisor resilience for cloud STT models#3583
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1770092793-fix-cloud-stt-meltdown

Conversation

@devin-ai-integration
Copy link
Contributor

Summary

Fixes the "Session failed: Meltdown { reason: 'max_restarts exceeded' }" error that users experience when using cloud STT models (Deepgram, AssemblyAI, etc.) with temporary connectivity issues.

Changes:

  • Increased max_restarts from 3 to 15 to handle transient cloud service failures
  • Increased max_window from 15s to 60s to allow more recovery time
  • Added exponential backoff for listener retries: 500ms → 1s → 2s → 4s → 8s (capped at 10s)

The previous settings were too aggressive for cloud services, causing sessions to fail after just 3 connection issues within 15 seconds. The new settings align better with the local-stt plugin's approach (which uses 100 restarts in 180s).

Review & Testing Checklist for Human

  • Verify the exponential backoff formula is correct: (500 * 2^(count-1)).min(10000) for counts 1-5+
  • Test with a cloud STT model during poor network conditions to confirm sessions recover instead of failing
  • Confirm that 60 seconds is an acceptable maximum wait time before showing a persistent failure to users
  • Test that local STT models are unaffected by this change

Recommended test plan: Start a recording session with a cloud model, then temporarily disable network connectivity for 10-20 seconds. The session should recover when connectivity returns instead of showing "Session failed: Meltdown".

Notes

Link to Devin run: https://app.devin.ai/sessions/0e124d772936441e8647f42d920d510b
Requested by: @ComputelessComputer

- Increase max_restarts from 3 to 15 to handle transient cloud service failures
- Increase max_window from 15s to 60s to allow more recovery time
- Add exponential backoff (500ms, 1s, 2s, 4s, 8s, capped at 10s) for listener retries

This fixes the 'Session failed: Meltdown { reason: max_restarts exceeded }' error
that users experience when using cloud STT models with temporary connectivity issues.

Co-Authored-By: john@hyprnote.com <john@hyprnote.com>
@netlify
Copy link

netlify bot commented Feb 3, 2026

Deploy Preview for hyprnote canceled.

Name Link
🔨 Latest commit 814cd71
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/69817b455d607000073d298f

@netlify
Copy link

netlify bot commented Feb 3, 2026

Deploy Preview for hyprnote-storybook canceled.

Name Link
🔨 Latest commit 814cd71
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/69817b45a6f83f0008ecf0bc

@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant