Describe the bug
Runpod SDK 1.7.12 that attempts to fix a bug in version 1.7.11 does fix the initial bug in local testing, but its still broken in Runpod serverless because its sending every single request to the same workers, leading to massive delay times while the requests are waiting for the worker to become available.
To Reproduce
Steps to reproduce the behavior:
- Create a serverless endpoint that uses Python SDK version 1.7.12.
- Deploy the endpoint with multiple max workers and some active workers.
- Send a bunch of concurrent requests.
- Observe that all requests are being sent to the same worker instead of multiple workers.
Expected behavior
If an endpoint is configured to have multiple max workers and active workers, depending on the queue configuration, new requests should be spread across workers and not all sent to the same worker.
Screenshots
Additional context
Reverting to SDK version 1.7.10 (since 1.7.11 is also broken) resolves the issue.
