Skip to content

Conversation

@ma-hang
Copy link

@ma-hang ma-hang commented Jan 14, 2026

9G7B 单卡 max_batch_size=32:
e73679451a5e665aace0317ed4e6546f

9G7B 4卡 max_batch_size=32:
d77ca5b3b22ccccf91360ce278067378

@ma-hang ma-hang requested review from a team, PanZezhong1725 and whjthu January 14, 2026 15:58
@ma-hang ma-hang linked an issue Jan 15, 2026 that may be closed by this pull request
def start(self):
app = self._create_app()
logger.info("Starting API Server...")
uvicorn.run(app, host="0.0.0.0", port=8000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把port做一个脚本参数,默认8000


def _create_app(self):
@asynccontextmanager
async def lifespan(app: FastAPI):
Copy link
Collaborator

@PanZezhong1725 PanZezhong1725 Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请模仿vllm.LLM或sglang.Engine对这部分逻辑进行封装,server的脚本尽量只传递request,不要出现这种业务逻辑相关的代码
这个封装应该提供两种用法:

  1. 面向网络服务,异步流式输出,即server脚本的用法
  2. 面向单机使用,提供batch generate的方法,用户可同时传入多个请求,返回所有结果。目前需要这个的使用场景就是让ceval测试可以批量化进行

请把当前的core/目录按照封装重新命名,封装的代码也放到里面

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] InfiniLM添加推理服务

3 participants