GitHub Issue

消费级显卡(比如5090/4090等)下的RTF统计

Discovered On Apr 3, 2026
Primary Metric open
原始Pytorch模型下大家大概是多少?
View Raw Thread

Developer & User Discourse

cacard • Apr 3, 2026
生成14秒音频平均1.12秒,RTF = 0.08,不错了。(on 24G VRAM 5090 laptop)
rennyka-107 • Apr 3, 2026
@cacard what's your config? I only got RTF = 0.3 on 3090 and even 5090. (with same num_step=16)
cacard • Apr 3, 2026
> [@cacard](https://github.com/cacard) what's your config? I only got RTF = 0.3 on 3090 and even 5090. (with same num_step=16)

我再测试一下看看
cacard • Apr 3, 2026
344秒时长音频 耗时51秒 RTF=0.15

测试方法:
1)自定义一个http server,仅加载一次 model,后续 http 请求都复用显存的model;
2)随机50个音频clone请求,串行;
3)统计【生成音频总时长】和【总耗时】;
结论:
【共生成344秒时长音频】【 耗时51秒】所以 RTF=0.15

机器: 5090laptop
zhu-han • Apr 4, 2026
For RTF evaluation, with different GPUs, inference steps, batch sizes, and particularly lengths of audio prompts and generated audio, the RTF will be different. Therefore, without aligning the evaluation setup, even identical GPUs can yield highly divergent RTF results.

Anyone interested can refer to our evaluation setup in https://github.com/k2-fsa/OmniVoice/issues/7#issuecomment-4181480657