Hacker News new | past | comments | ask | show | jobs | submit login

Has anyone done a comparison of combined speech to text and TTS vs speech-to-speech for create audio only interfaces? Particularly curious around latency, and quality of audio output.





Hugging Face has got a TTS leaderboard (arena like lmsys) - https://huggingface.co/spaces/TTS-AGI/TTS-Arena




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: