Hey HN!
I love watching YouTube with my 7-year-old daughter. Unfortunately, the best stuff is often in English (we're German). So I made an AI tool that translates videos directly, using the original voices. All other sounds, as well as background music, are preserved, too.
Turns out that it works for many other language pairs, too. So far, it can create dubs in English, Mandarin Chinese, Spanish, Arabic, French, Russian, German, Italian, Korean, Polish and Dutch.
The main challenge in building this was to get the balance right between translating the original meaning and getting the timing right. Especially for language pairs like English -> German, where the target ist often longer than the source ("bat" -> "Fle-der-maus", "speed" -> "Ge-schwin-dig-keit").
Let me know what you think! :)
https://haonowshaokao.com/2013/05/18/does-dubbing-tv-harm-la...
Edit: I forgot to mention that the samples on the website is impressive and well made. How do you do the speaker diarization and voice cloning?