audio - define pronunciation starting time for each word in script -


i have text script used create podcasts. words in podcast audio same in text. want have following:

word in text | pronounciation started @ hello          0:0:0.000             0:0:1.125 friends        0:0:2.750 

is possible @ all? in advance!

one of key words start approach complexity of problem "forced alignment". site covers questions regarding topic e.g. here leads questions , answers concerning htk (the hidden markov model toolkit) via releated threads.

you can find more hands-on style description of how use forced alignment in automated audio segmentation here.

so answer is: yes, possible, algorithmically complex , in best implementations not error-free.

ps.: found simple tool


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -