audio - define pronunciation starting time for each word in script -
i have text script used create podcasts. words in podcast audio same in text. want have following:
word in text | pronounciation started @ hello 0:0:0.000 0:0:1.125 friends 0:0:2.750
is possible @ all? in advance!
one of key words start approach complexity of problem "forced alignment". site covers questions regarding topic e.g. here leads questions , answers concerning htk (the hidden markov model toolkit) via releated threads.
you can find more hands-on style description of how use forced alignment in automated audio segmentation here.
so answer is: yes, possible, algorithmically complex , in best implementations not error-free.
ps.: found simple tool
Comments
Post a Comment