Metrika članka

  • citati u SCindeksu: 0
  • citati u CrossRef-u:0
  • citati u Google Scholaru:[=>]
  • posete u prethodnih 30 dana:6
  • preuzimanja u prethodnih 30 dana:5
članak: 1 od 1  
Telfor Journal
2017, vol. 9, br. 1, str. 32-37
jezik rada: engleski
vrsta rada: neklasifikovan
doi:10.5937/telfor1701032D


A review of Serbian parametric speech synthesis based on deep neural networks
(naslov ne postoji na srpskom)
Univerzitet u Novom Sadu, Fakultet tehničkih nauka

e-adresa: tijanadelic@uns.ac.rs, secujski@uns.ac.rs, sinisa.suzic@uns.ac.rs

Projekat

Razvoj dijaloških sistema za srpski i druge južnoslovenske jezike (MPNTR - 32035)
EUREKA project DANSPLAT (E!9944)
Central audio library of the University of Novi Sad

Sažetak

(ne postoji na srpskom)
In this paper the research related to the development of a deep neural network based speech synthesizer for the Serbian language, trained on recorded utterances of a single female voice talent, is described. Two separate networks are used for prediction of acoustic features and phonetic segment durations. Through a set of experiments the optimal values of the hyper-parameters of the neural networks are established, and then the influence of the amount of training data on the quality of synthesized speech is examined. The quality is evaluated through objective measures as well as appropriate listening tests. It has been confirmed that 4-layer deep neural networks with 512 units per hidden layer, trained on 3 hours of data, produce speech of very good quality. The results also suggest that a further increase in the amount of training data may contribute to further improvement in quality.

Ključne reči

Reference

Centre for Speech Technology Research http://www.cstr.ed.ac.uk/
Company AlfaNum http://www.alfanum.co.rs/
Delic, T., Secujski, M. (2016) Speech synthesis in Serbian based on artificial neural networks. u: 2016 24th Telecommunications Forum (TELFOR), Institute of Electrical and Electronics Engineers (IEEE), str. 1-4
Mak, R. (2012) Use of global variance in order to improve quality of speech synthesis based on hidden Markov. Novi Sad: FTS, Bachelor thesis
Morise, M., Yokomori, F., Ozawa, K. (2016) WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications. IEICE Transactions on Information and Systems, E99.D(7): 1877-1884
Odell, J. (1995) The use of context in large vocabulary speech recognition. Cambridge Univ, Ph.D. thesis
Pakoci, E. (2012) Speech synthesis based on hidden Markov models for Serbian language. Novi Sad: FTS, Bachelor thesis
Schröder, M. (2009) Expressive Speech Synthesis: Past, Present, and Possible Futures. u: Tao, Jianhua; Tan, Tieniu [ur.] Affective Information Processing, London: Springer Nature, str. 111-126
Secujski, M.S. (2005) Obtaining Prosodic Information from Text in Serbian Language. u: EUROCON 2005 - The International Conference on 'Computer as a Tool', Institute of Electrical and Electronics Engineers (IEEE), str. 1654-1657
Sečujski, M., Delić, V. (2011) Automatic conversion of textual information into speech. Scientific technical information, Vol. XLVI, No. 4, VTI, Belgrade
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T. (2000) Speech parameter generation algorithms for HMM-based speech synthesis. u: 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), Institute of Electrical and Electronics Engineers (IEEE), str. 1315-1318
Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K. (2013) Speech Synthesis Based on Hidden Markov Models. Proceedings of the IEEE, 101(5): 1234-1252
Viswanathan, M., Viswanathan, M. (2005) Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale. Computer Speech & Language, 19(1): 55-83
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V. (2006) The HTK book (for HTK version 3. 4). Cambridge University engineering department
Ze, H., Senior, A., Schuster, M. (2013) Statistical parametric speech synthesis using deep neural networks. u: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Institute of Electrical and Electronics Engineers (IEEE), str. 7962-7966
Zen, H., Tokuda, K., Black, A.W. (2009) Statistical parametric speech synthesis. Speech Communication, 51(11): 1039-1064