Article metrics

  • citations in SCindeks: [1]
  • citations in CrossRef:0
  • citations in Google Scholar:[=>]
  • visits in previous 30 days:3
  • full-text downloads in 30 days:2
article: 6 from 34  
Back back to result list
Vojnotehnički glasnik
2014, vol. 62, iss. 4, pp. 7-37
article language: English
document type: Original Scientific Paper
published on: 22/10/2014
doi: 10.5937/vojtehg62-5170
A comparative analysis of Serbian phonemes: Linear and non-linear models
General Staff of the Serbian Army, Department of Telecommunications and Information Technology (J-6), Centre for Applied Mathematics and Electronics, Belgrade

e-mail: adanijela@ptt.rs

Abstract

This paper presents the results of a comparative analysis of Serbian phonemes. The characteristics of vowels are quasi-periodicity and clearly visible formants. Non-vowels are short-term quasi-periodical signals having a low power excitation signal. For the purpose of this work, speech production systems were modelled with linear AR models and the corresponding non-linear models, based feed-forward neural networks with one hidden-layer. Sum squared error minimization as well as the back-propagation algorithm were used to train models. The selection of the optimal model was based on two stopping criteria: the normalized mean squares test error and the final prediction error. The Levenberg-Marquart method was used for the Hessian matrix calculation. The Optimal Brain Surgeon method was used for pruning. The generalization properties, based on the time-domain and signal spectra of outputs at hidden-layer neurons, are presented.

Keywords

References

Akaike, H. (1969) Fitting autoregressive models for prediction. Annals of the Institute of Statistical Mathematics, 21(1): 243
Arsenijević, D., Milosavljević, M. (2002) Analysis of neural network models in Serbian speech consonants: Electronic review. Banja Luka: Faculty of electrical engineering
Bishop, C. (1995) Neural networks for pattern recognition. Oxford University press
Bojanić, M., Delić, V. (2009) Automatic emotion recognition in speech: Possibility and significance. Electronics, Vol. 13, No. 2, pp. 35-40
Collobert, R., Weston, J. (2008) A unified architecture for natural language processing. in: Proceedings of the 25th international conference on Machine learning - ICML '08, pp.160-167
Delić, V. (2000) Speech databases in Serbian language recorded with the AlfaNum project. pp. 29-32
Delić, V., Sečujski, M., Jakovljević, N., Janev, M., Obradović, R., Pekar, D. (2010) Speech technologies for Serbian and kindered South Slavic languages. in: Shabtai N. [ed.] Chapter 9 in the book, pp. 141-165
Hansen, L.K., Rasmusen, C.E. (1994) Pruning from adaptive regularization. Neural Computation, 6(6): 1223
Haykin, S.S. (1994) Neural networks: A comprehensive foundation. New York: Macmillan Publishing
Kashyap, R. (1980) Inconsistency of the AIC rule for estimating the order of autoregressive models. IEEE Transactions on Automatic Control, 25(5): 996-998
Khanagha, V., Yahia, H., Daoudi, K. (2011) Reconstruction of speech signals from their unpredictable points manifold. 7015, pp.1-7, Available at http://hal.inria.fr/docs/00/64/71/97/PDF/KHANAGHA_Reconstruction_of_speech _from_UPM.pdf, Retrieved on January 22, 2014
Kupusinac, A., Sečujski, M. (2009) Part of speech tagging based on combining Markov model and machine learning
Larsen, J. (1993) Design of Neural Networks. Lyngby: Electronic Institute, DTH, Ph.D. Thesis
le Cun, Y., Denker, J., Solla, S., Howard, R.E., Jackel, L.D. (1989) Optimal brain damage. in: Touretzky D.S. [ed.] Advances in Neural Information Processing Systems, San Francisco, CA, itd: Morgan Kaufmann, 2: 598-6
Little, M.A., McSharry, P.E., Moroz, I.M., Roberts, S.J. (2006) Testing the assumptions of linear prediction analysis in normal vowels. The Journal of the Acoustical Society of America, 119(1): 549
Ljung, L. (1987) System identification: Theory for the user. Prentice Hall Inc
Marković, M., Milosavljević, M., Kovačević, M., Veinović, M. (1999) Robust AR speech analysis based on MGLR algorithm and quadratic clasifier with sliding training set. pp. 2401-2408
Mesbahi, L., Jouvet, D., Bonneau, A., Fohr, D., Illina, I., Laprie, Y. (2011) Reliability of non-native speech automatic segmentation for prosodic feedback
Mikolov, T., Sutskever, I., Deodoras, A., Le, H.S., Kombrink, S., Cernocky, J. (2012) Subword language modelling with neural networks. Unpublished
Milić, M.R., Župac, G.Ž. (2012) Objektivni pristup određivanju težina kriterijuma. Vojnotehnički glasnik, vol. 60, br. 1, str. 39-56
Narendra, K.S., Parthasarathy, K. (1990) Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks, 1(1): 4-27
Norgaard, M. (2001) Neural Network Based System Identification Toolbox, Version 1. 2. in: Technical Report 97-E-851, Technical University of Denmark, Department of Automation Department of Mathematical Modelling
Pamučar, D.S., Đorović, B.D. (2012) Optimizing models for production and inventory control using a genetic algorithm. Vojnotehnički glasnik, vol. 60, br. 1, str. 14-38
Park, S., Choi, S. (2008) A constrained sequential EM algorithm for speech enhancement. Neural Networks, 21(9): 1401-1409
Pekar, D., Obradović, R., Delić, V., Krčo, S., Šenk, V. (2002) Connected words recognition. pp. 21-24
Pekar, D., Miskovic, D., Knezevic, D., Vujnovic, N., Secujski, M., Delic, V. (2010) Applications of Speech Technologies in Western Balkan Countries. in: Advances in Speech Recognition, pp.105-122
Protic, D., Milosavljevic, M. (2006) NNARX Model of Speech Signal Generating System: Test Error Subject to Modeling Mode Selection. in: 2006 25th International Conference on Microelectronics, pp.685-688
Protić, D., Milosavljević, M. (2005) Generalizaciona svojstva različitih klasa linearnih i nelinearnih modela govornog signala. Festivalski katalog, pp. 247-258
Riecke, L., Esposito, F., Bonte, M., Formisano, E. (2009) Hearing Illusory Sounds in Noise: The Timing of Sensory-Perceptual Transformations in Auditory Cortex. Neuron, 64(4): 550-561
Sainath, T.N., Kingsbury, B., Ramabhadran, B., Fousek, P., Novak, P., Mohamed, A. (2011) Making deep belief networks effective for large vocabulary continuous speech recognition. pp. 30-35
Sečujski, M., Pekar, D. (2014) Evaluacija različitih aspekata kvaliteta sintetizovanog govora. Available at http://www.savez-slijepih.hr/hr/kategorija/evaluacija-razlicitih-aspekata-kvalitetasintetizovanog- govora-452/. Retrieved on February 16, 2014
Shahin, A.J., Pitt, M.A. (2012) Alpha activity marking word boundaries mediates speech segmentation. European journal of neuroscience, 36(12): 3740-8
Silva, L., de Sa, M.J., Alexandre, L.A. (2008) Data classification with multilayer perceptrons using a generalized error function. Neural Networks, 21: 1302
Stanimirović, Lj., Ćirović, Z. (2008) Digitalna obrada govornog signala. Retrieved from www.viser.edu.rs/download/uploads/2371.pdf Accessed January 24, 2013
Svarer, C. (1995) Neural Networks for Signal Processing. Technical University of Denmark
Wu, W., Wang, J., Cheng, M., Li, Z. (2011) Convergence analysis of online gradient method for BP neural networks. Neural Networks, 24: 91-9