THE WORKFLOW OF THE VOICE RECOGNITION IN THE CONTEXT OF STATISTICAL MODELING SYSTEMS

Kolesnyk Valeriia Viktorivna, Shmelov Oleh Borysovych

Currently, there are numerous technical means that can perceive (recognize) spoken voice messages: computers, medical electronic equipment, cars, mobile phones, etc. What is speech recognition? For the first look, everything seems very simple: a person pronounces a word (phrase), and the technical system responds adequately to it: either executes a command, contained in a word (phrase), or gaining a dictated text, or otherwise “disposes” information extracted from the phrase. The rapid development of speech recognition using a personal computer (PC) began with 1993 year. Two key objectives of speech recognition — achieving 100% recognition on a limited a set of commands for at least one speaker and independent of the speaker recognition of continuous speech stream in real time arbitrary language with acceptable quality — still not resolved, despite numerous attempts to solve these problems during last 50 years [1]. There are two major stages within voice recognition: a training stage and a testing stage. Training involves “teaching” the system by building its dictionary, an acoustic model for each word that the system needs to recognize. In the testing stage we use acoustic models of these words to recognize spoken words using a classification algorithm. The development workflow of voice recognition consists of three steps: speech acquisition, speech analysis and user interface development.