Skip to main content

Why is it hard to recognize Pathological Speech?

 “All happy families are alike; each unhappy family is unhappy in its own way.” 

-- Leo Tolstoy , Anna Karenina

Automatic Speech Recognition (ASR) or Speech Transcription (ST) is the process of converting human speech into text. Thanks to the availability of abundant speech data and the powerful processing power in the form of GPS and  significant strides made in Deep Machine Learning the process of speech transcription seems to have been solved. The cloud biggies have made it a commodity and have greatly packaged it making it a desirable toy (read smart speakers) to have. If you are wondering which toy? Well it is the Echo's, Home's. 

Productization has had a free learning curve, understanding what people want by throwing free "search what you want" interfaces. These people behaviour on the web comes in handy to build reliable ASR's under the hood of smart speakers. The ASR performance get's even more enhanced when they know more about you (personal information) so that they hardly make mistakes when you say "Call my Mom" on a Saturday 1500 hrs because this is what you do as part of your weekly routine. This is not to take away the advanced in speech signal processing, but these kind of human behaviour understanding does help in the over all enhanced performance of ASR and hence user experience making us all believe that the ASR is a solved problem. 

The fact that most of us a clueless as to what to do when we are faced with a screen-less interface (as human we expect to be given a hint as to what a machine interface is capable of doing for you) and the hint come in the form of /You can ask ...../. Eight out of 10 people seem to pick the  top 2 hints, making it even more easier the smart speaker to seem smarter than they are!

The ASR is one of the important components of a smart speaker and is generally build by a process of what is called "training". Basically you throw in a lot of speech data and the corresponding text and using some well know techniques (Machine Learning) learn the association between the speech and the text. When you throw hours and hours of data, these techniques are capable of associating patterns in speech with text (generally called learning).

Now here is the catch. The ability to associate is dependent on the data that you feed. While there could be variations in the speech of a man and and women (gender) of accent or dialect the variations is not too much for the pattern of association to be arrived at by the machine learning process. The more the variations, the less chances are the ability of the process to find an audio pattern to associate with the text. This leads to the ASR mis-recognition. 

Most of the cloud biggies have access to normal speech (that is most of us) and for this reason the smart speakers (or the ASR under the hood) seem to recognize us. On the other hand if someone who has an abnormal | pathological voice were to make use of these smart speakers it would fail to recognize them. Why? Because the kind of speech data was never used to train the ASR.

So what is the big deal? Add pathological speech data and train the system to make it inclusive. Two problems. Getting pathological speech data is an issue but more than that there is no similarity (as is seen in normal speech) between pathological speech from two different people. Something along the lines of 

 “All happy families are alike; each unhappy family is unhappy in its own way.” -- Leo Tolstoy 

where normal speech is like happy families more or less alike; however pathological speech is different in its own way making it difficult to find common patterns across pathological speech from different people and associate it with text. Making it damn hard to build systems that can be used by people with defective | abnormal | pathological speech.


Comments

https://authors.elsevier.com/a/1cmL839HpSeKGE for the complete article.

Popular posts from this blog

Visualizing Speech Processing Challenges!

Often it is difficult to emphasize the difficulty that one faces during speech signal processing. Thanks to the large population use of speech recognition in the form of Alexa, Google Home when most of us are asking for a very limited information ("call my mother", "play the top 50 international hits" or "switch off the lights") which is quite well captured by the speech recognition engine in the form of contextual knowledge (it knows where you are; it knows your calendar, it know you parents phone number, it knows your preference, it knows your facebook likes .... ). Same Same - Different Different:   You speak X = /My voice is my password/ and I speak Y= /My voice is my password/. In speech recognition both our speech samples (X and Y) need to be recognized as "My voice is my password" while in speaker biometric X has to be attributed to you and and Y has to be attributed to me! In this blog post we try to show   visually   what it means to pro...

BITS Pilani Goa Campus - Some Useful Information

You have cleared the BIT Aptitude Test and have got admission to BITS Pilani Goa Campus. Congratulation . Well Done. This is how the main building looks! Read on for some useful information, especially since you are traveling for the first time to the campus and more or less you will face the same scenario that we faced! We were asked report on 29-Jul-2018 (Sunday) to take admission on, 30-Jul-2018.  We reached Madgoan (we traveled by train though the airport is pretty close to the BITS campus, primarily to allow us to carry more luggage!)at around 0700 hours (expect a few drizzles to some good rain - so carry an umbrella) on 29-July-2019. As you come out you will be hounded by several taxi drivers, but the best is to take the official pre-paid taxi. It should cost you INR 700 to reach the BITS campus. We had booked a hotel in Vasco (this is one of the closest suburb from BITS campus, a taxi should charge you around 300-350 INR; you will make plenty of trips!) ...

Paying Property Taxes Online - Government of Andhra Pradesh

When my father received a SMS stating that you can pay your property tax online. I was thrilled. Why? My father stays with me in Mumbai and paying property tax for a small flat in   Proddatur Municipality was always a pain. The best was to request someone to pay it on behalf of him which meant it was at the time and convenience of the person we requested. Mind you this is no easy task, asking someone to pay on your behalf. A quick search on the web got me to  Commissioner & Director of Municipal Administration  and I it does have an online payment of taxes tab. And boy this was a breeze. As soon as you press the online payment tab you see a neat selection of District | Muncipality | Tax Type. For my purposes I choose Ysr Kadapa (it would be nice if they changed it to read "YSR Kadapa") and then "1014-Proddatur" for Municipality and I chose Tax Type is "Property Tax" (the other option is Water tax) Once you fill in these details. You are directe...