Skip to main content

Why is Speech Signal Processing more complex than Text Processing


At best a speech signal can be best described as 

 indiaisthelargestdemocracywelcometoindia

 and in reality it is 

either 

indiaesthelarzestdemocracyvelcometwondia  or indiaes thelarzestdemo cracyvel cometw ondia

Those of you who have paid attention to the 40 character signal would be able to see that it is actually

India is the largest democracy. Welcome to India.

This essentially is the difference between the input's seen in a text processing pipeline versus a speech processing pipeline. For this reason several ask to the speech processing community is (Speech Recognition)

Can you give us "India is the largest democracy. Welcome to India." from the speech signal "indiaes thelarzestdemo cracyvel cometw ondia"? 

so that we can go on with out text processing pipeline and do all the glittery stuff in Natural Language Processing. 

A speech processing researcher is looking at  "indiaes thelarzestdemo cracyvel cometw ondia" or "indiaesthelarzestdemocracyvelcometwondia"  while a text processing researcher is looking at "India is the largest democracy. Welcome to India."

The differences "capitalized alphabets, the punctuation's, the word separators, ..." are a boon, served on a silver plate, to a text processing engineer. While they are to be deciphered, with significant effort, from a speech signal for a speech engineer.

As an example, just imagine how easy it is to identify the number of times "India" occurs in  a text signal versus trying to identify "India" in the speech signal "indiaes thelarzestdemo cracyvel cometw ondia" (Keyword spotting).

But a speech signal come with a lot more information than just the linguistic content. It has information about the speaker gender (male, female), speaker age (adult, child), speaker identity (Sunil, ...), speaker state (happy, neutral, stressed, ....), speaker accent (Indian, Australian, ...), speaker dialect, speaker health (dysarthric, ...) ... which most often does not accompany the text signal "India is the largest democracy. Welcome to India." 

However for any transaction to happen between a machine and a human the crucial step is one of trying to get the linguistic content in the speech. For example, a robot in an airport trying to answer questions posed by the travelers requires the robot to convert speech into text perfectly. And for this reason the priority and focus, in the speech community, has been on exploiting only the linguistic content in the speech signal (Speech Recognition) and not so much on any other information that is abundantly and uniquely present only in the speech signal.

A robot that tries to answer information about Bangalore airport in India. The first part is the ability to understand what the human is "asking", namely to convert speech signal into text.
 

For this reason it is natural that concepts and algorithms that pass the test or performance on text strings (signal) are adopted (given a try!) in speech signal processing.

For those of who who are curious. For a visual feel as to why speech signal is complex, please see complexity of speech.


Comments

Popular posts from this blog

Visualizing Speech Processing Challenges!

Often it is difficult to emphasize the difficulty that one faces during speech signal processing. Thanks to the large population use of speech recognition in the form of Alexa, Google Home when most of us are asking for a very limited information ("call my mother", "play the top 50 international hits" or "switch off the lights") which is quite well captured by the speech recognition engine in the form of contextual knowledge (it knows where you are; it knows your calendar, it know you parents phone number, it knows your preference, it knows your facebook likes .... ). Same Same - Different Different:   You speak X = /My voice is my password/ and I speak Y= /My voice is my password/. In speech recognition both our speech samples (X and Y) need to be recognized as "My voice is my password" while in speaker biometric X has to be attributed to you and and Y has to be attributed to me! In this blog post we try to show   visually   what it means to pro...

BITS Pilani Goa Campus - Some Useful Information

You have cleared the BIT Aptitude Test and have got admission to BITS Pilani Goa Campus. Congratulation . Well Done. This is how the main building looks! Read on for some useful information, especially since you are traveling for the first time to the campus and more or less you will face the same scenario that we faced! We were asked report on 29-Jul-2018 (Sunday) to take admission on, 30-Jul-2018.  We reached Madgoan (we traveled by train though the airport is pretty close to the BITS campus, primarily to allow us to carry more luggage!)at around 0700 hours (expect a few drizzles to some good rain - so carry an umbrella) on 29-July-2019. As you come out you will be hounded by several taxi drivers, but the best is to take the official pre-paid taxi. It should cost you INR 700 to reach the BITS campus. We had booked a hotel in Vasco (this is one of the closest suburb from BITS campus, a taxi should charge you around 300-350 INR; you will make plenty of trips!) ...

Paying Property Taxes Online - Government of Andhra Pradesh

When my father received a SMS stating that you can pay your property tax online. I was thrilled. Why? My father stays with me in Mumbai and paying property tax for a small flat in   Proddatur Municipality was always a pain. The best was to request someone to pay it on behalf of him which meant it was at the time and convenience of the person we requested. Mind you this is no easy task, asking someone to pay on your behalf. A quick search on the web got me to  Commissioner & Director of Municipal Administration  and I it does have an online payment of taxes tab. And boy this was a breeze. As soon as you press the online payment tab you see a neat selection of District | Muncipality | Tax Type. For my purposes I choose Ysr Kadapa (it would be nice if they changed it to read "YSR Kadapa") and then "1014-Proddatur" for Municipality and I chose Tax Type is "Property Tax" (the other option is Water tax) Once you fill in these details. You are directe...