Skip to main content

Why is Speech Signal Processing more complex than Text Processing


At best a speech signal can be best described as 

 indiaisthelargestdemocracywelcometoindia

 and in reality it is 

either 

indiaesthelarzestdemocracyvelcometwondia  or indiaes thelarzestdemo cracyvel cometw ondia

Those of you who have paid attention to the 40 character signal would be able to see that it is actually

India is the largest democracy. Welcome to India.

This essentially is the difference between the input's seen in a text processing pipeline versus a speech processing pipeline. For this reason several ask to the speech processing community is (Speech Recognition)

Can you give us "India is the largest democracy. Welcome to India." from the speech signal "indiaes thelarzestdemo cracyvel cometw ondia"? 

so that we can go on with out text processing pipeline and do all the glittery stuff in Natural Language Processing. 

A speech processing researcher is looking at  "indiaes thelarzestdemo cracyvel cometw ondia" or "indiaesthelarzestdemocracyvelcometwondia"  while a text processing researcher is looking at "India is the largest democracy. Welcome to India."

The differences "capitalized alphabets, the punctuation's, the word separators, ..." are a boon, served on a silver plate, to a text processing engineer. While they are to be deciphered, with significant effort, from a speech signal for a speech engineer.

As an example, just imagine how easy it is to identify the number of times "India" occurs in  a text signal versus trying to identify "India" in the speech signal "indiaes thelarzestdemo cracyvel cometw ondia" (Keyword spotting).

But a speech signal come with a lot more information than just the linguistic content. It has information about the speaker gender (male, female), speaker age (adult, child), speaker identity (Sunil, ...), speaker state (happy, neutral, stressed, ....), speaker accent (Indian, Australian, ...), speaker dialect, speaker health (dysarthric, ...) ... which most often does not accompany the text signal "India is the largest democracy. Welcome to India." 

However for any transaction to happen between a machine and a human the crucial step is one of trying to get the linguistic content in the speech. For example, a robot in an airport trying to answer questions posed by the travelers requires the robot to convert speech into text perfectly. And for this reason the priority and focus, in the speech community, has been on exploiting only the linguistic content in the speech signal (Speech Recognition) and not so much on any other information that is abundantly and uniquely present only in the speech signal.

A robot that tries to answer information about Bangalore airport in India. The first part is the ability to understand what the human is "asking", namely to convert speech signal into text.
 

For this reason it is natural that concepts and algorithms that pass the test or performance on text strings (signal) are adopted (given a try!) in speech signal processing.

For those of who who are curious. For a visual feel as to why speech signal is complex, please see complexity of speech.


Comments

Popular posts from this blog

Visualizing Speech Processing Challenges!

Often it is difficult to emphasize the difficulty that one faces during speech signal processing. Thanks to the large population use of speech recognition in the form of Alexa, Google Home when most of us are asking for a very limited information ("call my mother", "play the top 50 international hits" or "switch off the lights") which is quite well captured by the speech recognition engine in the form of contextual knowledge (it knows where you are; it knows your calendar, it know you parents phone number, it knows your preference, it knows your facebook likes .... ). Same Same - Different Different:   You speak X = /My voice is my password/ and I speak Y= /My voice is my password/. In speech recognition both our speech samples (X and Y) need to be recognized as "My voice is my password" while in speaker biometric X has to be attributed to you and and Y has to be attributed to me! In this blog post we try to show   visually   what it means to pro

BITS Pilani Goa Campus - Some Useful Information

You have cleared the BIT Aptitude Test and have got admission to BITS Pilani Goa Campus. Congratulation . Well Done. This is how the main building looks! Read on for some useful information, especially since you are traveling for the first time to the campus and more or less you will face the same scenario that we faced! We were asked report on 29-Jul-2018 (Sunday) to take admission on, 30-Jul-2018.  We reached Madgoan (we traveled by train though the airport is pretty close to the BITS campus, primarily to allow us to carry more luggage!)at around 0700 hours (expect a few drizzles to some good rain - so carry an umbrella) on 29-July-2019. As you come out you will be hounded by several taxi drivers, but the best is to take the official pre-paid taxi. It should cost you INR 700 to reach the BITS campus. We had booked a hotel in Vasco (this is one of the closest suburb from BITS campus, a taxi should charge you around 300-350 INR; you will make plenty of trips!) and

Authorship or Acknowledgement? Order of Authors!

 {Personal views} Being in an R&D organization means there are several instances when you have to write (Scientific or Technical Papers) about what you do in peer reviewed conference or journals.Very often, the resulting work is a team effort and as a consequence most papers, written today, have multiple authors.  Few decades ago, as a research scholar, it was just you and your supervisor as the two sole authors of any output that came out of the PhD exploration. This was indeed true, especially if you were writing a paper based on your ongoing research towards a PhD. In the pre-google days, the trend was to email the second author (usually the supervisor) to ask for a copy of the paper so that you could read the research and hopeful build on it because you knew that the supervisor would be more static in terms of geo coordinates than the scholar.   However the concept of multiple authors for a research article is seeping into academic research as well. These days labs write papers