Skip to main content

Speech to Text Using IBM Watson from Command Line

This is more like a personal notes for my reference. If it assists anyone it is a bonus.

Assumed that 

  • You have created an account on blue-mix
  • You have obtained the user-id and password (I will call them your_userid, your_password)
  • You have a speech file called speech.wav 
  • You desire to obtain the transcript in the file transcribed_speech.txt
  • You have installed curl on your machine
  • You are not behind a

Then from the command (long) line using curl 

curl -u your_userid:your_password -X POST --header "Content-Type: audio/wav" --header "Transfer-Encoding: chunked" --data-binary @speech.wav > transcribed_speech.txt

should create  the best according to watson speech to text engine transcription of the audio speech.wav in the file transcribed_speech.txt

if you are behind a firewall with p_user as the user-id and p_passwd is the password. Then the following should do the job!

curl --proxy -your_userid:your_password -X POST --header "Content-Type: audio/wav" --header "Transfer-Encoding: chunked" --data-binary @speech.wav > transcribed_speech.txt

Happy speech to text-ing :-) 
Sample very rudimentary perl script

Also: See how to similarly use Google ASR 


Sujit Devkar said…
This comment has been removed by the author.

Popular posts from this blog

A Journey

A Journey from Vaccine to fight COVID19 to COVID19 03 April - 28 April Like several other people in India, had registered on the Aarogya Setu mobile App, blocked an afternoon slot on 14-April two weeks in advance so that I could take the jab when it was available to me in a nearby hospital.     Lazing after a normal Saturday lunch on 03-April "Sunil, My friend called she had her jab in the morning session, there was not much crowd it seems, should I attempt to get my vaccine today? We can Walk In." was something that I could not ignore because Namita was going non-stop since March 2020 (when the pandemic in India started) to her office and an early vaccine I thought would be a if not anything a protective gear for her. So masked I drove Namita to the hospital, a nice shamiya (like they do in festivals in India) was put up by the hospital with Social distances chairs. Everything looked good that I sent out an email to the head of the hospital congratulating them for the  gre

Paying Property Taxes Online - Government of Andhra Pradesh

When my father received a SMS stating that you can pay your property tax online. I was thrilled. Why? My father stays with me in Mumbai and paying property tax for a small flat in   Proddatur Municipality was always a pain. The best was to request someone to pay it on behalf of him which meant it was at the time and convenience of the person we requested. Mind you this is no easy task, asking someone to pay on your behalf. A quick search on the web got me to  Commissioner & Director of Municipal Administration  and I it does have an online payment of taxes tab. And boy this was a breeze. As soon as you press the online payment tab you see a neat selection of District | Muncipality | Tax Type. For my purposes I choose Ysr Kadapa (it would be nice if they changed it to read "YSR Kadapa") and then "1014-Proddatur" for Municipality and I chose Tax Type is "Property Tax" (the other option is Water tax) Once you fill in these details. You are directe

Why is it hard to recognize Pathological Speech?

 “All happy families are alike; each unhappy family is unhappy in its own way.”  -- Leo Tolstoy , Anna Karenina Automatic Speech Recognition (ASR) or Speech Transcription (ST) is the process of converting human speech into text. Thanks to the availability of abundant speech data and the powerful processing power in the form of GPS and  significant strides made in Deep Machine Learning the process of speech transcription seems to have been solved. The cloud biggies have made it a commodity and have greatly packaged it making it a desirable toy (read smart speakers) to have. If you are wondering which toy? Well it is the Echo's, Home's.  Productization has had a free learning curve, understanding what people want by throwing free "search what you want" interfaces. These people behaviour on the web comes in handy to build reliable ASR's under the hood of smart speakers. The ASR performance get's even more enhanced when they know more about you (personal informati