Skip to main content

Datasets

Datasets

Def: A dataset is a collection of data or a dataset consists of all of the information that needs analysis, generally the data is gathered during a survey.
 
You have great idea. You want to conduct some experiments. You collect your own data, test your idea and report results. Most often the data collected locally is small (to suit your experimentation).

You submit your results and then there is a comment from a reviewer saying that  (a) your experiments have not been tested on a large public datasets and (b) you need to compare your results with existing literature.

You can address (b) by implementing the existing algorithms and then running these algorithms on your small dataset. Obviously more work (need to implement an already known algorithm!) without much returns.

Use of publicly available datasets is a a good idea. It addresses both (a) and (b)!
  1. You get to test you idea and experiment on a large public dataset
  2. You can compare your results with existing literature (assuming that they have also reported results on the same dataset)
Here are a list of datasets that can be of use
All said and done, as a user of these datasets one should also strive to create datasets and put them up for public use!

Comments

Apurv said…
Nice information Sir. Thanks
oops. saw it only today. Thanks.

Popular posts from this blog

Visualizing Speech Processing Challenges!

Often it is difficult to emphasize the difficulty that one faces during speech signal processing. Thanks to the large population use of speech recognition in the form of Alexa, Google Home when most of us are asking for a very limited information ("call my mother", "play the top 50 international hits" or "switch off the lights") which is quite well captured by the speech recognition engine in the form of contextual knowledge (it knows where you are; it knows your calendar, it know you parents phone number, it knows your preference, it knows your facebook likes .... ). Same Same - Different Different:   You speak X = /My voice is my password/ and I speak Y= /My voice is my password/. In speech recognition both our speech samples (X and Y) need to be recognized as "My voice is my password" while in speaker biometric X has to be attributed to you and and Y has to be attributed to me! In this blog post we try to show   visually   what it means to pro

BITS Pilani Goa Campus - Some Useful Information

You have cleared the BIT Aptitude Test and have got admission to BITS Pilani Goa Campus. Congratulation . Well Done. This is how the main building looks! Read on for some useful information, especially since you are traveling for the first time to the campus and more or less you will face the same scenario that we faced! We were asked report on 29-Jul-2018 (Sunday) to take admission on, 30-Jul-2018.  We reached Madgoan (we traveled by train though the airport is pretty close to the BITS campus, primarily to allow us to carry more luggage!)at around 0700 hours (expect a few drizzles to some good rain - so carry an umbrella) on 29-July-2019. As you come out you will be hounded by several taxi drivers, but the best is to take the official pre-paid taxi. It should cost you INR 700 to reach the BITS campus. We had booked a hotel in Vasco (this is one of the closest suburb from BITS campus, a taxi should charge you around 300-350 INR; you will make plenty of trips!) and

Authorship or Acknowledgement? Order of Authors!

 {Personal views} Being in an R&D organization means there are several instances when you have to write (Scientific or Technical Papers) about what you do in peer reviewed conference or journals.Very often, the resulting work is a team effort and as a consequence most papers, written today, have multiple authors.  Few decades ago, as a research scholar, it was just you and your supervisor as the two sole authors of any output that came out of the PhD exploration. This was indeed true, especially if you were writing a paper based on your ongoing research towards a PhD. In the pre-google days, the trend was to email the second author (usually the supervisor) to ask for a copy of the paper so that you could read the research and hopeful build on it because you knew that the supervisor would be more static in terms of geo coordinates than the scholar.   However the concept of multiple authors for a research article is seeping into academic research as well. These days labs write papers