The article "The face recognition systems that law enforcement agencies use are probably biased" came as no surprise because the essential problem is that of data. Data mismatch to be precise
Today most of the so called smart or intelligent systems are based on "learning methodologies" or "learning techniques". What this means is that you bombard a black box with lots of input-output pair of data; the black box churns using some well known optimization algorithms to learn the match between the input-output pairs. So far so good.
Once the system is learnt, just imagine what would happen if the data you are going to test is not statistically related to the the input data that is used for training? You got it right. The black box has no clue what to infer from such (unseen) data. It just can not do anything - like a child in a new neighbourhood. At best it just makes a guess. Which is no good.
This is a classical problem in all walks of life. So much so true in speech signal processing, where one of the well known task is to convert speech to text (yes like the Siri on your iPhone!). You train a system to convert speech to text on some kind of speech data ( say clean speech) and then when you try to use the same system on speech that is noisy -- The speech recognition system just fails.
Research in robust speech recognition is what folks have been actively working on. Essentially addressing how do we handle if the data to be tested is not seen in the training data used to train the recognition system.
This is the classical Mismatched data problem.
Today most of the so called smart or intelligent systems are based on "learning methodologies" or "learning techniques". What this means is that you bombard a black box with lots of input-output pair of data; the black box churns using some well known optimization algorithms to learn the match between the input-output pairs. So far so good.
Once the system is learnt, just imagine what would happen if the data you are going to test is not statistically related to the the input data that is used for training? You got it right. The black box has no clue what to infer from such (unseen) data. It just can not do anything - like a child in a new neighbourhood. At best it just makes a guess. Which is no good.
This is a classical problem in all walks of life. So much so true in speech signal processing, where one of the well known task is to convert speech to text (yes like the Siri on your iPhone!). You train a system to convert speech to text on some kind of speech data ( say clean speech) and then when you try to use the same system on speech that is noisy -- The speech recognition system just fails.
Research in robust speech recognition is what folks have been actively working on. Essentially addressing how do we handle if the data to be tested is not seen in the training data used to train the recognition system.
This is the classical Mismatched data problem.
Comments