I’m I alone in thinking that Microsoft’s (and others) push to “Democratize AI” represents a threat to business equal to that of, say, the Y2K Bug?
Here’s why I think it is. A while ago I was attending a conference and I saw an AzureML talk. The presenter did an amazing job, they were really engaging, the talk was deep, technical, and… utterly flawed. During the talk the presenter took Lickart Scale data and clustered it using the K-Means algorithm. The audience were enthralled.
Now for the non data scientists in the audience, here’s a quick catch up. You’ll see Lickart Scales used most often in surveys, you know the kinds of questions that ask: “Where 1 is strongly disagree, 2 is disagree, 3 is neither agree nor disagree, 4 is agree and 5 is strongly agree, how strongly do you agree with the following statement…”. This makes it ordinal data, data which is categorical in nature.
However, because the answers are recorded numerically – 1 through 5 – when the data comes to be analysed it looks like interval data (numbers that you can do arithmetic with). Now, when that data is analysed by an experienced data scientist, then there’s no problem, they recognize the trap and apply other analysis techniques, such as Pearson’s Correlation.
The presenter however, although a very experience database person, had little to no experience as a data scientist and immediately fell into the trap of either, seeing numbers and assuming the data was interval in nature, or not knowing that, in the K-Means algorithm, the “Means” part relates to averages. The data being ordinal in nature meant that the presenter was not entitled to do arthritic on it.
Now this wouldn’t be a real problem if it were not for two issues. Firstly, there’s the issue that the very nature of “Democratizing AI” means that the majority of analysis is going to be carried out by non trained professionals in the future and secondly, tools like AzureML can’t tell that what you are doing is nonsense because they don’t have the context of how the data was captured. You feed them numbers and ask them to do arithmetic, they do that arithmetic and give you an answer.
Going forward, the danger will be that untrained people will take that answer and run with it, not having the training to realize that the answer is utterly meaningless in the context of the problem.
To solve this problem enterprises need to understand that no matter how simple Microsoft make data science appear, to operate it safely you need trained staff. Also, Microsoft and other companies need to think about how they can make these tools “safer”, something like compilers and linters for data science need to be developed.
The bottom line is, “Democratizing AI” should mean LESS training is required to use them, not that NO training is required.