In this song about a distant, magical place, not only is the land fair and bright, but it never rains or snows, the wind is never blowing, eggs come out of hens already soft-boiled, work never has to be done, and, among other nice things, people are free to do whatever they want there.
While the song is clearly fictional, there is a Big Rock Candy Mountain of sorts that exists in the minds of many people involved in data analysis. In this “place,” every question about data can be answered quickly, any outcome that can be imagined can be achieved. The math and underlying theory don’t need to be understood, ethics are of no concern, and computers do all the thinking. This place is known as the Big Rock Machine Learning Mountain.
Admittedly, this post is a little late to hop on the machine learning (ML) criticism bandwagon. The excitement and unrealistic expectations for ML have died down considerably over the last couple years, making way for more realistic (yet still optimistic) views. But machine learning is still a buzzword that many who work with analytics throw around carelessly. Plenty of these people still believe in the Big Rock Machine Learning Mountain.
To be clear, I’m no hater of ML. The work I’m doing for my doctoral dissertation makes heavy use of some popular machine learning methods. So I’m no unbeliever, but I often groan and roll my eyes when I come across how some company is using ML in an unprecedented way to achieve results that we can barely fathom. So instead of dismissing ML and everybody who is optimistic about it (because I’m optimistic about its potential too), this post is instead meant to restrain some over-hyped assumptions.
Machine learning hype is often increased by headlines that brag about some new amazing thing that has been accomplished. Headlines like “Computer Beats Human Chess Masters” or “Robot Solves Rubik’s Cube in 0.5 Seconds” or “Bots Begin Communicating in Strange New Language; Immediately Shut Down” only stir the pot by increasing excitement and fear about ML. While I made these headlines up, they’re similar to ones I’ve seen over the past year. We see headlines like these and think the machine uprising is upon us. But one thing we don’t often realize is that the Rubik’s cube solving robot would not fare very well playing chess or in conversing with the bots. Even the chess playing algorithm will not be able to play other games very well. Many of ML’s great accomplishments involve a very narrow scope. It would seem that true artificial intelligence (AI), on the other hand, would not be so restricted.
One reason for the success of ML in recent years has been the increase in computational resources available. Some methods, such as neural networks, were theoretically known for years before they could ever be successfully implemented. The increase in computing power has made it possible to use these once theoretical methods. There seems to be a belief that these tactics are all we need to unlock the secrets of our data (and maybe even the universe). Because we can now efficiently use neural networks, support vector machines and so on, the thinking goes, we’ve arrived at Big Rock Candy Mountain. However, ML methods typically require a large amount of data to be successful. Nowadays, algorithms are being trained using terabytes and even petabytes of data. But the popular ML methods are not the only ones that can benefit from having more data. Even more traditional methods can often produce results similar to the most cutting-edge methods when using the same data sets. That is, a large part of the success of ML is due to the ingredients and not the methods of combining them.
Realizing that ML presently has limitations, there is still, in my opinion, great untapped potential. With an increasing number of researchers and practitioners, the theory underlying ML methods will be better understood, generalizations and extensions will allow for even more practical applications, and these methods may even someday contribute to true artificial intelligence. In the meantime, for those using machine learning in any kind of data analysis, it is important to recognize that there is no Big Rock Machine Learning Mountain. Asking the right questions, thoroughly understanding the data, and combining results from ML algorithms with expert knowledge are all still necessary to get the most out of data.
Our team of experts collaborate and share insights to keep
high-risk industries safe and compliant.