The first rule of ML is don’t use ML

02 May, 2023

Keep it simple

The first rule of machine learning: Start without machine learning (does this sound like a bold statement ?)

I think it's important to do without ML first. Solve the problem manually, or with heuristics. You can write some if/else rules or heuristics that make some simple decisions and take actions as a result.

Why is that ? it will force you to become intimately familiar with the problem and the data, which is the most important first step. Furthermore, arriving at a non-ML baseline is important in keeping yourself honest.

I’d first try really hard to see if I could solve it without machine learning. I’m all about trying the less glamorous, easy stuff first before moving on to any more complicated solutions.

High cost of ownership

ML systems (I’m thinking neural nets here) are complex to develop and maintain, they require rare expertise and need close monitoring. Their complexity also makes them very error-prone when used in a real-world setting.

Production ML used for critical business tasks should inspire even more caution

Remember Zillow’s “AI-driven” failure ? around November 2021, Zillow’s home price estimation models “failed” in production. As a result, 25% of Zillow employees lost their jobs and Zillow lost 30% of its value in 5 days. Indeed, Zillow believed their model quality was high enough to support a new business model. They started buying and flipping homes, using automated ML-based systems. Here’s the workflow: a homeowner comes to the platform. Zillow uses its models to predict 2 things, a current price the homeowner will accept and the price of the home in 6 months. If the profit margin was high enough, the platform offered to buy the home at the current predicted price. It obviously didn’t work out well.

“If it sounds too good to be true, it probably is”

This is a painful example of magical thinking and dogmatic belief that ML is always the best way to go, and that algorithms are always better than humans. This belief is very common among Data Science practitioners and even among non-tech executives.

The ML-educated c-level executive

Of course, it all boils downs (as usual ?) to a techno-political leadership and decision-making problem. Some questions must be asked: sure, ML is awesome, but is it safe/good to use for my specific use case given the stakes and risks ? should I approach the real estate market (highly complex, constantly changing and driven by seasoned experts) with an automated system crated by a bunch of applied maths graduates that have no business knowledge and let it make million dollar transactions with little human supervision ?

"When you have a problem, build two solutions - a deep Bayesian transformer running on multicloud Kubernetes and a SQL query built on a stack of egregiously oversimplifying assumptions. Put one on your resume, the other in production. Everyone goes home happy.” - Brandon Bohrer

#AI #english