Feature Engineering secrets
Some say that feature engineering is a thing of the past, since we’ve made the transition to deep learning and GPUs.
I don’t believe that’s true at all for a few reasons:
- Relevant expert knowledge and better data ALWAYS win over more compute/model complexity
- KISS as a fundamental first principle for pretty muche all things in engineering (and Life ? 🤔)
- Cost of ownership: you need to think about inference cost/speed, maintenance and evolutivity. You’ll find more people that can edit a sklearn tree-based model than it is the case for a custom neural net written in an ancient version of Pytorch.
- You just don’t use ☢️ the nuclear option ☢️ when you can keep things simple, easy, nice and cheap. Except for unstructured data (image, video, text), you can do really well with regular ML methods. Proportionate response.
Some wise man once articulated one of the best data science career advice ever:
“When you have a problem, build two solutions: a deep Bayesian transformer running on multicloud Kubernetes and a SQL query built on a stack of egregiously oversimplifying assumptions. Put one on your resume, the other in production. Everyone goes home happy.”
That’s why I’m always looking for and tricks. Check out these cool feature engineering hacks:
- Target encoding, instead of one-hot encoding (careful with leakage)
- The hashing trick instead of learned embeddings, I wrote about it here
- Cyclical variable encoding, a great way of encoding time variables to be used with classic ML methods that don’t treat data point sequentially (RNNs, time series models).
