Anima Mundi

The Hashing Trick

The hashing trick is a technique used in machine learning to represent high-dimensional data, such as text or categorical features, as fixed-length vectors, by mapping them to a lower dimensional space using a hash function. This allows the data to be processed more efficiently by machine learning algorithms, which often require fixed-length inputs. However, since collisions may occur during the hashing process, the resulting vectors may not be unique, but in practice, the hashing trick is effective and widely used.

It sounds like embedding, except:

This is perfect if:

The rationale behind this technique is quite intricate, and I had to go through many ressources before getting to a good understanding of it. Here’s a pretty good ressource, using text as an example. There’s a scikit-learn function for that (sklearn.feature_extraction.FeatureHasher)

#AI #english