Notes: Curse of Dimensionality

Posted on

“Curse of Dimensionality”, in the sense of Bellman as well as in machine learning.

Geometry

In $$\mathbb{R}^n$$ where $$n$$ is large,

• The volume of the unit ball goes to 0 as the dimension goes to infinity.
• Most of the volume of a n-ball is concentrated around its boundary.
• Pick two vectors on the surface of the unit ball independently. The two are orthogonal with high probability.
• Johnson-Lindenstrauss lemma: a set of points can be embedded almost isometrically w.r.t. $$L_2$$ into a space of a much lower dimension. The $$L_1$$ analogue doesn’t hold. Look up fast JL transform.
• Distances between points are concentrated: $$\frac{ \max_{x, y} d(x,y) - \min_{x, y} d(x,y) } { \min_{x, y} d(x,y) } \to 0$$ where $$d$$ is the $$L_p$$ norm where $$p \geq 1$$.

ML

Suppose the data has $$p$$ features,

• Suppose that the features are binary categorical. The cardinality of the feature space grows exponentially with $$p$$. Suppose the size of the training set is fixed; then, the model is highly likely to overfit when $$p$$ is large.
• Suppose that the features are continuous and in $$[0,1]$$. Then, most points are located outside of the sphere inscribed in the cube.