The last century has seen tremendous innovation in the field of mathematics. New theories have been postulated and traditional theorems have been made robust by persistent mathematicians. And we are still reaping the benefits of their exhaustive endeavours to build intelligent machines.
Here is a list of five theorems which act as a cornerstone for standard machine learning models:
The Gauss-Markov Theorem
The first part of this theorem was given by Carl Friedrich Gauss in the year 1821 and by Andrey Markov in 1900. The modern notation of this theorem was given by FA Graybill in 1976.
Statement: When the error probability distribution is unknown in a linear model, then, amongst all of the linear unbiased estimators for the parameters of the linear model, the estimator obtained using the method of least squares is the one that minimises the variance. The mathematical expectation of each error is assumed to be zero, and all of them have the same (unknown) variance.
Application: Linear Regression models
Universal Approximation theorem
Statement: A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of R^n, under mild assumptions on the activation function.
Application: Artificial neural networks
Singular Value Decomposition
It can be used for eigen decomposition of a symmetric matrix with positive eigenvalues to any m x n matrix by polar decomposition.
Statement: Suppose M is a m × n matrix whose entries come from the field K, which is either the field of real numbers or the field of complex numbers. Then there exists a factorisation, called a ‘singular value decomposition’ of M, of the form
Where
- U is an m × m unitary matrix over K, (unitary matrices are orthogonal matrices),
- Σ is a diagonal m × n matrix with non-negative real numbers on the diagonal,
- V is an n × n unitary matrix over K, and V∗ is the conjugate transpose of V.
Application: Principal Component Analysis
Mercer’s Theorem
Postulated by Mercer in 1909, this theorem represents symmetric positive functions on a square as the sum of convergence of product functions.
Statement: Suppose K is a continuous symmetric non-negative definite kernel. Then there is an orthonormal basis {ei}i of L2[a, b] consisting of eigen functions of K such that the corresponding sequence of eigenvalues {λi}i is non-negative. The eigen functions corresponding to non-zero eigenvalues are continuous on [a, b] and K has the representation
Application: Support Vector Machines.
Representer Theorem
Statement: Among all functions, which admit an infinite representation in terms of eigen functions because of Mercer’s theorem, the one that minimises the regularised risk always has a finite representation in the basis formed by the kernel evaluated at the ‘n’ training points.
Where H is the Hilbert space and k is the reproducing kernel.
Application: Kernel tricks (class of algorithms for pattern analysis, Support Vector Machines)
The post 5 Fundamental Theorems Of Machine Learning appeared first on Analytics India Magazine.