Linear algebra


Another term for the conjugate transpose. Identical to the transpose if the matrix is real.

Affine combination

A linear combination of vectors where the weights sum to 1. Unlike a convex combination, the weights can be negative.

Condition number

The condition number of a matrix A is defined as:

\kappa(A) = \frac{\sigma_{\max}(A)}{\sigma_{\min}(A)}

where \sigma_{\max}(A) and \sigma_{\min}(A) are the largest and smallest singular values of A respectively.

If \kappa(A) is high, the matrix A is said to be ill-conditioned. Conversely, if the condition number is very low (ie close to 0) we say A is well-conditioned.

Since singular values are always non-negative, condition numbers are also always non-negative.

Conjugate transpose

The matrix obtained by taking the transpose followed by the complex conjugate of each entry.

Dot product

The dot product for two vectors a and b is:

a \cdot b = \sum_{i=1}^n a_i b_i

Eigenvalues and eigenvectors

Let A be a square matrix. Then the eigenvalues and eigenvectors of the matrix are the vectors v and scalars \lambda respectively that satisfy the equation:

Av = \lambda v


The trace of A is the sum of its eigenvalues:

\text{tr}(A) = \sum_i \lambda_i

The determinant of A is the product of its eigenvalues.

\text{det}(A) = \prod_i \lambda_i

Gaussian elimination

An algorithm for solving SLEs that iteratively transforms the matrix into an upper triangular one in row echelon form.

Hadamard product

Synonymous with elementwise-multiplication.


The inverse of a matrix A is written as A^{-1}.

A matrix A is invertible if and only if there exists a matrix B such that AB = BA = I.

The inverse can be found using:

  • Gaussian elimination
  • LU decomposition
  • Gauss-Jordan elimination

Matrix decomposition

Also known as matrix factorization.

Cholesky decomposition

A = LL^*

where A is Hermitian and positive-definite, L is lower-triangular and L^* is its conjugate transpose. Can be used for solving SLEs.


A = Q \Lambda Q^*

Where the columns of Q are the eigenvectors of A. \Lambda is a diagonal matrix in which \Lambda_{ii} is the i’th eigenvalue of A.

LU decomposition

A = LU, where L is lower triangular and U is upper triangular. Can be used to solve SLEs.

Polar decomposition

A = UP

where U is unitary and P is positive semi-definite and Hermitian.

QR decomposition

Decomposes a real square matrix A such that A = QR. Q is an orthogonal matrix and R is upper triangular.

Singular value decomposition (SVD)

Let A \in \mathbb{R}^{m \times n} be the matrix to be decomposed. SVD is:

A = U\Sigma V^*

where U \in \mathbb{R}^{m \times m} is a unitary matrix, \Sigma \in \mathbb{R}^{m \times n} is a rectangular diagonal matrix containing the singular values and V \in \mathbb{R}^{n \times n} is a unitary matrix.

Can be used for computing the sum of squares or the pseudoinverse.

Orthonormal vectors

Two vectors are orthonormal if they are orthogonal and both unit vectors.

Outer product

The outer product of two column vectors x and y is:

A = xy^T

Principal Component Analysis (PCA)

Decomposes a matrix X \in \mathbb{R}^{n \times m} into a set of k orthogonal vectors. The matrix X represents a dataset with n examples and m features.

Method for PCA via eigendecomposition:

  1. Center the data by subtracting the mean for each dimension.
  2. Compute the covariance matrix on the centered data C = (X^TX)/(n-1).
  3. Do eigendecomposition of the covariance matrix to get C = Q \Lambda Q^*.
  4. Take the k largest eigenvalues and their associated eigenvectors. These eigenvectors are the ‘principal components’.
  5. Construct the new matrix from the principal components by multiplying the centered X by the truncated Q.

PCA can also be done via SVD.


Matrix rank

The number of linearly independent columns.

Tensor rank

When the term is applied to tensors, the rank refers to the dimensionality: * Rank 0 is a scalar * Rank 1 is a vector * Rank 2 is a matrix etc.

Singular values

For a matrix A the singular values are the set of numbers:

\sigma_1 \geq \sigma_2 \geq ... \geq \sigma_n \geq 0

where \sigma_i = \sqrt{\lambda_i} and \lambda_i is an eigenvalue of the matrix A^{T}A.


The span of a matrix is the set of all points that can be obtained as a linear combination of the vectors in the matrix.

Spectral norm

The maximum singular value of a matrix.

Spectral radius

The maximum of the magnitudes of the eigenvalues.


The set of eigenvalues of a matrix.

System of Linear Equations (SLE)

A set of n linear equations using a common set of m variables. For example:

3x_0 + 4x_1 = 5

-2x_0 + x_1 = 11

In matrix form an SLE can be written as:

Ax = b

Where x is the vector of unknowns to be determined, A is a matrix of the coefficients from the left-hand side and the vector b contains the numbers from the right-hand side of the equations.

Systems of linear equations can be solved in many ways. Gaussian elimination is one.

Underdetermined and overdetermined systems

  • If the number of variables exceeds the number of equations the system is underdetermined.
  • If the number of variables is less than the number of equations the system is overdetermined.


The sum of the elements along the main diagonal of a square matrix.

\text{tr}(A) = \sum_{i=1}^n A_{ii}

Satisfies the following properties:

\text{tr}(A) = \text{tr}(A^T)

\text{tr}(A + B) = \text{tr}(A) + \text{tr}(B)

\text{tr}(cA) = c\text{tr}(A)


(A^T)_{ij} = A_{ji}

Satisfies the following properties:

(A+B)^T = A^T + B^T

(AB)^T = B^TA^T

(A^T)^{-1} = (A^{-1})^T

Types of matrix

This table summarises the relationship between types of real and complex matrices. The concept in the complex column is the same as the concept in the same row of the real column if the matrix is real-valued.

Real Complex
Symettric Hermitian
Orthogonal Unitary
Transpose Conjugate transpose


A matrix that is not invertible.

Diagonal matrix

A matrix where A_{ij} = 0 if i \neq j.

Can be written as \text{diag}(a) where a is a vector of values specifying the diagonal entries.

Diagonal matrices have the following properties:

\text{diag}(a) + \text{diag}(b) = \text{diag}(a + b)

\text{diag}(a) \cdot \text{diag}(b) = \text{diag}(a * b)

\text{diag}(a)^{-1} = \text{diag}(a_1^{-1},...,a_n^{-1})

\text{det}(\text{diag}(a)) = \prod_i{a_i}

The eigenvalues of a diagonal matrix are the set of its values on the diagonal.

Hermitian matrix

The complex equivalent of a symmetric matrix. A = A^*, where * represents the conjugate transpose.

Also known as a self-adjoint matrix.

Normal matrix

A^*A = AA^* where A^* is the conjugate transpose of A.

Orthogonal matrix

A^TA = AA^T = I

Positive and negative (semi-)definite

A matrix A \in \mathbb{R}^{n \times n} is positive definite if:

z^TAz > 0, \forall z \in \mathbb{R}^n, z \neq 0

Positive semi-definite matrices are defined analogously, except with z^TAz \geq 0

Negative definite and negative semi-definite matrices are the same but with the inequality round the other way.

Singular matrix

A square matrix which is not invertible. A matrix is singular if and only if the determinant is zero.

Symmetric matrix

A square matrix A where A = A^T.

Some properties of symmetric matrices are:

  • All the eigenvalues of the matrix are real.

Triangular matrix

Either a lower triangular or an upper triangular matrix.

Lower triangular matrix

A square matrix where only the lower triangle is not composed of zeros. Formally:

A_{ij} = 0, \text{if} i < j

Upper triangular matrix

A square matrix where only the upper triangle is not composed of zeros. Formally:

A_{ij} = 0, \text{if} i \geq j

Unitary matrix

A matrix where its inverse is the same as its complex conjugate. The complex version of an orthogonal matrix.

A^*A = AA^* = I


Like PCA, ZCA converts the data to have zero mean and an identity covariance matrix. Unlike PCA, it does not reduce the dimensionality of the data and tries to create a whitened version that is minimally different from the original.