Linear Algebra

Vector Spaces¶

Subspace¶

Definition: skip due to complication

That's say a vector space \(\mathbf V = \text{Span}\set \mathbf {v_1, v_2, v_3}\),

then the subspace \(\mathbf H=\text{Span}\set\mathbf{v_1, v_2},\quad\text{Span}\set\mathbf{v_2, v_3}, \dots\)

Thus it's intuitive that subspace \(\mathbf H\) have the property: \(\set\mathbf 0 \in \mathbf H\)

Null Space¶

Definition
for homogeneous equation

\[ \begin{gather} \mathbf {Ax}= \mathbf 0 \\\\ \text{Nul } \mathbf A = \set {\mathbf x | \mathbf x \in \mathbf R^{n} \,\cap\, \mathbf {Ax=0}} \end{gather} \]

example

\[ \begin{gather} A= \begin{bmatrix} 1 & -3 & -2 \\ -5 & 9 & -1 \end{bmatrix} \qquad u= \begin{bmatrix} 5 \\ 3 \\ -2 \end{bmatrix} \\\\ \mathbf {Au}= \begin{bmatrix} 1 & -3 & -2 \\ -5 & 9 & -1 \end{bmatrix} \begin{bmatrix} 5 \\ 3 \\ -2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{gather} \]

Thus \(\mathbf u\) is in \(\text{Nul }\mathbf A\).

Properties¶

\(\text{Nul }\mathbf A\) is namely the solution space of \(\mathbf x\)
for the matrix \(\mathbf A_{m\times n}\), \(\text{Nul }\mathbf A\) is a subspace of \(ℝ^{n}\)
since \(\mathbf {Ax=0}\), thus \(\mathbf x\) can be written as

\[ \begin{gather} \mathbf x= \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \end{gather} \]

so this property is intuitive.

Column Space¶

Definition: For an \(m\times n\) matrix \(\mathbf A=[a_1, a_2, \dots, a_n]\)

\[ \begin{gather} \text{Col }\mathbf A= \text{Span}\set{a_1, a_2, \dots, a_n} \\\\ a_i \in ℝ^{m} \end{gather} \]

or equivalently,

\[ \begin{gather} \text{Col }\mathbf A=\set{\mathbf {Ax}|\mathbf x \in ℝ^{n}} \end{gather} \]

Properties¶

\(\text{Col }\mathbf A\) is a subspace of \(ℝ^{m}\)
range of linear transform \(\mathbf {Ax}\)
\(\text{rank }\mathbf A = \text{dim}(\text{Col } \mathbf A)\)
\(\text{rank }\mathbf A+\text{dim}(\text{Nul }\mathbf A)=n\)
this is quite intuitive since when we lost \(1\) dimension in \(\text{Col }\mathbf A\), we got \(1\) dimension in solution space.
e.g. \(3\) equations in \(ℝ^{3}\) can yield an unique solution. However, \(2\) equations can only yield a line (i.e. one dimension) at most.

Characteristic Equation¶

Eigenvector & Eigenvalue¶

Definition

\[ \begin{gather} \mathbf {Ax}= \lambda \mathbf x \\\\ \Updownarrow \\\\ \mathbf {(A-\lambda I)\,x}= 0 \end{gather} \]

\(n\times n\) matrix \(\mathbf A\), with the eigenvector \(\mathbf x\) and eigenvalue \(\lambda\).

note:

\(\mathbf 0\) cannot be an eigenvector (since it is trivial and by definition)
0 can be an eigenvalue.

Orthogonality¶

Orthogonal Complements¶

A vector \(\mathbf x\) is in \(\mathbf {W^{\bot}}\) iff \(\mathbf x\) is orthogonal to every vector in \(\mathbf W\)
\((\text{Row }\mathbf {A})^{\bot}=\text{Nul }\mathbf A\)
\((\text{Col }\mathbf {A})^{\bot}=\text{Nul }\mathbf {A^{T}}\)

Orthogonal Projection¶

Given a vector \(\mathbf y\) and a subspace \(\mathbf W\) in \(ℝ^{n}\)

\[ \begin{align} \mathbf y &= c_1 \mathbf {u_1} + c_2 \mathbf {u_2} + \dots + c_n \mathbf {u_n} \\\\ &= \begin{bmatrix} \mathbf {u_1} & \mathbf {u_2} & \cdots & \mathbf {u_n} \\ \end{bmatrix} \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix} \end{align} \]

Let's say \(\mathbf W = \text{Span }\{\mathbf u_1, \mathbf u_2, \dots, \mathbf u_p\}\) (\(p \le n\))

then the orthogonal projection of \(\mathbf y\) onto \(\mathbf W\), \(\hat {\mathbf y}\) (aka \(\text{proj}_{\mathbf W}\mathbf y\))

\[ \begin{gather} \hat {\mathbf y} = \frac{\mathbf {y \cdot u_1}}{||\mathbf {u_1}||^{2}}\mathbf u_1 + \frac{\mathbf {y \cdot u_2}}{||\mathbf {u_2}||^{2}}\mathbf u_2 + \dots+ \frac{\mathbf {y \cdot u_p}}{||\mathbf {u_p}||^{2}}\mathbf {u_p} \end{gather} \]

assume \(\{\mathbf u_1, \mathbf u_2, \dots, \mathbf u_p\}\) is an orthonormal basic. we can thus further write

\[ \begin{align} \hat {\mathbf y} &= \frac{\mathbf {y \cdot u_1}}{||\mathbf {u_1}||^{2}}\mathbf u_1 + \frac{\mathbf {y \cdot u_2}}{||\mathbf {u_2}||^{2}}\mathbf u_2 + \dots+ \frac{\mathbf {y \cdot u_p}}{||\mathbf {u_p}||^{2}}\mathbf {u_p} \\\\ &= (\mathbf {y \cdot u_1})\mathbf {u_1} + (\mathbf {y \cdot u_2})\mathbf {u_2} + (\mathbf {y \cdot u_p})\mathbf {u_p} \\\\ &= \begin{bmatrix} \mathbf {u_1} & \mathbf {u_2} & \cdots & \mathbf {u_p} \\ \end{bmatrix} \begin{bmatrix} \mathbf {y \cdot u_1} \\ \mathbf {y \cdot u_2} \\ \vdots \\ \mathbf {y \cdot u_p} \\ \end{bmatrix} \\\\ &= \begin{bmatrix} \mathbf {u_1} & \mathbf {u_2} & \cdots & \mathbf {u_p} \\ \end{bmatrix} \begin{bmatrix} \mathbf {u_1^{T}} \\ \mathbf {u_2^{T}} \\ \vdots \\ \mathbf {u_p^{T}} \\ \end{bmatrix} \mathbf y \\\\ &= \mathbf {UU^{T}y} \end{align} \]

Thus, we can conclude that for an \(n \times p\) matrix \(\mathbf U\) which consists of \(\mathbf W\) which is \(p\)-dimension subspace of \(ℝ^{n}\), there is \(\mathbf {UU^{T}}\) that project \(\mathbf y \in ℝ^{n}\) onto \(\mathbf W\).

\[ \begin{gather} \mathbf {UU^{T}y}=\text{proj}_{\text{W}}\,\mathbf y \end{gather} \]

Diagonalization¶

Diagonal Matrix¶

\(\mathbf D\) is said to be a diagonal matrix if \(\mathbf D\) has the form

\[ \begin{gather} \mathbf D= \begin{bmatrix} a & 0 & 0 & \cdots \\ 0 & b & 0 & \cdots \\ 0 & 0 & c & \cdots \\ \vdots & \vdots & \vdots & \ddots \\ \end{bmatrix} \end{gather} \]

thus \(\mathbf D\) has the property

\[ \begin{gather} \mathbf D^{n}= \begin{bmatrix} a^{n} & 0 & 0 & \cdots \\ 0 & b^{n} & 0 & \cdots \\ 0 & 0 & c^{n} & \cdots \\ \vdots & \vdots & \vdots & \ddots \\ \end{bmatrix} \end{gather} \]

Diagonalization¶

Square matrix \(\mathbf A\) is said to be diagonalizable if

\(\mathbf {A=PDP^{-1}}\)
in which \(\mathbf {D}\) is diagonal matrix
\(\mathbf P\) is invertible matrix.

The Diagonalization Theorem¶

An \(n \times n\) matrix \(\mathbf A\) is diagonalizable iff \(\mathbf A\) has \(n\) eigenvectors and eigenvalues(may be multiple roots).

Easy Proof

Let's say an invertiable matrix \(P_{n\times n}\) and diagnol matrix \(D\)

\[ \begin{gather} \mathbf P = [\mathbf {v_1, v_2, \dots , v_n}] \qquad D = \begin{bmatrix} \lambda_1 & 0 & \\ 0 & \lambda_2 & \\ & & \ddots \\ & & &\lambda_n \end{bmatrix} \\\\ \mathbf {AP}=\mathbf{[Av_1, Av_2, \dots]} \\\\ \mathbf {PD}=[\mathbf {v_1, v_2, \dots , v_n}] \begin{bmatrix} \lambda_1 & 0 & \\ 0 & \lambda_2 & \\ & & \ddots \\ & & &\lambda_n \end{bmatrix} =[\mathbf {\lambda_1v_1,\,\lambda_2v_2,\,\dots}] \end{gather} \]

by the definition of characteristic equation

\[ \begin{gather} \mathbf {AP=PD} \\\\ \implies \mathbf {A=PDP^{-1}} \end{gather} \]

Orthogonally Diagonalizable¶

Definition
A matrix is said to be orthogonally diagonalizable if

\[ \begin{gather} \mathbf {A=PDP^{T}=PDP^{-1}} \\\\ \Updownarrow \\\\ \mathbf {A^{T}=P^{TT}D^{T}P^{T}=PDP^{T}=A} \end{gather} \]

And iff \(\mathbf A\) is a symmetric matrix.

SVD¶

Singular Value Diagonalization

Singular Values¶

Let \(\mathbf A\) be an \(m\times n\) matrix. Then

\[ \begin{gather} ({\mathbf {A^{T}A}})^{\mathbf T}=\mathbf {A^{T}A^{TT}}={\mathbf {A^{T}A}} \end{gather} \]

Thus, \(\mathbf {A^{T}A}\) is symmetric and orthogonally diagonalizable.

Then, Let say \(\{\mathbf {v_1, v_2, \dots, v_n}\}\) be an orthonormal basic for \(\mathbf ℝ^{n}\) consisting of eigenvectors of \(\mathbf {A^T A}\)

And also, say \(\lambda_1, \lambda_2, \dots\) be the eigenvalues of \(\mathbf {A^{T}A}\)
(make \(\lambda_1 \ge \lambda_2 \ge \dots \ge \lambda_n \ge 0\))

The singular values \(\sigma_i\) of \(\mathbf A\) are

\[ \begin{gather} \sigma_i = \sqrt{\lambda_i} \end{gather} \]

Thinking

collapse: none

Singular values are the "eigenvalues" of non-square matrix \(\mathbf A\).

Properties¶

\(\{\mathbf {Av_1, Av_2, \dots, Av_n}\}\) is an orthogonal set.
for any \(i\neq j\), \(\mathbf {(Av_i)^{T}(Av_j)=v_i A^{T}Av_j=v_i\lambda_j v_j}=0\)
\(\{\mathbf {Av_1, Av_2, \dots, Av_r}\}\) is an orthogonal basic of \(\text{Col }\mathbf A\). \(\iff \text{rank }\mathbf A = \text{dim }\text{Col }\mathbf A=r\)
say \(\mathbf {y = Ax}\) in \(\text{Col }\mathbf A\), then

\[ \begin{gather} \begin{aligned} \mathbf x &= c_1 \mathbf v_1 + c_2 \mathbf v_2 + \dots + c_r \mathbf v_r + \dots + c_n \mathbf v_n \\\\ &= c_1' \mathbf v_1 + c_2' \mathbf v_2 + \dots + c_r' \mathbf v_r \end{aligned} \\\\ \Downarrow \\\\ \begin{aligned} \mathbf y &=\mathbf {Ax} \\\\ &= c_1' \mathbf {Av_1} + c_2' \mathbf {Av_2} + \dots + c_r' \mathbf {Av_r} \end{aligned} \end{gather} \]

Singular Value Decomposition¶

\(m\times n\) "diagonal" matrix

\[ \begin{gather} \mathbf \Sigma_{m\times n}= \begin{bmatrix} \mathbf D_{r\times r} & \mathbf 0_{r\times (n-r)} \\ \mathbf 0_{(m-r)\times r} & \mathbf 0_{(m-r)\times (n-r)} \\ \end{bmatrix} \end{gather} \]

e.g. \(\mathbf A_{2\times 3}\) with \(r=2\)

\[ \begin{gather} \mathbf \Sigma= \begin{bmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \\ \end{bmatrix} \end{gather} \]

singular value decomposition

\[ \begin{gather} \mathbf A_{m\times n}= \mathbf U_{m\times m} \mathbf \Sigma_{m\times n} {\mathbf V_{n\times n}}^{\mathbf T} \end{gather} \]

Left singular vectors of \(A\,\,\) : The columns of \(\mathbf U\)
Right singular vectors of \(A\) : The columns of \(\mathbf V\)

Easy Proof

First following the property 2. of singular value, we can obtain an orthonormal basic \(\{\mathbf {u_1, u_2, \dots, u_r}\}\) by

\[ \begin{gather} \mathbf {u_i} = \mathbf {\frac{Av_i}{||Av_i||}} = \frac{\mathbf {Av_i}}{\sigma_i} \end{gather} \]

then

\[ \begin{gather} \begin{aligned} \mathbf {U\Sigma} &= \begin{bmatrix} \mathbf {u_1} & \mathbf {u_2} & \cdots & \mathbf {u_m} \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 & \cdots \\ 0 & \sigma_2 & \cdots \\ & & \ddots \end{bmatrix}_{m\times n} \\\\ &= \begin{bmatrix} \sigma_1 \mathbf u_1 & \sigma_2 \mathbf u_2 & \cdots & \sigma_r \mathbf u_r & \mathbf 0 & \cdots & \mathbf 0 \end{bmatrix} \\\\\\ \mathbf {AV} &= \begin{bmatrix} \mathbf {Av_1} & \mathbf {Av_2} & \cdots & \mathbf {Av_n} \end{bmatrix} \\\\ &= \begin{bmatrix} \sigma_1 \mathbf u_1 & \sigma_2 \mathbf u_2 & \cdots & \sigma_r \mathbf u_r & \mathbf 0 & \cdots & \mathbf 0 \end{bmatrix} \end{aligned} \\\\\\ \implies \mathbf {U\Sigma=AV} \implies \mathbf {U\Sigma V^{T}=A} \end{gather} \]

Least-Squares Solution¶

Target
for \(m \times n\) matrix \(\mathbf A\)

\[ \begin{gather} \mathbf A= \begin{bmatrix} \mathbf {a_1} & \mathbf {a_2} & \cdots &\mathbf {a_n} \end{bmatrix} \\\\ \mathbf {Ax=b} \end{gather} \]

say \(\mathbf b\) is a linear combination of columns of \(\mathbf A\) for solution \(\mathbf x\).

Most of times we cannot find the perfect solution. Thus, we try to find the Least-Squares solution \(\mathbf {\hat x}\).

Find

\[ \begin{gather} \mathbf {\hat x} = \mathop{\arg\min}_{\mathbf {\hat x}}{\,||\mathbf {b-A\hat x}||}^{2} \end{gather} \]

First, let define

\[ \begin{gather} \mathbf {p = A\hat{x} \in \text{Col }A} \end{gather} \]

We know that the least-square root happens at \((\mathbf {b-p})\) orthogonal to \(\mathbf p\).

Thus say the projection matrix \(\mathbf P\) s.t.

\[\mathbf {P b=p}\]

Back to the condition \(\mathbf {(b-p)\cdot p}=0\). That is

\[ \begin{align} \mathbf p &\in \text{Col }\mathbf {A} \\\\ \mathbf {(b-p)} &\in (\text{Col }\mathbf {A})^{\bot} = \text{Nul }\mathbf {A^{T}} \tag{a} \end{align} \]

from the definition of null space and equation \(a\), we have

\[ \begin{gather} \mathbf {A^{T}(b-p)=0} \\\\ \implies \mathbf {A^{T}(b-A\hat x)=0} \\\\ \implies \mathbf {A^{T}A\hat x=A^{T}\,b} \\\\ \implies \mathbf {\hat x=(A^{T}A)^{-1}A^{T}\,b} \end{gather} \]

Besides, we now further have

\[ \begin{gather} \mathbf {p=A\hat x=A(A^{T}A)^{-1}A^{T}\,b} \tag{b} \\\\ \mathbf {P = A(A^{T}A)^{-1}A^{T}} \end{gather} \]

Pseudoinverse Aspect¶

\[ \begin{gather} \mathbf {A\hat x=b} \\\\ \implies \mathbf {\hat x=A^{\dagger}\,b} \\\\ \implies \mathbf {p=A\hat x=AA^{\dagger}\,b} \tag{c} \end{gather} \]

compare equation \(b\) and \(c\)

\[ \begin{align} \mathbf {p}&=\mathbf {A\hat x=AA^{\dagger}\,b} \\\\ &= (\mathbf {U_r D V_r^{T}})(\mathbf {V_r D^{-1} U_r^{T}})\,\mathbf b \\\\ &= \mathbf {U_r U_r^{T}}\,\mathbf {b} \end{align} \]

by theorem of Orthogonal Projection, we know that

\[ \begin{gather} \mathbf {P} = \mathbf {AA^{\dagger}} = \mathbf {U_r U_r^{T}} \end{gather} \]