Geometric AlgorithmsIntroductory Linear Algebra for Computer Science Applications

1. Introduction and Motivation

1.1. Models, systems, and system states
1.2. A language of system states: terms and their algebraic properties
1.3. Extending the language of models: vectors and matrices
1.4. Review: formulas, sets, and term equality

2. Vectors
3. Matrices
Review 1. Vector and Matrix Algebra and Applications
4. Vector Spaces
5. Linear Transformations
Review 2. Vector and Matrix Algebra, Vector Spaces, and Linear Transformations

1. Introduction and Motivation

This section introduces at a high level the direction and goals of this course, as well as some terminology. Do not worry if all the details are not clear; all this material will be covered in greater detail and with many more examples during the course.

1.1. Models, systems, and system states

When many real-world problems are addressed or solved mathematically and computationally, the details of those problems are abstracted away until they can be represented directly as idealized mathematical structures (e.g., numbers, sets, matrices, and so on). We can talk about this distinction by calling the real-world problems systems (i.e., organized collections of interdepentend components) that have distinguishable system states, and the abstract versions of these systems that only capture some parts or aspects of the different system states as models. Since models are abstractions, we must also consider the interpretation of a model that explains how the different parts of the model are related to the different parts of the system (i.e., what each part of the model represents).

Definition: We represent the set of all real numbers using the symbol R.

{ -1, 1.2, 3, 4, 1/3, π, e, ... }

The set of real numbers is a very simple kind of model that can be used to represent quantities or, by considering quantities and fractions of units (e.g., inch, meter, kilogram, and so on), a magnitude. If we adopt R as our model of a system, we can then represent individual system states using individual real numbers.

model
(language for
describing
system states) example of a
system state
in the model interpretation system

R 7 number of giraffes zoo

R 1 distance in AU between the two objects the Earth-Sun system

R 5.6 temperature in Celsius weather in Boston

Thus, real numbers are a very simple symbolic language for modelling systems. Furthermore, this language allows us to describe characteristics of systems concisely ("7" is much more concise than a drawing of seven giraffes).

1.2. A language of system states: terms and their algebraic properties

Real numbers can be used to characterize the quantities or magnitudes of parts of a system, but this is often not enough. We may want to capture within our models certain kinds of changes that might occur to a system. In order to do so, we must add something to our language of models: binary operators such as +, -, ⋅, and so on.

model
(language for
describing
system states) a system state
in the model interpretation system

R with addition (+) 3 number of apples in one of the apple baskets a collection of apple baskets

R with addition (+) 3 + 2 number of apples in two of the apple baskets a collection of apple baskets

R with addition (+) 2 + 3 number of apples in two of the apple baskets a collection of apple baskets

symbolic language symbol string
(a.k.a., "term")
in the language meaning of symbol string system

The different parts of this language for models have technical names. So far, we have seen real numbers and operators. We can build up larger and larger symbol strings using operators and real numbers. These are all called terms.

Definition: The following table defines the collection of symbol strings corresponding to terms.

term

0

1.2

x x is a real number

t₁ + t₂ if t₁ and t₂ are terms

t₁ − t₂ if t₁ and t₂ are terms

− t if t is a term

t₁ ⋅ t₂ if t₁ and t₂ are terms

Note that in these notes, we use the multiplication symbols ⋅ and * interchangeably, or omit them entirely.

Notice that when our language for models only had real numbers, each symbol corresponded to a unique system state. However, by introducing operators into our language for describing system states, we are now able to write down multiple symbol strings that represent the same system state (e.g., 2 + 3 and 3 + 2). In other words, "2 + 3" and "3 + 2" have the same meaning.

Exercise: List as many of the algebraic properties of the operators + and ⋅ over the real numbers as you can.

For all real numbers x ∈ R, y ∈ R, and z ∈ R, it is true that:
x + y
=
y + x
x + (y + z)
=
(x + y) + z
x + 0
=
x
x ⋅ y
=
y ⋅ x
x ⋅ (y ⋅ z)
=
(x ⋅ y) ⋅ z
x ⋅ 1
=
x
x ⋅ (y + z)
=
(x ⋅ y) + (x ⋅ z)
x + (-x)
=
0

1.3. Extending the language of models: vectors and matrices

Real numbers and operators are still not sufficient to capture many interesting aspects of real-world systems: they can only capture quantities, one-dimensional magnitudes (e.g., distance). What if we want to capture multidimensional magnitudes (e.g., temperature and pressure) or relationships between different quantities? These can correspond to points, lines, planes, spaces; relationships and changes within such systems can be modelled as translations, rotations, reflections, stretching, skewing, and other transformations.

Example: Suppose we have $12,000 and two investment opportunities: one has an annual return of 10%, and the other has an annual return of 20%. How much should we invest in each opportunity to get exactly $1800 over one year?

The above problem can be represented as a system in which two magnitudes x ∈ R and y ∈ R have a relationship. Typically, this is represented using a system of equations:
x + y
=
12000
0.1 x + 0.2 y
=
1800
The possible solutions to this system are the collection of all pairs of real numbers that can be assigned to the pair (x,y). These are the possible system states. Notice that these can be interpreted directly as points on a plane.

How does the above example suggest we might extend our language for system states in a model? We might consider ordered pairs (x,y) of real numbers. More generally, we might consider ordered lists (x₁,...,x_n) of real numbers. In this course, we will introduce a few new kinds of terms into our language for models that can help us represent these sorts of relationships within systems: vectors and matrices. We will also study the algebraic properties of this extended language.

new term
language construct what it represents

vectors system state in the model

matrix transitions between system states

matrix changes to system states

matrix relationships between system states

matrix multiplication composition of transformations of system states

1.4. Review: sets and formulas

Definition: The symbols and, or, and not are logical operators. The symbols ∀ and ∃ are quantifiers, where ∀ is read "for all" and is called the universal quantifier, while ∃ is read "exists" and is called the existential quantifier.

Definition: Strings of mathematical symbols that can be either true or false are called formulas. The following table describes how formulas can be built up using logical operators and logical quantifiers, and their meanings.

formula conditions meaning

true always true

false always false

t₁ = t₂ t₁ and t₂ are terms true only if the meaning (e.g., system state)
of t₁ and t₂ is the same

t₁ < t₂ t₁ and t₂ are terms true only if t₁ and t₂ represent real numbers,
and t₁ is less than t₂

f₁ and f₂ if f₁ and f₂ are formulas only true if both f₁ and f₂ are true;
otherwise, it is false

f₁ or f₂ if f₁ and f₂ are formulas true if f₁, f₂, or both are true;
only false if both are false

not f if f is a formula true if f is false;
false if f is true

f₁ implies f₂ if f₁ and f₂ are formulas only true if f₂ is true whenever f₁ is true,
   or equivalently,
only true if f₁ is false or f₂ is true

f₁ iff f₂ if f₁ and f₂ are formulas only true if f₁ and f₂ are both true,
or f₁ and f₂ are both false

∀ x ∈ S,    f if S is a set and f is a formula true only if taking for every element of S,
replacing x inside f with that element makes f true

∃ x ∈ S,    f if S is a set and f is a formula true only if there is at least one element of S
that can replace x inside f so that f is true

Definition: The symbols =, <, >, ≤, ≥, ≠, and ∈ are relational operators or relations. Given two terms, t₁ and t₂, apply any of these operators to the terms produces a formula that is either true of false (e.g., "5 ≤ 6").

Fact: The relational operator = (the equality relation or just equality) has the following properties.

property definition example

reflexivity for any term t, t = t

HTML
text
load into verifier

∀ x ∈ R,
x = x
\forall x \in \R,   x = x

symmetry for any terms t₁ and t₂, t₁ = t₂ implies t₂ = t₁

HTML
text
load into verifier

∀ x,y ∈ R,
x = y
implies
y = x
\forall x,y \in \R,     x = y   \implies     y = x

transitivity for any terms t₁, t₂, and t₃, t₁ = t₂ and t₂ = t₃ implies t₁ = t₃

HTML
text
load into verifier

∀ x,y,z ∈ R,
x = y
y = z
implies
x = z
\forall x,y,z \in \R,     x = y     y = z   \implies     x = z

The above properties always apply to =. The properties below are assumptions we make in this course (they are extensions of the definition of equality):

equality of vectors: for any terms t₁, t₂, and t₃, t₁ = t₃ and t₂ = t₄ iff

t₁
t₂

=

t₃
t₄
replacement: for any terms t₁, t₂, and t₃, t₁ = t₂ iff

t₁
t₃

=

t₂
t₃

t₃
t₁

=

t₃
t₂

2. Vectors

In this section we consider a particular kind of term: a vector. A vector can have many interpretations, but the most common is that of a point in a geometric space. We first define a notation for names of vectors. Vectors will be written using any of the the following equivalent notations:

(2,3) [2;3]

(x, y, z) [x; y; z]

2.1. Defining operations on vectors

We have introduced a new kind of term (vectors) with their own corresponding notation. As with R, the set of real numbers, we define symbol to represent the sets of vectors.

Definition: We represent the set of all vectors with two components using the symbol R²:

R²

{

| x ∈ R, y ∈ R }

Definition: For positive n ∈ N, we represent the set of all vectors with n components using the symbol Rⁿ:

Rⁿ

{

x₁

⋮

x_n

| x₁ ∈ R, ..., x_n ∈ R }

We will also use symbols to name some geometric manipulations of vectors. One such operation is addition. Suppose we treat vectors as paths from (0,0), so that (2,3) is the path from (0,0) to (2,3) and (1,2) is the path from (0,0) to (1,2). Then vector addition would represent the final destination if someone walks first along the length and direction specified by one vector, and then from that destination along the length and direction specified by the other. The following definition for the operation + and vectors of two components corresponds to this intuition.

Definition: The following formula is assumed to be true; it defines what operation the symbol + represents when applied to two vectors. We call + vector addition in this context.

HTML

text

load into verifier

∀ x, y, x', y' ∈ R,

x + x'

y + y'

\forall x, y, x', y' \in \R,
  
  [x; y] + [ x'; y']  =  [x + x'; y + y']

Notice that we have defined an entirely new operation; we are merely reusing (or "overloading") the + symbol to represent this operation. We cannot assume that this operation has any of the properties we normally associate with addition of real numbers. However, we do know that according to our interpretion of vector addition (walking along one vector, then along the other from that destination), this operation should be commutative. Does our symbolic definition conform to this interpretation? If it does, then we should be able to show that [x;y] + [x';y'] and [x+x';y+y'] are names for the same vector. Using the commutativity of the real numbers and our definition above, we can indeed write the proof of this property.

Fact: Vector addition is commutative.

HTML

text

load into verifier

∀ x, y, x', y' ∈ R,

x + x'

y + y'

x' + x

y' + y

\forall x, y, x', y' \in \R,
  
  [x; y] + [x'; y']  =  [x + x'; y + y'] 
                     ``  =  [x' + x; y' + y] 
                     ``  =  [x'; y'] + [x; y]

Recall that we can view multiplication by a number as repeated addition. Thus, we could use this intuition and our new notion of vector addition defined above to define multiplication by a real number; in this context, it will be called a scalar, and the operation is known as scalar multiplication.

Definition: The following formula is assumed to be true; it defines what operation the symbol ⋅ represents when applied to a scalar (a real number) s ∈ R and a vector. We call ⋅ scalar multiplication in this context.

HTML

text

load into verifier

∀ s,x,y ∈ R,

s ⋅

s ⋅ x

s ⋅ y

\forall s,x,y \in \R,
  
  s * [x; y] = [s * x; s * y]

Note that, as with multiplication, the symbol is sometimes omitted.

Scalar multiplication has some of the intuitive algebraic properties that are familiar to us from our experience with R. Below, we provide proofs of a few.

Fact: The following fact is true for scalar multiplication.

HTML

text

load into verifier

∀ s, t, x, y ∈ R,

s (t

)

s (

t x

t y

)

s (t x)

s (t y)

(s t) x

(s t) y

(t s) x

(t s) y

t (s x)

t (s y)

s x

s y

t (s (

))

# alternatively, we could also derive...

s (t x)

s (t y)

(s t) x

(s t) y

(s t)

\forall s, t, x, y \in \R,
  
  s (t [x; y])  =  s ([t x; t y]) 
           ``   =  [ s (t x); s (t y) ] 
           ``   =  [ (s t) x; (s t) y ] 
           ``   =  [ (t s) x; (t s) y ] 
           ``   =  [ t (s x); t (s y) ] 
           ``   =  t [ s x; s y ] 
           ``   =  t (s ( [x; y])) 
  

  # alternatively, we could also derive...
                                 
  
 [ s (t x); s (t y) ]  =  [ (s t) x; (s t) y ] 
                       ``    =  (s t) [x;y]

Once we have defined scalar multiplication, we can now assign a meaning to the negation operator (interpreting -v for any vector v ∈ R² as referring to the scalar multiplication −1 ⋅ v).

Definition: The following formula is assumed to be true; it defines what operation the symbol − represents when applied to a vector.

HTML

text

load into verifier

∀ x,y ∈ R,

−

− x

− y

\forall x,y \in \R,
  
  - [x; y] = [- x; - y]

The above definition allows us to define vector subtraction: for two vectors v and w ∈ R², v - w = v + (-w).

Notice that we did not define vector division. In fact, division by a vector is undefined (in the same way that division by 0 is undefined). However, we can prove a cancellation law for vectors.

Fact: Let [x; y] = v ∈ R² be a vector with at least one nonzero real number component, and let a,b ∈ R be scalars. Suppose that a ⋅ v = b ⋅ v. If x is a nonzero component, then we have that:

a ⋅ v
=
b ⋅ v
a ⋅

x
y

=
b ⋅

x
y

a ⋅ x
=
b ⋅ x
a
=
b

Notice that we used division by x, a real number.

2.2. Properties of vector operations

In this course, we will study collections of vectors that have particular properties. When a collection of vectors satisfies these properties, we will call it a vector space. We list these eight properties (in the form of equations) below. These must hold for any u, v, and w in the collection, and 0 must also be a vector in the collection.

u + (v + w)
=
(u + v) + w
u + v
=
v + u
0 + v
=
v
v + (-v)
=
0
1 * v
=
v
s * (u + v)
=
(s * u) + (s * v)
(u + v) * s
=
(u * s) + (v * s)
s * (t * u)
=
(s * t) * u

Note that these are not unique; we could have specified an alternative equation for the additive identity.

v + 0 = v

Each equation can be derived from the other using the commutativity

v + 0 = 0 + v

These eight equations can be called axioms (i.e., assumptions) when we are studying vector spaces without thinking about the internal representation of vectors. For example, the above derivation of the alternative additive identity equation is based entirely on the axioms and will work for any vector space. However, to show that some other structure, such as R², is a vector space we must prove that these properties are satisfied. Thus, to define a new vector space, we usually must:

define a way to construct vectors;
define a vector addition operator + (i.e., how to add vectors that we have constructed);
specify the vector that is the additive identity (call it 0);
define vector inversion: a process for constructing the inverse of a vector;
prove that the eight properties of a vector space are satisfied by the vectors, +,0, and *.

Example: Assuming the following equation is true, solve for x ∈ R and y ∈ R:

2 ⋅

HTML

text

load into verifier

∀ x, y ∈ R,

2 ⋅

implies

2 ⋅ 4

2 ⋅ y

(0.5) ⋅ 10

0.5 ⋅ (2 ⋅ y)

(0.5 ⋅ 2) ⋅ y

1 ⋅ y

\forall x, y \in \R,
    
    [x; 10]  =  2 * [4; y] 
    
  \implies
    
        [x; 10]   =  [2 * 4 ; 2 * y] 
             ``   =  [8 ; 2 * y] 
              x   =  8 
             10   =  2 * y 
      (0.5) * 10  =  0.5 * (2 * y) 
               5  =  (0.5 * 2) * y 
               5  =  1 * y 
               5  =  y 
               y  =  5

Example: Assuming the following equation is true, solve for x ∈ R and y ∈ R:

2 ⋅ x

5 ⋅

2 ⋅ y

HTML

text

load into verifier

∀ x, y ∈ R,

2 ⋅ x

5 ⋅

2 ⋅ y

implies

2 ⋅ x

5 ⋅

2 ⋅ y

5 ⋅ (2 ⋅ y)

5 ⋅ 20

2 ⋅ x

(5 ⋅ 2) ⋅ y

100

2 ⋅ x

100

0.5 (2 ⋅ x)

0.5 ⋅ 100

(0.5 ⋅ 2) x

1 ⋅ x

10 ⋅ y

0.1 ⋅ 50

0.1 ⋅ (10 ⋅ y)

(0.1 ⋅ 10) ⋅ y

1 ⋅ y

\forall x, y \in \R,
    
    [x; 2 * x]  =  5 * [2 * y; 20] 
    
  \implies
    
        [x; 2 * x]  =  5 * [2 * y; 20] 
                ``  =  [5 * (2 * y); 5 * 20] 
        [x; 2 * x]  =  [(5 * 2) * y; 100] 
             2 * x  =  100 
       0.5 (2 * x)  =  0.5 * 100 
       (0.5 * 2) x  =  50 
             1 * x  =  50 
                 x  =  50 
                 x  =  10 * y 
          0.1 * 50  =  0.1 * (10 * y) 
                ``  =  (0.1 * 10) * y 
                ``  =  1 * y 
                ``  =  y 
                 y  =  5

2.4. Common vector properties, relationships, and operators

We introduce two new operations on vectors (e.g. u and v): the norm (|| ... ||) and the dot product (u ⋅ v). When interpreting some other structure, such as R², as a vector space, we must provide definitions for these operators in order to use them.

Definition: The operator ⋅ when applied to two vectors with the same number of components is called the dot product, and is defined for x,y,x',y' ∈ R as:

[x; y] ⋅ [x'; y']

x*x' + y*y'

The operator || ... || when applied to a vector is called the norm and is defined as:

|| [x;y] ||

√(x*x + y*y)

The norm || v || of a vector v represents its length in Euclidean space.

Notice that the dot product is a new, distinct form of multiplication (distinct from multiplication of real numbers, and distinct from scalar multiplication). Also notice that the two operations are related:

Fact: The following equation is true for all x,y,x',y' ∈ R:

|| [x; y] ||

√(x*x + y*y) = √([x; y] ⋅ [x; y])

These operations can be shown to have various algebraic properties.

Fact: The dot product is commutative. Notice that below, the dot product is a real number.

HTML

text

load into verifier

∀ a,b,c,d ∈ R,

⋅

a⋅c + b⋅d

c⋅a + d⋅b

⋅

\forall a,b,c,d \in \R,
  
  [a; b] * [c; d]  =  a*c + b*d  
               ``  =  c*a + d*b  
               ``  =  [c; d] * [a; b]

We also introduce several vector properties. Some deal with a single vector; some deal with two vectors; and some deal with three vectors.

Definition: The table below summarizes the definitions of several vector properties and relationships and how they are related in some cases to vector operators and associated algebraic properties.

property definition(s) algebraic properties for R²
u = [x;y], v = [x',y'], w = [x'',y'']

v has length s ||v|| = s or
√(v ⋅ v) = s ||v|| = √(x*x + x*y) = √([x,y] ⋅ [x,y])

v is a unit vector ||v|| = 1 or
v ⋅ v = 1 1 = ||v|| = √(x*x + x*y) = √([x,y] ⋅ [x,y])
1 = ||v|| = x*x + x*y = [x,y] ⋅ [x,y]

u and v are linearly dependent
u and v are collinear ∃ a ∈ R, a ⋅ u = v y/x = y'/x'
(the vectors have the same slope)

u and v are linearly independent ∀ a ∈ R, a ⋅ u ≠ v
or equivalently,
not (∃ a ∈ R, a ⋅ u = v) y/x ≠ y'/x'
(the vectors have different slopes)

u and v are orthogonal u ⋅ v = 0 y/x = -x'/y'

w is a projection of v onto u w = (v ⋅ (u/||u||)) ⋅ u/||u||

d is the (Euclidean) distance
between v and w d = ||u - v|| = ||v - u|| d = √((x - x')² + (y - y')²)

L is the unique line
parallel to v ∈ R² L = { a ⋅ v | a ∈ R }
L = { p | ∃ a ∈ R, p = a ⋅ v } { [x',y'] | y' = m x' } where m = y/x

L is the unique line
orthogonal to v ∈ R² L = { w | v ⋅ w = 0 }

L is the unique line
defined by the
two points u ∈ R² and v ∈ R² L = { a (u - v) + u | a ∈ R }
L = { p | ∃ a ∈ R, p = a (u - v) + u }

P is the unique plane
orthogonal to v ∈ R³ P = { w | v ⋅ w = 0 }

P is the unique plane
of linear combinations of
v, w ∈ R³ where v and w are
linearly independent P = { a v + b w | a ∈ R, b ∈ R }

w is a linear combination of u and v ∃ a,b ∈ R, w = au + bv

{u, v, w} are linearly independent not (u is a linear combination of v and w) and
not (v is a linear combination of u and w) and
not (w is a linear combination of u and v)

In R², we can derive the linear independence of [x;y] and [x';y'] from the orthogonality of [x;y] and [x';y'] using a proof by contradiction.

Fact: If [x;y] and [x';y'] are orthogonal (and have nonzero length), then they are linearly independent.

Suppose that [x;y] and [x';y'] are both orthogonal and linearly dependent. Since they are linearly dependent, we have that there exists an a ∈ R such that:

x'
y'

=
a ⋅

x
y

x'
=
a ⋅ x
y'
=
a ⋅ y
Since they are orthogonal, we have that:

x
y

⋅

x'
y'

=
0

x
y

⋅

a ⋅ x
a ⋅ y

=
0
x ⋅ (a ⋅ x) + y ⋅ (a ⋅ y)
=
0
a ⋅ (x² + y²)
=
0
x² + y²
=
0
x²
=
- y²
(x²)/(y²)
=
- 1
(x/y)²
=
- 1
x/y
=
√(-1)
No real numbers y and x satisfy the above equation, so we must have introduced a contradiction by supposing that [x;y] and [x';y'] are not linearly independent. Thus, they must be linearly independent.

Also, notice that [0;0] is the only solution to x² + y² = 0. Thus, all vectors in R² are linearly dependent, linearly independent, and orthogonal with [0;0].

Fact: If v ∈ R² is a vector that has nonzero length, then we know that the vector v/||v||: (1) is linearly dependent with v, and (2) is a unit vector.

Because v is a vector, then ||v|| is a real number that is nonzero, and 1/||v|| is a real number. By our definition of linear dependence, (1/||v||) ⋅ v = v/||v|| is linearly dependent with v because 1/||v|| exists.

Let v = [x;y]. We have that:

1/||v||

1 / ||[x;y]||

1 / √(x² + y²)

Then, we have that:

(1 / √(x² + y²)) ⋅

x / √(x² + y²)

y / √(x² + y²)

If we take the norm of the above vector, we have:

x / √(x² + y²)

y / √(x² + y²)

√( (x / √(x² + y²))² + (y / √(x² + y²))² )

√( (x² / (x² + y²)) + (y² / (x² + y²)) )

√( (x² + y²) / (x² + y²) )

√(1)

Thus, || v/||v|| || = 1, so v/||v|| is a unit vector.

Example: List all the unit vectors that are linearly dependent with:

It is sufficient to solve for s ∈ R, x ∈ R, and y ∈ R that satisfy the following equations:
s

x
y

=

5
12

||(x,y)||
=
1
One approach is to write x and y in terms of s and then solve for s using the second equation.

Example: Solve the following three problems.

Solve the following equation for x ∈ R:
||

x
2 ⋅ √(2)

||
=
3
Solve the following equation for x ∈ R and y ∈ R:

x
y

has length 10

x
y

and

2
0

are linearly dependent
Determine if the following two vectors are linearly dependent or linearly independent:

2
3
0

and

2
0
1

In the case of vectors in R², the properties of linear dependence and orthogonality can be defined in terms of the slopes of the vectors involved. However, note that these definitions in terms of slope are a special case for R². The more general definitions (in terms of scalar multiplication and dot product, respectively) apply to any vectors in a vector space. Thus, we can derive the slope definitions from the more general definitions.

Fact: If two vectors [x;y] ∈ R² and [x';y'] ∈ R² are linearly dependent, then x/y = x'/y'. If the two vectors a linearly dependent, then there exists a scalar s ∈ R such that:

s ⋅

x'
y'

=

x
y

s ⋅ x'
s ⋅ y'

=

x
y

s ⋅ x'
=
x
s ⋅ y'
=
y
(s ⋅ x') / (s ⋅ y')
=
x/y
x'/y'
=
x/y

Fact: If two vectors [x;y] ∈ R² and [x';y'] ∈ R² are orthogonal, then x/y = -y'/x':

x
y

⋅

x'
y'

=
0
x ⋅ x' + y ⋅ y'
=
0
x ⋅ x'
=
- y ⋅ y'
x
=
- y ⋅ y'/x'
x/y
=
- y'/x'

Example: List all unit vectors orthogonal to:

−3

It is sufficient to solve for x ∈ R and y ∈ R that satisfy the following equations:

x
y

⋅

4
−3

=
0
||(x,y)||
=
1
We can solve the above for the two vectors by first solving for both possible values of x, then finding the corresponding values for y:
4 x − 3 y
=
0
y
=
(4/3) x
√(x² + y²)
=
1
√(x² + ((4/3) x)²)
=
1
√((9/9)x² + (16/9)x²
=
1
√((25/9) x²)
=
1
√((25/9) x²)
=
1
± (5/3) x
=
1
x
=
± 3/5
Thus, the two vectors are:

x
y

∈
{

3/5
4/5

,

−3/5
−4/5

}

Orthogonal projections of vectors have a variety of interpretations and applications; one simple interpretation of the projection of a vector v onto a vector w is the shadow of vector v on w with respect to a light source that is orthogonal to w.

Example: Consider the vector following vectors, where u, v ∈ R² and u is a unit vector:

What is the orthogonal projection of v onto u? It is simply the x component of v, which is x. Thus:

is the projection of v onto u

Notice that we can obtain the length of the orthogonal projection using the dot product:

v ⋅ u

⋅

Since u is a unit vector, we can then simply multiply u by the scalar x to obtain the actual projection:

(v ⋅ u) ⋅ u

(

⋅

) ⋅

x ⋅

The above example can be generalized to an arbitrary vector v and unit vector u; then, it can be generalized to any vector u by using the fact that u/||u|| is always a unit vector.

Example: Compute the orthogonal projection of v onto u where:

We can apply the formula:
||u||
=
√(3² + 4²)

=
√(9 + 16)

=
5
u/||u||
=

3/5
4/5

(v ⋅ (u/||u||)) ⋅ (u/||u||)
=
(

9
2

⋅

3/5
4/5

) ⋅

3/5
4/5

=
(27/5 + 8/5) ⋅

3/5
4/5

=
(35/5) ⋅

3/5
4/5

=
7 ⋅

3/5
4/5

=

21/5
28/5

Other tools available online can be used to perform and check computations such as the above.

Example: Solve the problems below.

Compute the orthogonal projection of v onto u where u is a unit vector and:

1/√(2)

We can apply the formula:
(v ⋅ (u/||u||)) ⋅ (u/||u||)
=
(

4
2

⋅

1/√(2)
1/√(2)

) ⋅

1/√(2)
1/√(2)

=
(6/√(2)) ⋅

1/√(2)
1/√(2)

=

3
3

Compute the projection of v onto w where:

13 ⋅ 3

13 ⋅ 2

We can apply the formula:
||w||
=
√((5)² + (12)²)

=
√(25 + 144)

=
13
w/||w||
=

5/13
12/13

(v ⋅ (w/||w||)) ⋅ (w/||w||)
=
(

13 ⋅ 3
13 ⋅ 2

⋅

5/13
12/13

) ⋅

5/13
12/13

=
(15 + 24) ⋅

5/13
12/13

=
39 ⋅

5/13
12/13

=

15
36

Fact: Given the points u = [x₁,y₁] and v = [x₂,y₂], we can find the equation of the line between these two points in the form y = mx + b.

We recall the definition for a line L defined by two points:
L
=
{ p | ∃ a ∈ R, p = a (u - v) + u }.
Thus, if [x; y] is on the line, we have:

x
y

=
a (

x₁
y₁

-

x₂
y₂

) +

x₁
y₁

.
This implies the following system of equations (one from the x components in the above, and one from the y components):
x
=
a (x₁ - x₂) + x₁
y
=
a (y₁ - y₂) + y₁
If we solve for a in terms of x, we can recover a single equation for the line:
a
=
(x - x₁)/(x₁ - x₂)
y
=
((x - x₁)/(x₁ - x₂)) (y₁ - y₂) + y₁
y
=
((y₁ - y₂)/(x₁ - x₂)) (x - x₁) + y₁
Notice that we can set m = (y₁ - y₂)/(x₁ - x₂) because that is exactly the slope of the line between [x₁;y₁] and [x₂;y₂].
y
=
m (x - x₁) + y₁
y
=
mx - m x₁ + y₁

We see that we can set b = - m x₁ + y₁.
m
=
(y₁ - y₂)/(x₁ - x₂)
b
=
- m x₁ + y₁
y
=
mx + b
L
=
{

x
y

| y = m x + b }

2.5. Solving common problems involving vector algebra

We review the operations and properties of vectors introduced in this section by considering several example problems.

Example: Are the vectors [2; 1] and [3; 2] linearly independent?

There are at least two ways we can proceed in checking pairwise linear independence. Both involve checking if the vectors are linearly dependent. If they are linearly dependent, then they cannot be linearly independent. If they are not linearly dependent, they must be linearly independent.

We can compare the slopes; we see they are different, so they must be linearly independent.

1/2 ≠ 2/3

We can also use the definition of linear dependence. If they are linearly dependent, then we know that
∃ a ∈ R, a

2
1

=

3
2

Does such an a exist? We try to solve for it:
a

2
1

=

3
2

2a
=
3
a
=
2
4
=
3

Since we derive a contradiction, there is no such a, so the two vectors are not linearly dependent, which means they are linearly independent.

Example: Given v = [15; 20], list the vectors that are orthogonal to v, but of the same length as v.

The two constraints on the vectors [x;y] we seek are:

15
20

⋅

x
y

=
0
||

15
20

||
=
||

x
y

||

We take the first constraint and solve for x in terms of y.

15
20

⋅

x
y

=
0
15x + 20y
=
0
x
=
(-20/15) y

We now plug this into the second equation.

||

15
20

||
=
||

(-20/15) y
y

||
√(15² + 20²)
=
√((400/225) y² + y²)
√(625)
=
√((625/225) y²)
25
=
± (25/15) y
15
=
± y
y
=
± 15

Thus, the vectors are [-20; 15] and [20; -15].

Example: Given constants a,b,c ∈ R, find a vector orthogonal to the plane P defined by:

{

| a (x + y + z) + b (y + z) + c z = 0 }.

We only need to rewrite the equation defining the plane in a more familiar form. For any [x; y; z] ∈ P, we know that:

a (x + y + z) + b (y + z) + c z

ax + (a + b) y + (a + b + c) z

a + b

a + b + c

⋅

In order to be orthogonal to a plane, a vector must be orthogonal to all vectors [x;y;z] on that plane. Since all points on the plane are orthogonal to [a; a + b; a + b + c] by the above argument, [a; a + b; a + b + c] is such a point.

Example: Define the line L that is orthogonal to the vector [a; b] but also crosses [a; b] (i.e, [a; b] falls on the line).

We know that the line must be parallel to the vector that is orthogonal to [a; b]. The line L₀ crossing [0; 0] that is orthogonal to [a; b] is defined as:
L₀
=
{

x'
y'

|

a
b

⋅

x'
y'

= 0 }.
However, we need the line to also cross the point [a; b]. This is easily accomplished by adding the vector [a; b] to all the points on the orthogonal line going through [0; 0] (as defined above). Thus, we have:
L
=
{

x'
y'

+

a
b

|

a
b

⋅

x'
y'

= 0 }.
If we want to find points [ x ; y ] on the line directly without the intermediate term [ x' ; y' ], we can solve for [ x' ; y' ] in terms of [ x ; y ]:

x
y

=

x'
y'

+

a
b

x'
y'

=

x
y

−

a
b

We can then substitute to obtain a more direct definition of L (in terms of a constraint on the vectors [ x ; y ] in L):
L
=
{

x'
y'

+

a
b

   |

a
b

⋅

x'
y'

= 0 }

=
{ (

x
y

−

a
b

) +

a
b

   |

a
b

⋅ (

x
y

−

a
b

) = 0 }

=
{

x
y

   |

a
b

⋅ (

x
y

−

a
b

) = 0 }

Example: Is [7; −1] on the line defined by the points u = [19; 7] and v = [1; −5]?

To solve this problem, we recall the definition for a line defined by two points:

{ p | ∃ a ∈ R, p = a (u - v) + u }.

Thus, we want to know if [7; -1] is in the set defined as above. This can only occur if there exists an a such that [7; -1] = a (u - v) + u.

We solve for a; if no solution exists, then the point [7; -1] is not on the line L. If an a exists, then it is. In this case, a = 1/3 is a solution to both equations, so [7; -1] is on the line.

7
-1

=
a(

19
7

−

1
-5

) +

1
−5

=
(

19a − 1a
7a + 5a

) +

1
−5

=
(

18a + 1
12a − 5

)
7
=
18a+1
−1
=
12a − 5
a
=
1/3

Example: Define the line that is orthogonal to the vector [3; 5] but also crosses [3; 5] (i.e, [3; 5] falls on the line).

We know that the line must be parallel to the vector that is orthogonal to [3; 5]. The line crossing [0; 0] that is orthogonal to [3; 5] is defined as:
{

x
y

|

3
5

⋅

x
y

= 0 }
We can rewrite the above in a more familiar form:
5 y
=
− 3 x
y
=
− (3/5) x
However, we need the line to also cross the point [3; 5]. This is easily accomplished by adding the vector [3; 5] to all the points on the orthogonal line going through [0; 0] (as defined above). Thus, we have:
{

3
5

+

x
y

|

3
5

⋅

x
y

= 0 }
We can rewrite the above by defining:

x'
y'

=

3
5

+

x
y

x'
y'

−

3
5

=

x
y

Now we substitute [x;y] with [x',y'] in our definition of the line:
{

x'
y'

|

3
5

⋅ (

x'
y'

-

3
5

) = 0 }
We can now write the equation for the line:
3 (x' − 3) + 5 (y' − 5)
=
0
5 (y' − 5)
=
-3 (x' − 3)
5 y' − 25
=
-3 (x' − 3)
5 y'
=
-3 (x' − 3) + 25
y'
=
−3/5 (x' − 3) + 5
y'
=
−3/5 x' + 9/5 + 5
Notice that, alternatively, we could have instead simply found the y-intercept b ∈ R of the following equation using the point [3;5]:
y
=
− (3/5) x + b
5
=
− (3/5) (3) + b
5 + 9/5
=
b
b
=
5 + 9/5

Example: Is [8; -6] a linear combination of the vectors [19; 7] and [1; -5]?

We recall the definition of a linear combination and instantiate it for this example:

∃ a,b ∈ R,

8
-6

= a

19
7

+ b

1
−5

.

Thus, if we can solve for a and b, then [8; −6] is indeed a linear combination.

8
−6

=
a

19
7

+ b

1
−5

=

19a
7a

+

1b
−5b

8
=
19a + b
b
=
-19a + 8
−6
=
7a − 5b
−6
=
7a − 5(−19a + 8)
−6
=
7a + 95a − 40
34
=
102 a
a
=
1/3
b
=
−19/3 + 8 = 5/3

Example: Are the vectors V = {[2; 0; 4; 0], [6; 0; 4; 3], [1; 7; 4; 3]} linearly independent?

The definition for linear independence requires that none of the vectors being considered can be expressed as a sum of the others. Thus, we must check all pairs of vectors against the remaining third vector not in the pair. There are 3!/(2!*1!) such pairs:
can

2
0
4
0

be expressed as a combination of

6
0
4
3

and

1
7
4
3

?
can

6
0
4
3

be expressed as a combination of

2
0
4
0

and

1
7
4
3

?
can

1
7
4
3

be expressed as a combination of

6
0
4
3

and

2
0
4
0

?

For each combination, we can check whether the third vector is linearly dependent. If it is linearly dependent, we can stop and say that the three vectors are not linearly independent. If it is not, we must continue checking all the pairs. If all the pairs are incapable of being scaled and added in some way to obtain the third vector, then the three vectors are linearly independent.

not (u, v, w are linearly independent) iff (∃ a,b ∈ R, u,v,w ∈ V, w is a linear combination of u and v)

Notice that this an example of a general logical rule:

not (∀ x ∈ S, p) iff (∃ x ∈ S, not p)

We check each possible combination and find that we derive a contradiction if we assume they are not independent.

2
0
4
0

=

6
0
4
3

a +

1
7
4
3

b
0
=
7 b
b
=
0
2
=
6 a + 1 b
a
=
2/6
0
=
3 (2/6) + 3 (0)
0
=
1

6
0
4
3

=

2
0
4
0

a +

1
7
4
3

b
3
=
3 b
b
=
1
6
=
2 a + b
6
=
2 a + 1
5
=
2 a
a
=
5/2
4
=
4 (5/2) + 4
0
=
10

1
7
4
3

=

2
0
4
0

a +

6
0
4
3

b
7
=
0 a + 0 b
7
=
0

Thus, V is a set of linearly independent vectors.

2.6. Using vectors and linear combinations to model systems

In the introduction we noted that in this course, we would define a symbolic language for working with a certain collection of idealized mathematical objects that can be used to model system states of real-world systems in an abstract way. Because we are considering a particular collection of objects (vectors, planes, spaces, and their relationships), it is natural to ask what kinds of problems are well-suited for such a representation (and also what problems are not well-suited).

What situations and associated problems can be modelled using vectors and related operators and properties? Problems involving concrete objects that have a position, velocity, direction, geometric shape, and relationships between these (particularly in two or three dimensions) are natural candidates. For example, we have seen that it is possible to compute the projection of one vector onto another. However, these are just a particular example of a more general family of problems that can be studied using vectors and their associated operations and properties.

A vector of real numbers can be used to represent an object or collection of objects with some fixed number of characteristics (each corresponding to a dimension or component of the vector) where each characteristic has a range of possible values. This range could be a set of magnitudes (e.g., position, cost, mass), a discrete collection of states (e.g., absence or presence of an edge in a graph), or even a set of relationships (e.g., for every cow, there are four cow legs; for every $1 invested, there is a return of $0.02). Thus, vectors are well-suited for representing problems involving many instances of objects where all the objects have the same set of possible characteristics along the same set of linear dimensions. In these instances, many vector operations also have natural interpretations. For example, addition and scalar multiplication (i.e., linear combinations) typically correspond to the aggregation of a property across multiple instances or copies of objects with various properties (e.g., the total mass of a collection of objects).

In order to illustrate how vectors and linear combinations of vectors might be used in applications, we recall the notion of a system. A system is any physical or abstract phenomenon, or observations of a phenomenon, that we characterize as a collection of real values along one or more dimensions. A system state or state a system is a particular collection of real values. For example, if a system is represented by R⁴, states of that system are represented by individual vectors in R⁴ (note that not all vectors need to correspond to valid or possible states; see the examples below).

Example: Consider the following system: a barn with cows and chickens inside it. There are several dimensions along which an observer might be able to measure this system (we assume that the observer has such poor eyesight that chickens and cows are indistinguishable from above):

number of chickens inside
number of cows inside
number of legs that can be seen by peeking under the door
number of heads that can be seen by looking inside from a high window

Notice that we could represent a particular state of this system using a vector in R⁴. However, notice also that many vectors in R⁴ will not correspond to any system that one would expect to observe. Usually, the number of legs and heads in the entire system will be a linear combination of two vectors: the number of legs per cow, and the number of legs per chicken:

1 head
2 legs

⋅ x chickens +

1 head
4 legs

⋅ y cows
=

x+y heads
2x+4y legs

Given this relationship, it may be possible to derive some characteristics of the system given only partial information. Consider the following problem: how many chickens and cows are in a barn if 8 heads and 26 legs were observed?

1 head
2 legs

⋅ x chickens +

1 head
4 legs

⋅ y cows
=

8 heads
26 legs

Notice that a linear combination of vectors can be viewed as a translation from a vector describing one set of dimensions to a vector describing another set of dimensions. Many problems might exist in which the values are known along one set of dimensions and unknown along another set.

Example: We can restate the example from the introduction using linear combinations. Suppose we have $12,000 and two investment opportunities: A has an annual return of 10%, and B has an annual return of 20%. How much should we invest in each opportunity to get $1800 over one year?

The two investment opportunities are two-dimensional vectors representing the rate of return on a dollar:

1 dollar
0.1 interest

and

1 dollar
0.2 interest

The problem is to find what combination of the two opportunities would yield the desired observation of the entire system:

1 dollar
0.1 dollars of interest

⋅ x dollars in opportunity A +

1 dollar
0.2 dollars of interest

⋅ y dollars in opportunity B
=

12,000 dollars
1800 dollars of interest

Next, we consider a problem with discrete dimensions.

Example: Suppose there is a network of streets and intersections and the city wants to set up cameras at some of the intersections. Cameras can only see as far as the next intersection. Suppose there are five streets (#1, #2, #3, #4, #5) and four intersections (A, B, C, and D) at which cameras can be placed, and the city wants to make sure a camera can see every street while not using any cameras redundantly (i.e., two cameras should not film the same street).

Vectors in R⁵ can represent which streets are covered by a camera. A fixed collection of vectors, one for each intersection, can represent what streets a camera can see from each intersection. Thus, the system's dimensions are:

is street #1 covered by a camera?
is street #2 covered by a camera?
is street #3 covered by a camera?
is street #4 covered by a camera?
is street #5 covered by a camera?
is there a camera at intersection A? (represented by the variable a below)
is there a camera at intersection B? (represented by the variable b below)
is there a camera at intersection C? (represented by the variable c below)
is there a camera at intersection D? (represented by the variable d below)

Four fixed vectors will be used to represent which streets are adjacent to which intersections:

0
1
0
0
1

,

1
0
1
1
0

,

1
1
0
0
0

,

1
0
0
1
1

Placing the cameras in the way required is possible if there is integer solution to the following equation involving a linear combination of the above vectors:

0
1
0
0
1

a +

1
0
1
1
0

b +

1
1
0
0
0

c +

1
0
0
1
1

d
=

1
1
1
1
1

Example: Suppose a chemist wants to model a chemical reaction. The dimensions of the system might be:

how many molecules of C₃H₈ are present?
how many molecules of O₂ are present?
how many molecules of CO₂ are present?
how many molecules of H₂O are present?
how many atoms of carbon are present?
how many atoms of hydrogen are present?
how many atoms of oxygen are present?

Individual vectors in R³ can be used to represent how many atoms of each element are in each type of molecule being considered:

C₃H₈:

3
8
0

,     O₂:

0
0
2

,     CO₂:

1
0
2

,     H₂O:

0
2
1

Suppose we know that the number of atoms in a system may never change during a reaction, and that some quantity of C₃H₈ and O₂ can react to yield only CO₂ and H₂O. How many molecules of each compound will be involved in the reaction? That is the solution to the following linear combination.

3
8
0

x₁ +

0
0
2

x₂
=

1
0
2

x₃ +

0
2
1

x₄

For example, suppose we start with 1000 molecules of C₃H₈ and 5000 molecules of O₂. If both of these compounds react to produce only CO₂ and H₂O, how many molecules of each will be produced?

3
8
0

1000 +

0
0
2

5000
=

1
0
2

a +

0
2
1

b

3000
8000
10000

=

1
0
2

a +

0
2
1

b
3000
=
1 ⋅ a + 0 ⋅ b
a
=
3000
8000
=
0 ⋅ a + 2 ⋅ b
b
=
4000
10000
=
2 ⋅ 3000 + 1 ⋅ 4000

Thus, a = 3000 molecules of CO₂ and b = 4000 molecules of H₂O will be produced.

The notion of a linear combination of vectors is common and can be used to mathematically model a wide variety of problems. Thus, a more concise notation for linear combinations of vectors would be valuable. This is one of the issues addressed by introducing a new type of term: the matrix.

3. Matrices

In this section we introduce a new kind of term: a matrix. We define some operations on matrices and some properties of matrices, and we describe some of the possible ways to interpret and use matrices.

3.1. Matrices and multiplication of a vector by a matrix

Matrices are a concise way to represent and reason about linear combinations and linear independence of vectors (e.g., setwise linear independence might be difficult to check using an exhaustive approach), reinterpretations of systems using different dimensions, and so on. One way to interpret a matrix is as a collection of vectors. Multiplying a matrix by a vector corresponds to computing a linear combination of that collection of vectors.

As an example, we consider the case for linear combinations of two vectors. The two scalars in the linear combination can be interpreted as a 2-component vector. We can then put the two vectors together into a single object in our notation, which we call a matrix.

Definition:

a b
c d

⋅

x
y

=

a
c

⋅ x +

b
d

⋅ y

Notice that the columns of the matrix are the vectors used in our linear combination. Notice also that we can now reinterpret the result of multiplying a vector by a matrix as taking the dot product of each of the matrix rows with the vector.

Fact:

a b
c d

⋅

x
y

=

a
c

⋅ x +

b
d

⋅ y

=

a x
c x

+

b y
d y

=

ax + by
cx + dy

=

(a,b) ⋅ (x,y)
(c,d) ⋅ (x,y)

Because a matrix is just two column vectors, we can naturally extend this definition of multiplication to cases in which we have multiplication of a matrix by a matrix: we simply multiply each column of the second matrix by the first matrix and write down each of the resulting columns in the result matrix.

Definition:

a b
c d

⋅

x s
y t

=

(a,b) ⋅ (x,y) (a,b) ⋅(s,t)
(c,d) ⋅ (x,y) (c,d) ⋅ (s,t)

These definitions can be extended naturally to vectors and matrices with more than two components. If we denote using M_ij the entry in a matrix M found in the ith row and jth column, then we can define the result of matrix multiplication of two matrices A and B as a matrix M such that

M_ij = ith row of A ⋅ jth column of B.

3.2. Interpreting matrices as tables of relationships and transformations of system states

We saw how vectors can be used to represent system states. We can extend this interpretation to matrices and use matrices to represent relationships between the dimensions of system states. This allows us to interpret matrices as transformations between system states (or partial observations of system states).

If we again consider the example system involving a barn of cows and chickens, we can reinterpret the matrix as a table of relationships between dimensions. Each entry in the table has a unit indicating the relationship it represents.

chickens cows

heads 1 head/chicken 1 head/cow

legs 2 legs/chicken 4 legs/cow

Notice that the column labels in this table represent the dimensions of an "input" vector that could be multiplied by this matrix, and the row labels specify the dimensions of the "output" vector that is obtained as a result. That is, if we multiply using the above matrix a vector that specifies the number of chickens and the number of cows in a system state, we will get a vector that specifies the number of heads and legs we can observe in that system.

1 head/chicken 1 head/cow
2 legs/chicken 4 legs/cow

⋅

x chickens
y cows

=

x+y heads
2x+4y legs

Thus, we can interpret multiplication by this matrix as a function that takes system states that only specify the number of chickens and cows, and converts them to system states that only specify the number of heads and legs:

(# chickens × # cows) → (# heads × # legs)

3.3. Interpreting multiplication of matrices as composition of system state transformations

Example: Suppose that we have a system with the following dimensions.

number of wind farms
number of coal power plants
units of power
units of cost (e.g., pollution)
number of single family homes (s.f.h.'s)
number of businesses

Two different matrices might specify the relationships between some combinations of dimensions in this system.

M₁ =

100 power/wind farm 250 power/coal plant
50 cost/wind farm 400 cost/coal plant

, M₂ =

4 s.f.h./unit power -2 s.f.h./unit cost
1 businesses/unit power 0 businesses/unit cost

Notice that these two matrices both represent transformations between partial system state descriptions.

T₁: (# wind farms × # coal plants) → (units of power × units of cost)

T₂: (units of power × units of cost) → (# s.f.h. × # businesses)

Notice that because the interpretion of a result obtained using the first transformation matches the interpretation of an input to the second, we can compose these transformations to obtain a third transformation.

T₂ o T₁: (# wind farms × # coal plants) → (# s.f.h. × # businesses)

This corresponds to multiplying the two matrices to obtain a third matrix. Notice that the units of the resulting matrix can be computed using a process that should be familiar to you from earlier coursework.

4 s.f.h./unit power -2 s.f.h./unit cost
1 businesses/unit power 0 businesses/unit cost

⋅

100 power/wind farm 250 power/coal plant
50 cost/wind farm 400 cost/coal plant

=

300 s.f.h./wind farm 200 s.f.h./coal plant
100 business/wind farm 250 businesses/coal plant

Thus, given some vector describing the number of wind farms and coal plants in the system, we can multiply that vector by (M₂ ⋅ M₁) to compute the number of single family homes and business we expect to find in that system.

Example: Suppose that a gram of gold costs $50, while a gram of silver costs $10. After purchasing some of each, you have spent $350 on 15 grams of material. How many grams of each commodity have you purchased?

Write down four dimensions describing this system.
Define a matrix A that can be used to convert a description of a system state that specifies only the amount of gold and silver purchased into a description of the system state that specifies only the cost and total weight.
Write down a matrix equation describing this problem and solve it to find the solution.
Define a matrix B such that for any description of a system state v that specifies only the total weight and amount spent, B ⋅ v is a description of that system state that specifies the amount of gold and silver in that system state.

Example: Suppose we characterize our system in terms of two dimensions:

number of single family homes (s.f.h.'s)
number of power plants (p.p.'s)

In this example, instead of studying the relationships between dimensions, we want to study how the dimensions change (possibly in an interdependent way) over time. For example, the following matrix might capture how the system state evolves from year to year:

M =

2 s.f.h. in year 1/s.f.h. in year 0	-1 s.f.h. in year 1/p.p. in year 0
0 p.p. in year 1/s.f.h in year 0	1 p.p. in year 1/p.p. in year 0

We can parameterize the above in terms of a year t ∈ R. Notice that the matrix above is just a special case of the matrix below (when t = 0):

M =

2 s.f.h. in year t+1/s.f.h. in year t	-1 s.f.h. in year t+1/p.p. in year t
0 p.p. in year t+1/s.f.h in year t	1 p.p. in year t+1/p.p. in year t

What does M ⋅ M represent? If we consider the units, we have:

M ⋅ M =

4 s.f.h. in year t+2/s.f.h. in year t	-3 s.f.h. in year t+2/p.p. in year t
0 p.p. in year t+2/s.f.h in year t	1 p.p. in year t+2/p.p. in year t

Suppose v = [x; y] ∈ R² represents the number of single family homes and factories in a given year. We can then define the number of single family homes and factories after t years as:

M^t ⋅ v

2	-1
0	1

^t ⋅

If we wanted to write the number of single family homes and factories as a function of t ∈ R and an initial state x₀, y₀ ∈ R, we could nest the dot products as follows and use algebra to simplify:

# s.f.h. in year t

2 ( 2 ( 2 ( ... 2 (2 x₀ - y₀) - y₀ ... ) - y₀) - y₀) - y₀

2^t x₀ - (2^t-1 + ... + 4 + 2 + 1) y₀

2^t x₀ - (2^t - 1) y₀

2^t x₀ - 2^t y₀ + y₀

# p.p. in year t

0 ⋅ x_t + 1 ⋅ ( ... (0 ⋅ x₂ + 1 ⋅ ( 0 ⋅ x₁ + (0 ⋅ x₀ + 1 ⋅ y₀))) ... )

y₀

3.5. Matrix operations and their interpretations

The following table summarizes the matrix operations that we are considering in this course.

term definition restrictions general properties

M₁ + M₂ component-wise matrices must have
the same number of rows
and columns commutative,
associative,
has identity (matrix with all 0 components),
has inverse (multiply matrix by -1),
scalar multiplication is distributive

M₁ ⋅ M₂ row-column-wise
dot products columns in M₁ = rows in M₂
rows in M₁ ⋅ M₂ = rows in M₁
columns in M₁ ⋅ M₂ = columns in M₂ associative,
has identity I (1s in diagonal and 0s elsewhere),
distributive over matrix addition,
not commutative in general,
no inverse in general

M^-1 columns in M = rows in M
matrix is invertible M^-1 ⋅ M = M ⋅ M^-1 = I

The following tables list some high-level intuitions about how matrix operations can be understood.

level of
abstraction interpretations of multiplication
of a vector by a matrix

applications transformation of
system states extraction of information
about system states computing properties of
combinations or aggregations
of objects (or system states) conversion of system
state observations
from one set of dimensions
to another

geometry "moving" vectors in
a space (stretching,
skewing, rotating,
reflecting) projecting vectors taking a linear combination
of two vectors reinterpreting vector notation
as referring to a collection
of non-canonical vectors

level of
abstraction interpretations of multiplication of two matrices

applications composition of system state
transformations or conversions

geometry sequencing of motions of vectors within
a space (stretching, skewing, rotating,
reflecting)

level of
abstraction invertible matrix singular matrix

applications reversible transformation
of system states extraction of complete
information uniquely determining
a system state irreversible transformation
of system states extraction of incomplete
information about
a system state

geometry reversible transformation or motion
of vectors in a space projection onto a strict subset of
a set of vectors (space)

symbolic reversible transformation of
information numerically encoded in matrix
(example of such information: system of
linear equations encoded as matrix) irreversible/"lossy" transformation of
information encoded in matrix

Suppose we interpret multiplication of a vector by a matrix M as a function from vectors to vectors:

f(v) = M v.

Fact: Notice that for any M, if f(v) = M v then f is always a function because M v has only one possible result (i.e., matrix multiplication is deterministic) for a given M and v.

Fact: If f(v) = M v, then f is invertible if M is an invertible matrix. The inverse of f is then defined to be:

f^-1(v) = M^-1 v.

Notice that f^-1 is a function because M^-1 v only has one possible result. Notice that f^-1 is the inverse of f because

f^-1(f(v)) = M^-1 M v = I v = v.

Fact: If the columns of a matrix M are linearly dependent and f(v) = M v, then M cannot have an inverse. We consider the case in which M ∈ R^2×2. Suppose we have that

a	b
c	d

If the columns of M are linearly dependent, then we know that there is some s ∈ R such that

This means that we can rewrite M:

a	sa
c	s c

Since matrix multiplication can be interpreted as taking a linear combination of the column vectors, this means that for x,y ∈ R,

a	sa
c	sc

x +

(x + s y)

But this means that for any two vectors [x;y] and [x';y'], if x + sy = x' + sy' then multiplying by M will lead to the same result. Thus, f is a function that takes two different vector arguments and maps them to the same result. If we interpret f as a relation and take its inverse f^-1, f^-1 cannot be a function.

Thus, M cannot have an inverse (if it did, then f^-1 would be a function).

Fact: If the columns of a matrix M are linearly independent and f(v) = M v, then M has an inverse. We consider the case in which M ∈ R^2×2. Suppose we have that

a	b
c	d

If the columns of M are linearly independent, then we know that a d - b % c ≠ 0:

a/c

≠

b/d

a d

≠

b c

a d - b c

≠

Suppose we pick the following M^-1:

M^-1

(1/(a d - b c)) ⋅

d	-b
-c	a

The we have that:

M^-1 ⋅ M

(1/(a d - b c)) ⋅

d	-b
-c	a

⋅

a	b
c	d

(1/(a d - b c)) ⋅

a d - b c	bd - bd
ac - ac	a d - b c

1	0
0	1

Example: Solve the following problems.

Determine which of the following matrices are not invertible:

2 -7
-4 14

,

0 0
0 1

,

2 3
0 1

The columns of the first matrix are linearly dependent, so it is not invertible. The first column of the second matrix can be obtained by multiplying the second column by 0, so the two columns of that matrix are linearly dependent; thus, it is not invertible. For the third matrix, the following equation has no solution:

2
0

=
s ⋅

3
1

Thus, the third matrix is invertible. It is also possible to determine this by computing the determinant for each matrix.
The matrix below is not invertible, and the following equation is true. What is the matrix? List all four of its components (real numbers).

a b
b c

⋅

1
2

=

5
10

Since the matrix is not invertible, it must be that its determinant is 0. Thus, we have the following system of equations:
a + 2 b
=
5
b + 2 c
=
10
a c − b²
=
0
If we solve the above, we get:
a
=
1
b
=
2
c
=
4

3.6. Matrix properties

The following are subsets of R^n×n that are of interest in this course because they correspond to transformations and systems that have desirable or useful properties. For some of these sets of matrices, matrix multiplication and inversion have properties that they do not in general.

subset of R^n×n definition closed under
matrix
multiplication properties of
matrix multiplication inversion

identity matrix ∀ i,j
    M_ij = 1 if i=j, 0 otherwise closed commutative,
associative,
distributive with addition,
has identity has inverse (itself);
closed under inversion

elementary matrix can be obtained via an
elementary row operation
from I:

add nonzero multiple
of one row of the matrix
to another row

multiply a row by a
nonzero scalar

swap two rows of the
matrix

Note: the third is a combination
of the first two operations. associative,
distributive with addition,
have identity have inverses;
closed under inversion

scalar matrices ∃ s ∈ R, ∀ i,j
    M_ij = s if i=j, 0 otherwise closed commutative,
associative,
distributive with addition,
have identity nonzero members
have inverses;
closed under inversion

diagonal matrices ∀ i,j
    M_ij ∈ R if i=j, 0 otherwise closed associative,
distributive with addition,
have identity nonzero members
have inverses;
closed under inversion

matrices with
constant diagonal ∀ i,j
    M_ii = M_jj associative,
distributive with addition,
have identity

symmetric matrices ∀ i,j
    M_ij = M_ji associative,
distributive with addition,
have identity

symmetric matrices
with constant diagonal ∀ i,j
    M_ii = M_jj and M_ij = M_ji closed commutative,
associative,
distributive with addition,
have identity

upper triangular matrices ∀ i,j
    M_ij = 0 if i > j closed associative,
distributive with addition,
have identity not invertible in general;
closed under inversion
when invertible

lower triangular matrices ∀ i,j
    M_ij = 0 if i < j closed associative,
distributive with addition,
have identity not invertible in general;
closed under inversion
when invertible

invertible matrices ∃ M^-1 s.t. M^-1 M = M M^-1 = I closed associative,
distributive with addition,
have identity nonzero members
have inverses;
closed under inversion

square matrices all of R^n×n closed associative,
distributive with addition,
have identity

Two facts presented in the above table are worth noting.

Fact: Suppose A is invertible. Then the inverse of A^-1 is A, because A A^-1 = I. Thus, (A^-1)^-1 = A.

Fact: If A and B are invertible, so is AB. That is, invertible matrices are closed under matrix multiplication. We can show this by using the associativity of matrix multiplication. Since A and B are invertible, there exist B^-1 and A^-1 such that:

(B^-1 A^-1) A B
=
(B^-1 A^-1) A B

=
B^-1 (A^-1 A) B

=
B^-1 I B

=
B^-1 B

=
I.

Thus, since there exists a matrix (B^-1 A^-1) such that (B^-1 A^-1) A B = I, (B^-1 A^-1) is the inverse of AB.

Example: Given an invertible upper triangular matrix M ∈ R^2×2, show that M^-1 is also upper triangular. Hint: write out the matrix M explicitly.

Suppose we have the following upper triangular matrix:

M
=

x y
0 z

If it is invertible, then there exists an inverse such that:

x y
0 z

⋅

a b
c d

=

1 0
0 1

This implies that

xa + yc
=
1
xb + yd
=
0
zc
=
0
zd
=
1

Because M is invertible, we know that z ≠ 0. Since zc = 0, This means that c = 0. Thus, we have that the inverse is upper triangular.

Alternatively, we can observe that c = 0, and that using the formula for the inverse would yield a lower-left matrix entry equal to −c/(det M) = 0.

Example: In terms of a, b ∈ R where a ≠ 0 and b ≠ 0, compute the inverse of:

a	0
0	b

a	0
0	b

a	0
0	b

a	0
0	b

While we could perform the instances of matrix multiplication step-by-step and then invert the result (either by solving an equation or using the formula for the inverse of a matrix in R^2×2), it's easier to recall that diagonal matrices behave in a manner that is very similar to the real numbers. Thus, the above product is equal to

a⁴	0
0	b⁴

and its inverse simply has the multiplicative inverses of the two diagonal entries as its diagonal entries:

1/a⁴	0
0	1/b⁴

Another fact not in the table is also worth noting.

Fact: Any product of a finite number of elementary row matrices is invertible. This fact follows from the fact that all elementary matrices are invertible, and that the set of invertible matrices is closed under multiplication.

Given the last fact, we might ask whether the opposite is true: are all invertible matrices the product of a finite number of elementary matrices? The answer is yes, as we will see further below.

3.7. Solving the equation M v = w for M with various properties

Recall that for v,w ∈ Rⁿ and M ∈ R^n×n, an equation of the following form can represent a system of equations:

M v = w

Notice that if M is invertible, we can solve for v by multiplying both sides by M^-1. More generally, if M is a member of some of the sets in the above table, we can find straightforward algorithms for solving such an equation for v.

M is ... algorithm to solve M v = w for v

the identity matrix w is the solution

an elementary matrix perform a row operation on M to obtain I;
perform the same operation on w

a scalar matrix divide the components of w by the scalar

a diagonal matrix divide each component of w by the
corresponding matrix component

an upper triangular matrix start with the last entry in v, which is easily obtained;
move backwards through v, filling in the values by
substituting the already known variables

a lower triangular matrix start with the first entry in v, which is easily obtained;
move forward through v, filling in the values by
substituting the already known variables

product of a lower triangular matrix
and an upper triangular matrix combine the algorithms for upper and lower triangular
matrices in sequence (see example below)

an invertible matrix compute the inverse and multiply w by it

Example: We consider an equation M v = w where M is a diagonal matrix (the identity matrix and all scalar matrices are also diagonal matrices).

4 0 0
0 3 0
0 0 5

x
y
z

=

2
9
10

4x
=
2
3y
=
9
5z
=
10
x
=
1/2
y
=
3
z
=
2

Example: We consider an equation M v = w where M is a lower triangular matrix.

4 0 0
2 4 0
4 3 5

x
y
z

=

2
9
18

4x
=
2
2x + 4y
=
9
4x + 3y + 5z
=
10
x
=
1/2
y
=
2
z
=
2

Fact: Suppose that M = L U where L is lower triangular and U is upper triangular. How can we solve M v = w?

First, note that because matrix multiplication is associative, we have

M v
=
(L U) v

=
L (U v)

We introduce a new vector for the intermediate result U v, which we call u. Now, we have a system of matrix equations.

U v
=
u
L u
=
w

We first solve for u using the algorithm for lower triangular matrices, then we solve for v using the algorithm for upper triangular matrices.

Example: Solve the following equation for x,y,z ∈ R:

−2	0	0
1	2	0
1	−1	−1

⋅

3	0	1
0	1	2
0	0	1

⋅

−8

−1

We first divide the problem into two steps using the intermediate vector [ a ; b ; c ]:

-2 0 0
1 2 0
1 −1 −1

⋅

a
b
c

=

−8
12
−1

3 0 1
0 1 2
0 0 1

⋅

x
y
z

=

a
b
c

a
b
c

=

4
4
1

x
y
z

=

1
2
1

Example: The inverse of a matrix in R^2×2 can be computed as follows.

a b
c d

^-1
=

d/(ad − bc) −b/(ad-bc)
−c/(ad − bc) a/(ad − bc)

Thus, if we find that the determinant of a matrix M ∈ R^2×2 is nonzero, the algorithm for solving M v = w is straightforward. Consider the following example.

1 2
3 4

x
y

=

5
13

1 2
3 4

^-1

1 2
3 4

x
y

=

1 2
3 4

^-1

5
13

1 0
0 1

x
y

=

4/((1⋅4)-(2⋅3)) −2/((1⋅4)-(2⋅3))
−3/((1⋅4)-(2⋅3)) 1/((1⋅4)-(2⋅3))

5
13

1 0
0 1

x
y

=

−2 1
3/2 1/−2

5
13

x
y

=

3
1

3.8. Row echelon form and reduced row echelon form

We define two more properties that a matrix may possess.

M is in row echelon form

all nonzero rows are above any rows consisting of all zeroes

the first nonzero entry (from the left) of a nonzero row is strictly
to the right of the first nonzero entry of the row above it

all entries in a column below the first nonzero entry in a row are zero
(the first two conditions imply this)

M is in reduced row echelon form

M is in row echelon form

the first nonzero entry in every row is 1;
this 1 entry is the only nonzero entry in its column

We can obtain the reduced row echelon form of a matrix using a sequence of appropriately chosen elementary row operations.

Example: Suppose we want to find the reduced row echelon form of the matrix below. We list the steps of the procedure.

1 2 1
-2 -3 1
3 5 0

→

1 2 1
0 1 3
3 5 0

→

1 2 1
0 1 3
0 -1 -3

→

1 2 1
0 1 3
0 0 0

→

1 0 -5
0 1 3
0 0 0

Fact: For any matrix M, there may be more than one way to reach the reduced row echelon form using elementary row operations. However, it is always possible, and there is exactly one unique reduced row echelon form for that matrix M. We do not prove this result in this course, but fairly short proofs by induction can be found elsewhere (such as here). Because the reduced row echelon form of a matrix M is unique, we use the following notation to denote it:

rref M.

Example: Determine which of the following matrices are elementary:

1 0 0
0 1 0

,

0 0 2
0 1 0
1 0 0

,

1 0 0
0 1 0
0 0 0

,

1 0 0
0 1 0
-2 0 1

The matrices are: (a) not elementary (not square, so not invertible), (b) not elementary (composition of two elementary row operations applied to the identity), (c) not elementary (multiplication of a row by the 0 scalar is not invertible), and (d) elementary (multiple of first row added to last row).

Example: Determine which of the following matrices are in reduced row echelon form:

1	0	-2
0	1	4

1	0	0
0	1	0
0	1	3
0	0	1

1	0
0	1
0	0

1	0
0	1
0	1
0	0

0	0	0	1	0
0	0	0	0	1

The matrices are: (a) in reduced row echelon form, (b) not in reduced row echelon form, (c) in reduced row echelon form, (d) not in reduced row echelon form, and (e) in reduced row echelon form.

Example: Suppose the matrix below is in reduced row echelon form. Solve for a, b ∈ R.

1	a	0
0	b	1
0	0	1 − a

Without loss of generality, we can focus on four possibility: a is either zero or nonzero, and b is either zero or nonzero. We can further simplify this by considering 1 as the only nonzero value of interest. Then we have that:

if a = 0 and b = 0, then the matrix is not in reduced row echelon form;
if a = 0 and b = 1, then the matrix is not in reduced row echelon form;
if a = 1 and b = 0, then the matrix is in reduced row echelon form;
if a = 1 and b = 1, then the matrix is not in reduced row echelon form.

Thus, a = 1 and b = 0.

Question: For a given matrix in reduced row echelon form, how many different matrices can be reduced to it (one or more than one)?

Fact: It is always true that rref M = rref (rref M). Thus, the rref operation is idempotent.

Fact: If rref M is I (the identity matrix), then M is invertible. This is because I is an elementary matrix, and the row operations used to obtain rref M from M can be represented as a product of elementary (and thus, invertible) matrices E₁ ⋅ ... ⋅ E_n. Thus, we have:

E₁ ⋅ ... ⋅ E_n ⋅ M
=
I
(E^-1_n ⋅ ... ⋅ E^-1₁) ⋅ E₁ ⋅ ... ⋅ E_n ⋅ M
=
(E^-1_n ⋅ ... ⋅ E^-1₁) ⋅ I
M
=
(E^-1_n ⋅ ... ⋅ E^-1₁) ⋅ I

Since elementary matrices and I are invertible, and invertible matrices are closed under matrix multiplication, M must be invertible, too.

Fact: A matrix M ∈ R^n×n is not invertible iff the bottom row of rref M has all zeroes. This is because when all rows of rref M ∈ R^n×n have at least one nonzero value, rref M must be the identity matrix (try putting a nonzero value on the bottom row, and see what the definition of reduced row echelon form implies about the rest of the matrix). Since rref M = I, this implies that M must then be invertible by the fact immediately above this one.

Fact: For any invertible matrix M ∈ R^n×n, the reduced row echelon form of M is I ∈ R^n×n.

We can show this is true using a proof by contradiction. Suppose that M is invertible, but rref M ≠ I. We know that rref M can be obtained via a finite number of elementary row operations E₁ ⋅ ... ⋅ E_n:

(E₁ ⋅ ... ⋅ E_n) M = rref M.

If rref M is not I, then the last row of rref M must consist of only zeroes. But because M is invertible, we have:

((E₁ ⋅ ... ⋅ E_n) M) M^-1
=
(rref M) ⋅ M^-1
E₁ ⋅ ... ⋅ E_n
=
(rref M) ⋅ M^-1

Since the product of elementary matrices is invertible, (rref M) ⋅ M^-1 is also invertible. But if the last row of rref M consists of only zeroes, then the last row of (rref M) M^-1 also contains only zeroes, so it cannot be that (rref M) M^-1 is invertible. Thus, we have a contradiction, so our assumption that rref M ≠ I is false.

The following table provides an alternative illustration of how the contradiction is derived:

M is invertible rref M ≠ I

the matrix M^-1 exists the last row of rref M is all zeroes

(E₁ ⋅ ... ⋅ E_n) M = rref M

((E₁ ⋅ ... ⋅ E_n) M) M^-1 = (rref M) ⋅ M^-1

E₁ ⋅ ... ⋅ E_n = (rref M) ⋅ M^-1 the last row of (rref M) ⋅ M^-1
is all zeroes

(rref M) ⋅ M^-1 is invertible
because it is a product of the
invertible matrices E₁, ..., E_n (rref M) ⋅ M^-1 is not invertible
because multiplication by it is
a many-to-one function

The above result implies the following fact.

Fact: If a matrix M is invertible, it is the product of a finite number of elementary matrices. This is because rref M is the identity, which is an elementary matrix, and M can be reduced via a finite number of invertible row operations to I. Thus, the elementary matrices can be used to generate every possible invertible matrix.

Example: Suppose that for some matrix M ∈ R^2×2, the following row operations can be applied to M (in the order specified) to obtain the identity matrix I:

add the bottom row to the top row;
swap the two rows;
multiply the bottom row by 1/3.

Find the matrix M^-1.

We know that performing the three row operations on M will result in I. Thus, we can write out the row operations as three matrices E₁, E₂, and E₃:
E₃ ⋅ E₂ ⋅ E₁ ⋅ M
=
I

1 0
0 1/3

⋅

0 1
1 0

⋅

1 1
0 1

⋅ M
=
I

Thus, we have that:
M^-1
=

1 0
0 1/3

⋅

0 1
1 0

⋅

1 1
0 1

=

0 1
1/3 1/3

We can also find M:
det M
=
−1/3
M
=
1/(−1/3) ⋅

1/3 −1
−1/3 0

=

−1 3
1 0

The following table summarizes the results.

fact justification

(1) {M | M is a finite product of elementary matrices} = {M | rref M = I} I is an elementary matrix;
sequences of row operations
are equivalent to multiplication by
elementary matrices

(2) {M | M is a finite product of elementary matrices} ⊂ {M | M is invertible} elementary matrices are invertible;
products of invertible matrices are invertible

(3) {M | rref M = I} ⊂ {M | M is invertible} fact (1) in this table;
fact (2) in this table;
transitivity of equality

(4) {M | M is invertible} ⊂ {M | rref M = I} proof by contradiction;
non-invertible M implies
rref M has all zeroes in bottom row

(5) {M | M is invertible} = {M | rref M = I} for any sets A,B,
A ⊂ B and B ⊂ A
implies A = B

(6) {M | M is a finite product of elementary matrices} = {M | M is invertible} fact (1) in this table;
fact (5) in this table;
transitivity of equality

Given all these results, we can say that the properties for a matrix M in the following table are all equivalent: if any of these is true, then all of them are true. If any of these is false, then all of them are false.

M is invertible

det M ≠ 0

the columns of M are (setwise) linearly independent

Mv = w has exactly one solution

M is a finite product of elementary matrices

Fact: For matrices M ∈ R^n×n, rref M is guaranteed to be upper triangular. Notice that this means that for some finite product of elementary matrices E₁ ⋅ ... ⋅ E_n, it is the case that

M = (E₁ ⋅ ... ⋅ E_n) ⋅ U.

If E₁ ,..., E_n are all lower triangular, then then M has an LU decomposition. However, this will not always be the case. But recall that E₁ ,..., E_n are all elementary matrices.

Fact: Given any product of elementary matrices E₁ ⋅ ... ⋅ E_n, it is possible to find a lower triangular matrix L by applying a finite number of elementary swap operations S₁ ,..., S_n such that L = S₁ ⋅ ... ⋅ S_n ⋅ E₁ ⋅ ... ⋅ E_n is lower triangular; then we have:

S₁ ⋅ ... ⋅ S_n ⋅ E₁ ⋅ ... ⋅ E_n

(S_n^-1 ⋅ ... ⋅ S₁^-1) ⋅ L

E₁ ⋅ ... ⋅ E_n

Thus, E₁ ⋅ ... ⋅ E_n can be decomposed into a lower triangular matrix and a product of elementary swap matrices.

Fact: Any product of a finite number of swap elementary matrices S₁ ,..., S_n is a permutation matrix.

Fact: Any matrix M can be written as the product of three matrices P ⋅ L ⋅ U where P is a permutation matrix, L is a lower-triangular matrix, and U is an upper-triangular matrix, where:

S₁ ⋅ ... ⋅ S_n

E₁ ⋅ ... ⋅ E_k

rref M

Example: Suppose we have a system in which a single object is traveling along one spatial dimension (i.e., in a straight line). The object has a distance travelled, a velocity, and an acceleration; these are the three dimensions of the system:

distance (m)
velocity (m/s)
acceleration (m/s²)

Consider the following equations that can be used to compute the distance the object travels and its final velocity given its acceleration a ∈ R⁺, initial velocity v₀ ∈ R⁺, and the amount of time t ∈ ∈ R⁺ it travels.
d
=
0.5 a t² + v₀ t
v
=
a t
Suppose we want to convert a view of the system state that describes the acceleration and velocity at time 0 into a view of the system state that represents the distance travelled and velocity at time t. This conversion operation can be represented using a matrix M.
M
=

0.5 t² dist. at t+1 / accel. at t t dist. at t+1 / vel. at t
t vel. at t+1 / accel. at t 1 vel. at t+1 / vel. at t

=

0.5 t² m / (m/s²) t m / (m/s)
t (m/s) /(m/s²) 1 (m/s)/(m/s)

=

0.5 t² s² t s
t s 1 scalar identity

This matrix is invertible. This immediately tells us that it is possible to derive the initial acceleration and velocity given only the amount of time that has elapsed, the distance travelled, and the final velocity.

By computing the inverse of this matrix, we can obtain formulas that allow us to derive the initial velocity and acceleration of an object given how much time has passed, how far it has travelled, and its current velocity.
M^-1
=

-2/t² accel. at t / dist. at t+1 t accel. at t / vel. at t+1
t vel. at t / dist. at t+1 1 vel. at t / vel. at t+1

=

-2/t² (m/s²)/m 2/t (m/s²) / (m/s)
2/t (m/s) / m -1 (m/s)/(m/s)

=

-2/t² 1/s² 2/t 1 / s
2/t 1 / s -1 scalar identity

The formulas can be obtained by multiplying M^-1 by a system state describing the distance travelled and current velocity. This yields:
a
=
-2d/t² + 2v/t
v₀
=
2d/t - v
We could have also obtained the above formulas using manipulation of equations of real numbers, or via the corresponding row operations. Let us consider the latter approach:

0.5 t² t
t 1

⋅

a
v₀

=

d
v

0.5 t² t
t − (2/t ⋅ 0.5 t²) 1 − (2/t ⋅ t)

⋅

a
v₀

=

d
v − d ⋅ (2/t)

0.5 t² t
0 1 − 2

⋅

a
v₀

=

d
v − (2d/t)

0.5 t² t
0 1

⋅

a
v₀

=

d
(2d/t) − v

0.5 t² − (t ⋅ 0) t − (t ⋅ 1)
0 1

⋅

a
v₀

=

d − (t ⋅ ((2d/t) − v))
(2d/t) − v

0.5 t² 0
0 1

⋅

a
v₀

=

− d + v t
(2d/t) − v

1 0
0 1

⋅

a
v₀

=

(2/t²) ⋅ (− d + v t)
(2d / t) − v

a
v₀

=

−2d/t² + v/t
2d / t − v

3.9. Matrix transpose

Definition: The transpose of a matrix M ∈ R^n×n, denoted M^⊤, is defined to be A such that for all i and j, A_ij = M_ji.

Fact: If a matrix M is scalar, diagonal, or symmetric, M^⊤ = M. If a matrix M is upper triangular, M^⊤ is lower triangular (and vice versa).

Example: Below are some examples of matrices and their transposes:

1	0
0	1

^⊤

1	0
0	1

a	b
c	d

^⊤

a	c
b	d

1	2
3	4
5	6

^⊤

1	3	5
2	4	6

1	0	0
2	3	0
4	5	6

^⊤

1	2	4
0	3	5
0	0	6

Fact: For A,B ∈ R^n×n, it is always the case that:

(A^⊤)^⊤

(A + B)^⊤

A^⊤ + B^⊤

s(B^⊤)

(sB)^⊤

Fact: It is always the case that (AB)^⊤ = B^⊤ A^⊤. If M = AB, then:

(AB)_ij

ith row of A ⋅ jth column of B

ith column of A^⊤ ⋅ jth row of B^⊤

jth row of B^⊤ ⋅ ith column of A^⊤

(B^⊤ A^⊤)_ji.

Example: We consider the following example with matrices in R^{3× 3}. Let a, b, c, x, y, and z be vectors in R³, with v_i representing the ith entry in a vector v. Suppose we have the following product of two matrices. Notice that a, b, and c are the rows of the left-hand matrix, and x, y, and z are the columns of the right-hand matrix.

a₁	a₂	a₃
b₁	b₂	b₃
c₁	c₂	c₃

⋅

x₁	y₁	z₁
x₂	y₂	z₂
x₃	y₃	z₃

(a₁, a₂, a₃) ⋅ (x₁, x₂, x₃)	(a₁, a₂, a₃) ⋅ (y₁, y₂, y₃)	(a₁, a₂, a₃) ⋅ (z₁, z₂, z₃)
(b₁, b₂, b₃) ⋅ (x₁, x₂, x₃)	(b₁, b₂, b₃) ⋅ (y₁, y₂, y₃)	(b₁, b₂, b₃) ⋅ (z₁, z₂, z₃)
(c₁, c₂, c₃) ⋅ (x₁, x₂, x₃)	(c₁, c₂, c₃) ⋅ (y₁, y₂, y₃)	(c₁, c₂, c₃) ⋅ (z₁, z₂, z₃)

a ⋅ x	a ⋅ y	a ⋅ z
b ⋅ x	b ⋅ y	b ⋅ z
c ⋅ x	c ⋅ y	c ⋅ z

Suppose we take the transpose of both sides of the equation above. Then we would have:

(

a₁	a₂	a₃
b₁	b₂	b₃
c₁	c₂	c₃

⋅

x₁	y₁	z₁
x₂	y₂	z₂
x₃	y₃	z₃

)^⊤

a ⋅ x	a ⋅ y	a ⋅ z
b ⋅ x	b ⋅ y	b ⋅ z
c ⋅ x	c ⋅ y	c ⋅ z

^⊤

a ⋅ x	b ⋅ x	c ⋅ x
a ⋅ y	b ⋅ y	c ⋅ y
a ⋅ z	b ⋅ z	c ⋅ z

x ⋅ a	x ⋅ b	x ⋅ c
y ⋅ a	y ⋅ b	y ⋅ c
z ⋅ a	z ⋅ b	z ⋅ c

x₁	x₂	x₃
y₁	y₂	y₃
z₁	z₂	z₃

⋅

a₁	b₁	c₁
a₂	b₂	c₂
a₃	b₃	c₃

x₁	y₁	z₁
x₂	y₂	z₂
x₃	y₃	z₃

^⊤ ⋅

a₁	a₂	a₃
b₁	b₂	b₃
c₁	c₂	c₃

^⊤

Fact: If A is invertible, then so is A^⊤. This can be proven using the fact directly above.

HTML

text

load into verifier

∀ A,B ∈ R^2×2,
(A) is invertible
implies

(A⁻¹) ⋅ A

1	0
0	1

A^⊤ ⋅ (A⁻¹)^⊤

((A⁻¹) ⋅ A)^⊤

1	0
0	1

^⊤

1	0
0	1

\forall A,B \in \R^(2 \times 2),
    `(A) is invertible` 
  \implies
   
   (A^(-1)) * A  =   [1 , 0; 0 , 1] 
   A^\t * (A^(-1))^\t  =  ((A^(-1)) * A)^\t  
           ``  =  [1,0;0,1]^\t  
           ``  =  [1,0;0,1]

Fact: det A = det A^⊤. We can see this easily in the A ∈ R^2×2 case.

HTML

text

load into verifier

∀ a,b,c,d ∈ R,

det

a	b
c	d

a d−b c

a d − c b

det

a	c
b	d

det

a	b
c	d

det

a	c
b	d

\forall a,b,c,d \in \R,
  
  \det [a,b;c,d]  =  a d-b c 
              ``  =  a d - c b  
              ``  =  \det [a,c;b,d]  
  \det [a,b;c,d]  =  \det [a,c;b,d]

3.10. Orthogonal matrices

Definition: A matrix M ∈ R^n×n is orthogonal iff M^⊤ = M^-1.

Fact: The columns of orthogonal matrices are always orthogonal unit vectors. The rows of orthogonal matrices are always orthogonal unit vectors. We can see this in the R^2×2 case. For columns, we use the fact that M^⊤ ⋅ M = I:

a	b
c	d

^⊤

a	b
c	d

1	0
0	1

a	c
b	d

a	b
c	d

1	0
0	1

(a,c) ⋅ (a,c)	(a,c) ⋅ (b,d)
(b,d) ⋅ (a,c)	(b,d) ⋅ (b,d)

1	0
0	1

For rows, we use that M ⋅ M^⊤ = I:

a	b
c	d

a	b
c	d

^⊤

1	0
0	1

a	b
c	d

a	c
b	d

1	0
0	1

(a,b) ⋅ (a,b)	(a,b) ⋅ (c,d)
(c,d) ⋅ (a,b)	(c,d) ⋅ (c,d)

1	0
0	1

Below, we provide a verifiable argument of the above fact for the R^2×2 case.

HTML

text

load into verifier

∀ a,b,c,d ∈ R,

a	b
c	d

⋅

a	b
c	d

^⊤

1	0
0	1

implies

a	b
c	d

⋅

a	c
b	d

1	0
0	1

a a + b b	a c + b d
c a + d b	c c + d d

1	0
0	1

a a + b b

⋅

(

) is a unit vector

c c + d d

⋅

(

) is a unit vector

a c + b d

⋅

(

) and (

) are orthogonal

\forall a,b,c,d \in \R,
    
    [a,b;c,d] * [a,b;c,d]^\t  =  [1,0;0,1]
    
  \implies
    
    [a,b;c,d] * [a,c;b,d]  =  [1,0;0,1]  
    [a a + b b, a c + b d; c a + d b, c c + d d]  =  [1,0;0,1]  

    a a + b b  =  1  
    [a;b] * [a;b]  =  1  
    ||[a;b]||  =  1  
    
    
    `([a;b]) is a unit vector`  
    
    
    c c + d d  =  1  
    [c;d] * [c;d]  =  1 
    ||[c;d]||  =  1  
    
    
    `([c;d]) is a unit vector`  
    
    
    a c + b d  =  0  
    [a;b] * [c;d]  =  0  
    
    
    `([a;b]) and ([c;d]) are orthogonal`

Fact: Matrices representing rotations and reflections are orthogonal. We can show this for the general (counterclockwise) rotation matrix:

cos θ sin θ
-sin θ cos θ

cos θ sin θ
-sin θ cos θ

^⊤
=

cos θ sin θ
-sin θ cos θ

cos θ -sin θ
sin θ cos θ

=

cos ² θ + sin ² θ -cos θ sin θ + sin θ cos θ
-sin θ cos θ + cos θ sin θ sin ² θ + cos ² θ

=

1 0
0 1

.

Example: Suppose that the hour hand of an animated 12-hour clock face is represented using a vector [ x ; y ]. To maintain the time, the coordinates [ x ; y ] must be updated once every hour by applying a matrix M to the vector representing the hour hand. What is the matrix M?

Since the hour hand must be rotated by 360/12 = 30 degrees in the clockwise direction while the rotation matrix represents counterclockwise rotation, we actually want to find the rotation matrix for 360 - (360/12) = 330 degrees.

Thus, M must be the rotation matrix for θ = 2π - (2π / 30), where θ is specified in radians. Thus, we have:
cos (330/360 ⋅ 2π)
=
(√ 3)/2
sin (330/360 ⋅ 2π)
=
−1/2
M
=

cos (330/360 ⋅ 2π) −sin (330/360 ⋅ 2π)
sin (330/360 ⋅ 2π) cos (330/360 ⋅ 2π)

=

(√ 3)/2 1/2
−1/2 (√ 3)/2

Example: You are instructed to provide a simple algorithm for drawing a spiral on the Cartesian plane. The spiral is obtained using counterclockwise rotation, and for every 360 degree turn of the spiral, the spiral arm's distance from the origin should double. Provide a matrix M ∈ R^2×2 that will take any point [ x ; y ] on the spiral and provide the next point [ x' ; y' ] after θ radians of rotation (your definition of M should contain θ).

It is sufficient to multiply a rotation matrix by a scalar matrix that scales a vector according to the angle of rotation. If the angle of rotation is θ, then the scale factor s should be such that:
s^{((2π) / θ)}
=
2
s
=
2^{1 / ((2π) / θ)}
s
=
2^θ/(2π)
In the above, if n = π/θ then s is the nth root of 2. Thus M would be:

2^θ/(2π) 0
0 2^θ/(2π)

⋅

cos θ −sin θ
sin θ cos θ

=

2^θ/(2π) cos θ − (2^θ/(2π)) sin θ
2^θ/(2π) sin θ 2^θ/(2π) cos θ

Fact: Orthogonal matrices are closed under multiplication. For orthogonal A, B ∈ R^n×n we have:

(AB)^⊤
=
B^⊤ A^⊤
=
B^-1 A^-1
=
(AB)^-1.

Fact: Orthogonal matrices are closed under inversion and transposition. For orthogonal A ∈ R^n×n we have:

(A^-1)^⊤
=
(A^⊤)^-1.

Thus, we can show that both A^-1 and A^⊤ are orthogonal.

(A^-1)^⊤
=
(A^-1)^-1
(A^⊤)^⊤
=
A.

(A^⊤)^⊤
=
(A^⊤)^-1

=
A.

We can summarize this by adding another row to our table of matrix subsets.

subset of R^n×n definition closed under
matrix
multiplication properties of
matrix multiplication inversion

orthogonal matrices M^⊤ = M^-1 closed associative,
distributive with addition,
have identity nonzero members
have inverses;
closed under inversion

Example: Let A ∈ R^2×2 be an orthogonal matrix. Compute A A^⊤ A.

We recall that because A is orthogonal, A^⊤ = A^-1. Thus, we have that

A A^⊤ A = (A A^-1) A = I A = A.

Fact: Suppose the last row of a matrix M ∈ R^n×n consists of all zeroes. Show that M is not invertible.

Suppose M is invertible. Then M^⊤ is invertible. But the last column of M is all zeroes, which means it is a trivial linear combination of the other column vectors of M. Thus, we have a contradiction, so M cannot be invertible.

3.11. Matrix rank

Yet another way to characterize matrices is by considering their rank.

Definition: We can define a the rank of a matrix M, rank M, as the number of nonzero rows in rref M.

We will see in this course that rank M is related in many ways to the various characteristics of a matrix.

Fact: Matrix rank is preserved by elementary row operations. Why is this the case? Because rref is idempotent and rank is defined in terms of it:

rref M
=
rref (rref M)
number of nonzero rows in rref M
=
number of nonzero rows in rref (rref M)
rank (M)
=
rank (rref M)

Fact: If M ∈ R^n×n is invertible, rank M = n. How can we prove this? Because all invertible matrices have rref M = I, and I has no rows with all zeroes. If I ∈ R^n×n, then I has n nonzero rows.

Fact: For A ∈ R^n×n, rank A = rank (A^⊤).

We will not prove the above in this course in general, but typical proofs involve first proving that for all A ∈ R^n×n, rank A ≤ rank (A^⊤). Why is this sufficient to complete the proof? Because we can apply this fact for both A and A^⊤ and use the fact that (A^⊤)^⊤ = A:

rank(A)
≤
rank(A^⊤)
rank(A^⊤)
≤
rank((A^⊤)^⊤)
rank(A^⊤)
≤
rank(A)
rank(A^⊤)
=
rank(A)

To get some intuition about the previous fact, we might consider the following question: if the columns of a matrix M ∈ R^2×2 are linearly independent, are the rows also linearly independent?

Fact: The only matrix in R^n×n that has no nonzero rows in reduced row echelon form is I.

Fact: If a matrix M ∈ R^n×n has rank n, then it must be that rref M = I (and, thus, M is invertible). This is derived from the fact above.

Fact: Invertible matrices are closed under the transposition operation. In other words, if M is invertible, then M^⊤ is invertible. How can we prove this using the rank operator? We know that rank is preserved under transposition, so we have:

n
=
rank(A)
=
rank(A^⊤)
=
n

Thus, the rank of rank(A^⊤) is n, so it is invertible.

The following table summarizes the important facts about the rank and rref operators.

rank (rref M) = rank M

rref (rref M ) = rref M

rank (M^⊤) = rank M

for M ∈ R^n×n, M is invertible iff rank M = n

Review 1. Vector and Matrix Algebra and Applications

The following is a breakdown of what you should be able to do at this point in the course (and of what you may be tested on in an exam). Notice that many of the tasks below can be composed. This also means that many problems can be solved in more than one way.

vectors
- definitions and algebraic properties of scalar and vector operations (addition, multiplication, etc.)
- vector properties and relationships between vectors
  - dot product of two vectors
  - norm of a vector
  - unit vectors
  - orthogonal projection of a vector onto another vector
  - orthogonal vectors
  - linear dependence of two vectors
  - linear independence of two vectors
  - linear combinations of vectors
  - linear independence of three vectors
- lines and planes
  - line defined by a vector and the origin ([0; 0])
  - line defined by two vectors
  - line in R² defined by a vector orthogonal to that line
  - plane in R³ defined by a vector orthogonal to that plane
matrices
- algebraic properties of scalar and matrix multiplication and matrix addition
- collections of matrices and their properties (e.g., invertibility, closure)
  - identity matrix
  - elementary matrices
  - scalar matrices
  - diagonal matrices
  - upper and lower triangular matrices
  - matrices in reduced row echcelon form
  - determinant of a matrix in R^2×2
  - inverse of a matrix and invertible matrices
- other matrix operations and properties
  - determine whether a matrix is invertible
    - using the determinant for matrices in R^2×2
    - using facts about rref for matrices in R^n×n
  - algebraic properties of matrix inverses with respect to matrix multiplication
  - transpose of a matrix
    - algebraic properties of transposed matrices with respect to matrix addition, multiplication, and inversion
  - matrix rank
- matrices in applications
  - solve an equation of the form LU = w
  - matrices and systems of states
    - interpret partial observations of system states as vectors
    - interpret relationships betweem dimensions in a system of states as a matrix
    - given a partial description of a system state and a matrix of relationships, find the full description of the system state
    - interpret system state transitions/transformations over time as matrices
      - population growth/distributions over time
        
        compute the system state after a specifieds amount of time
        
        find the fixed point of a transition matrix

Below is a comprehensive collection of review problems going over the course material covered until this point. These problems are an accurate representation of the kinds of problems you may see on an exam.

Example: Find any h ∈ R such that the following two vectors are linearly independent.

-5

-2

-20

There are many ways to solve this problem. One way is to use the definition of linear dependence and find an h that does not satisfy it.

-5

-2

-20

Then, we have that any h such that h ≠ -8 is sufficient to contradict linear dependence (and, thus, imply linear independence):

s ⋅ 5

s ⋅ (-2)

-8

Another solution is to recall that orthogonality implies linear independence. Thus, it is sufficient to find h such that the two vectors are orthogonal.

-5

-2

⋅

-20

This implies h = 100.

5(20) + (-5)(-20) + (-2)h

100

Example: Suppose we have a matrix M such that the following three equations are true:

-2

-3

Compute the following:

-1

We should recall that multiplying a matrix by a canonical unit vector with 1 in the ith row in the vector gives us the ith column of the matrix. Thus, we can immediately infer that:
M
=

3 3 -3
-2 0 1

Thus, we have:
M ⋅

2
1
-1

=

(3,3,-3) ⋅ (2,1,-1)
(-2,0,1) ⋅ (2,1,-1)

=

12
-5

Example: List at least three properties of the following matrix:

0	1	0
1	0	0
0	0	1

The matrix has many properties, such as:

it is an elementary matrix
it is an invertible matrix
it is an orthogonal matrix
it is a symmetric matrix
it has rank n
its reduced row echelon form is the identity

Example: Find the matrix B ∈ R^2×2 that is symmetric, has a constant diagonal (i.e., all entries on the diagonal are the same real number), and satisfies the following equation:

We know that B is symmetric and has a constant diagonal, so we need to solve for a and b in:

a b
b a

⋅

2
1

=

15
6

Example: Compute the inverse of the following matrix:

2	3
1	2

One approach is to set up the following equation and solve for a,b,c, and d:

a b
c d

⋅

2 3
1 2

=

1 0
0 1

Another approach is to apply the formula for the inverse of a matrix in R^2×2:

2 3
1 2

^-1
=
(1 / (2 ⋅ 2 - 3 ⋅ 1)) ⋅

2 −3
−1 2

=

2 −3
−1 2

Example: Let a ∈ R be such that a ≠ 0. Compute the inverse of the following matrix:

a	-a
-a	-a

As in the previous problem, we can either solve an equation or apply the formula:

a -a
-a -a

^-1
=
(1 / (-a² - a²)) ⋅

-a a
a a

Example: Suppose x ∈ R is such that x ≠ 0. Compute the inverse of the following matrix:

x	0	-2x
0	x	0
0	0	4x

Because we have an upper triangular matrix, computing the inverse by solving the following equation is fairly efficient. Start by considering the bottom row and its product with each of the columns. This will generate the values for g, h, and i. You can then proceed to the other rows.

x 0 -2x
0 x 0
0 0 4x

a b c
d e f
g h i

=

1 0 0
0 1 0
0 0 1

The solution is:

1/x	0	1/(2x)
0	1/x	0
0	0	1/4x

Example: Assume a matrix M ∈ R^2×2 is symmetric, has constant diagonal, and all its entries are nonzero. Show that A cannot be an orthogonal matrix.

Let a ≠ 0 and b ≠ 0, and let us define:

M
=

a b
b a

.

Suppose that M is an orthogonal matrix; then, we have that:

M^-1
=
M^⊤
M ⋅ M^⊤
=
I

a b
b a

⋅

a b
b a

=

1 0
0 1

.

Thus, we have ba + ab = 0, so 2ab = 0. This implies that either a = 0 or b = 0. But this contradicts the assumptions, so M cannot be orthogonal.

Example: Suppose the Earth is located at the origin and your spaceship is in space at the location corresponding to the vector [ -5 ; 4; 2 ]. Earth is sending transmissions along the vector [ √(3)/3 ; √(3)/3 ; √(3)/3 ]. What is the shortest distance your spaceship must travel in order to hear a transmission from Earth?

By the triangle inequality, for any vector v ∈ R³, the closest point to v on a line is the orthogonal projection of v onto that line. We need to find the distance from the spaceship's current position to that point.

We first compute the orthogonal projection of [ -5 ; 4; 2 ] onto the line L defined as follows:
L
=
{ a ⋅

√(3)/3
√(3)/3
√(3)/3

| a ∈ R}
We can compute the orthogonal projection using the formula for an orthogonal projection of one vector onto another. Notice that the vector specifying the direction of transmission is already a unit vector:
||

√(3)/3
√(3)/3
√(3)/3

||
=
1

-5
4
2

⋅

√(3)/3
√(3)/3
√(3)/3

=
√(3)/3
(v ⋅ u/||u||) ⋅ u/||u||
=
√(3)/3 ⋅

√(3)/3
√(3)/3
√(3)/3

=

1/3
1/3
1/3

Now, we must compute the distance between the destination above and the spaceship's current position:
||

1/3
1/3
1/3

-

-5
4
2

||
=
||

16/3
-11/3
-5/3

||

=
√((16/3)² + (-11/3)² + (-5/3)²)

=
√(256/9 + 121/9 + 25/9)

=
√(402)/3
Thus, the shortest distance is √(402)/3.

Example: Two communication towers located at v ∈ R² and u ∈ R² are sending directed signals to each other (it is only possible to hear the signal when in its direct path). You are at the origin and want to travel the shortest possible distance to intercept their signals. What vector specifies the distance and direction you must travel to intercept their signals?

We can solve this problem by recognizing that the closest point to the origin on the line along which the signals can be intercepted is the vector that is orthogonal to the line (i.e., it is the orthogonal projection of the origin onto the line). The definition of the line is:
L
=
{ a ⋅ (u - v) + v | a ∈ R}
Thus, the point p ∈ R² to which we want to travel must be both on the line and orthogonal to the line (i.e., its slope must be orthogonal to any vector that represents the slope of the line, such as u - v). In other words, it must satisfy the following two equations:
p
=
a ⋅ (u - v) + v
p ⋅ (u - v)
=
0
Thus, it is sufficient to solve the above system of equations for p.

4. Vector Spaces

We are interested in studying sets of vectors because they can be used to model sets of system states, observations and data that might be obtained about systems, geometric shapes and regions, and so on. We can then represent real-world problems (e.g., given some observations, what is the actual system state) as equations of the form M ⋅ v = w, and sets of vectors as the collections of possible solutions to those equations. But what is exactly the set of possible solutions to M ⋅ v = w? Can we characterize it precisely? Can we define a succinct notation for it? Can we say anything about it beyond simply solving the equation? Does this tell us anything about our system?

In some cases, it may make more sense to consider only a finite set of system states, or an infinite set of discrete states (i.e., only vectors that contain integer components); for example, this occurs if vectors are used to represent the number of atoms, molecules, cows, chickens, power plants, single family homes, and so on. However, in this course, we make the assumption that our our sets of system states (our models of systems) are infinite and continuous (i.e., not finite and not discrete); in this context, this simply means that the entries in the vectors we use to represent system states are real numbers.

Notice that the assumption of continuity means that, for example, if we are looking for a particular state (e.g., a state corresponding to some set of observations), we allow the possibility that the state we find will not correspond exactly to a state that "makes sense". Consider the example problem involving the barn of cows and chickens. Suppose we observe 4 heads and 9 legs. We use the matrix that represents the relationships between the various dimensions of the system to find the number of cows and chickens:

1 1
2 4

⋅

x chickens
y cows

=

4 heads
9 legs

x
=
3.5
y
=
0.5
Notice that the solution above is not an integer solution; yet, it is a solution to the equation we introduced because the set of system states we are allowing in our solution space (the model of the system) contains all vectors in R², not just those with integer entries.

As we did with vectors and matrices, we introduce a succinct language (consisting of symbols, operators, predicates, terms, and formulas) for infinite, continuous sets of vectors. And, as with vectors and matrices, we study the algebraic laws that govern these symbolic expressions.

4.1. Sets of vectors and their notation

We will consider three kinds of sets of vectors in this course; they are listed in the table below.

kind of set (of vectors) maximum
cardinality
("quantity of
elements") solution space of a... examples

finite set of vectors finite

{(0,0)}

{(2,3),(4,5),(0,1)}

vector space infinite homogeneous system of
linear equations:
M ⋅ v = 0

{(0,0)}

R

R²

span{(1,2),(2,3),(0,1)}

any point, line, or plane
intersecting the origin

affine space infinite nonhomogeneous
system of
linear equations:
M ⋅ v = w

{ a + v | v ∈ V} where
V is a vector space and a is a vector

any point, line, or plane

To represent finite sets of vectors symbolically, we adopt the convention of simply listing the vectors between a pair of braces (as with any set of objects). However, we need a different convention for symbolically representing vector spaces and affine spaces. This is because we must use a symbol of finite size to represent a vector space or affine space that may contain an infinite number of vectors.

If the solution spaces to equations of the form M ⋅ v = w are infinite, continuous sets of vectors, in what way can be characterize them? Suppose that M ∈ R^2×2 and that is M invertible. Then we have that:
M
=

a b
c d

w
=

s
t

M ⋅ v
=
w

a b
c d

⋅ v
=

s
t

v
=
(1/(ad-bc)) ⋅

d −b
−c a

⋅

s
t

v
=
(s/(ad−bc)) ⋅

d
−c

+ (t/(ad−bc)) ⋅

−b
a

Notice that the set of possible solutions v is a linear combination of two vectors in R². In fact, if a collection of solutions (i.e., vectors) to the equation M ⋅ v = 0 exists, it must be a set of linear combinations (in the more general case of M ⋅ v = w, it is a set of linear combinations with some specified offset). Thus, we introduce a succinct notation for a collection of linear combinations of vectors.

Definition: A span of a set of vectors { v₁, ..., v_n } is the set of all linear combinations of vectors in { v₁, ..., v_n }:

span { v₁, ..., v_n } = { a₁ ⋅ v₁ + ... + a_n ⋅ v_n | a₁ ∈ R, ..., a_n ∈ R }

Example: For each of the following spans, expand the notation into the equivalent set comprehension and determine if the set of vectors is a point, a line, a plane, or a three-dimensional space.

The set defined by:
span {

1
2

}
=
{a ⋅

1
2

| a ∈ R}
The above span is a line.
The set defined by:
span {

1
-2

,

−2
4

}
=
span {

1
−2

}
=
{a ⋅

1
−2

| a ∈ R}
The above span is a line.
The set defined by:
span {

0
1

,

1
0

,

1
1

}
=
span {

0
1

,

1
0

}
=
{a ⋅

1
0

+ b ⋅

0
1

| a,b ∈ R}
=
R²
The above span is a plane.
The set defined by:
span {

0
0

}
=
{

0
0

}
The above span is a set containing a single point (the origin).
The set defined by:
span {

0
1
0

,

1
0
0

,

0
0
1

}
=
{a ⋅

0
1
0

+ b ⋅

1
0
0

+ c ⋅

0
0
1

| a,b,c ∈ R}
=
R³
The above span is a three-dimensional space.

Example: Using span notation, describe the set of vectors that are orthogonal to the vector [ 1 ; 2 ] ∈ R².

We know that the set of vectors orthogonal to [ 1 ; 2 ] can be defined as the line L where:
L
=
{

x
y

|

x
y

⋅

1
2

= 0 }
It suffices to find a vector on the the line L. The equation for the line is:

x
y

⋅

1
2

=
0
1 ⋅ x + 2 ⋅ y
=
0
y
=
(-1/2) x
We can choose any point on the line y = (-1/2) x; for example, [ 2 ; -1 ]. Then, we have that:
L
=
span {

2
-1

}

Example: Using span notation, describe the set of solutions to the following matrix equation:

1 2
2 4

⋅

x
y

=

0
0

Notice that the above equation implies two equations:

1
2

⋅

x
y

=
0

2
4

⋅

x
y

=
0
In fact, the second equation provides no additional information because 2 ⋅ [ 1 ; 2 ] = [ 2 ; 4 ]. Thus, the solution space is the set of vectors:
L
=
{

x
y

|

1
2

⋅

x
y

= 0 }
We can now use our solution to the previous problem to find a span notation for the solution space:
L
=
span {

2
-1

}

Example: Suppose also that system states are described by vectors v ∈ R³ with the following units for each dimension:

x # carbon atoms

y # hydrogen atoms

z # oxygen atoms

Individual molecules are characterized using the following vectors:

C₃H₈:

, O₂:

, CO₂:

, H₂O:

Suppose that a mixture contains only molecules of water (H₂O) and carbon dioxide (CO₂). Using the span notation, specify the possible set of system states that satisfy these criteria.

The possible set of system states for the mixture is:
V
=
span {

1
0
2

,

0
2
1

}
Recall that the above set is continuous. In this particular problem, only vectors with integer components would be of interest; thus, the above set is at least guaranteed to contain all possible system states that satisfy the specified criteria.

Then, if we wanted to solve a more constrained problem that satisfies the above criteria, we would know that we should only consider vectors v ∈ V as possible solutions.
Suppose we observe the following system state (in terms of the number of each kind of atom):
v
=

100
400
400

How can we determine whether the above mixture could contain only H₂0 and CO₂?
Suppose we want to know if a mixture of only C₃H₈ and O₂ can ever react to produce a mixture of only CO₂ and H₂O. How can we represent this as an equation of spans? This problem can be represented as follows:
span {

3
8
0

,

0
0
2

}
=
span {

1
0
2

,

0
2
1

}

Recall the definition for what constitutes a vector space (there are many equivalent definitions).

Definition: A vector space is a set of vectors that contains 0, is closed under vector addition and scalar multiplication, and is such that all the elements in the set satisfy the vector space axioms governing vector addition and scalar multiplication.

Fact: Any set of linear combinations of a collection of vectors is closed under vector addition and scalar multiplication, contains 0, and satisfies the vector space axioms. In other words, for any collection of vectors v₁, ..., v_n, span{v₁, ..., v_n} is a vector space.

Fact: For any set of vectors V ⊂ Rⁿ that satisfies the vector space axioms, there exists a finite set of at most n vectors v₁, ..., v_k (where k ≤ n) such that span{v₁, ..., v_k} = V.

Given the two facts above, we can safely adopt span{v₁, ..., v_n} as a standard notation for vector spaces. In addition to this notation, we will also often use the notation Rⁿ for specific values of n (e.g., R² is equivalent to span{[1;0],[0;1]}), and {0} for specific vector 0 (e.g., {[0;0]} is equivalent to span{[0;0]}).

4.2. Membership and equality relations involving sets of vectors

Recall that many of the algebraic laws we saw governing operators on vectors and matrices involved equality of vectors and matrices. In fact, one can view these laws as collectively defining the semantic equality of the symbols (i.e., they specify when two symbols refer to the same object).

The meaning of symbols is closely tied to the equality relation we define over them. In the case of the span notation, one potential problem is that there is more than one way to describe a set using span notation. For example:
span {

2
3

}
=
span {

4
6

}
span {

1
0

,

0
1

}
=
span {

2
1

,

0
-1

}
How can we determine if two spans are equivalent? We must define the equality relation (i.e., the relational operator) that applies to sets of vectors (including infinite sets containing all linear combinations of vectors). We first recall the definition of equality for our vector notation.

Fact: Recall that two vectors v, w ∈ Rⁿ are equal if and only if their components are equivalent:

a₁

a₂

⋮

a_n

b₁

b₂

⋮

b_n

iff

a₁ = b₁ and a₂ = b₂ and ... and a_n = b_n

With the equality of pairs of vectors defined, it is possible to provide a definition of equality for sets of vectors (both finite and infinite). However, we will build the definition gradually. First, let us define when a vector is a member of a finite set of vectors using vector equality.

Fact: A vector v ∈ Rⁿ is a member of the finite set of vectors { w₁, ..., w_k } ⊂ Rⁿ if it is equivalent to one of the vectors in the set:

v ∈ { w₁, ..., w_k }

iff

∃ w ∈ { w₁, ..., w_k } s.t. v = w

Notice that the above defines membership of v in the set using an equation (in this case, v = w_i where i ∈ {1,...,k}). We can take the same approach in the case of an infinite set of vectors.

Fact: A vector v ∈ Rⁿ is a member of the set of vectors span { w₁, ..., w_k } ⊂ Rⁿ if it is equivalent to one of the vectors in the set:

v ∈ span { w₁, ..., w_k }

iff

∃ w ∈ span { w₁, ..., w_k } s.t. v = w

iff

∃ w ∈ { a₁ ⋅ w₁ + ... + a_k ⋅ w_k | a₁ ∈ R, ..., a_n ∈ R } s.t. v = w

iff

∃ a₁ ⋅ w₁ + ... + a_k ⋅ w_k ∈ span { w₁, ..., w_k } s.t. v = a₁ ⋅ w₁ + ... + a_k ⋅ w_k

iff

∃ a₁ ∈ R ,..., a_n ∈ R s.t. v = a₁ ⋅ w₁ + ... + a_n ⋅ w_n

iff

∃

a₁

⋮

a_k

∈ R^k s.t.

↑		↑
w₁	...	w_k
↓		↓

⋅

a₁

⋮

a_k

= v

Thus, we can determine membership of a vector in a span by solving a matrix equation of the form M ⋅ u = v.

Example: Determine whether the following formula is true:

-1

∈

span {

-1

}

We proceed as in the fact above:

-1

∈ span {

-1

}

iff

∃ w ∈ span {

-1

} s.t.

-1

= w

iff

∃ w ∈ { a ⋅

+ b ⋅

-1

| a ∈ R, b ∈ R } s.t.

-1

= w

iff

∃ a ⋅

+ b ⋅

-1

∈ span {

-1

} s.t.

-1

= a ⋅

+ b ⋅

-1

iff

∃ a ∈ R, b ∈ R s.t.

-1

= a ⋅

+ b ⋅

-1

iff

∃

∈ R² s.t.

1	1
4	-1

⋅

-1

Since a solution to the matrix equation exists, the formula is true:

1	1
4	-1

⋅

-1

-2

Example: Solve each of the following problems.

Determine whether the following formula is true or false:

2
5
-1

∈ span {

4
0
-2

,

0
1
0

}
Define the following line using span notation:
L
=
{

x
y

| y = -3 ⋅ x }

Now suppose we want to determine if a finite set of vectors is a subset of a span. Let us first consider how we would check if a finite set of vectors is a subset of another finite set of vectors.

Fact: A finite set of vectors { v₁, ..., v_j } ⊂ Rⁿ is a subset of the finite set of vectors { w₁, ..., w_k } ⊂ Rⁿ if every member of { v₁, ..., v_j } is a member of { w₁, ..., w_k }:

{ v₁, ..., v_j } ⊂ { w₁, ..., w_k }

iff

∀ v ∈ { v₁, ..., v_j }, ∃ w ∈ { w₁, ..., w_k } s.t. v = w

Thus, we can generalize the above by replacing the second finite set of vectors with a span.

Fact: A finite set of vectors { v₁, ..., v_j } ⊂ Rⁿ is a subset of the set of vectors span { w₁, ..., w_k } ⊂ Rⁿ if every member of { v₁, ..., v_j } is a member of span { w₁, ..., w_k }:

{ v₁, ..., v_j } ⊂ span { w₁, ..., w_k }

iff

∀ v ∈ { v₁, ..., v_j }, ∃ w ∈ span { w₁, ..., w_k } s.t. v = w

iff

∀ v ∈ { v₁, ..., v_j }, ∃

a₁

⋮

a_k

∈ R^k s.t.

↑		↑
w₁	...	w_k
↓		↓

⋅

a₁

⋮

a_k

= v

iff

∃

a₁₁	...	a_1j
⋮		⋮
a_1k	...	a_kj

∈ R^{k × j} s.t.

↑		↑
w₁	...	w_k
↓		↓

⋅

a₁₁	...	a_1j
⋮		⋮
a_1k	...	a_kj

↑		↑
v₁	...	v_j
↓		↓

Fact: For any finite set of vectors { v₁, ..., v_j } ⊂ Rⁿ, for any real number scalars a₁ ∈ R, ..., a_j ∈ R, we have that:

{ v₁, ..., v_j } ⊂ span { w₁, ..., w_k } implies a₁ ⋅ v₁ + ... + a_j ⋅ v_j ∈ span { w₁, ..., w_k }

We can see the above is true, because:

v₁

s₁₁ ⋅ w₁ + ... + s_k1 ⋅ w_k

⋮

v_j

s_1j ⋅ w₁ + ... + s_kj ⋅ w_k

a₁ ⋅ v₁

(a₁ s₁₁) ⋅ w₁ + ... + (a₁ s_k1) ⋅ w_k

⋮

a_j ⋅ v_j

(a_j s_1j) ⋅ w₁ + ... + (a_j s_kj) ⋅ w_k

a₁ ⋅ v₁ + ... + a_j ⋅ v_j

(a₁ s₁₁ + ... + a_j s_1j) w₁ + ... + (a₁ s_k1 + ... + a_j s_kj) w_k

Thus, we have that:

{ v₁, ..., v_j } ⊂ span { w₁, ..., w_k } implies span { v₁, ..., v_j } ⊂ span { w₁, ..., w_k }

Definition: Using facts about sets, we can then specify the following definition of equality between two spans for two finite sets of vectors V ⊂ Rⁿ and W ⊂ Rⁿ:

span W = span V

iff

span W ⊂ span V and span V ⊂ span W

Example: Determine whether the following formula is true:

span {

}

∈

span {

-6

-2

}

It suffices to set up two matrix equations and determine if solutions A ∈ R^{2 × 3} and B ∈ R^{3 × 2} exist:

1	2
2	1

⋅ A

9	3	-6
3	1	−2

9	3	-6
3	1	−2

⋅ B

1	2
2	1

Alternatively, we can check each vector individually (this is analogous to considering the above equations for A and B one column at a time).

a linear combination of

, and

-6

-2

a linear combination of

, and

-6

-2

a linear combination of

and

a linear combination of

and

-6

-2

a linear combination of

and

If all of the above are true, then the two spans are equivalent. If any of the above is false, then the two spans are not equivalent.

The above implies that a matrix can be used to represent a particular vector space. We will see in other sections further below that a given vector space can be represented in more than one way by a matrix, and that more than one matrix can be used to represent a vector space.

Definition: Given two vector spaces W and V, we have:

W is a (vector) subspace of V iff V ⊂ W.

4.3. Vector spaces as abstract structures

Our definitions of a vector space so far have explicitly referenced sets of concrete vectors as we usually understand them. However, this is not necessary.

Definition: A vector space is a set of objects X that satisfies the following conditions:

addition ⊕: X × X → X is an operation on elements of X such that:
- X is closed under ⊕, so for any x,y ∈ X:
  x ⊕ y
  ∈
  X
- ⊕ is commutative and associative; for any x,y,z ∈ X, we have:
  x ⊕ y
  =
  y ⊕ x
  (x ⊕ y) ⊕ z
  =
  x ⊕ (y ⊕ z)
- there is a unique additive identity 0 ∈ X for the operation ⊕ where for any x ∈ X:
  x ⊕ 0
  =
  x
  0 ⊕ x
  =
  x
- every element x ∈ X has an additive inverse -x where:
  x ⊕ (-x)
  =
  0
  (-x) ⊕ x
  =
  0
scalar multiplication ⊗: R × X → X is an operation on elements of X such that:
- X is closed under ⊗, so for any s ∈ R, x ∈ X:
  s ⊗ x
  ∈
  X
- 1 ∈ R is an identity with ⊗, so for any x ∈ X:
  1 ⊗ x
  =
  x
- ⊗ is associative, so for any s, t ∈ R, x ∈ X:
  s ⊗ (t ⊗ x)
  =
  (s ⊗ t) ⊗ x
- ⊗ distributes across ⊕, so for any x,y ∈ X:
  s ⊗ (x ⊕ y)
  =
  s ⊗ x ⊕ s ⊗ y

The above definition specifies conditions under which any set of objects S can be studied as a vector space.

We have encountered many sets that satisfy the above properties (thus making them vector spaces); below is a table of vector spaces, including vector spaces we have already encountered and vector spaces we will encounter later in the course.

vector space addition
operation additive
identity scalar
multiplication
operation

R addition of
real numbers 0 multiplication of
real numbers

R² vector addition:
[ a ; b ] + [ c ; d ] = [ a + c ; b + d ] [ 0 ; 0 ] scalar multiplication:
s ⋅ [ a ; b ] = [ s ⋅ a ; s ⋅ b ]

R³ vector addition:
[ a ; b ; c ] + [ d ; e ; f ] =
[ a + d ; b + e ; c + f ] [ 0 ; 0 ; 0] scalar multiplication:
s ⋅ [ a ; b ; c ] = [ s ⋅ a ; s ⋅ b ; s ⋅ c ]

Rⁿ vector addition:
[ a₁ ; ... ; a_n ] + [ b₁ ; ... ; b_n ] =
[ a₁ + b₁ ; ... ; a_n + b_n ] [ 0 ; ... ; 0 ] scalar multiplication:
s ⋅ [ a₁ ; ... ; a_n ] = [ s ... a₁ ; ... ; s ⋅ a_n ]

span { [0;0] } = { [0;0] } vector addition [0;0] scalar multiplication
of vectors in R²

span { v₁ , ... , v_k } ⊂ Rⁿ vector addition [ 0 ; ... ; 0 ] scalar multiplication
of vectors in Rⁿ

R^2×2 matrix addition [ 0 , 0 ; 0 , 0 ] scalar multiplication
of a matrix

R^n×n matrix addition [ 0, ..., 0 ; ... ; 0, ..., 0 ] scalar multiplication
of a matrix

affine space with
origin at a ∈ R² v ⊕ w = (v - a) + (w - a) + a a s ⊗ v = s ⋅ (v - a) + a

set of lines
through the origin
f(x) = a x f ⊕ g = h
where h(x) = f(x) + g(x) f(x) = 0 ⋅ x s ⊗ f = h
where h(x) = s × f(x)

set of polynomials
of degree 2
f(x) = a x² + b x + c f ⊕ g = h
where h(x) = f(x) + g(x) f(x) = 0 s ⊗ f = h
where h(x) = s × f(x)

set of polynomials
of degree k
f(x) = a_k x^k + ... + a₀ f ⊕ g = h
where h(x) = f(x) + g(x) f(x) = 0 s ⊗ f = h
where h(x) = s × f(x)

Example: The affine space A = {a + v | v ∈ V} is a vector space for appropriate definitions of addition, scalar multiplication, and identity.

addition (⊕) can be defined as follows. It is an operation on elements of A under which A is closed, and which satisfies the vector space axioms:

v ⊕ w = u where u = (v - a) + (w - a) + a = v + w - 2a + a = v + w - a.
scalar multiplication (⊗) can be defined as follows. It is an operation on elements of A under which A is closed, and which satisfies the vector space axioms:

s ⊗ v = u where u = (s ⋅ (v - a)) + a = sv - sa + a = sv + (1-s)a.
there is a unique additive identity in A; it is the vector a:

v ⊕ a = v + a - a = v.

An abstract definition of vector spaces is useful because it also allows us to study by analogy and learn about other objects that are not necessarily sets of vectors. All the properties we can derive about vector spaces using the above definition will apply to other sets of objects that also satisfy the above definition.

Example: We consider the set of functions F = {f | f(x) = cx, c ∈ R}. How do we show that F is a vector space?

there is a unique additive identity in F: f(x) = 0x
addition (+) is an operation on elements of F under which F is closed, and which satisfies the vector space axioms:

f + g = h where h(x) = f(x) + g(x)
scalar multiplication (⋅) is an operation on elements of F under which F is closed, and which satisfies the vector space axioms:

s ⋅ f = h where h(x) = s ⋅ f(x)

Example: Show that F = {f | f(x) = bx² + cx, b,c ∈ R} is a vector space.

For our purposes, it is sufficient to show that there is an additive identity, that the set is closed under addition, and that the set is closed under scalar multiplication. There is an additive identity:
f(x) = 0 ⋅ x² + 0 ⋅ x
The set is closed under the usual addition operation on polynomial curves; for any f, g ∈ F we have f + g = h where:
f(x)
=
bx² + cx
g(x)
=
b'x² + c'x
f(x) + g(x)
=
(bx² + cx) + (b'x² + c'x)

=
(b + b') ⋅ x² + (c + c') ⋅ x
h(x)
=
(b + b') ⋅ x² + (c + c') ⋅ x
h
∈
F
For any f ∈ F and s ∈ R we have h = s ⋅ f where:
f(x)
=
bx² + cx
s ⋅ f(x)
=
s ⋅ (bx² + cx)
s ⋅ f(x)
=
(sb)x² + (sc)x
h(x)
=
(sb)x² + (sc)x
h
∈
F

Example: Find two elements in F = {f | f(x) = bx² + cx + d, b,c,d ∈ R} that are linearly independent. What does it mean for two functions to be linearly independent?

Two vectors g, h ∈ \F are linearly independent if they are not linearly dependent. This means there exists no scalar such that s ⋅ g = h or s ⋅ h = g. One example of such a pair would be:
g(x)
=
x²
h(x)
=
1
There is no single scalar by which we can multiply the curve h(x) = 1 to obtain a parabola, and there is no single scalar by which we can multiply the parabola to obtain a non-zero flat line.

Example: What is a finite set of functions that spans {f | f(x) = bx² + cx + d, b,c,d ∈ R}? One such set is:

{f,g,h} where f(x) = 1, g(x) = x, h(x) = x²

Example: We consider the set of functions {f | f(x) = x^k, k ∈ R}. The following definitions of addition and scalar multiplication make this set of functions a vector space.

there is a unique additive identity: f(x) = x⁰
addition (+) is defined as:

f + g = h where h(x) = f(x) ⋅ g(x)
scalar multiplication (⋅) is an operation on elements of S under which S is closed, and which satisfies the vector space axioms:

s ⋅ f = h where h(x) = f(x)^s

Fact: Once we start thinking of curves as a vector space, we can rewrite curve-fitting problems as equations involving linear combinations of curves. For example, suppose we have some curve φ for which we do not know the formula...

φ(x) = ...

...that represents some observable phenomenon. We can hypothesize that the curve φ(x) is a linear combination of simpler curves, such as f(x) = x and g(x) = x². This is equivalent to hypothesizing that φ is in the following vector space:

span {f, g} = { f | f(x) = a x² + b x where a,b ∈ R }

We can then set up the following equation:

a ⋅ f + b ⋅ g

However, the above equation cannot directly be solved for a,b ∈ R because we do not know the formula for φ. Suppose we can sample φ on some finite collection of input values { x₁,...,x_m } ⊂ R. We can then write an equation that approximates the above equation at those points:

a ⋅

f(x₁)

⋮

f(x_m)

+ b ⋅

g(x₁)

⋮

g(x_m)

φ(x₁)

⋮

φ(x_m)

The above is equivalent to the following system of equations:

a ⋅ f(x₁) + b ⋅ g(x₁)

φ(x₁)

⋮

a ⋅ f(x_m) + b ⋅ g(x_m)

φ(x_m)

It can also be rewritten as the following matrix equation:

f(x₁)	g(x₁)
⋮
f(x_m)	g(x_m)

⋅

φ(x₁)

⋮

φ(x_m)

Notice that because we know f and g exactly, the matrix above has all constant entries. Furthermore, the right-hand side of the equation also consists of a vector with all constants as entries because we assumed we can sample φ on those inputs. Thus, we now have a matrix equation of the form M ⋅ v = w.

We can now use rref computation or, if the matrix is invertible, any other method for computing the matrix inverse to solve the above equation for a,b ∈ R. This would give us the exact coefficients of a curve in span {f, g} that fits φ at the x values { x₁,...,x_m }.

Fact: For any two natural numbers k, m ∈ N where k ≠ m, the curves f and g defined below are linearly independent:

f(x) = x^k

g(x) = x^m

Corollary: For any natural number k ∈ N, the curves f and g defined below are linearly independent:

f(x) = x^k

g(x) = x^k+1

We can see this by choosing any set of real number inputs { x₁,...,x_k+2 }; the equation below then has no solution s ∈ R:

s ⋅ g

f(x₁)

⋮

f(x_{k + 2})

s ⋅

g(x₁)

⋮

g(x_{k + 2})

Since no such s exists, f and g must be linearly independent.

Example: Suppose we have the following two curves:

f(x) = x

g(x) = x²

If we choose any three inputs, e.g., {1,2,3}, we will find that g cannot be a linear combination of f because the three points will not be collinear.

s ⋅ g

f(1)

f(2)

f(3)

s ⋅

g(1)

g(2)

g(3)

s ⋅

1/2

Since we derived a contradiction above, no such s exists, so f and g must be linearly independent.

Fact: For any collection of exactly k points in R²:

{

x₁

y₁

, ...,

x_k

y_k

}

there exists a polynomial f(x) = a_k-1x^k-1 + ... + a₁ x + a₀ that fits those points exactly.

We know the above must be true because the columns of the following matrix in R^{k × k} must be linearly independent:

(x₁)^k-1 ... (x₁) 1
⋮ ⋮ ⋮
(x_k)^k-1 ... (x_k) 1

This means that the equation below must have a unique solution:

(x₁)^k-1 ... (x₁) 1
⋮ ⋮ ⋮
(x_k)^k-1 ... (x_k) 1

⋅

a_k-1
⋮
a₀

=

y₁
⋮
y_k

Example: Just as with any vector space, we can set up and solve problems involving linear combinations of vectors. Suppose we want to find a function in the space {f | f(x) = ax³ + bx² + cx + d, a,b,c,d ∈ R} that fits a certain collection of points, such as:

{(1,3), (-1,13), (2,1), (-2,33)}

We can interpret each point as a pair (x, f(x)). We can then set up the following system of equations:

f(1)
=
3
f(-1)
=
13
f(2)
=
1
f(-2)
=
33

Expanding, we have:

a(1)³ + b(1)² + c(1) + d
=
3
a(-1)³ + b(-1)² + c(-1) + d
=
13
a(2)³ + b(2)² + c(2) + d
=
1
a(-2)³ + b(-2)² + c(-1) + d
=
33

Notice that we can rewrite this in the form of an equation M v = w:

(1)³ (1)² (1) 1
(-1)³ (-1)² (-1) 1
(2)³ (2)² (2) 1

(-2)³ (-2)² (-2) 1

⋅

a
b
c
d

=

3
13
1
33

This shows us that we can interpret individual objects in {f | f(x) = ax³ + bx² + cx + d, a,b,c,d ∈ R} as vectors in R⁴. We can also view this problem in terms of systems, system states, and observations. If the function f(x) = ax³ + bx² + cx + d that solves the above system is a state of the system, then (3, 13, 1, 33) is a partial observation of that state along four distinct "dimensions".

Note that finding the equation for a line that crosses two points is just a special case of the above.

Example: We want to find a function f(x) = cx + d such that the points (2,3) and (5,-4) fall on the line the equation represents. Thus, we have:

(2) 1
(5) 1

⋅

c
d

=

3
-4

We can consider many spaces of functions beyond spaces of polynomial functions. For example, suppose exp(x) = e^x. Then we can define the following vector space:

span{ exp, cos , sin }.

We can also generalize this approach to families of related functions.

Example: Consider the following set of vectors of functions, where f' denotes the derivative of a function f:

F = { (f, f, f') | f(x) = bx² + cx + d }

The set F is a vector space. Now, suppose we have the following problem. Find the function for a parabola in F such that (1,4) and (2,-3) lie on the parabola, and the maximum or minimum point on the parabola is at x = 1. Such a function is represented by the solution to the following equation:

(1)² (1) 1
(2)² (2) 1
2(1) 1 0

⋅

b
c
d

=

4
-3
0

Note that the number of rows in the matrix (and result vector) corresponds to the number of data points for which a model is being sought, and is not bounded.

Example: Find the order 5 polynomial that exactly fits the following points in R²:

-1
45

,

8
4

,

6
5

,

2
-8

,

-10
8

,

15
6

We must first compute the vectors that approximate each of the following curves that span our space of possible polynomials at each of the x components of the points above:
{ f | f(x) = a₅ x⁵ + a₄ x⁴ + a₃ x³ + a₂ x² + a₁ x + a₀ + % where a_i ∈ R }
=
span {f,g,h,i,j,l}
where
f(x)
=
x⁵
g(x)
=
x⁴
h(x)
=
x³
i(x)
=
x²
j(x)
=
x
l(x)
=
1

Thus, we want to solve for the coefficients of the following linear combination of curves:
a₅ f + a₄ g + a₃ h + a₂ i + a₁ j + a₀ l
=
curve represented by data points
Thus, we would have the following approximation of the above linear combination at the specific x values in { -1, 8, 6, 2, -10, 15 }:
a₅

f(-1)
f(8)
f(6)
f(2)
f(-10)
f(15)

+ a₄

g(-1)
g(8)
g(6)
g(2)
g(-10)
g(15)

+ a₃

h(-1)
h(8)
h(6)
h(2)
h(-10)
h(15)

+ a₂

i(-1)
i(8)
i(6)
i(2)
i(-10)
i(15)

+ a₁

j(-1)
j(8)
j(6)
j(2)
j(-10)
j(15)

+ a₀

l(-1)
l(8)
l(6)
l(2)
l(-10)
l(15)

=

45
4
5
-8
8
6

We can convert the above into the following matrix equation:

f(-1) g(-1) h(-1) i(-1), j(-1) l(-1)
f(8) g(8) h(8) i(8), j(8) l(8)
f(6) g(6) h(6) i(6), j(6) l(6)
f(2) g(2) h(2) i(2), j(2) l(2)
f(-10) g(-10) h(-10) i(-10) j(-10) l(-10)
f(15) g(15) h(15) i(15) j(15) l(15)

⋅

a₅
a₄
a₃
a₂
a₁
a₀

=

45
4
5
-8
8
6

It is then sufficient to solve the above equation to obtain the ceofficients, and thus the degree 5 polynomial that exactly fits the points above.

The disadvantage to using the approach presented in this subsection to fitting functions to data points is that the data must match an actual function perfectly (or, equivalently, the space of functions being considered must be rich enough to contain a function that can fit the data points exactly). We will see further below how to find functions that have the "best" fit (for one particular definition of "best") without necessarily matching the points exactly.

Example: Let polynomials in F = {f | f(t) = a t² + b t + c } represent a space of possible radio signals. To send a vector v ∈ R³ to Bob, Alice sets her device to generate the signal corresponding to the polynomial in F whose coefficients are represented by v. Bob can then let his radio receiver sample the radio signals in his environment at multiple points in time t to retrieve the message.

Suppose Alice wants to transmit the following vector v ∈ R³ to Bob:
v
=

5
-2
1

Alice sets her radio transmitter to generate a signal whose amplitude as a function of time is:
f(t)
=
5 t² - 2 t + 1
How can Bob recover the message Alice is sending? Bob can recover the message by having his device sample the signal at three points, e.g., t₁ = 1, t₂ = 2, and t₃ = 3. Once Bob does this, he can set up an equation to recover the curve Alice used to generate the signal:

(1)² (1) 1
(2)² (2) 1
(3)² (3) 1

⋅ v
=

f(1)
f(2)
f(3)

1 1 1
4 2 1
9 3 1

⋅ v
=

4
17
40

The above equation can then be solved to obtain Alice's message v.

4.5. Basis, dimension, and orthonormal basis of a vector space

We have adopted span{v₁,...,v_n} as our notation for vector spaces. However, this makes it possible to represent a given vector space in a variety of ways:

span{(0,1),(1,0)} = span{(0,1),(1,0),(1,0)} = span{(0,1),(1,0),(1,0),(1,0)}, and so on.

We might naturally ask: given a vector space V, what is the smallest possible size for a finite set of vectors W such that V = span W?

Definition: A finite set of vectors B is a basis of the vector space V if span B = V and for all finite sets of vectors W such that span W = V, |B| ≤ |W|.

Fact: A finite set of vectors v₁,...,v_n is a basis for V iff span {v₁,...,v_n} = V and the vectors v₁,...,v_n are setwise linearly independent.

Fact: Given a finite set of vectors v₁,...,v_n, we can find the basis for span {v₁,...,v_n} by creating a matrix M in which the rows are the vectors v₁,...,v_n and computing rref M. The set of distinct nonzero rows in rref M is a basis for span {v₁,...,v_n}.

Example: We can find the basis of V = span {[-3;0;7;0], [-9;0;21;0], [0;0;2;8], [0;0;1;4], [3;0;13;4], [0;0;1;0]} by computing the reduced row echelon form of the following matrix. We can use a third-party tool to do so (e.g., WolframAlpha).

-3	7	0
-9	21	0
0	2	8
0	1	4
3	13	4
0	1	0

rref M

1	0	0
0	1	0
0	0	1
0	0	0
0	0	0
0	0	0

The rows of the matrix rref M will correspond to the vectors in a basis for V. Thus, we have that:

span {

-3

-9

}

span {

}

Definition: The dimension of a vector space V, which we write as dim V, is the size of any basis of V (recall that every basis is the same size as every other basis, since they all must have the smallest size of any set of vectors that spans V).

Example: What is the dimension of {f | f(x) = ax³ + bx² + cx + d, a,b,c,d ∈ R}?

The dimension of a space is the size of its basis. A basis must consist of vectors that are linearly independent from one another, and it must span the space. We know that the following set B spans the space:
B
=
{ f, g, h, l } where
f(x)
=
x³
g(x)
=
x²
h(x)
=
x
l(x)
=
1

We also know that the curves f, g, h, and l are linearly independent. Thus, B is a basis. Since |B| = 4 and span B = V, then we have that:
dim V
=
|B|

=
4

Example: Consider the vector space V consisting of finite sets of real numbers:

the empty set is the unique additive identity: ∅
addition (+) is defined as follows: for any two objects in our vectors space P ∈ V and T ∈ V, we have that

P + T = Q where Q = (P ∪ T) - (P ∩ T)
scalar multiplication is defined as:

s ⋅ T = Q where Q = {s ⋅ r | r ∈ T}

What are the basis and dimension of this vector space?

The basis of the space is B = { {1} } since you can obtain any finite set of real numbers by taking linear combinations of the set {1} ∈ V. Since span { {1} } = V and |{ {1} }| = 1, dim V = 1.

Definition: A finite set of vectors B is an orthonormal basis of the vector space V if B is a basis of V, all the vectors in B are unit vectors, and the vectors in B are setwise orthogonal.

Recall that for any vector v ∈ Rⁿ and any unit vector u ∈ Rⁿ, (v ⋅ u) ⋅ u is the projection of v onto the line parallel to u (i.e., the vector space {a u | a ∈ R}). We can use this fact to define a process for turning an arbitrary basis into an orthonormal basis.

Fact: Given a basis B = {v₁,...,v_n}, it is possible to turn it into an orthonormal basis {e₁,...,e_n} by using the Gram-Schmidt process. This algorithm can be summarized as follows.

u₁
=
v₁

e₁ = u₁ / ||u₁ ||
u₂
=
v₂ − ((v₂ ⋅ e₁) ⋅ e₁)

e₂ = u₂ / || u₂ ||
u₃
=
v₃ − ((v₃ ⋅ e₁) ⋅ e₁) − ((v₃ ⋅ e₂) ⋅ e₂)

e₃ = u₃ / || u₃ ||
u₄
=
v₄ − ((v₄ ⋅ e₁) ⋅ e₁) − ((v₄ ⋅ e₂) ⋅ e₂) − ((v₄ ⋅ e₃) ⋅ e₃)

e₄ = u₄ / || u₄ ||

⋮



u_n
=
v_n − Σ_{i = 1}^n-1 ((v_n ⋅ e_i) ⋅ e_i)

e_n = u_n / || u_n ||

Intuitively, for each vector v_i ∈ B, the algorithm substracts out the contributions of the already-computed orthogonal unit vectors from v_i, obtaining u_i, and then rescales u_i to make it into a unit vector e_i.

At the end of the above process, {e₁,...,e_n} is the orthonormal basis (and {e₁,...,e_n} is a basis of orthogonal vectors that are not necessarily unit vectors).

Many computational environments for mathematics that support linear algebra provide an operator for turning a basis into an orthonormal basis.

Why is an orthonormal basis useful? Once again, we appeal to the fact that (v ⋅ u) ⋅ u is the projection of v onto u, where v ⋅ u is the length of that projection. Suppose we want to determine how to express an arbitrary vector v using an orthonormal basis u₁,...,u_n.

Fact: Given a vector v ∈ Rⁿ, and an orthonormal basis B = {u₁,...,u_n} for Rⁿ, we can determine what linear combination of the vectors u₁,...,u_n yields v by computing:

a = (M_B)^⊤ ⋅ v.

Notice that the components of the vector a, which we will call a₁,...,a_n, are the scalars needed to compute v as a linear combination of the vectors u₁,...,u_n:

v = a₁ u₁ + ... + a_n u_n = M_B a.

Fact: Given a vector v ∈ Rⁿ, some k < n, and an orthonormal basis B = {u₁,...,u_k} for a subspace W ⊂ Rⁿ, the product (M_B ⋅ (M_B)^⊤ ⋅ v) is the projection of v onto W.

Example: Notice the relationship of the above fact to projection onto the span of a single unit vector u = (x,y) in R². Suppose we have a basis B = {u} for a subspace {s ⋅ u | s ∈ R²} ⊂ R². Then we have:

M_B
=

x
y

Thus, for an arbitrary v ∈ R² not necessarily in {s ⋅ (x,y) | s ∈ R²} we have:

M_B ⋅ (M_B)^⊤ ⋅ v
=

x
y

⋅

x
y

^⊤ ⋅ v

=

x
y

⋅ (

x
y

⋅ v )

=
(

x
y

⋅ v ) ⋅

x
y

=
( v ⋅

x
y

) ⋅

x
y

=
( v ⋅ u ) ⋅ u

Thus, we see that the projection of a vector onto another unit vector in R² is just a special case of the general approach to projection.

Example: Suppose we want to project the vector v ∈ R² onto the space W ⊂ R² where:

span {

1/√(2)

}

{

1/√(2)

}

We compute the matrix M_B and its transpose. M_B is a 2 × 1 matrix in this case, and (M_B)^⊤ is a 1 × 2 matrix:

M_B

1/√(2)

(M_B)^⊤

1/√(2)

We can now apply the formula. Notice that it is equivalent to the formula for projection of v onto the single vector that spans W:

M_B ⋅ (M_B)^⊤ ⋅ v

1/√(2)

⋅

1/√(2)

⋅

1/√(2)

⋅ (

1/√(2)

⋅

)

1/√(2)

⋅ (6/√(2))

Thus, the projection of v onto W is [3; 3].

Example: Suppose we want to project the vector v ∈ R³ onto the space W ⊂ R³ where:

span {

1/√(2)

}

{

1/√(2)

}

We compute the matrix M_B and its transpose. M_B is a 3 × 2 matrix in this case, and (M_B)^⊤ is a 2 × 3 matrix:

M_B

1/√(2)	0
1/√(2)	0
0	1

(M_B)^⊤

1/√(2)	1/√(2)	0
0	0	1

We can now apply the formula:

M_B ⋅ (M_B)^⊤ ⋅ v

1/√(2)	0
1/√(2)	0
0	1

⋅

1/√(2)	1/√(2)	0
0	0	1

⋅

1/√(2)	0
1/√(2)	0
0	1

⋅

6/√(2)

Thus, the projection of v onto W is [3; 3; 5].

Example: Suppose we want to project the vector v ∈ R³ onto the space W ⊂ R³ where:

span {

1/√(2)

}

What is the orthogonal projection of v onto W?

Since the two vectors that span W are already an orthonormal basis, we can simply project v onto the two individual vectors in the orthonormal basis and take the sum to obtain the orthogonal projection of v onto W:
(

3
2
4

⋅

1/√(2)
0
1/√(2)

) ⋅

1/√(2)
0
1/√(2)

+ (

3
2
4

⋅

0
1
0

) ⋅

0
1
0

=
(7/√(2)) ⋅

1/√(2)
0
1/√(2)

+ 2 ⋅

0
1
0

=

7/2
2
7/2

Example: Suppose we want to project the vector v ∈ R³ onto the space W ⊂ R³ where:

-1

span {

}

What is the orthogonal projection of v onto W?

Since the two vectors that span W are already orthogonal, it is sufficient to turn them into unit vectors in order to obtain an orthonormal basis of W. We can then project v onto the two individual vectors in the orthonormal basis and take the sum to obtain the orthogonal projection of v onto W:
span {

3
4
0

,

0
0
2

}
=
span {

3/5
4/5
0

,

0
0
1

} (

-1
2
1

⋅

3/5
4/5
0

) ⋅

3/5
4/5
0

+ (

-1
2
1

⋅

0
0
1

) ⋅

0
0
1

=
1 ⋅

3/5
4/5
0

+ 1 ⋅

0
0
1

=

3/5
4/5
1

4.6. Homogeneous, non-homogeneous, overdetermined, and underdetermined systems

While we have usually considered n × n matrices, in real-world applications (especially those involving data), system states usually have a shorter description than the corpus of data (e.g., observational data) being used to determine the system state. Thus, problems in such applications usually involve matrices in which the number of rows is very different from the number of columns. Furthermore, the data may be noisy, which means that there may be no exact system state that matches the data. Informally, this might mean that the system of equations being used to determine a system state is overdetermined. On the other hand, if the amount of data is not sufficient to determine a single system state, a system is underdetermined.

Definition: For any matrix M ∈ R^n×m and vector v ∈ R^m, the system M x = v is overdetermined if there exist no solutions to the equation M x = v.

Definition: For any matrix M ∈ R^n×m and vector v ∈ R^m, the system M x = v is undetermined if there exist two or more solutions for x that satisfy the equation M x = v.

Definition: For any matrix M ∈ R^n×m, the system M x = 0 is homogeneous.

Definition: For any matrix M ∈ R^n×m and vector v ∈ R^m where v ≠ 0, the system M x = v is nonhomogeneous.

Fact: A homogeneous system M x = 0 has at least one solution.

Fact: A homogeneous system M x = 0 is never overdetermined.

Fact: The solution space of a homogeneous system is a vector space:

there is a unique additive identity: 0 is a solution
adding two solutions yields a solution:
M x
=
0
M y
=
0
M (x + y)
=
M x + M y

=
0
multiplying a solution by a scalar yields a solution:
M x
=
0
M (s y)
=
s (M y)

=
s 0

=
0

Fact: The solution space of a nonhomogeneous system is an affine space.

Motivating Example: Suppose we have a nonhomogeneous system M x = v where v ∈ R^1000000.

We may not know ahead of time whether or not it is overdetermined. Suppose we do not want to incur the cost of attempting to solve M x = v exactly. This might be because we only want to make one pass of the data to reduce costs, but it might also be because this is a data stream that is too voluminous to store, so the data will be gone unless we solve the system right away (e.g., data from a telescope, a particle collider, or Twitter).

In such cases, we can typically assume the data will have noise, so the system is overwhelmingly likely to be overdetermined and have no solution. Thus, we are okay with finding an approximate solution.

We do know, however, that M x = 0 has at least one solution, and that the solution space is a vector space (so, for example, we could try to "search" for a better solution using some other technique involving addition and scalar multiplication once we find at least one non-trivial one). Thus, we might instead choose to solve the system M x = 0, knowing that even in the worst case, a very poor approximate solution of 0 will always be found in the worst case.

Is there a better compromise between these two strategies of keeping v or changing it to 0? Can we find some other space of solutions that won't be overdetermined, but will still allow us to find a non-trivial (nonzero) approximate solution? One possibility is to project the vector v onto a subspace of the solution space, ensuring that the system is no longer overdetermined. This approach is covered in the section below.

4.7. Application: approximating overdetermined systems

Fact: Suppose we have an overdetermined system where M ∈ R^{m × n}, v ∈ Rⁿ, and w ∈ R^m:

M ⋅ v

Let us rewrite M in terms of its columns c_i ∈ R^m:

↑		↑
c₁	...	c_n
↓		↓

x₁

⋮

x_n

Let B = {c₁,...,c_n} be a basis for span B ⊂ R^m. To be more general, we will use W = R^m and W' = span B:

{c₁,...,c_n}

R^m

span B

span B'

⊂

R^m

⊂

Now, it is possible to restate the fact that M ⋅ v = w has no solution in a different way:

M ⋅ v

↑		↑
c₁	...	c_n
↓		↓

⋅

x₁

⋮

x_n

x₁ ⋅ c₁ + ... + x_n ⋅ c_n

∈

span B

M ⋅ v

∈

span B

∉

span B

The above tells us that the real problem is that w ∉ span B. It also tells us that for any w' in the span of the columns of M (which we call span B), M ⋅ v = w' will have a solution:

↑		↑
c₁	...	c_n
↓		↓

∈

span {c₁, ..., c_n}

M ⋅ v

w' has at least one solution

Definition: Suppose we have an overdetermined system where v ∈ Rⁿ, and w ∈ R^m:

↑		↑
c₁	...	c_n
↓		↓

⋅ v

We can find an approximate solution v' by picking some w' ∈ span {c₁, ..., c_n} and solving the new equation:

↑		↑
c₁	...	c_n
↓		↓

⋅ v'

The error of the approximate solution v' is defined to be ||w' − w||.

We see that we may have many choices for w' in changing an overdetermined system M ⋅ v = w to a system with at least one solution M ⋅ v' = w'. Above, we have introduced a notion of error so that we can compare the quality of different choices of w'. How can we pick the w' that minimizes the error ||w - w'||? Let us first define what the vector leading to the minimal error is.

Definition: Given w ∈ W and a subspace W' ⊂ W, the vector w' within the subspace W' that is closest to w is a vector w* ∈ W' such that ||w - w*|| is minimal. In other words, for all w' ∈ W,

||w - w*|| ≤ ||w - w'||.

How do we find w*?

Fact: We first review the triangle inequality in the case of a right triangle. Suppose we have a triangle with a height a, a base length of b, and a hypotenuse of length c. Then we have:

a² + b²

≥

a²

c²

≥

a²

≥

This implies that the orthogonal projection of a vector v ∈ V onto a subspace W ⊂ V is the closest point in W to v.

Fact: Given w ∈ W and a subspace W' ⊂ W, the orthogonal projection w* of w onto W' is the closest vector in W' to w.

Fact: Suppose we are given an overdetermined system:

M ⋅ v

We assume the matrix can be written in terms of its columns:

↑		↑
c₁	...	c_n
↓		↓

The least-squares approximate solution to M ⋅ v = w is the solution v* to the equation:

M ⋅ v*

where w* is the orthogonal projection of w onto span {c₁, ..., c_n}. The solution v* is the one that leads to the smallest possible error for any possible vector v that we can multiply by M:

||w - M v*|| ≤ ||w - M_B v||

Why is it called a "least-squares" approximate solution? In the table below, we summarize the correspondences used in the above facts.

concept related to
solving M_B x = v notation relationship notation geometric concept

the space of values W' with
which we can replace w in
the overdetermined system M v = w
to make a system M v = w'
that has solutions {M ⋅ v | v ∈ Rⁿ} the span of the
columns of M span B the subspace W' of W
spanned by B

the error of an approximate
solution v' ||w - M v' || M v' = w' ||w - w'|| the distance between
w ∈ W and w' ∈ span B

M ⋅ v* where v* is
the minimum error solution for all v',
||w - M w* || ≤ ||w - M w' || M v* = w* for all w' ∈ span B,
||w - w*|| ≤ ||w - w'|| the orthogonal projection
w* of w ∈ W onto span B
(the closest vector in span B
to w)

Notice that we know a solution to M v* = w* exists, and that we can show this in two different ways. Since w* ∈ span B, it must be that w is a linear combination of the vectors in B, so it is a linear combination of the columns in M. Alternatively, if M is a square matrix and we know that B is a basis, then the columns of M are linearly independent, which means M is invertible, so

v* = M^-1 w*.

Fact: We can compute the least-squares approximate solution v* to any equation M ⋅ v = w by using the following process:

break M up into a set of its column vectors
find an orthonormal basis B of the span of the column vectors of M to make M_B
use M_B to compute the projection w* of w onto span B
solve the system M v* = w* (e.g., by finding the rref of an augmented matrix)

Example: Find the least-squares approximate solution v* to the following equation:

3	0
4	0
0	2

⋅ v

-1

We have an overdetermined system (i.e., an equation with no solution) of the form M ⋅ v = w. We must turn it into an equation M ⋅ v = w* that does have a solution by replacing w with w*, the orthogonal projection of w that is in the span of the columns of M.

Let W be the span of the columns of the matrix in the equation above:
W
=
span {

3
4
0

,

0
0
2

}
We first need to find an orthonormal basis of W. Since the two vectors above are already orthogonal, it is sufficient to normalize them. More generally, we would need to apply the Gram-Schmidt process (this is technically what we are doing here, it is just that the terms we need to subtract are always 0 in this case). Thus, we have an orthonormal basis:
W
=
span {

3/5
4/5
0

,

0
0
1

}
We now compute the orthogonal projection of w to find w* ∈ W:
w*
=
(

-1
2
1

⋅

3/5
4/5
0

) ⋅

3/5
4/5
0

+ (

-1
2
1

⋅

0
0
1

) ⋅

0
0
1

=
1 ⋅

3/5
4/5
0

+ 1 ⋅

0
0
1

=

3/5
4/5
1

Thus, we now have a solvable equation M ⋅ v = w*:

3 0
4 0
0 2

⋅ v
=

3/5
4/5
1

v
=

1/5
1/2

The error of the approximate solution is ||w − w*||:
error of the approximate solution

1/5
1/2

=
||w − w*||

=
||

-1
2
1

−

3/5
4/5
1

||

=
||

-8/5
6/5
0

||

=
2

Example: Suppose a matrix M is such that M ⋅ v = 0 has exactly one solution. What can we say about the error of the least-squares approximation of any system M ⋅ v = w?

Since M is invertible (because there is exactly one solution to M ⋅ v = 0), then there is always an exact solution v to M ⋅ v = w:
M ⋅ v
=
w
v
=
M^-1 ⋅ w
Thus, the columns of M are linearly indepenent and span the space of all possible vectors that can appear on the right-hand side of the equation. This means that w* = w, so the error is ||w − w*|| = ||w − w||| = 0.

Example: Suppose a matrix M is such that M ⋅ v = 0 has Rⁿ as its space of solutions. What can we say about any system M ⋅ v = w? What can we say about the error of any least-squares approximation of a solution for M ⋅ v = w?

This means that M is the matrix consisting of all zeroes. Thus, the span of its columns is { 0 }, the set consisting of only the origin. Thus, the error of any approximate solution will be ||w - 0|| = ||w||.

Example: We saw that if M is the matrix consisting of all zero entries, w* must be the zero vector (i.e,. the origin). Are there other matrices M for which w* will be 0?

Yes, if all the columns of M are orthogonal to w, then the orthogonal projection of w onto the span of the columns of M will be the origin, 0. For example, consider the following overdetermined system:

1 2
2 4

⋅ v
=

−2
1

Since [−2; 1] ⋅ [1; 2] = 0 and [−2; 1] ⋅ [2; 4] = 0, the orthogonal projection of [−2; 1] onto the span of the two columns will be [0; 0].

Relationship to curve fitting and linear regressions in statistics

The method described above does overlap with standard curve fitting and linear regression techniques you see in statistics. There are many variants to these approaches, and the one considered above corresponds to a fairly basic variant that has no specialized characteristics (i.e., it makes no special assumptions about the data points, the relative importance of the data points, their distribution, or the way they can be interpreted).

The approach above is known as ordinary least squares and as linear least squares.
The case in which the function space is {f | f(x) = ax + b, a,b ∈ R} corresponds to a simple linear regression. This involves fitting a line to a collection of points while minimizing the sum of the squares of the distances parallel to the y-axis between all the data points and the approximate line. This diagram is an illustration of this. The method in these notes is more general in that it is not restricted to polynomials of order 1, but can be used to fit polynomials of any order to data.
We can restate the least squares method described in these notes as finding x* such that the vector ε in M x* = v - ε is minimized. Notice that ε + w = v. The length of the vector ε can be viewed as the sum of squares of y-axis-aligned distances from the estimate curve to the actual data points:

||ε|| = ||v - M x*||.
This diagram illustrates the projection method being used.
The matrix M in M x = v is sometimes called the design matrix. The same term is used to refer to the corresponding matrix in a simple linear regression.

4.8. Application: approximating a model of system state dimension relationships

The approach described in the previous section can be used to approximate a solution to the equation M v = w. Usually, if M describes a model of a system (in this sense, system refers to a system of states that are described by values along some number of dimensions), solving M v = w corresponds to determining a system state v given some other information about a system state w.

What if we instead have a large collection of pairs of observations of system states of the form (v,w) (or, more generally, any collection of observations of a system along multiple dimensions). Can we approximate a model of a set of relationships between some two subsets of the dimensions in the system? In other words, if we take all our observations (x₁, ..., x_n) and divide them into pairs of descriptions x = (x₁,...,x_k) and v = (x_k+1,...,x_n), can we find the best M that approximates the relationship between incomplete system state descriptions x and v?

Example: Suppose we have the following dimensions in our system:

number of red dwarf stars
number of G type stars
number of pulsars
units of infrared (IR) radiation observed
units of ultraviolet (UV) radiation observed
units of gamma (γ) radiation observed

Suppose that we already have some set of confirmed (but possibly noisy) observations of systems along all the dimensions (the dimensions of the quantities in each point correspond to the list of dimensions above):

4
5
1
243
3341
700

,

3
6
21
125
1431
1465

,

4
2
13
533
3432
334

,

16
4
4
334
143
762

,

13
8
13
235
1534
513

,

34
16
17
333
3532
450

For the above observations, we want to find a matrix that relates the first three dimensions (the number of each type of star) to the last three dimensions (the amount of each kind of radiation). In other words, we are looking for a matrix M ∈ R^{3× 3} of the form:

a units IR/red dwarf	b units IR/G type	c units IR/pulsar
d units UV/red dwarf	e units UV/G type	f units UV/pulsar
g units γ/red dwarf	h units γ/G type	i units γ/pulsar

Multiplication by the above matrix represents a system state description transformation with the following units:

(# red dwarf stars, # G type stars, # pulsars) → (units IR, units UV, units γ)

We can split the data into two collections of points: those that specify the number of each type of star (the inputs to the above transformation), and those that specify the amount of each kind of radiation (the output of the above transformation). Below, we call these P and Q. We then turn these two sets of data points into matrices. In order to turn this problem into the familiar form M v = w, we can transpose the two matrices.

Example: Suppose we have four dimensions in our system. Also, suppose that we already have some set of confirmed (but possibly noisy) observations of systems along all the dimensions (the dimensions of the quantities in each point correspond to the list of dimensions above):

For the above observations, we want to find a matrix that relates the first two dimensions to the last two dimensions. We would then set up the following equations:

a b
c d

⋅

1
0

=

1
2

a b
c d

⋅

0
1

=

0
3

a b
c d

⋅

2
0

=

2
4

a b
c d

⋅

0
1

=

0
2

Unfortunately, there is no single solution a,b,c,d ∈ R to the above collection of equations. To find an approximate solution, we can convert the above into a single matrix equation:

1 0
0 1
2 0
0 1

⋅

a c
b d

=

1 2
0 3
2 4
0 2

We can go further and turn this into an equation of the form M ⋅ v = w:

1 0 0 0
0 1 0 0
2 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
0 0 2 0
0 0 0 1

⋅

a
b
c
d

=

1
0
2
0
2
3
4
2

In either case, we can now approach the problem by finding the closest vector or matrix that is in the span of the matrix on the left-hand side of the equation. We do this by finding the orthogonal projection of the right-hand side onto the span of the matrix.

1
0
2;
0

∈
span {

1
0
2;
0

,

0
1
0;
1

}

2
3
4;
2

∉
span {

1
0
2;
0

,

0
1
0;
1

}
The above tells us that it is sufficient to project the vector [2; 3; 4; 2] onto the span. The projection is:
(

2
3
4
2

⋅

1
0
2
0

⋅ (1/√(3))) ⋅

1
0
2
0

⋅ (1/√(3)) + (

2
3
4
2

⋅

0
1
0
1

⋅ (1/√(2))) ⋅

0
1
0
1

⋅ (1/√(2))
The above yields:
10/3 ⋅

1
0
2
0

+ 5/2 ⋅

0
1
0
1

=

10/3
5/2
20/3
5/2

We can now set up an equation that has a solution:

1 0
0 1
2 0
0 1

⋅

a c
b d

=

1 10/3
0 5/2
2 20/3
0 5/2

a b
c d

≈

1 0
10/3 5/2

4.9. Application: distributed computation of least-squares approximations

Fact: We assume there are n distinct computing devices {C₀, ..., C_n} (e.g., servers, virtual machines, motes, mobile devices, etc.), that any device can communicate with a small number (e.g., two) of other devices at any given moment in time, and that all devices can compute or communicate simultaneously (i.e., independently of one another). Then the following are true:

if C_i knows some small piece of information (e.g., a real number r ∈ R), the devices can distribute r to all the devices in about log n time steps;
if every device C_i has some real number r_i, the devices can collectively compute r₁ + ... + r_n in about log n time steps.

Fact: The process of finding an approximate solution v* to an overdetermined system M ⋅ v = w can be broken down into the following basic operations:

addition and subtraction of vectors;
projection of one vector onto another vector;
solving a system M ⋅ v* = w* that must have a solution.

Fact: If two vectors [a₁; ...; a_n] and [b₁; ...; b_n] are stored in a distributed manner across multiple devices {C₀, ..., C_n} such that device C_i stores only a_i and b_i, then the devices can collective compute the sum of these two vectors in one time step: each device C_i simply computes a_i + b_i and stores the result internally.

4.10. Orthogonal complements and algebra of vector spaces

Definition: The orthogonal complement W^⊥ of a vector subspace W ⊂ V is the set of all vectors in V that are orthogonal to all the vectors in W:

W^⊥ = {v | v ∈ V, ∀ w ∈ W, v ⋅ w = 0}

Notice that the orthogonal complement of a subspace is always taken within the context of some specified space V. We consider vector spaces, their orthogonal complements, and common set operations: set union, set intersection, and set product.

Example: Given any two vectors spaces V and W, is V ∪ W a vector space? No. For example, consider

V = span{[1;0]} ∪ span{[0;1]}.

We have that [1;0] ∈ V and [0;1] ∈ V, but not [1;0] + [0;1] = [1;1] ∈ V.

Example: Given any two vectors spaces V and W, is V ∩ W a vector space? Yes.

there is a unique additive identity: 0 ∈ V and 0 ∈ W, so 0 ∈ V ∩ W
V ∩ W is closed under addition (+): if v,w ∈ V ∩ W then we have that v ∈ V and w ∈ V, and we also have that v ∈ W and w ∈ W, so we have that

v + w ∈ V and v + w ∈ W, so v + w ∈ V ∩ W.
Since addition satisfies the vector space axioms for elements V and W, it also does so for elements in V ∩ W.
V ∩ W is closed under scalar multiplication (⋅): if v ∈ V ∩ W then we have that v ∈ V and w ∈ V, so we have that

sv ∈ V and sv ∈ W, so sv ∈ V ∩ W.

Since scalar multiplication satisfies the vector space axioms for elements V and W, it also does so for elements in V ∩ W.

Example: Given any two vectors spaces V and W, is V × W a vector space? Yes.

there is a unique additive identity: (0_V,0_W) where 0_V ∈ V and 0_W ∈ W
there is an addition operation (+):

(v,w) + (v',w') = (v+v', w+w')
there is a scalar multiplication (⋅) operation:

s (v,w) = (sv, sw)

Exercise: For a vector space W ⊂ V, compute W ∩ W^⊥.

Fact: For a vector subspace W ⊂ V,

dim W + dim W^⊥ = dim V.

Fact: For a vector subspace W ⊂ V,

W × W^⊥ = V.

5. Linear Transformations

5.1. Set products, relations, and maps

We consider the following sets of vectors and their properties.

construct definition example graphical example

V × W { (v,w) | v ∈ V, w ∈ W } {1,2,3} × {4,5,6}
=
{(1,4),(1,5),(1,6),
(2,4),(2,5),(2,6),
(3,4),(3,5),(3,6)}

R is a relation between V and W R ⊂ V × W {(1,D), (2,B), (2,C)}
is a relation between
{1,2,3,4} and {A,B,C,D}

f is a function from V to W
f is a map from V to W f is a relation between V and W and
∀ v ∈ V, there is at most one
w ∈ W s.t. f relates v to w { (x,f(x)) | f(x) = x² }

R^-1 is the inverse of R { (w,v) | (v,w) ∈ R }

f: X → Y is injective

f: X → Y is surjective

f: X → Y is bijective

Notice that we may have f such that f is a function, but f^-1 is not a function.

Fact: If we say that a map f is a relation between V and W, we can denote this as f : V → W or f ∈ V → W. Then, V is the domain of f and W is the codomain.

Notice that the definition of f does not uniquely determine its codomain. For example, f(x) = x² could have a codomain of R if f : R → R, or it could have a codomain of R⁺ if f : R → R⁺.

Example: Describe with as much detail as you can the domains, codomains, and images of the following maps (there is more than one right answer for these, as the definitions do not specify the domains of the functions). Determine whether each map is injective, surjective, or both given the domains and codomains you chose.

f(x)
=
3x


f(x)
=
x²


f(x)
=
-x


f(x)
=
x ⋅ √(-1)


f(x)
=

x 0
0 x



f( M, N )
=
M ⋅ N

where M and N are matrices

f(

x
y

)
=

1 2
3 4

x
y

Fact: Given some f:V → W, the following process can be used to show a map f is injective:

assume v,v' ∈ V and v ≠ v';
it is necessary to show that f(v) ≠ f(v'):
1. expand f(v) and f(v') using the definition of f;
2. use algebraic properties or other facts to derive f(v) ≠ f(v').

Alternatively, we can use the following approach:

assume v,v' ∈ V;
assume f(v) = f(v'):
1. expand f(v) and f(v') using the definition of f;
2. use algebraic properties or other facts to derive v = v'.

Fact: Given some f: V → W, the following process can be used to determine whether a map f is surjective:

let w ∈ W;
solve f(v) = w:

if you can solve the equation (or derive an equation v = g(w)) such that v ∈ V, f:V → W is surjective;
if the equation has no solutions in V, f:V → W is not surjective.

Notice that the above does not mean that there is not some V' where V ⊂ V' that does have a solution to the equation f(v) = w. For example, f(x) = 3 x is surjective if f ∈ R → R but not surjective if f ∈ Z → Z.

Example: Show that f: R → R where f(x) = x + 1 is injective.

We can show this as follows by first adding 1 to both sides, then applying the definition of f:
x
≠
x'
x+1
≠
x'+1
f(x)
≠
f(x')
Alternatively, we could use the other method by first apply the definition of f, then subtracting 1 from both sides.
f(x)
=
f(x')
x+1
=
x'+1
x
=
x'

Example: Show that f: R → R where f(x) = x + 1 is surjective.

Suppose any value in the codomain r ∈ R is chosen. Then we can solve the following equation for a corresponding value x in the domain:
f(x)
=
r
x+1
=
r
x
=
r - 1

Definition: A map f: V → W is a bijection or is bijective if f is an injection from V to W and f is a surjection from V to W.

Example: Show that f: R → R where f(x) = x + 1 is bijective.

Definition: Given a map f: V → W, we call { f(v) | v ∈ V } the image of f:

im(f)

{ f(v) | v ∈ V }

5.2. Homomorphisms and isomorphisms

The previous section defined a number of properties of functions f: A → B that only address the equality of elements in A and B. However, what if A and B have relations and operators? For example, what if A and B are vector spaces with an identity, an addition operation, and a scalar multiplication operation?

We can restrict our notion of a map or function to maps or functions f: A → B that somehow preserve or consistently transform the properties of operators and relations we may have already defined on A and B. For example, suppose we have A = {a₁, a₂, a₃}, B = {b₁, b₂, b₃}, an operator ⊕ such that a₁ ⊕ a₂ = a₃, and an operator ⊕ over elements of B such that b₃ ⊕ b₂ = b₁. A map f from A to B that preserves (or respects, or consistently transforms) ⊕ would be such that

f(a₃) = b₁
f(a₂) = b₂
f(a₁) = b₃

Notice that the above map is such that the operation ⊕ is respected by the map f:

f(a₃) = f(a₁ ⊕ a₂) = f(a₁) ⊕ f(a₂) = b₃ ⊕ b₂ = b₁

Definition: Given a binary operator ⊗ over A and another operator ⊕ over B, we say that a map f: A → B is a homomorphism if we have that

∀ a,a' ∈ A, f(a ⊗ a') = f(a) ⊕ f(a').

Notice that a homomorphism might have, but does not necesserily have, any of the properties we introduced for maps: it could be injective, surjective, and so on.

Definition: A bijective homomorphism f: A → B is called an isomorphism.

5.3. Linear transformations

In this course we are interested in a specific kind of homomorphism.

Definition: For any two vector spaces V and W, a map f: V → W is a linear transformation iff we have that for all v,v' ∈ V and scalars s ∈ R,

f(v + v')
=
f(v) + f(v')
f(s ⋅ v)
=
s ⋅ f(v)

In other words, a linear transformation is a homomorphism between vector spaces that preserves vector addition and scalar multiplication. If a map f does not satisfy the above properties, it is not a linear transformation.

Example: For any matrix M ∈ R^n×m, the map f : R^m → Rⁿ defined as f(v) = M ⋅ v is a linear transformation.

We can confirm that any such f satisfies the properties of a linear transformation:
f(v + v')
=
M ⋅ (v + v')

=
M ⋅ v + M ⋅ v'

=
f(v) + f(v')
f(s ⋅ v)
=
M ⋅ (s ⋅ v)

=
s ⋅ (M ⋅ v)

=
s ⋅ f(v)

Example: Representation of a polynomial curve f as a vector of its y values on some fixed number of inputs x₁,...,x_n is a linear transformation.

Suppose we have the transformation φ defined as follows:
φ(f)
=

f(x₁)
⋮
f(x_n)

We can confirm that φ is a linear transformation. Suppose that h(x) = f(x) + g(x). Then we have:
φ(f + g)
=
φ(h)

=

h(x₁)
⋮
h(x_n)

=

f(x₁) + g(x₁)
⋮
f(x_n) + g(x_n)

=

f(x₁)
⋮
f(x_n)

+

g(x₁)
⋮
g(x_n)

=
φ(f) + φ(g)
Suppose that h(x) = s ⋅ f(x). Then we have:
φ(s ⋅ f)
=
φ(h)

=

h(x₁)
⋮
h(x_n)

=

s ⋅ f(x₁)
⋮
s ⋅ f(x_n)

=
s ⋅

f(x₁)
⋮
f(x_n)

=
s ⋅ φ(f)

Fact: For any two linear transformations f: V → W, g: U → V, the composition f o g: U → W is also a linear transformation. To show this, we must demonstrate that if h(x) = f(g(x)) then h respects vector addition and scalar multiplication. For any u,u' ∈ U, and s ∈ R, we have:

h(u + u')

f(g(u + u'))

f(g(u) + g(u'))

f(g(u)) + f(g(u'))

h(u) + h(u')

and

h(s ⋅ u)

f(g(s ⋅ u))

f(s ⋅ g(u))

s ⋅ f(g(u))

s ⋅ h(u)

Because linear transformations are homomorphisms (and, thus, maps), we can ask whether they have properties of maps (i.e., whether they are injective, surjective, bijective, and so on). We will do so further below.

Example: Let the map φ : R³ → {f | f(x) = a x² + b x + c} be defined as follows:

φ(

)

h where h(x) = a x² + b x + c

It is the case that φ is a linear transformation and a bijection. Thus, φ is an isomorphism between R³ and {f | f(x) = a x² + b x + c}.

Example: Differentiation (i.e., finding the derivative) of polynomials is a linear transformation.

As an example, consider the space of polynomials of the form f(x) = a x² + b x + c. If each polynomial is represented as a vector of its coefficients, the differentiation operator for this vector space of functions can be represented as a matrix:

0 0 0
2 0 0
0 1 0

⋅

a
b
c

=

0
2 a
b

Because there exists an isomorphism φ between {f | f(x) = a x² + b x + c} and R³, we can compose φ and φ^-1 with the following linear transformation ψ : R³ → R³
ψ(v)
=

0 0 0
2 0 0
0 1 0

⋅ v
Then, the following differential operator d/dx : {f | f(x) = a x² + b x + c} → {f | f(x) = a x² + b x + c} is a linear transformation because composition of linear transformations are linear transformations:
d/dx
=
φ o ψ o φ^-1

Example: Orthogonal projection onto a vector space is a linear transformation.

We know that any vector space has an orthonormal basis B. Thus, we can compute the projection of any vector v using the formula M_B ⋅ M_B^⊤ ⋅ v. Thus, orthogonal projection onto span B can be defined as a linear transformation f where:
f(v)
=
(M_B ⋅ M_B^⊤) ⋅ v
Since (M_B ⋅ M_B^⊤) is a matrix, f must be a linear transformation.

Fact: For any linear transformation f:V → W and appropriate constants 0 ∈ V and 0' ∈ W,

f(0) = 0'

We can show this in the following way: given any v ∈ V,
0
=
0 ⋅ v
f(0)
=
f(0 ⋅ v)

=
0 ⋅ f(v)

=
0'

Fact: For any linear transformation f:V → W and the corresponding additive inversion operations of V and W, it is the case that f respects additive inversion:

f(-v) = -f(v)

We can derive this fact in two ways. One approach is to use the previous fact that f(0) = 0' to show that f(-v) is indeed the inverse of f(v):

f(v) + f(-v)
=
f(v + (-v))

=
f(0)

=
0'

The other approach is to use the fact that in both vector spaces, v = -1 ⋅ v:

v
=
1 ⋅ v
v + (-1 ⋅ v)
=
(1 ⋅ v) + ((-1) ⋅ v)

=
(1 + (-1)) ⋅ v

=
0 ⋅ v

=
0

Then, because f respects scalar multiplication, we have that:

f(-1 ⋅ v)
=
-1 ⋅ f(v)

Thus, the argument is:

f(v) + f(-v)
=
f(v) + f(-1 ⋅ v)

=
f(v) + ((-1) ⋅ f(v))

=
f(v) + (-f(v))

=
0

5.4. Orthogonal projections as linear transformations

Fact: For v ∈ Rⁿ, an orthogonal projection onto a one-dimensional vector space span{w} is a linear transformation. First, note that the dot product of two vectors can be rewritten as matrix multiplication of two matrices:

w ⋅ v = w^⊤ ⋅ v.

Also, notice that

||w||
=
√(w ⋅ w)
||w||²
=
w ⋅ w

Finally, notice that for nonzero r ∈ R where r ≠ 0, the 1 × 1 matrix

r

∈ R^{1 × 1} is invertible and has inverse

1/r

.

Consider the formula for the projection of w ∈ Rⁿ onto span(v). We can rewrite the formula to use only multiplication of matrices.

(v ⋅ w/||w||) ⋅ w/||w||
=
1/||w||² ⋅ (v ⋅ w) ⋅ w

=
w ⋅ 1/||w||² ⋅ (v ⋅ w)

=
w ⋅ 1/(w ⋅ w) ⋅ (w ⋅ v)

=
w ⋅ (w^⊤ ⋅ w)^-1 ⋅ (w^⊤ ⋅ v)

=
(w ⋅ (w^⊤ ⋅ w)^-1 ⋅ w^⊤) ⋅ v

Thus, we have a product of the matrices w ∈ R^n×1, (w^⊤ ⋅ w)^-1 ∈ R^1×1, and w^⊤ ∈ R^1×n. This product is a matrix in R^n×n. Call it M_span{w}. Then we can define the linear transformation f ∈ Rⁿ → Rⁿ that performs the orthogonal projection onto span{w} as:

f(v) = M_span{w} ⋅ v.

We used the above approach because we will see later that it can be generalized to orthogonal projections onto multidimensional vector spaces. An alternative way to show the above is to notice that w/||w|| can be rewritten as a matrix u ∈ R^n×1. In that case, we have:

(v ⋅ w/||w||) ⋅ w/||w||
=
(v ⋅ u) ⋅ u

=
u ⋅ (u ⋅ v)

=
(u ⋅ u^⊤) ⋅ v

Here, u ⋅ u^⊤ ∈ R^n×n.

Example: Suppose we want to compute the projection of v onto w where:

-2

We can compute the projection as follows:

(v ⋅ w/||w||) ⋅ w/||w||

(

-2

⋅ (

⋅ (1/5))) ⋅ (

⋅ (1/5))

⋅ (1/25) ⋅ (

-2

⋅

)

⋅ (

⋅

)^-1 ⋅ (

⋅

-2

)

(

⋅ (

⋅

)^-1 ⋅

) ⋅

-2

(

⋅

^-1 ⋅

) ⋅

-2

(

⋅

1/25

⋅

) ⋅

-2

(

3/25

4/25

⋅

) ⋅

-2

9/25	12/25
12/25	16/25

⋅

-2

Notice that the columns of the matrix above are in span {w}:

9/25

12/25

(3/25) ⋅

9/25

12/25

∈

span {

}

12/25

16/25

(3/25) ⋅

12/25

16/25

∈

span {

}

Thus, we have that:

span {

9/25

12/25

16/25

} = span {

}

Fact: Suppose that for M ∈ R^n×m and f ∈ R^m → Rⁿ we have an overdetermined system:

M ⋅ v = w

If M^⊤ ⋅ M is invertible (and, thus, (M^⊤ ⋅ M)^-1 is defined), we can say that:
M ⋅ v
=
w
M^⊤ ⋅ (M ⋅ v)
=
M^⊤ ⋅ w
(M^⊤ ⋅ M) ⋅ v
=
M^⊤ ⋅ w
(M^⊤ ⋅ M)^-1 ⋅ (M^⊤ ⋅ M) ⋅ v
=
(M^⊤ ⋅ M)^-1 M^⊤ ⋅ w
v
=
(M^⊤ ⋅ M)^-1 M^⊤ ⋅ w
M ⋅ v
=
M ⋅ (M^⊤ ⋅ M)^-1 M^⊤ ⋅ w.
Thus, even if M ⋅ v = w is overdetermined, M^⊤ ⋅ (M ⋅ v) = M^⊤ ⋅ w has a solution v* such that M ⋅ v* corresponds to the orthogonal projection of v onto the span of the column vectors of M.

Thus, for any matrix M where M^⊤ ⋅ M is invertible, we can define the linear transformation that represents orthogonal projection onto the span of the columns of M:
f(v)
=
M ⋅ (M^⊤ ⋅ M)^-1 M^⊤ ⋅ v.

5.6. Matrices as linear transformations

Fact: Suppose we have the following matrix M ∈ R^m×n and the corresponding linear transformation f : Rⁿ → R^m:

↑		↑
c₁	...	c_n
↓		↓

f(v)

M ⋅ v

Then the following is true:

span {c₁, ..., c_n}

im(f)

Fact: Suppose we have an invertible matrix M ∈ R^n×n and the corresponding linear transformation f : Rⁿ → Rⁿ:

f(v)

M ⋅ v

Then f is a bijection. Because f is also a linear transformation, f is an isomorphism.

Example: Consider the following linear transformation f: R → R²:

f(x)

The linear transformation f is injective but not surjective.

We can show f is injective:
x
≠
x'

x
x

≠

x'
x'

f(x)
≠
f(x')
To show it is not surjective, we choose the following vector w ∈ R² (i.e., in the codomain). The equation is then not solvable, so f is not surjective.
w
=

1
0

f(x)
=

1
0

x
x

=

1
0

x
=
1
x
=
0
1
=
0

Example: Consider the following linear transformation f: R² → span{ [3;4] }:

)

3	6
4	8

⋅

The linear transformation f is surjective but not injective.

Example: Consider the following linear transformation f: R² → span{ [3;4] }:

)

0	1	1
1	0	1

⋅

The linear transformation f is surjective but not injective.

5.7. Application: communication

We want to transmit information, but we have constraints (e.g., security or high cost of transmission). Suppose we have two linear transformations f : V → W and f^-1 : W → V such that f^-1 o f is a bijection (and, thus, an isomorphism). If im(f) has some desirable property that the domain of f does not possess (e.g., it is obscured, or it has a smaller representation size), we can use f as an encoding function for the information we want to transmit, and f^-1 as a decoding function.

Example: Suppose Alice wants to transmit the following vectors to Bob:

{

-3

-4

-2

-5

}

If she simply transmitted the individual real numbers to Bob one-by-one, she would need to transmit 10 real numbers. Can Alice and Bob agree on some protocol that would allow Alice to send fewer real numbers but transmit the same amount of information in this case?

All the vectors in the set are in R². However, Alice and Bob can take advantage of the fact that the vectors are in a subspace of R². In particular, all five vectors are in:
span {

1
-2

}
⊂
R²

Suppose Alice and Bob define the following encoding and decoding linear transformations:
f : span {

1
-2

} → R
f^-1 : R → span {

1
-2

}

f(

x
y

)
=
x
f^-1(s)
=
s ⋅

1
-2

Then f^-1 o f is invertible, and Alice can use f to encode vectors that contain two real numbers within a single real number. This means it is sufficient for Alice to send just five real numbers in order to send Bob the five vectors:
{ f(

-3
6

), f(

2
-4

), f(

1
-2

), f(

0
0

), f(

-5
10

) }
=
{-3, 2, 1, 0, -5}

5.8. Affine spaces and affine transformations

We are interested in studying solution spaces of systems of the form M ⋅ v = w. However, these are not vector spaces because they do not always contain 0 (i.e., the origin). Recall that if a solution space of a system does contain 0, the solution space is a vector space and M ⋅ v = w must be a homogeneous system where w = 0.

Definition: Suppose W is a set of vectors such that 0 ∉ W, and suppose w₀ is the orthogonal projection of 0 onto W. Then we can define W to be a vector space with the following vector addition and scalar multiplication operations:

addition (⊕) can be defined as follows. It is an operation on elements of W under which W is closed, and which satisfies the vector space axioms:

v ⊕ v' = u where u = (v - w₀) + (v' - w₀) + w₀ = v + v' - 2 ⋅ w₀ + w₀ = v + v' - w₀.
scalar multiplication (⊗) can be defined as follows. It is an operation on elements of A under which A is closed, and which satisfies the vector space axioms:

s ⊗ v = u where u = (s ⋅ (v - w₀)) + w₀ = s v - s w₀ + w₀ = s v + (1-s) ⋅ w₀.
there is a unique additive identity in W; it is the vector w₀:

v ⊕ w₀ = v + w₀ - w₀ = v.

Fact: If V is a vector space with addition operation + and scalar multiplication operation ⋅, the vector space W = {v + w₀ | v ∈ V} with addition operation ⊕ and scalar multiplication operation ⊗ is isomorphic to V, with isomorphism f:V → W defined as:

f(v)

v + w₀

We can see that f is a linear transformation because it has the properties of a linear transformation:

f(v + v')

v + v' + w₀

v + w₀ + v'

v + w₀ + v' + w₀ - w₀

(v + w₀) + (v' + w₀) - w₀

f(v) ⊕ f(v')

f(s ⋅ v)

s ⋅ v + w₀

s ⋅ v + w₀ + (s ⋅ w₀ - s ⋅ w₀)

s ⋅ v + s ⋅ w₀ + w₀ - s ⋅ w₀

s ⋅ (v + w₀) + (1-s) ⋅ w₀

s ⊗ f(v)

We can see it is injective:

≠

v+w₀

≠

v'+w₀

f(v)

≠

f(v')

We can see it is surjective because for any w' ∈ W:

f(v)

v+w₀

w' - w₀

Since f is injective, surjective, and a linear transformation, it is an isomorphism. Thus, V and W are isomorphic. This means we can work with W as if it is a vector space by simply mapping back to V using the isomorphism, doing any necessary work, and then mapping back to W by using the inverse of that isomorphism, f^-1.

Example: Consider the following affine space:

{

| y = 2 x + 3}

Suppose we want to find the orthogonal projection of the following point p ∈ R onto the affine space W:

To do so, we first need to find the isomorphism between a vector space V and the space W. We will choose the vector space V that is parallel to W but that goes through the origin; it is defined as:

{

| y = 2 x }

Notice that V is a line with the same slope as W. Next, we must the additive identity w₀ of W by finding the intersection of V^⊥ and W, where V^⊥ is the orthogonal complement of V. We first compute V^⊥:

span {

}

V^⊤

{ u | u ⋅ v = 0 for all v ∈ V}

{ u | u ⋅ v = 0 for all v ∈ basis V}

{ u | u ⋅ v = 0 for all v ∈ {

}}

{ u | u ⋅

= 0 }

{

⋅

= 0 }

{

| x + 2 ⋅ y = 0 }

{

| y = -(1/2) x }

We can now compute the intersection and, thus, w₀:

{w₀}

W ∩ V^⊥

{

| y = 2 x + 3} ∩ {

| y = -(1/2) x }

{

| y = 2 x + 3 and y = -(1/2) x}

{

−6/5

3/5

}

w₀

−6/5

3/5

We can now define an isomorphism f:V → W:

f(v)

v + w₀

v +

−6/5

3/5

Next, to project the point p onto W, it is now sufficient to first project the point onto V, then apply f:

span {

}

√(5)

orthogonal projection of

onto span {

}

(

⋅ (1/√(5))

) ⋅ (1/√(5))

(10/√(5)) ⋅ (1/√(5))

2 ⋅

Finally, the orthogonal projection of p onto W can be computed via f:

orthogonal projection of

onto W

)

−6/5

3/5

4/5

23/5

How do we find an isomorphism between an affine space and a parallel vector space?

Fact: Suppose that for a matrix M ∈ R^n×n, M ⋅ u = w is a non-homogeneous system with at least one solution. Let U be the solution space of M ⋅ u = w:

{u | M ⋅ u = w }

How can we find an isomorphism between U and some vector space V? We let V be the corresponding homogeneous system:

{v | M ⋅ v =

⋮

}

We must now find V^⊥; we can do so by taking the transpose of M and setting up a new homogeneous system:

V^⊥

{M^⊤ ⋅ v | v ∈ Rⁿ }

To find u₀, it is now sufficient to find the intersection of U and V^⊥:

{u₀}

U ∩ V^⊥

The isomorphism f:V → U is then:

f(v)

v + u₀

Example: Suppose we have a non-homogeneous system M ⋅ v = w where M ∈ R^n×n and we want to compute the projection of some p ∈ Rⁿ onto the affine space {v | M ⋅ v = w}. How can we do so?

Suppose that M is of the following form:
M
=

↑ ↑
c₁ ... c_n
↓ ↓

=

← r₁ →
⋮
← r_n →

Then we can say that we want to project p onto {v | M ⋅ v = 0} and find an isomorphism f: {v | M ⋅ v = 0} → {v | M ⋅ v = w}; the orthogonal projection is then f(p).

Thus, we find an orthonormal basis of {v | M ⋅ v = 0} and compute the orthogonal projection of p onto {v | M ⋅ v = 0}; call this p*:
p*
=
orthogonal projection of p onto {v | M ⋅ v = 0}
We then find f by solving the following system of equations for u₀:
M ⋅ u₀
=
w
M^⊤ ⋅ v
=
u₀
We solve the above by first solving for v and then computing u₀:
M ⋅ (M^⊤ ⋅ v)
=
w
u₀
=
M^⊤ ⋅ v
We now have the isomorphism f:
f(v)
=
v + u₀
Then, we can compute the orthogonal projection:
orthogonal projection of p onto {v | M ⋅ v = w}
=
f(p*)
=
p* + u₀

Example: We can find a basis of the following set of solutions of a homogeneous system (thus, a vector space):

{ v |

3	6
1	2

⋅ v = 0 }

We can do so by starting with a basis that spans the entire space that contains V (which in this case is R²), and then "removing" the contribution of the orthogonal complement V^⊥ from these basis vectors.

Thus, we begin with:
R²
=
span {

1
0

,

0
1

}
We know that:
V^⊥
=
span {

3
6

,

1
2

}
=
span {

1
2

}
Thus, we project each basis vector for R² onto V^⊥, then subtract that from the basis vector:

1
0

- (

1
0

⋅ (1/√(5)) ⋅

1
2

) ⋅ (1/√(5)) ⋅

1
2

=

4/5
−2/5

0
1

- (

0
1

⋅ (1/√(5)) ⋅

1
2

) ⋅ (1/√(5)) ⋅

1
2

=

−2/5
1/5

The remaining vectors above span V:
V
=
{ v |

3 6
1 2

⋅ v = 0 }
=
span {

4/5
−2/5

,

−2/5
1/5

}
=
span {

−2/5
1/5

}

5.9. Fixed points, eigenvectors, and eigenvalues

We introduce several properties of linear transformations; these properties it possible to work with linear transformations in new and convenient ways.

Definition: Given a linear transformation f: V → V, v ∈ V is a fixed point of f if:

f(v)

Example: Consider the following linear transformation f: R² → R²:

f(v)

2	1
1	2

⋅ v

The following vector v is a fixed point of f:

−1

)

2	1
1	2

⋅

−1

Fact: For any linear transformation f, 0 is a fixed point of f (see the definition of a linear transformation).

Fact: If v is a fixed point of f and v ≠ 0, then f has an infinite number of fixed points. This is because for any s ∈ R and any fixed point v, we have that:

f(v)

f(s ⋅ v)

s ⋅ f(v)

s ⋅ v

Fact: For any linear transformation f where f(v) = M ⋅ v, let S be the set of fixed points of f:

S = {v | f(v) = v }

Then S is a vector space. There are several ways to show this. Let v be any fixed point of f. Then we have:

f(v)
=
v
M ⋅ v
=
v
(M ⋅ v) − v
=
0
(M ⋅ v) − I ⋅ v
=
0
(M − I) ⋅ v
=
0

Thus, the fixed points of f represented by M are exactly the elements of the solution space of the homogeneous system (M - I) ⋅ v = 0. Alternatively, we could show that S contains 0, S is closed under scalar addition, and S is closed under scalar multiplication.

We can generalize the notion of a fixed point by observing that a fixed point is just a vector on which the linear transformation acts as a scalar multiplier (in the case of a fixed point, the multiplier is 1). If v is a fixed point of f, then we have that:

f(v) = 1 ⋅ v

What if we created another notion that was not restricted to 1?

Definition: Given a linear transformation f: V → V, v ∈ V is an eigenvector of f with eigenvalue λ if:

f(v) = λ ⋅ v

Fact: For any linear transformation f, if v is a fixed point of f, then it is an eigenvector of f with eigenvalue 1.

Definition: The set of eigenvectors of a linear transformation f that have eigenvalue λ is its eigenspace:

eigenspace of f with eigenvalue λ

{ v | f(v) = λ ⋅ v }

Any eigenspace is a vector space.

Fact: Given a linear transformation f ∈ Rⁿ → Rⁿ represented by M ∈ R^n×n and an eigenvector v, consider the following:

f(v)

λ v

M ⋅ v

λ v

(M ⋅ v) − λ v

(M ⋅ v) − λ I ⋅ v

(M − λ I) ⋅ v

The above equation has only zero solutions if (M - λ I) is invertible:

(M - λ I) ⋅ v

(M - λ I)^-1 ⋅ (M - λ I) ⋅ v

(M − λ I)^-1 ⋅ 0

Thus, nonzero eigenvectors v exist only if (M - λ I) is not invertible. However, if it is not invertible, this means that det (M − λ I) = 0. Thus, if it is the case that det (M - λ I) = 0, then there must exist at least one λ that solves this equation. In fact, the eigenvalues of T are exactly the solutions to the equation:

det (M − λ I)

Example: Suppose that we are modelling a system with two dimensions that are each modelled using R:

population in the city;
population in the suburbs.

The following matrix (call it T) represents the movement of the two populations between the two locations (with entries in percentages) over one year:

from city from suburbs

to city

0.95 stay in the city      0.03 move from suburbs to city

to suburbs

0.05 move from city to suburbs      0.97 stay in the suburbs

For example, suppose the population in 1999 is represented by v below. Then the population in 2000 is represented by T ⋅ v:

T ⋅ v
=

0.95 0.03
0.05 0.97

⋅

600,000
400,000

=

582,000
418,000

Let f(v) = T ⋅ v be the linear transformation represented by T.

What does a fixed point of f represent?

A fixed point of f represents a stable population distribution that will not change from one year to the next. For example:

0.95 0.03
0.05 0.97

⋅

375,000
625,000

=

375,000
625,000

Notice that any scalar multiple of this vector, including a vector that is normalized so that the two components add up to 1, is also a fixed point of f:

0.95 0.03
0.05 0.97

⋅

375,000
625,000

=

0.375
0.625

Thus, for this transformation, for any distribution of the population in which 37.5% live in the city, the distribution is stable.

What does an eigenvector of f represent?

An eigenvector of f represents a population distribution that may grow or shrink, but whose relative distribution between the city and the suburbs remains the same from one year to the next.
Does f have any nonzero eigenvectors other than the fixed points?

No, because the sum of the components of any vector in im f is always the same as the sum of the components of the input vector that produced that image.

0.95 0.03
0.05 0.97

⋅

x
y

=

0.95 x + 0.03 y
0.05 x + 0.97 y

x + y
=
(0.95 x + 0.05 x) + (0.03 y + 0.97 y)

=
(0.95 x + 0.03 y) + (0.05 x + 0.97 y)

This is because the sums of the components of the column vectors of T are 1. This makes T a stochastic matrix, and it makes the fixed points the steady-state or equilibrium vectors of T.
Find the vector space of fixed points of f.

Note that either the fixed points of f are {0}, or there are infinitely many. Thus, if we can find a matrix equation whose solutions are the fixed points, we will obtain either a system whose only solution is 0, or an underdetermined system. We know that for our particular f and T, the space of fixed points is the set of solutions to the equation:
S
=
{ v | (

0.95 0.03
0.05 0.97

-

1 0
0 1

) ⋅ v
=

0
0

}
We then find the basis that spans S:
S
=
{v | v ⋅

-0.05
0.03

= 0}

=
span {

0.03
0.05

}

=
span {

3
5

}

Suppose we want to find a closed formula for the population k years after the initial state v (i.e., after applying T to an initial vector v eight times, or T^k ⋅ v) where we have:

v
=

0.6
0.4

The formula should be in terms of k and should not require matrix multiplication. In other words, we should be able to obtain closed formulas for the city population and the suburb population in terms of k.

We can approach this problem by finding the eigenvectors of f. Then, we can express the result of T^k ⋅ v as a linear combination of eigenvectors.
det (

0.95 0.03
0.05 0.97

- λ ⋅

1 0
0 1

)
=
0
det

0.95 - λ 0.03
0.05 0.97 - λ

=
0
(0.95 - λ) ⋅ (0.97 - λ) - (0.03 ⋅ 0.05)
=
0
λ² - 1.92 λ + 0.9215 - 0.0015
=
0
λ² - 1.92 λ + 0.92
=
0
λ
=
(1.92 ± √(1.92² - 4(0.92)))/2
λ
=
1.92/2 ± 0.08/2
λ
=
0.96 ± 0.04
λ
=
1 or 0.92

We already know the eigenvector for eigenvalue 1 (it is the fixed point). To find the eigenvector for the other eigenvalue, we solve:

0.95 0.03
0.05 0.97

⋅

x
y

=
0.92 ⋅

x
y

0.95 x + 0.03 y
=
0.92 x
0.05 x + 0.97 y
=
0.92 y
0.03 x + 0.03 y
=
0
0.05 x + 0.05 y
=
0
x
=
1
y
=
-1

Thus, our eigenvectors are:

e₁
=

3
5

e₂
=

1
-1

Notice that we had a space of solutions because the system was underdetermined; we chose a particular eigenvector. Finally, notice that:

T^k ⋅ (a ⋅ e₁ + b ⋅ e₂)
=
(T^k ⋅ a ⋅ e₁) + (T^k ⋅ b ⋅ e₂)

=
a ⋅ (T^k ⋅ e₁) + b ⋅ (T^k ⋅ e₂)

=
a ⋅ e₁ + b ⋅ 0.92^k ⋅ e₂

Thus, if the initial input vector can be rewritten in terms of the two eigenvectors, we can find the closed formula. In fact, it can be because the two eigenvectors are linearly independent:

a ⋅

3
5

+ b ⋅

1
-1

=

x
y

3 1
5 -1

⋅

a
b

=

x
y

a
b

=

3 1
5 -1

^-1

x
y

a
b

=

3 1
5 -1

^-1

0.6
0.4

a
b

=

0.125
0.225

0.125 ⋅

3
5

+ 0.225 ⋅

1
-1

=

0.6
0.4

The closed formula is:

T^k ⋅

0.6
0.4

=
0.125 ⋅

3
5

+ 0.225 ⋅ 0.92^k ⋅

1
-1

Eigenspaces have many other applications. In particular, they make it possible to provide "natural" interpretations of general notions of concepts such as differentiation in the context of vector spaces.

Example: Differentiation is a linear transformation from the vector space of differentiable functions (or a subset, e.g., the polynomials):

d
dx

(a f + b g)
=
a

d
dx

f + b

d
dx

g

0 0 0
2 1 0
0 1 0

⋅

a
b
c

=

0
2 a
b

Thus, an eigenvector in this vector space is any differentiable function f such that:

d
dx

f
=
λ f

Notice that the above is a differential equation. If λ = 0, then we have for any constant c ∈ R the solution:

f(x) = c

If λ ≠ 0 and we do not restrict ourselves to polynomials but allow all infinitely differentiable functions, then we have the solution:

f(x) = c e^λx

Review 2. Vector and Matrix Algebra, Vector Spaces, and Linear Transformations

The following is a breakdown of what you should be able to do at the end of the course (and of what you may be tested on in an exam). Notice that many of the tasks below can be composed. This also means that many problems can be solved in more than one way.

vectors
- definitions and algebraic properties of scalar and vector operations (addition, multiplication, etc.)
- vector properties and relationships between vectors
  - dot product of two vectors
  - norm of a vector
  - unit vectors
  - orthogonal projection of a vector onto another vector
  - orthogonal vectors
  - linear dependence of two vectors
  - linear independence of two vectors
  - linear combinations of vectors
  - linear independence of three vectors
- lines and planes
  - line defined by a vector and the origin ([0; 0])
  - line defined by two vectors
  - line in R² defined by a vector orthogonal to that line
  - plane in R³ defined by a vector orthogonal to that plane
matrices
- algebraic properties of scalar and matrix multiplication and matrix addition
- collections of matrices and their properties (e.g., invertibility, closure)
  - identity matrix
  - elementary matrices
  - scalar matrices
  - diagonal matrices
  - upper and lower triangular matrices
  - matrices in reduced row echcelon form
  - determinant of a matrix in R^2×2
  - inverse of a matrix and invertible matrices
- other matrix operations and properties
  - determine whether a matrix is invertible
    - using the determinant for matrices in R^2×2
    - using facts about rref for matrices in R^n×n
  - algebraic properties of matrix inverses with respect to matrix multiplication
  - transpose of a matrix
    - algebraic properties of transposed matrices with respect to matrix addition, multiplication, and inversion
  - matrix rank
- matrices in applications
  - solve an equation of the form LU = w
  - matrices and systems of states
    - interpret partial observations of system states as vectors
    - interpret relationships betweem dimensions in a system of states as a matrix
    - given a partial description of a system state and a matrix of relationships, find the full description of the system state
    - interpret system state transitions/transformations over time as matrices
      - population growth/distributions over time
        
        compute the system state after a specifieds amount of time
        
        find the fixed point of a transition matrix
vector spaces
- vector spaces and their properties
  - given a set of vectors or other objects, show that it is a vector space
  - express a vector space as a span of a finite set of vectors
  - given two vector spaces defined using span notation, show one is a subspace of the other
  - given two vector spaces defined using span notation, show they are equal
  - find the basis of a vector space
  - find an orthonormal basis of a vector space
  - find the dimension of a vector space
  - find the orthogonal complement of a vector space
- particular vector spaces
  - the set of polynomials {f | f(x) = a_k x^k + ... + a₁ x + a₀, a₁,...,a_k ∈ R}
  - the solution set of an equation { v | M ⋅ v = w }
- affine spaces
- find the least-squares approximation of an overdetermined linear system
linear transformations
- determine if a relation is a map
- determine if a map is a linear transformation
- given a linear transformation (and/or its matrix representation)...
  - show it is injective
  - show it is surjective
  - show it is bijective
  - find its image (a vector space)
  - find its space of fixed points
  - find its eigenvalues
  - find its eigenvector for a given eigenvalue
- compositions of linear transformations and their properties
- compute orthogonal projections...
  - onto the span of a single vector in Rⁿ
  - onto a subspace of Rⁿ...
    - by first computing an orthonormal basis and then using it to find the projection
    - by using the formula M ⋅ M^⊤ if M has orthonormal columns
    - by using the formula M ⋅ (M^⊤ ⋅ M)^-1 ⋅ M^⊤
applications and solving systems of equations
- curve fitting
  - find a polynomial curve that exactly fits a given set of points in R²
  - find a least-squares approximate polynomial curve that best fits a set of points in R²

Below is a comprehensive collection of review problems going over the material covered in this course. These problems are an accurate representation of the kinds of problems you may see on an exam.

Example: Consider the following vector space:

span{

-1

}

Find the orthogonal projection of the following vector onto V:
u
=

1
2
0

If the two spanning vectors were orthogonal, one approach would be to project u onto a normalized form of each vector, and to add the results. If the two spanning vectors were orthonormal, it would be sufficient to simply project onto each vector, and add the results. Because the two vectors are neither, we can use the formula M (M^⊤ M)^-1 M^⊤ u for the projection where
M
=

2 0
1 1
4 -1
Find a basis of V^⊥.

It is sufficient to set up an underdetermined system, solve for two of the variables in terms of the third, and set the third to an arbitrary constant:

2
1
4

⋅

x
y
z

=
0

0
1
-1

⋅

x
y
z

=
0

2x + y + 4z
=
0
y - z
=
0
y
=
z
x
=
-y/2 - 4z/2 = -z/2 - 4z/2 = -5/2 ⋅ z
Setting z = 2, we get:
V^⊥
=
span {

-5
2
2

}
Find any matrix M such that for f(v) = M ⋅ v, f(v) = 0 for all v ∈ V.

It is sufficient to find a matrix that maps both spanning vectors to 0. From above, we already have a vector that spans V^⊥, so such a matrix can be computed using the formula:

-5 2 0
2 1 1
2 4 -1

⋅

1 0 0
0 0 0
0 0 0

⋅

-5 2 0
2 1 1
2 4 -1

^-1

Example: Consider the vector space of polynomials of degree at most 2:

F = { f | f(x) = a x² + b x + c, a,b,c ∈ R}

The map d:F → F represent differentiation. For example:

f(x)
=
5 x² - 2 x + 3
g(x)
=
10 x - 2
d(f)
=
g

Determine whether d:F → F is injective.

It is not injective because we can find two unequal inputs that produce the same output. Consider the following polynomials:
f(x)
=
1
g(x)
=
2
h(x)
=
0
Then we have that:
d(f)
=
h
d(g)
=
h
Thus, the map d is not injective.
Recall that a polynomial can be represented as a vector. For example, f(x) = 5 x² - 2 x + 3 can be represented as:

5
-2
3

Show that d is a linear transformation by finding a matrix representation for d.

The matrix representation is:

0 0 0
2 1 0
0 1 0

Notice that:

0 0 0
2 0 0
0 1 0

⋅

a
b
c

=

0
2 a
b

Show that d:F → F is not surjective.

Intuitively, we see that there are only two linearly independent columns in the matrix representing d, so there must be vectors in the codomain of d that are not in the image of d, so d is not surjective. To prove this, we find some vector w in the codomain for which there is no solution to the following equation:
w
=

1
0
0

0 0 0
2 0 0
0 1 0

⋅

a
b
c

=
w

0 0 0
2 0 0
0 1 0

⋅

a
b
c

=

1
0
0

0
2a
b

=

1
0
0

0
=
1
Since we derive a contradiction, the above equation must have no solution.

Example: Alice wants to send vectors in R² to Bob. For any vector v ∈ R² that she wants to send, she generates a random scalar r ∈ R and sends w to Bob as defined below:

1	2	0
0	3	0
0	0	2

Find a matrix D ∈ R^{3× 3} that Bob can use to decode Alice's messages.

Alice is sending vectors that are a linear combination of two vectors that carry the two scalars in a message vector v and a noise vector:

1
0
0

x +

2
3
0

y +

0
0
2

z
Bob must first cancel out the noise using a matrix in R³. Then he must take the result of that and recover the scalars x and y. To find an appropriate matrix to cancel out the noise, Bob can use the following formula:
C
=

1 0 0
0 1 0
0 0 0

Once Bob applies C to the vector he receives from Alice, he has:

1
0
0

x +

2
3
0

y +

0
0
2

⋅ 0
=

1 2 0
0 3 0
0 0 0

x
y
0

Notice that we can find a matrix that inverts this operation by replacing the top-left portion of the above encoding matrix with its inverse:

1 2
0 3

^-1
=
(1/3) ⋅

3 -2
0 1

=

1 -2/3
0 1/3

D
=

1 -2/3 0
0 1/3 0
0 0 0

1 -2/3 0
0 1/3 0
0 0 0

1 2 0
0 3 0
0 0 0

x
y
0

=

x
y
0

Thus, Bob can use the following matrix to decode messages:
D ⋅ C
If Bob receives a message w that is the encoded version of v, he can retrieve it by computing:
D ⋅ C ⋅ w
Find a matrix D' ∈ R^2×3 that Bob can use to retrieve v ∈ R² given a transmitted w ∈ R³.

Bob simply needs to drop the third component of the result of D ⋅ C ⋅ w. This can be accomplished by computing:

1 0 0
0 1 0

⋅ D ⋅ C ⋅ w
Thus, an appropriate matrix in R^2×3 would be:

1 0 0
0 1 0

⋅ D ⋅ C

Example: Suppose there are two locations across which a population is distributed. Over the course of each year, the population moves between the two locations according to one of two population distribution transformations depending on how well the economy is doing (A if the economy is doing well, B otherwise):

0.55	0.9
0.45	0.1

0.75	0.3
0.25	0.7

Find an initial state v ∈ R² such that if the economy is always doing well, the population distribution will remain the same.

We need to find a fixed point of f(v) = A ⋅ v. Since dim R² = 2, we know that there are three possibilities:
- 0 is the only fixed point;
- the space of fixed points has dimension 1, so there are infinitely many fixed points that all fall on a single line;
- all points in R² are fixed points.
We solve the following system:

0.55 0.9
0.45 0.1

⋅

x
y

=

x
y

0.55 x + 0.9 y
=
x
0.9 y
=
0.45 x
0.45 x + 0.1 y
=
y
0.45 x
=
0.9 y
x
=
2 y
Since one equation can be derived from the other, the above system is underdetermined, but x can be expressed in terms of y. Thus, the space of fixed points is one-dimensional. Setting y = 10, it can be expessed as a span of the following fixed point vector:

20
10
Suppose that over the course of several years, the economy has both done well and not well. Has the total population (the sum of the populations in the two locations) changed over this duration?

Notice that neither A nor B change the total population in the two locations as represented by a state vector in R². Thus, the total population has not changed.

Example: Given u ∈ Rⁿ, what is the orthogonal projection of v ∈ Rⁿ onto span{u}?

It is sufficient to compute the orthogonal projection of v onto u. Given a unit vector e parallel to u, the projection of v onto e would be:

(e ⋅ v) ⋅ e.

However, we cannot assume u is a unit vector. Thus, we scale u to obtain a unit vector e = u/||u||. Then, the solution is:

((u/||u||) ⋅ v) ⋅ (u/||u||).

Example: Let M ∈ R^{17× 9} and f(v) = M ⋅ v. What is the largest possible value of dim(im(f))?

We know that we can only compute M v if v has as many rows as M has columns. Thus, if f(v) = M ⋅ v, then v ∈ R⁹. We also know that M ⋅ v will be a vector with 17 rows because M has 17 rows, so f(v) ∈ R¹⁷. Thus, f ∈ R⁹ → R¹⁷.

This means that dim(im(f)) cannot be greater than dim(R¹⁷), so it cannot exceed 17. However, we also need to note that M has 9 columns, and that any value in the image of f is thus a linear combination of at most 9 vectors. Thus, any basis of im(f) has at most 9 distinct vectors in it. Since dim(im(f)) is the size of the basis of im(f), dim(im(f)) can be at most 9.

Example: Find an orthonormal basis for the following vector space:

span {

1
0
2
0

,

0
1
2
3

,

0
0
0
1

}.

We can use the algorithm for computing the vectors in an orthonormal basis. We can work with the vectors in any order, so suppose we have:

v₁ =

0
0
0
1

, v₂ =

0
1
2
3

, and v₃ =

1
0
2
0

.

According to the algorithm, we then let u₁ = v₁ and e₁ = u₁ / ||u₁||. In this case, we still have e₁ = v₁. Next, we compute:

u₂
=
v₂ - ((v₂ ⋅ e₁) ⋅ e₁)
=

0
1
2
3

-

0
0
0
3

=

0
1
2
0

e₂
=

0
1/√(5)
2/√(5)
0

u₃
=
v₃ - ((v₃ ⋅ e₁) ⋅ e₁) - ((v₃ ⋅ e₂) ⋅ e₂)
=

1
0
2
0

-

0
0
0
0

-

0
4/5
8/5
0

=

1
-4/5
2/5
0

e₃
=
(3√(5))/5 ⋅

1
-4/5
2/5
0

Thus, {e₁, e₂, e₃} is an orthonormal basis for span{v₁, v₂, v₃}.

Example: Suppose we are given a linear transformation f:R² → R² where:

f(v)
=

2 7
3 1

⋅ v.

Find an orthonormal basis for im(f).

We know that the column vectors of the matrix are linearly independent. Thus, dim(im(f)) as at least 2, and since the codomain of f is R², im(f) = R². Thus, any orthonormal basis of R² is appropriate, such as:

{

1
0

,

0
1

}.

Find the set of all v such that f(v) = 0.

Let the matrix in the definition of f be M. We know from lecture that M is invertible iff M ⋅ v = 0 has exactly one solution 0. Since det M ≠ 0, M is invertible, so there is exactly one solution in the solution space {0}.

Alternatively, it is sufficient to find the set of solutions to the equation f(v) = 0, which is the set of solutions (expanding the definition of f) of:

2 7
3 1

⋅ v
=

0
0

.
Since M is invertible, we can multiply both sides by M^-1 to obtain:
v
=
1/(3-21) ⋅

1 -7
-3 2

⋅

0
0

v
=

0
0

.
Thus, there is only one solution in the solution set: {0}.

Show that f is surjective.

To show that f is surjective, it is sufficient to show that for every v in the codomain, there exists x such that:
f(x)
=
v
In other words, we want a formula for x in terms of v that is defined for any v. Let the matrix in the definition of f be M. Since M is invertible, we can define:
x
=
1/(3-21) ⋅

1 -7
-3 2

⋅ v
This formula for x is always defined, so f is surjective.

Example: Suppose we are given a linear transformation f:R² → R² where:

f(v)
=

2 1
8 4

⋅ v.

Find dim(im(f)).

Recall that the dimension of a space is the size of any basis of that space (all bases of a space have the same size). Let M be the matrix in the definition of f. The space im(f) is equivalent to the span of the column vectors of the matrix:

im(f)
=
span{

2
8

,

1
4

}

To find a basis of a space spanned by a collection of vectors, we create a matrix whose rows are the vectors in that collection, find its reduced row echelon form, and keep only the nonzero rows in the basis:

2 8
1 4

→

1 4
1 4

→

1 4
0 0

im(f)
=
span{

1
4

}

Thus, the dimension of im(f) is 1.
Show that f is not injective.

To show that a map is not injective, it is sufficient to find v and v' such that v ≠ v' but f(v) = f(v').

One approach to finding such v and v' is to expand f(v) = f(v') until the constraints are simple enough that it is straightforward to try some inputs and easily check that they satisfy the constraints.

f(x)
=
f(x')

2 1
8 4

⋅

x
y

=

2 1
8 4

⋅

x'
y'

Since the top row of the matrix is a multiple of the bottom row, we get one equation:

x
≠
x'
y
≠
y'
2x + y
=
2x' + y'

One possible pair of vectors in the domain that satisfies the above is:

x
y

=

0
2

x'
y'

=

1
0

Thus, f is not injective.
Define the solution space f(v) = 0 as a span and find its basis.

We write down the definition of the solution space of the homogeneous system above:
{

x
y

| f(

x
y

) =

0
0

}
=
{

x
y

|

2 1
8 4

⋅

x
y

=

0
0

}
The constraints imposed above can be summarized as:
{

x
y

| 2x + y = 0}
=
{

x
y

| y = -2x}
Thus, the solution space is the line in R² defined by the equation y = -2x. We can choose any vector on this line and take the span of the set containing only that vector to provide an explicit definition for the set of solutions:
ker(f)
=
span {

-1
2

} .

Example: Find a matrix M such that for f(v) = M ⋅ v,

im(f)

(span{

})^⊥

One way to approach this problem is to expand the definition on the right-hand side using the definition of the orthogonal complement operation:

im(f)
=
(span{

2
0
0
0

,

0
0
2
0

})^⊥
=
{

x
y
z
t

|

2
0
0
0

⋅

x
y
z
t

= 0,

0
0
2
0

⋅

x
y
z
t

= 0 }

=
{

x
y
z
t

| 2x = 0,      2z = 0,      y,t ∈ R }

=
{

x
y
z
t

| x = 0,      z = 0,      y,t ∈ R }

=
{

0
y
0
t

| y,t ∈ R }.
Thus,
im(f)
=
span{

0
1
0
0

,

0
0
0
1

}.
Recall that the image of f is the span of the columns of M. Thus, one possible solution is:
M
=

0 0
1 0
0 0
0 1

.

Example: Suppose that a² + b² = 1. What is the orthogonal projection of v onto im(f) if f(v) = M ⋅ v, and:

v
=

1
2
3

M
=

a 0 0
b 0 0
0 c 0

We have that:
im(f)
=
span{

a
b
0

,

0
0
c

}.
Because a² + b² = 1, the first vector is already a unit vector and orthogonal to the second vector. By rescaling the second vector to be a unit vector, we can obtain an orthonormal basis:
im(f)
=
span{

a
b
0

,

0
0
1

}.
We can now find the orthogonal projection of v onto each vector in the orthonormal basis, and add these to find the orthogonal projection of the vector onto im(f):
(

1
2
3

⋅

a
b
0

) ⋅

a
b
0

+ (

1
2
3

⋅

0
0
1

) ⋅

0
0
1

=

a² + 2ba
ab + 2b²
3

.

Example: Suppose we are given the following points:

0
6

,

2
6

,

2
12

.

Find a function of the form f(x) = ax² + bx + c that is the best least-squares fit for these points.

We begin by writing down the equations in terms of f for each point:

f(0)
=
6
f(2)
=
6
f(2)
=
12

We construct the matrix equation that represents the above system:

(0)² (0) 1
(2)² (2) 1
(2)² (2) 1

⋅

a
b
c

=

6
6
12

There is no solution to the above equation. However, we are looking for the least-squares best fit approximation. Thus, we first find an orthonormal basis of the image of the matrix. If the matrix is M and f(v) = M ⋅ v, we have that:

im(f)
=
span {

0
4
4

,

0
2
2

,

1
1
1

}.

Using the algorithm for finding an orthonormal basis, we obtain:
im(f)
=
span {

0
1/√{2}
1/√{2}

,

1
0
0

}.

We now find the orthogonal projection of

6
6
12

onto im(f) and use it to rewrite the equation so that it is not overdetermined:

(0)² (0) 1
(2)² (2) 1
(2)² (2) 1

⋅

a
b
c

=

6
9
9

This system is underdetermined. The space of solutions implied by the above equation is:
{ f | f(x) = ax² + bx + c, c = 6, 4a + 2b + c = 9}
This means that any solution to the system is a least-squares best fit approximation. One possible best fit approximation is:

f(x) = x² + (-1/2)x + 6

Find a function of the form f(x) = bx + c that is the best least-squares fit for these points.

We follow the same process as in part (a) above. The matrix equation is:

(0) 1
(2) 1
(2) 1

⋅

b
c

=

6
6
12

There is no solution to the above equation. However, we are looking for the least-squares best fit approximation. Thus, we first find an orthonormal basis of the image of the matrix. If the matrix is M and f(v) = M ⋅ v, we have that:

im(f)
=
span {

0
2
2

,

1
1
1

}.

Using the algorithm for finding an orthonormal basis, we obtain:

im(f)
=
span {

0
1/√{2}
1/√{2}

,

1
0
0

}.

We again find the orthogonal projection of

6
6
12

onto im(f) and use it to rewrite the equation so that it is not overdetermined:

(0) 1
(2) 1
(2) 1

⋅

b
c

=

6
9
9

This yields c = 6 and b = 3/2. Thus, the least-squares best fit approximation is:

f(x) = (3/2)x + 6

Example: Show that for M ∈ R^n×n, the space of solutions to M x = 0 is a vector space (you do not need to show that all the axioms hold; it is sufficient to show the appropriate closure properties).

Let S be the space of solutions.

We show that 0 ∈ S:
M ⋅ 0 = 0.

We show that if v,v' ∈ S, then v + v' ∈ S:
M ⋅ v
=
0
M ⋅ v'
=
0
M ⋅ (v + v')
=
M ⋅ v + M ⋅ v'

=
0 + 0

=
0.

We show that if v ∈ S, then for any scalar s ∈ R, s v ∈ S:
M ⋅ v
=
0
M ⋅ (s ⋅ v)
=
s ⋅ (M ⋅ v)

=
s ⋅ 0

=
0.

Example: Compute the orthogonal projection of v onto im(f) where f(v) = M ⋅ v,

v
=

1
-2
3

, and M
=

1 1
2 8
1 5

.

Normally, the way to approach such problems is to find an orthonormal basis for im(f) where f(v) = M ⋅ v, then use that orthonormal basis to project v onto im(f). However, it's a good idea to check for special cases before delving into a large computation. In this particular example, we have that:

1
-2
3

⋅

1
2
1

=
0

1
-2
3

⋅

1
8
5

=
0.

Thus, v is orthogonal to both vectors, so the orthogonal projection of v onto im(f) is 0 ∈ R³.

It is worth noting that v is in the orthogonal complement of im(f). In general, given a vector subspace W ⊂ V, for any w ∈ W^⊥, the orthogonal projection of w onto W is 0.

Example: Is the system below overdetermined or underdetermined? Does it have a solution?

1 2 0
0 1 -4
0 0 1

⋅ v
=

8
3
0

The system is neither overdetermined nor underdetermined. It has a solution. Since the matrix is upper triangular, we can use the algorithm for solving a system with an upper triangular matrix to obtain v:
v
=

2
3
0

.

Example: Determine whether the system below has a solution for all possible v ∈ R³. If it does not, describe exactly the set of vectors v for which the system below has a solution and determine whether this set a vector space.

1 2 2
0 2 -1
1 4 1

⋅ x
=
v .

Notice that determining whether the system has a solution for any v is equivalent to determining whether the linear transformation represented by the matrix is surjective. It is also equivalent to determining whether the matrix is invertible.

To determine whether the matrix is invertible, we could compute its reduced row echelon form by finding an appropriate series of row operations.

1 2 2
0 2 -1
1 4 1

→

1 2 2
0 1 -1/2
1 4 1

→

1 0 3
0 1 -1/2
1 4 1

→

1 0 3
0 1 -1/2
0 4 -2

→

1 0 3
0 1 -1/2
0 0 0

Given the above, we see that the matrix is not invertible.

Let the matrix be M. The set of vectors for which there is a solution to the above equation is im(f) where f(x) = M ⋅ x. Thus, we want to find im(f) explicitly. We could say:
im(f) = span {

1
0
1

,

2
2
4

,

2
-1
1

}.
If we want to be more precise, we could compute rref(M^⊤) to find a basis for im(f).

Since the set of possible vectors v for which the equation has a solution is im(f) and f is a linear transformation, the set of such v is a vector space.

model (language for describing system states)	example of a system state in the model	interpretation	system
R	7	number of giraffes	zoo
R	1	distance in AU between the two objects	the Earth-Sun system
R	5.6	temperature in Celsius	weather in Boston

model (language for describing system states)	a system state in the model	interpretation	system
R with addition (+)	3	number of apples in one of the apple baskets	a collection of apple baskets
R with addition (+)	3 + 2	number of apples in two of the apple baskets	a collection of apple baskets
R with addition (+)	2 + 3	number of apples in two of the apple baskets	a collection of apple baskets
symbolic language	symbol string (a.k.a., "term") in the language	meaning of symbol string	system

term
0
1.2
x	x is a real number
t₁ + t₂	if t₁ and t₂ are terms
t₁ − t₂	if t₁ and t₂ are terms
− t	if t is a term
t₁ ⋅ t₂	if t₁ and t₂ are terms

new term language construct	what it represents
vectors	system state in the model
matrix	transitions between system states
matrix	changes to system states
matrix	relationships between system states
matrix multiplication	composition of transformations of system states

formula	conditions	meaning
true		always true
false		always false
t₁ = t₂	t₁ and t₂ are terms	true only if the meaning (e.g., system state) of t₁ and t₂ is the same
t₁ < t₂	t₁ and t₂ are terms	true only if t₁ and t₂ represent real numbers, and t₁ is less than t₂
f₁ and f₂	if f₁ and f₂ are formulas	only true if both f₁ and f₂ are true; otherwise, it is false
f₁ or f₂	if f₁ and f₂ are formulas	true if f₁, f₂, or both are true; only false if both are false
not f	if f is a formula	true if f is false; false if f is true
f₁ implies f₂	if f₁ and f₂ are formulas	only true if f₂ is true whenever f₁ is true, or equivalently, only true if f₁ is false or f₂ is true
f₁ iff f₂	if f₁ and f₂ are formulas	only true if f₁ and f₂ are both true, or f₁ and f₂ are both false
∀ x ∈ S, f	if S is a set and f is a formula	true only if taking for every element of S, replacing x inside f with that element makes f true
∃ x ∈ S, f	if S is a set and f is a formula	true only if there is at least one element of S that can replace x inside f so that f is true

property	definition	example
reflexivity	for any term t, t = t	HTML text load into verifier ∀ x ∈ R, x = x `\forall x \in \R, x = x`
symmetry	for any terms t₁ and t₂, t₁ = t₂ implies t₂ = t₁	HTML text load into verifier ∀ x,y ∈ R, x = y implies y = x `\forall x,y \in \R, x = y \implies y = x`
transitivity	for any terms t₁, t₂, and t₃, t₁ = t₂ and t₂ = t₃ implies t₁ = t₃	HTML text load into verifier ∀ x,y,z ∈ R, x = y y = z implies x = z `\forall x,y,z \in \R, x = y y = z \implies x = z`

property	definition(s)	algebraic properties for R² u = [x;y], v = [x',y'], w = [x'',y'']
v has length s	\|\|v\|\| = s or √(v ⋅ v) = s	\|\|v\|\| = √(xx + xy) = √([x,y] ⋅ [x,y])
v is a unit vector	\|\|v\|\| = 1 or v ⋅ v = 1	1 = \|\|v\|\| = √(xx + xy) = √([x,y] ⋅ [x,y]) 1 = \|\|v\|\| = xx + xy = [x,y] ⋅ [x,y]
u and v are linearly dependent u and v are collinear	∃ a ∈ R, a ⋅ u = v	y/x = y'/x' (the vectors have the same slope)
u and v are linearly independent	∀ a ∈ R, a ⋅ u ≠ v or equivalently, not (∃ a ∈ R, a ⋅ u = v)	y/x ≠ y'/x' (the vectors have different slopes)
u and v are orthogonal	u ⋅ v = 0	y/x = -x'/y'
w is a projection of v onto u	w = (v ⋅ (u/\|\|u\|\|)) ⋅ u/\|\|u\|\|
d is the (Euclidean) distance between v and w	d = \|\|u - v\|\| = \|\|v - u\|\|	d = √((x - x')² + (y - y')²)
L is the unique line parallel to v ∈ R²	L = { a ⋅ v \| a ∈ R } L = { p \| ∃ a ∈ R, p = a ⋅ v }	{ [x',y'] \| y' = m x' } where m = y/x
L is the unique line orthogonal to v ∈ R²	L = { w \| v ⋅ w = 0 }
L is the unique line defined by the two points u ∈ R² and v ∈ R²	L = { a (u - v) + u \| a ∈ R } L = { p \| ∃ a ∈ R, p = a (u - v) + u }
P is the unique plane orthogonal to v ∈ R³	P = { w \| v ⋅ w = 0 }
P is the unique plane of linear combinations of v, w ∈ R³ where v and w are linearly independent	P = { a v + b w \| a ∈ R, b ∈ R }
w is a linear combination of u and v	∃ a,b ∈ R, w = au + bv
{u, v, w} are linearly independent	not (u is a linear combination of v and w) and not (v is a linear combination of u and w) and not (w is a linear combination of u and v)

	chickens	cows
heads	1 head/chicken	1 head/cow
legs	2 legs/chicken	4 legs/cow

term	definition	restrictions	general properties
M₁ + M₂	component-wise	matrices must have the same number of rows and columns	commutative, associative, has identity (matrix with all 0 components), has inverse (multiply matrix by -1), scalar multiplication is distributive
M₁ ⋅ M₂	row-column-wise dot products	columns in M₁ = rows in M₂ rows in M₁ ⋅ M₂ = rows in M₁ columns in M₁ ⋅ M₂ = columns in M₂	associative, has identity I (1s in diagonal and 0s elsewhere), distributive over matrix addition, not commutative in general, no inverse in general
M^-1		columns in M = rows in M matrix is invertible	M^-1 ⋅ M = M ⋅ M^-1 = I

level of abstraction	interpretations of multiplication of a vector by a matrix
applications	transformation of system states	extraction of information about system states	computing properties of combinations or aggregations of objects (or system states)	conversion of system state observations from one set of dimensions to another
geometry	"moving" vectors in a space (stretching, skewing, rotating, reflecting)	projecting vectors	taking a linear combination of two vectors	reinterpreting vector notation as referring to a collection of non-canonical vectors

level of abstraction	interpretations of multiplication of two matrices
applications	composition of system state transformations or conversions
geometry	sequencing of motions of vectors within a space (stretching, skewing, rotating, reflecting)

level of abstraction	invertible matrix		singular matrix
applications	reversible transformation of system states	extraction of complete information uniquely determining a system state	irreversible transformation of system states	extraction of incomplete information about a system state
geometry	reversible transformation or motion of vectors in a space		projection onto a strict subset of a set of vectors (space)
symbolic	reversible transformation of information numerically encoded in matrix (example of such information: system of linear equations encoded as matrix)		irreversible/"lossy" transformation of information encoded in matrix

subset of R^n×n	definition	closed under matrix multiplication	properties of matrix multiplication	inversion
identity matrix	∀ i,j M_ij = 1 if i=j, 0 otherwise	closed	commutative, associative, distributive with addition, has identity	has inverse (itself); closed under inversion
elementary matrix	can be obtained via an elementary row operation from I: add nonzero multiple of one row of the matrix to another row multiply a row by a nonzero scalar swap two rows of the matrix Note: the third is a combination of the first two operations.		associative, distributive with addition, have identity	have inverses; closed under inversion
scalar matrices	∃ s ∈ R, ∀ i,j M_ij = s if i=j, 0 otherwise	closed	commutative, associative, distributive with addition, have identity	nonzero members have inverses; closed under inversion
diagonal matrices	∀ i,j M_ij ∈ R if i=j, 0 otherwise	closed	associative, distributive with addition, have identity	nonzero members have inverses; closed under inversion
matrices with constant diagonal	∀ i,j M_ii = M_jj		associative, distributive with addition, have identity
symmetric matrices	∀ i,j M_ij = M_ji		associative, distributive with addition, have identity
symmetric matrices with constant diagonal	∀ i,j M_ii = M_jj and M_ij = M_ji	closed	commutative, associative, distributive with addition, have identity
upper triangular matrices	∀ i,j M_ij = 0 if i > j	closed	associative, distributive with addition, have identity	not invertible in general; closed under inversion when invertible
lower triangular matrices	∀ i,j M_ij = 0 if i < j	closed	associative, distributive with addition, have identity	not invertible in general; closed under inversion when invertible
invertible matrices	∃ M^-1 s.t. M^-1 M = M M^-1 = I	closed	associative, distributive with addition, have identity	nonzero members have inverses; closed under inversion
square matrices	all of R^n×n	closed	associative, distributive with addition, have identity

M is ...	algorithm to solve M v = w for v
the identity matrix	w is the solution
an elementary matrix	perform a row operation on M to obtain I; perform the same operation on w
a scalar matrix	divide the components of w by the scalar
a diagonal matrix	divide each component of w by the corresponding matrix component
an upper triangular matrix	start with the last entry in v, which is easily obtained; move backwards through v, filling in the values by substituting the already known variables
a lower triangular matrix	start with the first entry in v, which is easily obtained; move forward through v, filling in the values by substituting the already known variables
product of a lower triangular matrix and an upper triangular matrix	combine the algorithms for upper and lower triangular matrices in sequence (see example below)
an invertible matrix	compute the inverse and multiply w by it

M is in row echelon form	all nonzero rows are above any rows consisting of all zeroes the first nonzero entry (from the left) of a nonzero row is strictly to the right of the first nonzero entry of the row above it all entries in a column below the first nonzero entry in a row are zero (the first two conditions imply this)
M is in reduced row echelon form	M is in row echelon form the first nonzero entry in every row is 1; this 1 entry is the only nonzero entry in its column

M is invertible	rref M ≠ I
the matrix M^-1 exists	the last row of rref M is all zeroes
(E₁ ⋅ ... ⋅ E_n) M = rref M
((E₁ ⋅ ... ⋅ E_n) M) M^-1 = (rref M) ⋅ M^-1
E₁ ⋅ ... ⋅ E_n = (rref M) ⋅ M^-1	the last row of (rref M) ⋅ M^-1 is all zeroes
(rref M) ⋅ M^-1 is invertible because it is a product of the invertible matrices E₁, ..., E_n	(rref M) ⋅ M^-1 is not invertible because multiplication by it is a many-to-one function

fact				justification
(1)	{M \| M is a finite product of elementary matrices}	=	{M \| rref M = I}	I is an elementary matrix; sequences of row operations are equivalent to multiplication by elementary matrices
(2)	{M \| M is a finite product of elementary matrices}	⊂	{M \| M is invertible}	elementary matrices are invertible; products of invertible matrices are invertible
(3)	{M \| rref M = I}	⊂	{M \| M is invertible}	fact (1) in this table; fact (2) in this table; transitivity of equality
(4)	{M \| M is invertible}	⊂	{M \| rref M = I}	proof by contradiction; non-invertible M implies rref M has all zeroes in bottom row
(5)	{M \| M is invertible}	=	{M \| rref M = I}	for any sets A,B, A ⊂ B and B ⊂ A implies A = B
(6)	{M \| M is a finite product of elementary matrices}	=	{M \| M is invertible}	fact (1) in this table; fact (5) in this table; transitivity of equality

kind of set (of vectors)	maximum cardinality ("quantity of elements")	solution space of a...	examples
finite set of vectors	finite		{(0,0)} {(2,3),(4,5),(0,1)}
vector space	infinite	homogeneous system of linear equations: M ⋅ v = 0	{(0,0)} R R² span{(1,2),(2,3),(0,1)} any point, line, or plane intersecting the origin
affine space	infinite	nonhomogeneous system of linear equations: M ⋅ v = w	{ a + v \| v ∈ V} where V is a vector space and a is a vector any point, line, or plane

vector space	addition operation	additive identity	scalar multiplication operation
R	addition of real numbers	0	multiplication of real numbers
R²	vector addition: [ a ; b ] + [ c ; d ] = [ a + c ; b + d ]	[ 0 ; 0 ]	scalar multiplication: s ⋅ [ a ; b ] = [ s ⋅ a ; s ⋅ b ]
R³	vector addition: [ a ; b ; c ] + [ d ; e ; f ] = [ a + d ; b + e ; c + f ]	[ 0 ; 0 ; 0]	scalar multiplication: s ⋅ [ a ; b ; c ] = [ s ⋅ a ; s ⋅ b ; s ⋅ c ]
Rⁿ	vector addition: [ a₁ ; ... ; a_n ] + [ b₁ ; ... ; b_n ] = [ a₁ + b₁ ; ... ; a_n + b_n ]	[ 0 ; ... ; 0 ]	scalar multiplication: s ⋅ [ a₁ ; ... ; a_n ] = [ s ... a₁ ; ... ; s ⋅ a_n ]
span { [0;0] } = { [0;0] }	vector addition	[0;0]	scalar multiplication of vectors in R²
span { v₁ , ... , v_k } ⊂ Rⁿ	vector addition	[ 0 ; ... ; 0 ]	scalar multiplication of vectors in Rⁿ
R^2×2	matrix addition	[ 0 , 0 ; 0 , 0 ]	scalar multiplication of a matrix
R^n×n	matrix addition	[ 0, ..., 0 ; ... ; 0, ..., 0 ]	scalar multiplication of a matrix
affine space with origin at a ∈ R²	v ⊕ w = (v - a) + (w - a) + a	a	s ⊗ v = s ⋅ (v - a) + a
set of lines through the origin f(x) = a x	f ⊕ g = h where h(x) = f(x) + g(x)	f(x) = 0 ⋅ x	s ⊗ f = h where h(x) = s × f(x)
set of polynomials of degree 2 f(x) = a x² + b x + c	f ⊕ g = h where h(x) = f(x) + g(x)	f(x) = 0	s ⊗ f = h where h(x) = s × f(x)
set of polynomials of degree k f(x) = a_k x^k + ... + a₀	f ⊕ g = h where h(x) = f(x) + g(x)	f(x) = 0	s ⊗ f = h where h(x) = s × f(x)

concept related to solving M_B x = v	notation	relationship	notation	geometric concept
the space of values W' with which we can replace w in the overdetermined system M v = w to make a system M v = w' that has solutions	{M ⋅ v \| v ∈ Rⁿ}	the span of the columns of M	span B	the subspace W' of W spanned by B
the error of an approximate solution v'	\|\|w - M v' \|\|	M v' = w'	\|\|w - w'\|\|	the distance between w ∈ W and w' ∈ span B
M ⋅ v* where v* is the minimum error solution	for all v', \|\|w - M w* \|\| ≤ \|\|w - M w' \|\|	M v* = w*	for all w' ∈ span B, \|\|w - w*\|\| ≤ \|\|w - w'\|\|	the orthogonal projection w* of w ∈ W onto span B (the closest vector in span B to w)

construct	definition	example	graphical example
V × W	{ (v,w) \| v ∈ V, w ∈ W }	{1,2,3} × {4,5,6} = {(1,4),(1,5),(1,6), (2,4),(2,5),(2,6), (3,4),(3,5),(3,6)}
R is a relation between V and W	R ⊂ V × W	{(1,D), (2,B), (2,C)} is a relation between {1,2,3,4} and {A,B,C,D}
f is a function from V to W f is a map from V to W	f is a relation between V and W and ∀ v ∈ V, there is at most one w ∈ W s.t. f relates v to w	{ (x,f(x)) \| f(x) = x² }
R^-1 is the inverse of R	{ (w,v) \| (v,w) ∈ R }
f: X → Y is injective
f: X → Y is surjective
f: X → Y is bijective