The lecture notes below are presented for archival purposes as they appeared at the time that this course was originally offered. The latest version of these notes is available at https://lapets.io/courselinearalgebra.
NOTE: This page contains all the machineverifiable examples presented during the lectures, as well as all the homework assignments. Click here to go back to the main page with the course information and schedule.
This section introduces at a high level the direction and goals of this course, as well as some terminology.
Do not worry if all the details are not clear; all this material will be covered in greater detail and with many more examples during the course.
1.1. Models, systems, and system states
When many realworld problems are addressed or solved mathematically and computationally, the details of those problems are abstracted away until they can be represented directly as idealized mathematical structures (e.g., numbers, sets, matrices, and so on). We can talk
about this distinction by calling the realworld problems systems (i.e., organized collections of interdepentend components) that have
distinguishable system states, and the abstract versions of these systems that only capture some parts or aspects of the different
system states as models. Since models are abstractions, we must also consider the interpretation of a model that explains how
the different parts of the model are related to the different parts of the system (i.e., what each part of the model represents).
Definition: We represent the set of all real numbers using the symbol R.
R
=
{ 1, 1.2, 3, 4, 1/3, π, e, ... }
The set of real numbers is a very simple kind of model that can be used to represent
quantities or, by considering quantities and fractions of units (e.g., inch, meter, kilogram, and so on), a magnitude.
If we adopt R as our model of a system, we can then represent individual system states using individual real numbers.
model (language for describing system states)
example of a system state in the model
interpretation
system
R
7
number of giraffes
zoo
R
1
distance in AU between the two objects
the EarthSun system
R
5.6
temperature in Celsius
weather in Boston
Thus, real numbers are a very simple symbolic language for modelling systems. Furthermore, this language allows us to describe characteristics of
systems concisely ("7" is much more concise than a drawing of seven giraffes).
1.2. A language of system states: terms and their algebraic properties
Real numbers can be used to characterize the quantities or magnitudes of parts of a system, but this is often not enough.
We may want to capture within our models certain kinds of changes that might occur to a system. In order to do so, we must
add something to our language of models: binary operators such as +, , ⋅, and so on.
model (language for describing system states)
a system state in the model
interpretation
system
R with addition (+)
3
number of apples in one of the apple baskets
a collection of apple baskets
R with addition (+)
3 + 2
number of apples in two of the apple baskets
a collection of apple baskets
R with addition (+)
2 + 3
number of apples in two of the apple baskets
a collection of apple baskets
symbolic language
symbol string (a.k.a., "term") in the language
meaning of symbol string
system
The different parts of this language for models have technical names. So far, we have seen real numbers and operators. We can build up
larger and larger symbol strings using operators and real numbers. These are all called terms.
Definition: The following table defines the collection of symbol strings corresponding to terms.
term
0
1.2
x
x is a real number
t_{1} + t_{2}
if t_{1} and t_{2} are terms
t_{1} − t_{2}
if t_{1} and t_{2} are terms
− t
if t is a term
t_{1} ⋅ t_{2}
if t_{1} and t_{2} are terms
Note that in these notes, we use the multiplication symbols ⋅ and * interchangeably, or omit them entirely.
Notice that when our language for models only had real numbers, each symbol corresponded to a unique system state. However, by introducing
operators into our language for describing system states, we are now able to write down multiple symbol strings that represent the same
system state (e.g., 2 + 3 and 3 + 2). In other words, "2 + 3" and "3 + 2" have the same meaning.
Exercise: List as many of the algebraic properties of the operators + and ⋅ over the real numbers as you can.
For all real numbers x∈R, y∈R, and z∈R, it is true that:
x + y
=
y + x
x + (y + z)
=
(x + y) + z
x + 0
=
x
x ⋅ y
=
y ⋅ x
x ⋅ (y ⋅ z)
=
(x ⋅ y) ⋅ z
x ⋅ 1
=
x
x ⋅ (y + z)
=
(x ⋅ y) + (x ⋅ z)
x + (x)
=
0
1.3. Extending the language of models: vectors and matrices
Real numbers and operators are still not sufficient to capture many interesting aspects of realworld systems: they can only capture quantities,
onedimensional magnitudes (e.g., distance). What if we want to capture multidimensional magnitudes (e.g., temperature and pressure)
or relationships between different quantities? These can correspond to points, lines, planes, spaces; relationships and changes within
such systems can be modelled as translations, rotations, reflections, stretching, skewing, and other transformations.
Example: Suppose we have $12,000 and two investment opportunities: one has an annual return of 10%, and the other has an annual return
of 20%. How much should we invest in each opportunity to get exactly $1800 over one year?
The above problem can be represented as a system in which two magnitudes x∈R and y∈R have a relationship. Typically, this is
represented using a system of equations:
x + y
=
12000
0.1 x + 0.2 y
=
1800
The possible solutions to this system are the collection of all pairs of real numbers that can be assigned to the pair (x,y). These are the
possible system states. Notice that these can be interpreted directly as points on a plane.
How does the above example suggest we might extend our language for system states in a model? We might consider ordered pairs (x,y) of
real numbers. More generally, we might consider ordered lists (x_{1},...,x_{n}) of real numbers. In this course,
we will introduce a few new kinds of terms into our language for models that can help us represent these sorts of relationships within systems:
vectors and matrices. We will also study the algebraic properties of this extended language.
new term language construct
what it represents
vectors
system state in the model
matrix
transitions between system states
matrix
changes to system states
matrix
relationships between system states
matrix multiplication
composition of transformations of system states
1.4. Review: sets and formulas
Definition: The symbols and, or, and not are logical operators. The symbols ∀ and ∃ are
quantifiers, where ∀ is read "for all" and is called the universal quantifier, while ∃ is read "exists"
and is called the existential quantifier.
Definition: Strings of mathematical symbols that can be either true or false are called formulas. The following
table describes how formulas can be built up using logical operators and logical quantifiers, and their meanings.
formula
conditions
meaning
true
always true
false
always false
t_{1} = t_{2}
t_{1} and t_{2} are terms
true only if the meaning (e.g., system state) of t_{1} and t_{2} is the same
t_{1} < t_{2}
t_{1} and t_{2} are terms
true only if t_{1} and t_{2} represent real numbers, and t_{1} is less than t_{2}
f_{1}andf_{2}
if f_{1} and f_{2} are formulas
only true if both f_{1} and f_{2} are true; otherwise, it is false
f_{1}orf_{2}
if f_{1} and f_{2} are formulas
true if f_{1}, f_{2}, or both are true; only false if both are false
notf
if f is a formula
true if f is false; false if f is true
f_{1}impliesf_{2}
if f_{1} and f_{2} are formulas
only true if f_{2} is true whenever f_{1} is true, or equivalently, only true if f_{1} is false or f_{2} is true
f_{1}ifff_{2}
if f_{1} and f_{2} are formulas
only true if f_{1} and f_{2} are both true, or f_{1} and f_{2} are both false
∀ x∈S, f
if S is a set and f is a formula
true only if taking for every element of S, replacing x inside f with that element makes f true
∃ x∈S, f
if S is a set and f is a formula
true only if there is at least one element of S that can replace x inside f so that f is true
Definition: The symbols =, <, >, ≤, ≥, ≠, and ∈ are
relational operators or relations. Given two terms, t_{1} and t_{2}, apply any of these operators to the terms produces
a formula that is either true of false (e.g., "5 ≤ 6").
Fact: The relational operator = (the equality relation or just equality) has the following properties.
The above properties always apply to =. The properties below are assumptions we
make in this course (they are extensions of the definition of equality):
equality of vectors: for any terms t_{1}, t_{2}, and t_{3}, t_{1} = t_{3} and t_{2} = t_{4} iff
t_{1}
t_{2}
=
t_{3}
t_{4}
replacement:
for any terms t_{1}, t_{2}, and t_{3}, t_{1} = t_{2} iff
t_{1}
t_{3}
=
t_{2}
t_{3}
t_{3}
t_{1}
=
t_{3}
t_{2}
2. Vectors
In this section we consider a particular kind of term: a vector. A vector can have many interpretations, but the most common
is that of a point in a geometric space. We first define a notation for names of vectors. Vectors will be written using any
of the the following equivalent notations:
(2,3) [2;3]
2
3
(x, y, z) [x; y; z]
x
y
z
2.1. Defining operations on vectors
We have introduced a new kind of term (vectors) with their own corresponding notation.
As with R, the set of real numbers, we define symbol to represent the sets of vectors.
Definition: We represent the set of all vectors with two components using the symbol R^{2}:
R^{2}
=
{
x
y
 x∈R, y∈R }
Definition: For positive n∈N, we represent the set of all vectors with n components using the symbol R^{n}:
R^{n}
=
{
x_{1}
⋮
x_{n}
 x_{1}∈R, ..., x_{n}∈R }
We will also use symbols to name some geometric manipulations of vectors. One such operation is addition. Suppose we treat vectors as paths
from (0,0), so that (2,3) is the path from (0,0) to (2,3) and (1,2) is the path from (0,0) to (1,2). Then vector addition would represent
the final destination if someone walks first along the length and direction specified by one vector, and then from that destination along
the length and direction specified by the other. The following definition for the operation + and vectors of two components corresponds
to this intuition.
Definition: The following formula is assumed to be true; it defines what operation the symbol + represents when applied to two vectors.
We call + vector addition in this context.
Notice that we have defined an entirely new operation; we are merely reusing (or "overloading") the + symbol to represent this operation.
We cannot assume that this operation has any of the properties we normally associate with addition of real numbers. However, we do know
that according to our interpretion of vector addition (walking along one vector, then along the other from that destination), this operation
should be commutative. Does our symbolic definition conform to this interpretation? If it does, then we should be able to show that
[x;y] + [x';y'] and [x+x';y+y'] are names for the same vector. Using the commutativity of the real numbers and our definition
above, we can indeed write the proof of this property.
Recall that we can view multiplication by a number as repeated addition. Thus, we could use this intuition and our new notion of vector addition
defined above to define multiplication by a real number; in this context, it will be called a scalar, and the operation is known as
scalar multiplication.
Definition: The following formula is assumed to be true; it defines what operation the symbol ⋅ represents when applied to
a scalar (a real number) s∈R and a vector. We call ⋅ scalar multiplication in this context.
Note that, as with multiplication, the symbol is sometimes omitted.
Scalar multiplication has some of the intuitive algebraic properties that are familiar to us from our experience with R. Below, we provide
proofs of a few.
Fact: The following fact is true for scalar multiplication.
s (t [x; y]) = s ([t x; t y]) `` = [ s (t x); s (t y) ] `` = [ (s t) x; (s t) y ] `` = [ (t s) x; (t s) y ] `` = [ t (s x); t (s y) ] `` = t [ s x; s y ] `` = t (s ( [x; y]))
# alternatively, we could also derive...
[ s (t x); s (t y) ] = [ (s t) x; (s t) y ] `` = (s t) [x;y]
Once we have defined scalar multiplication, we can now assign a meaning to the negation operator (interpreting v for any vector v∈R^{2} as
referring to the scalar multiplication −1 ⋅ v).
Definition: The following formula is assumed to be true; it defines what operation the symbol − represents when applied to
a vector.
The above definition allows us to define vector subtraction: for two vectors v and w∈R^{2}, v  w = v + (w).
Notice that we did not define vector division. In fact, division by a vector is undefined (in the same way that division by 0 is undefined).
However, we can prove a cancellation law for vectors.
Fact: Let [x; y] = v∈R^{2} be a vector with at least one nonzero real number component,
and let a,b∈R be scalars. Suppose that a ⋅ v = b ⋅ v. If x is a nonzero component, then we have that:
a ⋅ v
=
b ⋅ v
a ⋅
x
y
=
b ⋅
x
y
a ⋅ x
=
b ⋅ x
a
=
b
Notice that we used division by x, a real number.
2.2. Properties of vector operations
In this course, we will study collections of vectors that have particular properties. When a collection of vectors satisfies these properties,
we will call it a vector space. We list these eight properties (in the form of equations) below. These must hold for any u, v, and w
in the collection, and 0 must also be a vector in the collection.
u + (v + w)
=
(u + v) + w
u + v
=
v + u
0 + v
=
v
v + (v)
=
0
1 * v
=
v
s * (u + v)
=
(s * u) + (s * v)
(u + v) * s
=
(u * s) + (v * s)
s * (t * u)
=
(s * t) * u
Note that these are not unique; we could have specified an alternative equation for the additive identity.
v + 0 = v
Each equation can be derived from the other using the commutativity
v + 0 = 0 + v
These eight equations can be called axioms (i.e., assumptions) when we are studying vector spaces without thinking about the internal
representation of vectors. For example, the above derivation of the alternative additive identity equation is based entirely on the axioms
and will work for any vector space. However, to show that some other structure, such as R^{2}, is a vector space we must prove that these
properties are satisfied. Thus, to define a new vector space, we usually must:
define a way to construct vectors;
define a vector addition operator + (i.e., how to add vectors that we have constructed);
specify the vector that is the additive identity (call it 0);
define vector inversion: a process for constructing the inverse of a vector;
prove that the eight properties of a vector space are satisfied by the vectors, +,0, and *.
Example: Assuming the following equation is true, solve for x∈R and y∈R:
[x; 2 * x] = 5 * [2 * y; 20] `` = [5 * (2 * y); 5 * 20] [x; 2 * x] = [(5 * 2) * y; 100] 2 * x = 100 0.5 (2 * x) = 0.5 * 100 (0.5 * 2) x = 50 1 * x = 50 x = 50 x = 10 * y 0.1 * 50 = 0.1 * (10 * y) `` = (0.1 * 10) * y `` = 1 * y `` = y y = 5
2.3.Assignment #1: Vector Algebra
In this assignment you will perform stepbystep algebraic manipulations involving vectors and vector operations. In doing
so, you will assemble machineverifiable proofs of algebraic facts.
Your proofs must be automatically verifiable using the proof verifier.Please submit a single text file a1.txt
containing your proof scripts for each of the problem parts below.
Working with machine verification can be frustrating; minute operations that are normally implied must often be explicit.
However, the process should familiarize you with common practices of rigorous algebraic reasoning in mathematics: finding, applying,
or expanding definitions to reach a particular formula that represents a desired argument or solution.
Finish the proof below by solving for a using a sequence of permitted algebraic manipulations.
`([x; y]) is a unit vector` `([x*x; y*y]) and ([1; 1]) are orthogonal`
\implies # ... put proof steps here ...
? # this should be a contradiction
We have shown in lecture that R^{2}, together with vector addition and scalar multiplication, satisfies some of the vector space axioms.
In this problem, you will show that the remaining axioms are satisfied.
Finish the proof below showing that [0; 0] is a left identity for addition of vectors in R^{2}.
∀ s, x, y, x', y' ∈R, # ... put proof steps here ...
s (
x
y
+
x'
y'
) = (s
x
y
) + (s
x'
y'
)
\forall s, x, y, x', y' \in \R, # ... put proof steps here ...
s ([x; y] + [x'; y']) = (s [x; y]) + (s [x'; y'])
Any point p on the line between vectors u and v can be expressed as a(uv)+u for some scalar a. In fact, it can also be
expressed as b(uv)+v for some other scalar b. In this problem, you will prove this fact for R^{3}.
Given some p = a(uv)+u, find a formula for b in terms of a so that p = b(uv)+v. Add this formula to the beginning
of the proof (after a = ) and then complete the proof.
The verifier's library contains
a variety of derived algebraic properties for R and R^{3} in addition to the vector space axioms for R^{3}. Look over them
to see which might be useful.
2.4. Common vector properties, relationships, and operators
We introduce two new operations on vectors (e.g. u and v): the norm ( ... ) and the dot product (u ⋅ v). When interpreting some other structure,
such as R^{2}, as a vector space, we must provide definitions for these operators in order to use them.
Definition: The operator ⋅ when applied to two vectors with the same number of components is called the dot product, and
is defined for x,y,x',y' ∈R as:
[x; y] ⋅ [x'; y']
=
x*x' + y*y'
The operator  ...  when applied to a vector is called the norm and is defined as:
 [x;y] 
=
√(x*x + y*y)
The norm  v  of a vector v represents its length in Euclidean space.
Notice that the dot product is a new, distinct form of multiplication (distinct from multiplication of real numbers, and distinct from scalar multiplication). Also
notice that the two operations are related:
Fact: The following equation is true for all x,y,x',y' ∈R:
 [x; y] 
=
√(x*x + y*y) = √([x; y] ⋅ [x; y])
These operations can be shown to have various algebraic properties.
Fact: The dot product is commutative. Notice that below, the dot product is a real number.
We also introduce several vector properties. Some deal with a single vector; some deal with two vectors; and some deal with three vectors.
Definition: The table below summarizes the definitions of several vector properties and relationships and how they are related in some
cases to vector operators and associated algebraic properties.
property
definition(s)
algebraic properties for R^{2} u = [x;y], v = [x',y'], w = [x'',y'']
u and v are linearly dependent u and v are collinear
∃ a∈R, a ⋅ u = v
y/x = y'/x' (the vectors have the same slope)
u and v are linearly independent
∀ a∈R, a ⋅ u ≠ v or equivalently, not (∃ a∈R, a ⋅ u = v)
y/x ≠ y'/x' (the vectors have different slopes)
u and v are orthogonal
u ⋅ v = 0
y/x = x'/y'
w is a projection of v onto u
w = (v ⋅ (u/u)) ⋅ u/u
d is the (Euclidean) distance between v and w
d = u  v = v  u
d = √((x  x')^{2} + (y  y')^{2})
L is the unique line parallel to v∈R^{2}
L = { a ⋅ v  a∈R } L = { p  ∃ a∈R, p = a ⋅ v }
{ [x',y']  y' = mx' } where m = y/x
L is the unique line orthogonal to v∈R^{2}
L = { w  v ⋅ w = 0 }
L is the unique line defined by the two points u∈R^{2} and v∈R^{2}
L = { a (u  v) + u  a∈R } L = { p  ∃ a∈R, p = a (u  v) + u }
P is the unique plane orthogonal to v∈R^{3}
P = { w  v ⋅ w = 0 }
P is the unique plane of linear combinations of v, w∈R^{3} where v and w are linearly independent
P = { av + bw  a∈R, b∈R }
w is a linear combination of u and v
∃ a,b∈R, w = au + bv
{u, v, w} are linearly independent
not (u is a linear combination of v and w) and not (v is a linear combination of u and w) and not (w is a linear combination of u and v)
In R^{2}, we can derive the linear independence of [x;y] and [x';y'] from the orthogonality of [x;y] and [x';y'] using a proof
by contradiction.
Fact: If [x;y] and [x';y'] are orthogonal (and have nonzero length), then they are linearly independent.
Suppose that [x;y] and [x';y'] are both orthogonal and linearly dependent. Since they are linearly dependent, we have that
there exists an a∈R such that:
x'
y'
=
a ⋅
x
y
x'
=
a ⋅ x
y'
=
a ⋅ y
Since they are orthogonal, we have that:
x
y
⋅
x'
y'
=
0
x
y
⋅
a ⋅ x
a ⋅ y
=
0
x ⋅ (a ⋅ x) + y ⋅ (a ⋅ y)
=
0
a ⋅ (x^{2} + y^{2})
=
0
x^{2} + y^{2}
=
0
x^{2}
=
 y^{2}
(x^{2})/(y^{2})
=
 1
(x/y)^{2}
=
 1
x/y
=
√(1)
No real numbers y and x satisfy the above equation, so we must have introduced a contradiction by supposing that [x;y] and [x';y']
are not linearly independent. Thus, they must be linearly independent.
Also, notice that [0;0] is the only solution to x^{2} + y^{2} = 0. Thus, all vectors in
R^{2} are linearly dependent, linearly independent, and orthogonal with [0;0].
Fact: If v∈R^{2} is a vector that has nonzero length,
then we know that the vector v/v: (1) is linearly dependent with v, and (2) is a unit vector.
Because v is a vector, then v is a real number that is nonzero, and 1/v is a real number. By our definition of
linear dependence, (1/v) ⋅ v = v/v is linearly dependent with v because 1/v exists.
Thus,  v/v  = 1, so v/v is a unit vector.
Example: List all the unit vectors that are linearly dependent with:
5
12
It is sufficient to solve for s∈R, x∈R, and y∈R that satisfy the following equations:
s
x
y
=
5
12
(x,y)
=
1
One approach is to write x and y in terms of s and then solve for s using the second equation.
Example: Solve the following three problems.
Solve the following equation for x∈R:

x
2 ⋅ √(2)

=
3
Solve the following equation for x∈R and y∈R:
x
y
has length 10
x
y
and
2
0
are linearly dependent
Determine if the following two vectors are linearly dependent or linearly independent:
2
3
0
and
2
0
1
In the case of vectors in R^{2}, the properties of linear dependence and orthogonality can be defined in terms of
the slopes of the vectors involved. However, note that these definitions in terms of slope are a special case for R^{2}.
The more general definitions (in terms of scalar multiplication and dot product, respectively) apply to any vectors in a vector space.
Thus, we can derive the slope definitions from the more general definitions.
Fact: If two vectors [x;y] ∈R^{2} and [x';y'] ∈R^{2} are linearly dependent, then x/y = x'/y'.
If the two vectors a linearly dependent, then there exists a scalar s∈R such that:
s ⋅
x'
y'
=
x
y
s ⋅ x'
s ⋅ y'
=
x
y
s ⋅ x'
=
x
s ⋅ y'
=
y
(s ⋅ x') / (s ⋅ y')
=
x/y
x'/y'
=
x/y
Fact: If two vectors [x;y] ∈R^{2} and [x';y'] ∈R^{2} are orthogonal, then x/y = y'/x':
x
y
⋅
x'
y'
=
0
x ⋅ x' + y ⋅ y'
=
0
x ⋅ x'
=
 y ⋅ y'
x
=
 y ⋅ y'/x'
x/y
=
 y'/x'
Example: List all unit vectors orthogonal to:
4
−3
It is sufficient to solve for x∈R and y∈R that satisfy the following equations:
x
y
⋅
4
−3
=
0
(x,y)
=
1
We can solve the above for the two vectors by first solving for both possible values of
x, then finding the corresponding values for y:
4 x − 3 y
=
0
y
=
(4/3) x
√(x^{2} + y^{2})
=
1
√(x^{2} + ((4/3) x)^{2})
=
1
√((9/9)x^{2} + (16/9)x^{2}
=
1
√((25/9) x^{2})
=
1
√((25/9) x^{2})
=
1
± (5/3) x
=
1
x
=
± 3/5
Thus, the two vectors are:
x
y
∈
{
3/5
4/5
,
−3/5
−4/5
}
Orthogonal projections of vectors have a variety of interpretations and applications; one simple interpretation of the projection of
a vector v onto a vector w is the shadow of vector v on w with respect to a light source that is orthogonal to w.
Example: Consider the vector following vectors, where u, v∈R^{2} and u is a unit vector:
v
=
x
y
u
=
1
0
What is the orthogonal projection of v onto u? It is simply the x component of v, which is x. Thus:
x
0
is the projection of v onto u
Notice that we can obtain the length of the orthogonal projection using the dot product:
v ⋅ u
=
x
y
⋅
1
0
=
x
Since u is a unit vector, we can then simply multiply u by the scalar x to obtain the actual projection:
(v ⋅ u) ⋅ u
=
(
x
y
⋅
1
0
) ⋅
1
0
=
x ⋅
1
0
=
x
0
The above example can be generalized to an arbitrary vector v and unit vector u; then, it can be generalized to any vector u by using
the fact that u/u is always a unit vector.
Example: Compute the orthogonal projection of v onto u where:
Notice that we can set m = (y_{1}  y_{2})/(x_{1}  x_{2}) because that is exactly the slope of the line between [x_{1};y_{1}] and [x_{2};y_{2}].
y
=
m (x  x_{1}) + y_{1}
y
=
mx  mx_{1} + y_{1}
We see that we can set b =  mx_{1} + y_{1}.
m
=
(y_{1}  y_{2})/(x_{1}  x_{2})
b
=
 mx_{1} + y_{1}
y
=
mx + b
L
=
{
x
y
 y = mx + b }
2.5. Solving common problems involving vector algebra
We review the operations and properties of vectors introduced in this section by considering several example problems.
Example: Are the vectors [2; 1] and [3; 2] linearly independent?
There are at least two ways we can proceed in checking pairwise linear independence. Both involve checking if the vectors
are linearly dependent. If they are linearly dependent, then they cannot be linearly independent. If they are not linearly
dependent, they must be linearly independent.
We can compare the slopes; we see they are different, so they must be linearly independent.
1/2 ≠ 2/3
We can also use the definition of linear dependence. If they are linearly dependent, then we know that
∃ a∈R, a
2
1
=
3
2
Does such an a exist? We try to solve for it:
a
2
1
=
3
2
2a
=
3
a
=
2
4
=
3
Since we derive a contradiction, there is no such a, so the two vectors are not linearly dependent, which means they
are linearly independent.
Example: Given v = [15; 20], list the vectors that are orthogonal to v, but of the same length as v.
The two constraints on the vectors [x;y] we seek are:
15
20
⋅
x
y
=
0

15
20

=

x
y

We take the first constraint and solve for x in terms of y.
15
20
⋅
x
y
=
0
15x + 20y
=
0
x
=
(20/15) y
We now plug this into the second equation.

15
20

=

(20/15) y
y

√(15^{2} + 20^{2})
=
√((400/225) y^{2} + y^{2})
√(625)
=
√((625/225) y^{2})
25
=
± (25/15) y
15
=
± y
y
=
± 15
Thus, the vectors are [20; 15] and [20; 15].
Example: Given constants a,b,c∈R, find a vector orthogonal to the plane P defined
by:
P
=
{
x
y
z
 a (x + y + z) + b (y + z) + cz = 0 }.
We only need to rewrite the equation defining the plane in a more familiar form. For any [x; y; z] ∈P, we know that:
a (x + y + z) + b (y + z) + cz
=
0
ax + (a + b) y + (a + b + c) z
=
0
a
a + b
a + b + c
⋅
x
y
z
=
0
In order to be orthogonal to a plane, a vector must be orthogonal to all vectors [x;y;z] on that plane.
Since all points on the plane are orthogonal to [a; a + b; a + b + c] by the above argument, [a; a + b; a + b + c] is such a point.
Example: Define the line L that is orthogonal to the vector [a; b] but also crosses [a; b] (i.e, [a; b] falls on the line).
We know that the line must be parallel to the vector that is orthogonal to [a; b]. The line L_{0} crossing [0; 0] that is orthogonal
to [a; b] is defined as:
L_{0}
=
{
x'
y'

a
b
⋅
x'
y'
= 0 }.
However, we need the line to also cross the point [a; b]. This is easily accomplished by adding the vector [a; b] to all the points
on the orthogonal line going through [0; 0] (as defined above). Thus, we have:
L
=
{
x'
y'
+
a
b

a
b
⋅
x'
y'
= 0 }.
If we want to find points [ x ; y ] on the line directly without the intermediate term [ x' ; y' ], we can solve for
[ x' ; y' ] in terms of [ x ; y ]:
x
y
=
x'
y'
+
a
b
x'
y'
=
x
y
−
a
b
We can then substitute to obtain a more direct definition of L (in terms of a constraint on the
vectors [ x ; y ] in L):
L
=
{
x'
y'
+
a
b

a
b
⋅
x'
y'
= 0 }
=
{ (
x
y
−
a
b
) +
a
b

a
b
⋅ (
x
y
−
a
b
) = 0 }
=
{
x
y

a
b
⋅ (
x
y
−
a
b
) = 0 }
Example: Is [7; −1] on the line defined by the points u = [19; 7] and v = [1; −5]?
To solve this problem, we recall the definition for a line defined by two points:
{ p  ∃ a∈R, p = a (u  v) + u }.
Thus, we want to know if [7; 1] is in the set defined as above. This can only occur if there exists
an a such that [7; 1] = a (u  v) + u.
We solve for a; if no solution exists, then the point [7; 1] is not on the line L. If an a exists, then it is. In this case,
a = 1/3 is a solution to both equations, so [7; 1] is on the line.
7
1
=
a(
19
7
−
1
5
) +
1
−5
=
(
19a − 1a
7a + 5a
) +
1
−5
=
(
18a + 1
12a − 5
)
7
=
18a+1
−1
=
12a − 5
a
=
1/3
Example: Define the line that is orthogonal to the vector [3; 5] but also crosses [3; 5] (i.e, [3; 5] falls on the line).
We know that the line must be parallel to the vector that is orthogonal to [3; 5]. The line crossing [0; 0] that is orthogonal
to [3; 5] is defined as:
{
x
y

3
5
⋅
x
y
= 0 }
We can rewrite the above in a more familiar form:
5 y
=
− 3 x
y
=
− (3/5) x
However, we need the line to also cross the point [3; 5]. This is easily accomplished by adding the vector [3; 5] to all the points
on the orthogonal line going through [0; 0] (as defined above). Thus, we have:
{
3
5
+
x
y

3
5
⋅
x
y
= 0 }
We can rewrite the above by defining:
x'
y'
=
3
5
+
x
y
x'
y'
−
3
5
=
x
y
Now we substitute [x;y] with [x',y'] in our definition of the line:
{
x'
y'

3
5
⋅ (
x'
y'

3
5
) = 0 }
We can now write the equation for the line:
3 (x' − 3) + 5 (y' − 5)
=
0
5 (y' − 5)
=
3 (x' − 3)
5 y' − 25
=
3 (x' − 3)
5 y'
=
3 (x' − 3) + 25
y'
=
−3/5 (x' − 3) + 5
y'
=
−3/5 x' + 9/5 + 5
Notice that, alternatively, we could have instead simply found the yintercept b∈R of the following equation using the point [3;5]:
y
=
− (3/5) x + b
5
=
− (3/5) (3) + b
5 + 9/5
=
b
b
=
5 + 9/5
Example: Is [8; 6] a linear combination of the vectors [19; 7] and [1; 5]?
We recall the definition of a linear combination and instantiate it for this example:
∃ a,b∈R,
8
6
= a
19
7
+ b
1
−5
.
Thus, if we can solve for a and b, then [8; −6] is indeed a linear combination.
8
−6
=
a
19
7
+ b
1
−5
=
19a
7a
+
1b
−5b
8
=
19a + b
b
=
19a + 8
−6
=
7a − 5b
−6
=
7a − 5(−19a + 8)
−6
=
7a + 95a − 40
34
=
102 a
a
=
1/3
b
=
−19/3 + 8 = 5/3
Example: Are the vectors V = {[2; 0; 4; 0], [6; 0; 4; 3], [1; 7; 4; 3]} linearly independent?
The definition for linear independence requires that none of the vectors being considered can be expressed
as a sum of the others. Thus, we must check all pairs of vectors against the remaining third vector not in the pair.
There are 3!/(2!*1!) such pairs:
can
2
0
4
0
be expressed as a combination of
6
0
4
3
and
1
7
4
3
?
can
6
0
4
3
be expressed as a combination of
2
0
4
0
and
1
7
4
3
?
can
1
7
4
3
be expressed as a combination of
6
0
4
3
and
2
0
4
0
?
For each combination, we can check whether the third vector is linearly dependent.
If it is linearly dependent, we can stop and say that the three vectors are not
linearly independent. If it is not, we must continue checking all the pairs. If all the pairs are incapable of
being scaled and added in some way to obtain the third vector, then the three vectors are linearly independent.
not (u, v, w are linearly independent) iff (∃ a,b∈R, u,v,w∈V, w is a linear combination of u and v)
Notice that this an example of a general logical rule:
not (∀ x∈S, p) iff (∃ x∈S, not p)
We check each possible combination and find that we derive a contradiction if we assume they are not independent.
2
0
4
0
=
6
0
4
3
a +
1
7
4
3
b
0
=
7 b
b
=
0
2
=
6 a + 1 b
a
=
2/6
0
=
3 (2/6) + 3 (0)
0
=
1
6
0
4
3
=
2
0
4
0
a +
1
7
4
3
b
3
=
3 b
b
=
1
6
=
2 a + b
6
=
2 a + 1
5
=
2 a
a
=
5/2
4
=
4 (5/2) + 4
0
=
10
1
7
4
3
=
2
0
4
0
a +
6
0
4
3
b
7
=
0 a + 0 b
7
=
0
Thus, V is a set of linearly independent vectors.
2.6. Using vectors and linear combinations to model systems
In the introduction we noted that in this course, we would define a symbolic language for working with a certain
collection of idealized mathematical objects that can be used to model system states of realworld systems in an abstract
way. Because we are considering a particular collection of objects (vectors, planes, spaces, and their relationships),
it is natural to ask what kinds of problems are wellsuited for such a representation (and also what problems are
not wellsuited).
What situations and associated problems can be modelled using vectors and related operators and properties?
Problems involving concrete objects that have a position, velocity, direction, geometric shape, and relationships between
these (particularly in two or three dimensions) are natural candidates. For example, we have seen that it is possible
to compute the projection of one vector onto another. However, these are just a particular example of a more general
family of problems that can be studied using vectors and their associated operations and properties.
A vector of real numbers can be used to represent an object or collection of objects with some fixed number of characteristics (each
corresponding to a dimension or component of the vector) where each characteristic has a range of possible values.
This range could be a set of magnitudes (e.g., position, cost, mass), a discrete collection of states (e.g.,
absence or presence of an edge in a graph), or even a set of relationships (e.g., for every cow, there are four
cow legs; for every $1 invested, there is a return of $0.02). Thus, vectors are wellsuited for representing
problems involving many instances of objects where all the objects have the same set of possible characteristics
along the same set of linear dimensions. In these instances, many vector operations also have natural interpretations.
For example, addition and scalar multiplication (i.e., linear combinations) typically correspond to the aggregation
of a property across multiple instances or copies of objects with various properties (e.g., the total mass of a
collection of objects).
In order to illustrate how vectors and linear combinations of vectors might be used in applications, we recall the
notion of a system. A system is any physical or abstract phenomenon, or observations of a phenomenon, that
we characterize as a collection of real values along one or more dimensions. A
system state or state a system is a particular collection
of real values. For example, if a system is represented by R^{4}, states of that system are represented by individual vectors
in R^{4} (note that not all vectors need to correspond to valid or possible states; see the examples below).
Example: Consider the following system: a barn with cows and chickens inside it. There are several dimensions along
which an observer might be able to measure this system (we assume that the observer has such poor eyesight that chickens and
cows are indistinguishable from above):
number of chickens inside
number of cows inside
number of legs that can be seen by peeking under the door
number of heads that can be seen by looking inside from a high window
Notice that we could represent a particular state of this system using a vector in R^{4}. However, notice also that
many vectors in R^{4} will not correspond to any system that one would expect to observe. Usually, the number of
legs and heads in the entire system will be a linear combination of two vectors: the number of legs per cow,
and the number of legs per chicken:
1 head
2 legs
⋅
xchickens +
1 head
4 legs
⋅ ycows
=
x+yheads
2x+4ylegs
Given this relationship, it may be possible to derive some characteristics of the system given only partial information.
Consider the following problem: how many chickens and cows are in a barn if 8 heads and 26 legs were observed?
1 head
2 legs
⋅
xchickens +
1 head
4 legs
⋅ ycows
=
8 heads
26 legs
Notice that a linear combination of vectors can be viewed as a translation from a vector describing one set of
dimensions to a vector describing another set of dimensions. Many problems might exist in which the values are
known along one set of dimensions and unknown along another set.
Example: We can restate the example from the introduction using linear combinations. Suppose we have $12,000 and two
investment opportunities: A has an annual return of 10%, and B has an annual return of 20%. How much should we invest
in each opportunity to get $1800 over one year?
The two investment opportunities are twodimensional vectors representing the rate of return on a dollar:
1 dollar
0.1 interest
and
1 dollar
0.2 interest
The problem is to find what combination of the two opportunities would yield the desired observation of the entire system:
1 dollar
0.1 dollars of interest
⋅ x dollars in opportunity A
+
1 dollar
0.2 dollars of interest
⋅ y dollars in opportunity B
=
12,000 dollars
1800 dollars of interest
Next, we consider a problem with discrete dimensions.
Example: Suppose there is a network of streets and intersections and the city wants to set up cameras
at some of the intersections. Cameras can only see as far as the next intersection. Suppose there are five
streets (#1, #2, #3, #4, #5) and four intersections (A, B, C, and D) at which cameras can be placed, and the city
wants to make sure a camera can see every street while not using any cameras redundantly (i.e., two cameras
should not film the same street).
Vectors in R^{5} can represent which streets are covered by a camera. A fixed collection of vectors, one for each
intersection, can represent what streets a camera can see from each intersection. Thus, the system's dimensions
are:
is street #1 covered by a camera?
is street #2 covered by a camera?
is street #3 covered by a camera?
is street #4 covered by a camera?
is street #5 covered by a camera?
is there a camera at intersection A? (represented by the variable a below)
is there a camera at intersection B? (represented by the variable b below)
is there a camera at intersection C? (represented by the variable c below)
is there a camera at intersection D? (represented by the variable d below)
Four fixed vectors will be used to represent which streets are adjacent to which intersections:
0
1
0
0
1
,
1
0
1
1
0
,
1
1
0
0
0
,
1
0
0
1
1
Placing the cameras in the way required is possible if there is integer solution to the following equation
involving a linear combination of the above vectors:
0
1
0
0
1
a +
1
0
1
1
0
b +
1
1
0
0
0
c +
1
0
0
1
1
d
=
1
1
1
1
1
Example: Suppose a chemist wants to model a chemical reaction. The dimensions of the system might be:
how many molecules of C_{3}H_{8} are present?
how many molecules of O_{2} are present?
how many molecules of CO_{2} are present?
how many molecules of H_{2}O are present?
how many atoms of carbon are present?
how many atoms of hydrogen are present?
how many atoms of oxygen are present?
Individual vectors in R^{3} can be used to represent how many atoms of each element are in each type of
molecule being considered:
C_{3}H_{8}:
3
8
0
, O_{2}:
0
0
2
, CO_{2}:
1
0
2
, H_{2}O:
0
2
1
Suppose we know that the number of atoms in a system may never change during a reaction, and that some quantity
of C_{3}H_{8} and O_{2} can react to yield only CO_{2} and H_{2}O. How many molecules of each compound will be involved in the reaction? That is the solution
to the following linear combination.
3
8
0
x_{1} +
0
0
2
x_{2}
=
1
0
2
x_{3} +
0
2
1
x_{4}
For example, suppose we start with 1000 molecules of C_{3}H_{8} and 5000 molecules of O_{2}. If both of these compounds react to produce only
CO_{2} and H_{2}O, how many molecules of each will be produced?
3
8
0
1000 +
0
0
2
5000
=
1
0
2
a +
0
2
1
b
3000
8000
10000
=
1
0
2
a +
0
2
1
b
3000
=
1 ⋅ a + 0 ⋅ b
a
=
3000
8000
=
0 ⋅ a + 2 ⋅ b
b
=
4000
10000
=
2 ⋅ 3000 + 1 ⋅ 4000
Thus, a = 3000 molecules of CO_{2} and b = 4000 molecules of H_{2}O will be produced.
The notion of a linear combination of vectors is common and can be used to mathematically model a wide variety of problems.
Thus, a more concise notation for linear combinations of vectors would be valuable. This is one of the issues addressed by introducing
a new type of term: the matrix.
3. Matrices
In this section we introduce a new kind of term: a matrix. We define some operations on matrices and some
properties of matrices, and we describe some of the possible ways to interpret and use matrices.
3.1. Matrices and multiplication of a vector by a matrix
Matrices are a concise way to represent and reason about linear combinations and linear independence of vectors (e.g., setwise linear
independence might be difficult to check using an exhaustive approach), reinterpretations of systems using different dimensions,
and so on. One way to interpret a matrix is as a collection of vectors. Multiplying a matrix by a vector corresponds to computing a linear
combination of that collection of vectors.
As an example, we consider the case for linear combinations of two vectors. The two scalars in the linear combination
can be interpreted as a 2component vector. We can then put the two vectors together into a single object in our
notation, which we call a matrix.
Definition:
a
b
c
d
⋅
x
y
=
a
c
⋅ x +
b
d
⋅ y
Notice that the columns of the matrix are the vectors used in our linear combination. Notice also that we can now
reinterpret the result of multiplying a vector by a matrix as taking the dot product of each of the matrix rows with the
vector.
Fact:
a
b
c
d
⋅
x
y
=
a
c
⋅ x +
b
d
⋅ y
=
ax
cx
+
by
dy
=
ax + by
cx + dy
=
(a,b) ⋅ (x,y)
(c,d) ⋅ (x,y)
Because a matrix is just two column vectors, we can naturally extend this definition of multiplication
to cases in which we have multiplication of a matrix by a matrix: we simply multiply each column of the
second matrix by the first matrix and write down each of the resulting columns in the result matrix.
Definition:
a
b
c
d
⋅
x
s
y
t
=
(a,b) ⋅ (x,y)
(a,b) ⋅(s,t)
(c,d) ⋅ (x,y)
(c,d) ⋅ (s,t)
These definitions can be extended naturally to vectors and matrices with more than two components. If we denote using M_{ij} the entry in a matrix M found
in the ith row and jth column, then we can define the result of matrix multiplication of two matrices A and B as a matrix M such that
M_{ij} = ith row of A ⋅ jth column of B.
3.2. Interpreting matrices as tables of relationships and transformations of system states
We saw how vectors can be used to represent system states. We can extend this interpretation to matrices and use
matrices to represent relationships between the dimensions of system states. This allows us to interpret matrices as transformations
between system states (or partial observations of system states).
If we again consider the example system involving a barn of cows and chickens, we can reinterpret the matrix as a table of relationships between
dimensions. Each entry in the table has a unit indicating the relationship it represents.
chickens
cows
heads
1 head/chicken
1 head/cow
legs
2 legs/chicken
4 legs/cow
Notice that the column labels in this table represent the dimensions of an "input" vector that could be multiplied by this matrix, and
the row labels specify the dimensions of the "output" vector that is obtained as a result. That is,
if we multiply using the above matrix a vector that specifies the number of chickens and the number of cows in a system state, we will get a
vector that specifies the number of heads and legs we can observe in that system.
1 head/chicken
1 head/cow
2 legs/chicken
4 legs/cow
⋅
x chickens
y cows
=
x+yheads
2x+4ylegs
Thus, we can interpret multiplication by this matrix as a function that takes system states that only specify the number of chickens and
cows, and converts them to system states that only specify the number of heads and legs:
(# chickens × # cows) → (# heads × # legs)
3.3. Interpreting multiplication of matrices as composition of system state transformations
Example: Suppose that we have a system with the following dimensions.
number of wind farms
number of coal power plants
units of power
units of cost (e.g., pollution)
number of single family homes (s.f.h.'s)
number of businesses
Two different matrices might specify the relationships between some combinations of dimensions in this system.
M_{1} =
100 power/wind farm
250 power/coal plant
50 cost/wind farm
400 cost/coal plant
,
M_{2} =
4 s.f.h./unit power
2 s.f.h./unit cost
1 businesses/unit power
0 businesses/unit cost
Notice that these two matrices both represent transformations between partial system state descriptions.
T_{1}: (# wind farms × # coal plants) → (units of power × units of cost)
T_{2}: (units of power × units of cost) → (# s.f.h. × # businesses)
Notice that because the interpretion of a result obtained using the first transformation matches the interpretation of an input to the second, we
can compose these transformations to obtain a third transformation.
This corresponds to multiplying the two matrices to obtain a third matrix. Notice that the units of the resulting matrix can be computed using a
process that should be familiar to you from earlier coursework.
4 s.f.h./unit power
2 s.f.h./unit cost
1 businesses/unit power
0 businesses/unit cost
⋅
100 power/wind farm
250 power/coal plant
50 cost/wind farm
400 cost/coal plant
=
300 s.f.h./wind farm
200 s.f.h./coal plant
100 business/wind farm
250 businesses/coal plant
Thus, given some vector describing the number of wind farms and coal plants in the system, we can multiply that vector by (M_{2} ⋅ M_{1})
to compute the number of single family homes and business we expect to find in that system.
Example: Suppose that a gram of gold costs $50, while a gram of silver costs $10. After purchasing some of each, you have spent $350 on
15 grams of material. How many grams of each commodity have you purchased?
Write down four dimensions describing this system.
Define a matrix A that can be used to convert a description of a system state that specifies only the amount of gold and silver purchased into
a description of the system state that specifies only the cost and total weight.
Write down a matrix equation describing this problem and solve it to find the solution.
Define a matrix B such that for any description of a system state v that specifies only the total weight and amount spent, B ⋅ v
is a description of that system state that specifies the amount of gold and silver in that system state.
Example: Suppose we characterize our system in terms of two dimensions:
number of single family homes (s.f.h.'s)
number of power plants (p.p.'s)
In this example, instead of studying the relationships between dimensions, we want to study how the dimensions
change (possibly in an interdependent way) over time. For example, the following matrix might capture how
the system state evolves from year to year:
M =
2 s.f.h. in year 1/s.f.h. in year 0
1 s.f.h. in year 1/p.p. in year 0
0 p.p. in year 1/s.f.h in year 0
1 p.p. in year 1/p.p. in year 0
We can parameterize the above in terms of a year t∈R. Notice that the matrix above is just a special case of the matrix below (when t = 0):
M =
2 s.f.h. in year t+1/s.f.h. in year t
1 s.f.h. in year t+1/p.p. in year t
0 p.p. in year t+1/s.f.h in year t
1 p.p. in year t+1/p.p. in year t
What does M ⋅ M represent? If we consider the units, we have:
M ⋅ M =
4 s.f.h. in year t+2/s.f.h. in year t
3 s.f.h. in year t+2/p.p. in year t
0 p.p. in year t+2/s.f.h in year t
1 p.p. in year t+2/p.p. in year t
Suppose v = [x; y] ∈R^{2} represents the number of single family homes and factories in a given year. We can then define the number of single
family homes and factories after t years as:
M^{t} ⋅ v
=
2
1
0
1
^{t} ⋅
x
y
If we wanted to write the number of single family homes and factories as a function of t∈R and an initial state x_{0}, y_{0}∈R, we
could nest the dot products as follows and use algebra to simplify:
In this assignment you will solve problems involving vectors and matrices.
Please submit a single file a2.* containing your solutions. The file extension may be anything you choose;
please ask before submitting a file in an exotic or obscure file format (plain text is preferred).
Consider the line L in R^{2} that passes through the following two vectors:
u
=
9
7
v
=
1
5
Using set notation, define L in terms of u and v.
Below are some possible definitions:
L
=
{ a (u  v) + u  a∈R }
L
=
{ a (u  v) + v  a∈R }
L
=
{ a (
9
7

1
5
) +
1
5
 a∈R }
L
=
{
x
y
 y = (3/2) x  13/2 }
Solutions are acceptable as long as the definition is correct set notation for the set of
vectors representing the line.
Determine whether the following vectors are on the line L:
25
31
,
7
−1
,
−3
−11
The quickest way is to derive the y = (3/2) x − 13/2 equation for points on the line
and use it to check each point:
31
=
(3/2) ⋅ 25 − 13/2
,
so
25
31
∈L
−1
≠
(3/2) ⋅ 7 − 13/2
,
so
7
−1
∉ L
−11
=
(3/2) ⋅ (−3) − 13/2
,
so
−3
−11
∈L
Find all unit vectors orthogonal to L.
It is sufficient to find the two unit vectors on the line through the origin
that is perpendicular to L. A vector that is parallel to L can be obtained using u  v:
9
7
−
1
5
=
8
12
The unit vectors must be orthogonal to the above vector. Thus, we have the following
two equations for the unit vectors [ x ; y ] that we want to find:
8
12
⋅
x
y
=
0

x
y

=
1
The two vectors that satisfy the above are:
x
y
∈
{
3/√(13)
−2/√(13)
,
−3/√(13)
2/√(13)
}
Define the line L^{⊥} that passes through the origin and is orthogonal to L using an equation of the form:
y
=
m ⋅ x + b
In other words, find m, b∈R such that:
L^{⊥}
=
{
x
y
 y = m ⋅ x + b }
We can use one of the unit vectors we found in part (c) above to determine
the slope of L^{⊥}. Since the line passes through the origin, b = 0.
m
=
(2/√(13)) / (−3/√(13))
=
−2/3
b
=
0
For each of the following collections of vectors, determine whether the collection is
linearly independent. If it is, explain why; if not, show that one of the vectors is a
linear combination of the other vectors in the collection.
The following two vectors in R^{2}:
2
1
,
3
2
The following equation has no solution a∈R since assuming there is a solution
leads to a contradiction:
a ⋅
2
1
=
3
2
a ⋅ 2
=
3
a ⋅ 1
=
2
a
=
2
2 ⋅ 2
≠
3
Thus, the two vectors do not satisfy the definition of linear independence.
They must be linearly independent, so the collection is linearly independent.
The following three vectors in R^{2}:
2
1
,
1
3
,
2
4
We must check if any of the three vectors might be a linear combination
of the other two. In fact, there is at least one such vector, so the collection of
vectors is not linearly independent:
2
1
=
5 ⋅
1
3
+ (−7/2) ⋅
2
4
Notice that we could have concluded this immediately without finding the above
counterexample
because these vectors are in R^{2} and there are at least two linearly independent
vectors in the collection. Thus, those two vectors can be used in a linear
combination to obtain the third.
The following three vectors in R^{4}:
2
0
4
0
,
6
0
4
3
,
1
7
4
3
We must check if any of the three vectors might be a linear combination
of the other two. We check if the first can be a linear combination of the second
and third; since we arrive at a contradiction below, it cannot.
2
0
4
0
=
a ⋅
6
0
4
3
+ b ⋅
1
7
4
3
2
=
6 a + b
0
=
7 b
b
=
0
a
=
1/3
4
≠
4/3 + 0
Next, we check if the second vector can be a linear combination of the first and third.
6
0
4
3
=
a ⋅
2
0
4
0
+ b ⋅
1
7
4
3
6
=
2 a + b
0
=
7 b
b
=
0
a
=
3
4
≠
12 + 0
Finally, we check if the third can be a linear combination of the first and second:
1
7
4
3
=
a ⋅
6
0
4
3
+ b ⋅
2
0
4
0
7
=
0 ⋅ a + 0 ⋅ b
7
≠
0
Since no vector is a linear combination of the other two, the collection is linearly
independent.
Consider the following vectors in R^{3}:
u
=
3
7
9
v
=
2
−5
3
w
=
3
4
−3
Compute the orthogonal projection of w onto the vector:
1/√(12)
1/√(12)
1/√(12)
We use the formula for an orthogonal projection. First, we compute the norm of
the vector onto which we are projecting.

1/√(12)
1/√(12)
1/√(12)

=
√(3/12)
Notice that 2 ⋅ √(3/12) = √((4 ⋅ 3)/12) = 1. Thus,
to obtain a unit vector, we simply multiply the vector onto which we are projecting
by the scalar 2.
2 ⋅
1/√(12)
1/√(12)
1/√(12)
=
2/√(12)
2/√(12)
2/√(12)
To project w onto the above unit vector, we can use the orthogonal projection formula:
(
3
4
3
⋅
2/√(12)
2/√(12)
2/√(12)
) ⋅
2/√(12)
2/√(12)
2/√(12)
=
(8/√(12)) ⋅
2/√(12)
2/√(12)
2/√(12)
=
16/12
16/12
16/12
=
4/3
4/3
4/3
Determine whether each of the following points lies on the plane perpendicular to u:
3
0
1
,
7
9
5
,
1
1
1
If a vector lies on the plane perpendicular to u, then it must be orthogonal to u.
Thus, we compute the dot product of each of these vectors with u:
3
0
1
⋅
3
7
9
=
9 + 0  9
=
0, so this vector is on the plane;
7
9
5
⋅
3
7
9
=
21  63  45
≠
0, so this vector is not on the plane;
1
1
1
⋅
3
7
9
=
3 + 7  9
≠
0, so this vector is not on the plane.
Extra credit: Given the vector v and w, let P be the plane orthogonal to v, and let Q be the plane orthogonal to w.
Find a vector p∈R^{3} that lies on the line in R^{3} along which P and Q intersect, and provide a definition of the line.
We have the following:
P
=
{ p  p ⋅ v = 0 }
Q
=
{ p  p ⋅ w = 0 }
The line along which P and Q intersect is the set of vectors that are orthogonal to both
P and Q.
P ∩ Q
=
{ p  p ⋅ v = 0 and p ⋅ w = 0}
We can expand the two equations p ⋅ v = 0 and p ⋅ w = 0 in order to obtain
a system of equations that restricts the possible components of p = [ x ; y ; z ]:
x
y
z
⋅
2
−5
3
=
0
x
y
z
⋅
3
4
−3
=
0
2 x − 5 y + 3 z
=
0
3 x + 4 y − 3 z
=
0
5 x − y
=
0
y
=
5 x
z
=
(23/3) x
Given the above, we can introduce a scalar a and write:
x
=
a
y
=
5 a
z
=
(23/3) a
P ∩ Q
=
{ a ⋅
1
5
(23/3)
 a∈R }
You decide to drive the 2800 miles from New York to Los Angeles in a hybrid vehicle.
A hybrid vehicle has two modes: using only the electric motor and battery, it can travel 1 mile on 3 units of battery power;
using only the internal combustion engine, it can travel 1 mile on 0.1 liters of gas (about 37 mpg) while also charging the
battery with 1 unit of battery power. At the end of your trip, you have 1400 fewer units of battery power than you did when
you began the trip. How much gasoline did you use (in liters)?
You should define a system with the following dimensions:
net change in the total units of battery power;
total liters of gasoline used;
total number of miles travelled;
number of miles travelled using the electric motor and battery;
number of miles travelled using the engine.
You should define a matrix M∈R^{3×2} to characterize this system. Then, write down an equation containing
that matrix (and three variables in R), and solve it to obtain the quantity of gasoline.
M ⋅
x
y
=
?
?
?
One possible solution is to solve the matrix equation below for n, the number of liters of gasoline used:
3 battery power/mile on battery
1 battery power/mile on engine
1 miles/mile on battery
1 miles/mile on engine
0 liters of gas/mile on battery
0.1 liters of gas/mile on engine
⋅
xmiles on battery
ymiles on engine
=
1400 battery power
2800 miles
nliters of gas
The above equation can be rewrittenn as:
−3 ⋅ x + 1 ⋅ y
=
1400
x + y
=
2800
0 ⋅ x  0.1 ⋅ y
=
n
We then have that 245 liters were used:
x
=
2800 − y
−3 ( 2800 − y) + y
=
−1400
−8400 + 4 y
=
−1400
4 y
=
7000
y
=
1750
n
=
− 0.1 ⋅ 1750
n
=
−175
Notice that it is possible to solve an equation for a 2 × 2 matrix first, and then
use x and y to determine n. For example:
3
1
1
1
⋅
x
y
=
1400
2800
It is also possible to immediately notice that 10 ⋅ n is the number of miles travelled
on n liters of gasoline. Then, the following matrix equation can be set up and solved:
3 battery power/mile on battery
10 battery power/liter of gas
1 miles/mile on battery
10 miles/liter of gas
⋅
xmiles on battery
nliters of gas
=
1400 battery power
2800 miles
Suppose we create a very simple system for modelling how predators and prey interact in a closed environment.
Our system has only two dimensions: the number of prey animals, and the number of predators. We want to model
how the state of the system changes from one generation to the next.
If there are x predators and y prey animals in a given generation, in the next generation the following will be the case:
all predators already in the system will stay in the system;
all prey animals already in the system will stay in the system;
for every prey animal, two new prey animals are introduced into the system;
for every predator, two prey animals are removed from the system;
we ignore any other factors that might affect the state (e.g., natural death or starvation).
Specify explicitly a matrix T in R^{2×2} that takes a description of the system state
in one generation and produces the state of the system during the next generation. Note: you may simply
enter the matrix on its own line for this part of the problem, but you must also use it in the
remaining three parts below.
The following matrix reflects the constraints imposed by the description of a
state transformation:
T
=
1 # predators at time t+1 / predator at time t
0 # predators at time t+1 / prey at time t
2 # prey at time t+1 / predator at time t
3 # prey at time t+1 / prey at time t
Show that the number of predators does not change from one generation to the next.
T ⋅
x
y
=
x
y'
It is sufficient to derive that the number of predators does not change
after a transformation is applied. Let x be the number of predators before
the transformation is applied, and let x' be the number of predators after it is applied.
Then we have that:
1
0
2
3
⋅
x
y
=
x'
y'
1 ⋅ x + 0 ⋅ y
=
x'
x
=
x'
Determine what initial state [x_{0}; y_{0}] is such that there is no change in the state from one generation to another.
T ⋅
x_{0}
y_{0}
=
x_{0}
y_{0}
It is sufficient to solve the equation for x_{0} and y_{0}:
T ⋅
x_{0}
y_{0}
=
x_{0}
y_{0}
1
0
2
3
⋅
x_{0}
y_{0}
=
x_{0}
y_{0}
1 ⋅ x_{0} + 0 ⋅ y_{0}
=
x_{0}
2 ⋅ x_{0} + 3 ⋅ y_{0}
=
y_{0}
The first equation of real numbers above imposes no constraints on x_{0} or y_{0}.
Thus, any vector that satisfies the last equation would be a solution. All vectors
on the line defined by this equation are solutions:
2 ⋅ x_{0} + 3 ⋅ y_{0}
=
y_{0}
2 ⋅ x_{0}
=
2 y_{0}
y_{0}
=
x_{0}
Suppose that in the fourth generation of an instance of this system (that is, after the transformation has
been applied three times), we have 2 predators and 56 prey animals. How many predators and prey animals were
in the system in the first generation (before any transformations were applied)? Let [x; y] represent the
state of the system in the first generation. Set up a matrix equation that involves [x; y] and
matrix multiplication, and solve it to obtain the answer.
It is sufficient to solve the following equation for x and y:
T^{3} ⋅
x
y
=
2
56
We solve it below:
1
0
2
3
⋅
1
0
2
3
⋅
1
0
2
3
⋅
x
y
=
2
56
1
0
8
9
⋅
1
0
2
3
⋅
x
y
=
2
56
1
0
26
27
⋅
x
y
=
2
56
x − 0 ⋅ y
=
2
x
=
2
−26 x + 27 y
=
56
27 y
=
108
y
=
4
3.5. Matrix operations and their interpretations
The following table summarizes the matrix operations that we are considering in this course.
term
definition
restrictions
general properties
M_{1} + M_{2}
componentwise
matrices must have the same number of rows and columns
commutative,
associative,
has identity (matrix with all 0 components),
has inverse (multiply matrix by 1),
scalar multiplication is distributive
M_{1} ⋅ M_{2}
rowcolumnwise dot products
columns in M_{1} = rows in M_{2} rows in M_{1} ⋅ M_{2} = rows in M_{1} columns in M_{1} ⋅ M_{2} = columns in M_{2}
associative,
has identity I (1s in diagonal and 0s elsewhere),
distributive over matrix addition, not commutative in general, no inverse in general
M^{ 1}
columns in M = rows in M
matrix is invertible
M^{ 1} ⋅ M = M ⋅ M^{ 1} = I
The following tables list some highlevel intuitions about how matrix operations can be understood.
level of abstraction
interpretations of multiplication of a vector by a matrix
applications
transformation of system states
extraction of information about system states
computing properties of combinations or aggregations of objects (or system states)
conversion of system state observations from one set of dimensions to another
geometry
"moving" vectors in a space (stretching, skewing, rotating, reflecting)
projecting vectors
taking a linear combination of two vectors
reinterpreting vector notation as referring to a collection of noncanonical vectors
level of abstraction
interpretations of multiplication of two matrices
applications
composition of system state transformations or conversions
geometry
sequencing of motions of vectors within a space (stretching, skewing, rotating, reflecting)
level of abstraction
invertible matrix
singular matrix
applications
reversible transformation of system states
extraction of complete information uniquely determining a system state
irreversible transformation of system states
extraction of incomplete information about a system state
geometry
reversible transformation or motion of vectors in a space
projection onto a strict subset of a set of vectors (space)
symbolic
reversible transformation of information numerically encoded in matrix
(example of such information: system of linear equations encoded as matrix)
irreversible/"lossy" transformation of information encoded in matrix
Suppose we interpret multiplication of a vector by a matrix M as a function from vectors to vectors:
f(v) = Mv.
Fact: Notice that for any M, if f(v) = Mv then f is always a function because Mv has only one possible result (i.e.,
matrix multiplication is deterministic) for a given M and v.
Fact: If f(v) = Mv, then f is invertible if M is an invertible matrix. The inverse of f is then defined to be:
f^{ 1}(v) = M^{ 1}v.
Notice that f^{ 1} is a function because M^{ 1}v only has one possible result. Notice that f^{ 1} is the inverse of f because
f^{ 1}(f(v)) = M^{ 1}Mv = Iv = v.
Fact: If the columns of a matrix M are linearly dependent and f(v) = Mv, then M cannot have an inverse. We consider the case
in which M∈R^{2×2}. Suppose we have that
M
=
a
b
c
d
.
If the columns of M are linearly dependent, then we know that there is some s∈R such that
s
a
c
=
b
d
.
This means that we can rewrite M:
M
=
a
sa
c
sc
.
Since matrix multiplication can be interpreted as taking a linear combination of the column vectors, this means that for x,y∈R,
a
sa
c
sc
x
y
=
a
c
x +
sa
sc
y
=
(x + sy)
a
c
But this means that for any two vectors [x;y] and [x';y'], if x + sy = x' + sy' then multiplying by M will lead to the same result.
Thus, f is a function that takes two different vector arguments and maps them to the same result. If we interpret f as a relation and take
its inverse f^{ 1}, f^{ 1} cannot be a function.
Thus, M cannot have an inverse (if it did, then f^{ 1} would be a function).
Fact: If the columns of a matrix M are linearly independent and f(v) = Mv, then M has an inverse. We consider the case
in which M∈R^{2×2}. Suppose we have that
M
=
a
b
c
d
.
If the columns of M are linearly independent, then we know that ad  b % c ≠ 0:
a/c
≠
b/d
ad
≠
bc
ad  bc
≠
0
Suppose we pick the following M^{ 1}:
M^{ 1}
=
(1/(ad  bc)) ⋅
d
b
c
a
.
The we have that:
M^{ 1} ⋅ M
=
(1/(ad  bc)) ⋅
d
b
c
a
⋅
a
b
c
d
=
(1/(ad  bc)) ⋅
ad  bc
bd  bd
ac  ac
ad  bc
=
1
0
0
1
=
I
Example: Solve the following problems.
Determine which of the following matrices are not invertible:
2
7
4
14
,
0
0
0
1
,
2
3
0
1
The columns of the first matrix are linearly dependent, so it is not invertible. The first column of
the second matrix can be obtained by multiplying the second column by 0, so the two columns of that
matrix are linearly dependent; thus, it is not invertible. For the third matrix, the following equation
has no solution:
2
0
=
s ⋅
3
1
Thus, the third matrix is invertible. It is also possible to determine this by computing the
determinant for each matrix.
The matrix below is not invertible, and the following equation is true. What is the matrix? List all four of its components (real numbers).
a
b
b
c
⋅
1
2
=
5
10
Since the matrix is not invertible, it must be that its determinant is 0. Thus, we have the
following system of equations:
a + 2 b
=
5
b + 2 c
=
10
ac − b^{2}
=
0
If we solve the above, we get:
a
=
1
b
=
2
c
=
4
3.6. Matrix properties
The following are subsets of R^{n×n} that are of interest in this course because they
correspond to transformations and systems that have desirable or useful properties. For some of these
sets of matrices, matrix multiplication and inversion have properties that they do not in general.
subset of R^{n×n}
definition
closed under matrix multiplication
properties of matrix multiplication
inversion
identity matrix
∀ i,j M_{ij} = 1 if i=j, 0 otherwise
closed
commutative, associative, distributive with addition, has identity
has inverse (itself); closed under inversion
elementary matrix
can be obtained via an elementary row operation from I:
add nonzero multiple of one row of the matrix to another row
multiply a row by a nonzero scalar
swap two rows of the matrix
Note: the third is a combination of the first two operations.
associative, distributive with addition, have identity
have inverses; closed under inversion
scalar matrices
∃ s∈R, ∀ i,j M_{ij} = s if i=j, 0 otherwise
closed
commutative, associative, distributive with addition, have identity
nonzero members have inverses; closed under inversion
diagonal matrices
∀ i,j M_{ij}∈R if i=j, 0 otherwise
closed
associative, distributive with addition, have identity
nonzero members have inverses; closed under inversion
matrices with constant diagonal
∀ i,j M_{ii} = M_{jj}
associative, distributive with addition, have identity
symmetric matrices
∀ i,j M_{ij} = M_{ji}
associative, distributive with addition, have identity
symmetric matrices with constant diagonal
∀ i,j M_{ii} = M_{jj} and M_{ij} = M_{ji}
closed
commutative, associative, distributive with addition, have identity
upper triangular matrices
∀ i,j M_{ij} = 0 if i > j
closed
associative, distributive with addition, have identity
not invertible in general; closed under inversion when invertible
lower triangular matrices
∀ i,j M_{ij} = 0 if i < j
closed
associative, distributive with addition, have identity
not invertible in general; closed under inversion when invertible
invertible matrices
∃ M^{ 1} s.t. M^{ 1}M = MM^{ 1} = I
closed
associative, distributive with addition, have identity
nonzero members have inverses; closed under inversion
square matrices
all of R^{n×n}
closed
associative, distributive with addition, have identity
Two facts presented in the above table are worth noting.
Fact: Suppose A is invertible. Then the inverse of A^{ 1} is A, because AA^{ 1} = I. Thus, (A^{ 1})^{ 1} = A.
Fact: If A and B are invertible, so is AB. That is, invertible matrices are closed under matrix multiplication. We can show
this by using the associativity of matrix multiplication. Since A and B are invertible, there exist B^{ 1} and A^{ 1} such that:
(B^{ 1}A^{ 1}) AB
=
(B^{ 1}A^{ 1}) AB
=
B^{ 1} (A^{ 1}A) B
=
B^{ 1}IB
=
B^{ 1}B
=
I.
Thus, since there exists a matrix (B^{ 1}A^{ 1}) such that (B^{ 1}A^{ 1}) AB = I, (B^{ 1}A^{ 1}) is the inverse of AB.
Example:
Given an invertible upper triangular matrix M∈R^{2×2},
show that M^{ 1} is also upper triangular. Hint: write out
the matrix M explicitly.
Suppose we have the following upper triangular matrix:
M
=
x
y
0
z
If it is invertible, then there exists an inverse such that:
x
y
0
z
⋅
a
b
c
d
=
1
0
0
1
This implies that
xa + yc
=
1
xb + yd
=
0
zc
=
0
zd
=
1
Because M is invertible, we know that z ≠ 0. Since zc = 0, This means that c = 0. Thus, we have that the inverse is upper triangular.
Alternatively, we can observe that c = 0, and that using the formula for the inverse would yield a lowerleft
matrix entry equal to −c/(det M) = 0.
Example: In terms of a, b∈R where a ≠ 0 and b ≠ 0, compute the inverse of:
a
0
0
b
a
0
0
b
a
0
0
b
a
0
0
b
While we could perform the instances of matrix multiplication stepbystep and then invert the result (either by solving an equation or
using the formula for the inverse of a matrix in R^{2×2}), it's easier to recall that diagonal matrices behave in a manner that is very
similar to the real numbers. Thus, the above product is equal to
a^{4}
0
0
b^{4}
,
and its inverse simply has the multiplicative inverses of the two diagonal entries as its diagonal entries:
1/a^{4}
0
0
1/b^{4}
.
Another fact not in the table is also worth noting.
Fact: Any product of a finite number of elementary row matrices is invertible. This fact follows from the fact that all
elementary matrices are invertible, and that the set of invertible matrices is closed under multiplication.
Given the last fact, we might ask whether the opposite is true: are all invertible matrices the product of a finite number of elementary
matrices? The answer is yes, as we will see further below.
3.7. Solving the equation Mv = w for M with various properties
Recall that for v,w∈R^{n} and M∈R^{n×n}, an equation of the following form
can represent a system of equations:
Mv = w
Notice that if M is invertible, we can solve for v by multiplying both sides by M^{ 1}. More generally, if M is a member of
some of the sets in the above table, we can find straightforward algorithms for solving such an equation for v.
M is ...
algorithm to solve Mv = w for v
the identity matrix
w is the solution
an elementary matrix
perform a row operation on M to obtain I; perform the same operation on w
a scalar matrix
divide the components of w by the scalar
a diagonal matrix
divide each component of w by the corresponding matrix component
an upper triangular matrix
start with the last entry in v, which is easily obtained;
move backwards through v, filling in the values by
substituting the already known variables
a lower triangular matrix
start with the first entry in v, which is easily obtained;
move forward through v, filling in the values by
substituting the already known variables
product of a lower triangular matrix
and an upper triangular matrix
combine the algorithms for upper and lower triangular
matrices in sequence (see example below)
an invertible matrix
compute the inverse and multiply w by it
Example: We consider an equation Mv = w where M is a diagonal matrix (the identity matrix and all scalar matrices are also diagonal
matrices).
4
0
0
0
3
0
0
0
5
x
y
z
=
2
9
10
4x
=
2
3y
=
9
5z
=
10
x
=
1/2
y
=
3
z
=
2
Example: We consider an equation Mv = w where M is a lower triangular matrix.
4
0
0
2
4
0
4
3
5
x
y
z
=
2
9
18
4x
=
2
2x + 4y
=
9
4x + 3y + 5z
=
10
x
=
1/2
y
=
2
z
=
2
Fact: Suppose that M = LU where L is lower triangular and U is upper triangular. How can we solve Mv = w?
First, note that because matrix multiplication is associative, we have
Mv
=
(LU) v
=
L (Uv)
We introduce a new vector for the intermediate result Uv, which we call u. Now, we have a system of matrix equations.
Uv
=
u
Lu
=
w
We first solve for u using the algorithm for lower triangular matrices, then we solve for v using the algorithm for upper triangular
matrices.
Example: Solve the following equation for x,y,z∈R:
−2
0
0
1
2
0
1
−1
−1
⋅
3
0
1
0
1
2
0
0
1
⋅
x
y
z
=
−8
12
−1
We first divide the problem into two steps using the intermediate vector [ a ; b ; c ]:
2
0
0
1
2
0
1
−1
−1
⋅
a
b
c
=
−8
12
−1
3
0
1
0
1
2
0
0
1
⋅
x
y
z
=
a
b
c
a
b
c
=
4
4
1
x
y
z
=
1
2
1
Example: The inverse of a matrix in R^{2×2} can be computed as follows.
a
b
c
d
^{ 1}
=
d/(ad − bc)
−b/(adbc)
−c/(ad − bc)
a/(ad − bc)
Thus, if we find that the determinant of a matrix M∈R^{2×2} is nonzero, the algorithm for solving Mv = w is
straightforward. Consider the following example.
1
2
3
4
x
y
=
5
13
1
2
3
4
^{ 1}
1
2
3
4
x
y
=
1
2
3
4
^{ 1}
5
13
1
0
0
1
x
y
=
4/((1⋅4)(2⋅3))
−2/((1⋅4)(2⋅3))
−3/((1⋅4)(2⋅3))
1/((1⋅4)(2⋅3))
5
13
1
0
0
1
x
y
=
−2
1
3/2
1/−2
5
13
x
y
=
3
1
3.8. Row echelon form and reduced row echelon form
We define two more properties that a matrix may possess.
M is in row echelon form
all nonzero rows are above any rows consisting of all zeroes
the first nonzero entry (from the left) of a nonzero row is strictly to the right of the first nonzero entry of the row above it
all entries in a column below the first nonzero entry in a row are zero (the first two conditions imply this)
M is in reduced row echelon form
M is in row echelon form
the first nonzero entry in every row is 1; this 1 entry is the only nonzero entry in its column
We can obtain the reduced row echelon form of a matrix using a sequence of appropriately chosen elementary row operations.
Example: Suppose we want to find the reduced row echelon form of the matrix below. We list the steps of the procedure.
1
2
1
2
3
1
3
5
0
→
1
2
1
0
1
3
3
5
0
→
1
2
1
0
1
3
0
1
3
→
1
2
1
0
1
3
0
0
0
→
1
0
5
0
1
3
0
0
0
Fact: For any matrix M, there may be more than one way to reach the reduced row echelon form using elementary row operations. However,
it is always possible, and there is exactly one unique reduced row echelon form for that matrix M. We do not prove this result in this course,
but fairly short proofs by induction can be found elsewhere (such as here).
Because the reduced row echelon form of a matrix M is unique, we use the following notation to denote it:
rref M.
Example: Determine which of the following matrices are elementary:
1
0
0
0
1
0
,
0
0
2
0
1
0
1
0
0
,
1
0
0
0
1
0
0
0
0
,
1
0
0
0
1
0
2
0
1
The matrices are:
(a) not elementary (not square, so not invertible),
(b) not elementary (composition of two elementary row operations applied to the identity),
(c) not elementary (multiplication of a row by the 0 scalar is not invertible),
and (d) elementary (multiple of first row added to last row).
Example: Determine which of the following matrices are in reduced row echelon form:
1
0
2
0
1
4
,
1
0
0
0
1
0
0
1
3
0
0
1
,
1
0
0
1
0
0
,
1
0
0
1
0
1
0
0
,
0
0
0
1
0
0
0
0
0
1
The matrices are:
(a) in reduced row echelon form,
(b) not in reduced row echelon form,
(c) in reduced row echelon form,
(d) not in reduced row echelon form,
and (e) in reduced row echelon form.
Example: Suppose the matrix below is in reduced row echelon form. Solve for a, b∈R.
1
a
0
0
b
1
0
0
1 − a
Without loss of generality, we can focus on four possibility: a is either zero or nonzero, and
b is either zero or nonzero. We can further simplify this by considering 1 as the only nonzero
value of interest. Then we have that:
if a = 0 and b = 0, then the matrix is not in reduced row echelon form;
if a = 0 and b = 1, then the matrix is not in reduced row echelon form;
if a = 1 and b = 0, then the matrix is in reduced row echelon form;
if a = 1 and b = 1, then the matrix is not in reduced row echelon form.
Thus, a = 1 and b = 0.
Question: For a given matrix in reduced row echelon form, how many different matrices can be reduced to it (one or more than one)?
Fact: It is always true that rref M = rref (rref M). Thus, the rref operation is idempotent.
Fact: If rref M is I (the identity matrix), then M is invertible. This is because I is an elementary matrix, and the row operations
used to obtain rref M from M can be represented as a product of elementary (and thus, invertible) matrices E_{1} ⋅ ... ⋅ E_{n}. Thus, we have:
Since elementary matrices and I are invertible, and invertible matrices are closed under matrix multiplication, M must be invertible, too.
Fact: A matrix M∈R^{n×n} is not invertible iff the bottom row of rref M has all zeroes. This is because when all
rows of rref M∈R^{n×n} have at least one nonzero value, rref M must be the identity matrix (try putting a nonzero value on the bottom row, and see what the
definition of reduced row echelon form implies about the rest of the matrix). Since rref M = I, this implies that M must then be invertible by the fact immediately above this one.
Fact: For any invertible matrix M∈R^{n×n}, the reduced row echelon form of M is I∈R^{n×n}.
We can show this is true using a proof by contradiction. Suppose that M is invertible, but rref M ≠ I. We know that rref M can be
obtained via a finite number of elementary row operations E_{1} ⋅ ... ⋅ E_{n}:
(E_{1} ⋅ ... ⋅ E_{n}) M = rref M.
If rref M is not I, then the last row of rref M must consist of only zeroes. But because M is invertible, we have:
((E_{1} ⋅ ... ⋅ E_{n}) M) M^{ 1}
=
(rref M) ⋅ M^{ 1}
E_{1} ⋅ ... ⋅ E_{n}
=
(rref M) ⋅ M^{ 1}
Since the product of elementary matrices is invertible, (rref M) ⋅ M^{ 1} is also invertible. But if the last row of rref M consists of only
zeroes, then the last row of (rref M) M^{ 1} also contains only zeroes, so it cannot be that (rref M) M^{ 1} is invertible. Thus, we have a contradiction, so our assumption that rref M ≠ I is false.
The following table provides an alternative illustration of how the contradiction is derived:
M is invertible
rref M ≠ I
the matrix M^{ 1} exists
the last row of rref M is all zeroes
(E_{1} ⋅ ... ⋅ E_{n}) M = rref M
((E_{1} ⋅ ... ⋅ E_{n}) M) M^{ 1} = (rref M) ⋅ M^{ 1}
E_{1} ⋅ ... ⋅ E_{n} = (rref M) ⋅ M^{ 1}
the last row of (rref M) ⋅ M^{ 1} is all zeroes
(rref M) ⋅ M^{ 1} is invertible because it is a product of the invertible matrices E_{1}, ..., E_{n}
(rref M) ⋅ M^{ 1} is not invertible because multiplication by it is a manytoone function
The above result implies the following fact.
Fact: If a matrix M is invertible, it is the product of a finite number of elementary matrices. This is because rref M is
the identity, which is an elementary matrix, and M can be reduced via a finite number of invertible row operations to I. Thus, the
elementary matrices can be used to generate every possible invertible matrix.
Example: Suppose that for some matrix M∈R^{2×2}, the following row operations can be applied to M (in the order
specified) to obtain the identity matrix I:
add the bottom row to the top row;
swap the two rows;
multiply the bottom row by 1/3.
Find the matrix M^{ 1}.
We know that performing the three row operations on M will result in I. Thus,
we can write out the row operations as three matrices E_{1}, E_{2}, and E_{3}:
E_{3} ⋅ E_{2} ⋅ E_{1} ⋅ M
=
I
1
0
0
1/3
⋅
0
1
1
0
⋅
1
1
0
1
⋅ M
=
I
Thus, we have that:
M^{ 1}
=
1
0
0
1/3
⋅
0
1
1
0
⋅
1
1
0
1
=
0
1
1/3
1/3
We can also find M:
det M
=
−1/3
M
=
1/(−1/3) ⋅
1/3
−1
−1/3
0
=
−1
3
1
0
The following table summarizes the results.
fact
justification
(1)
{M  M is a finite product of elementary matrices}
=
{M  rref M = I}
I is an elementary matrix; sequences of row operations are equivalent to multiplication by elementary matrices
(2)
{M  M is a finite product of elementary matrices}
⊂
{M  M is invertible}
elementary matrices are invertible; products of invertible matrices are invertible
(3)
{M  rref M = I}
⊂
{M  M is invertible}
fact (1) in this table; fact (2) in this table; transitivity of equality
(4)
{M  M is invertible}
⊂
{M  rref M = I}
proof by contradiction; noninvertible M implies rref M has all zeroes in bottom row
(5)
{M  M is invertible}
=
{M  rref M = I}
for any sets A,B, A ⊂ B and B ⊂ A implies A = B
(6)
{M  M is a finite product of elementary matrices}
=
{M  M is invertible}
fact (1) in this table; fact (5) in this table; transitivity of equality
Given all these results, we can say that the properties for a matrix M in the following table are all equivalent:
if any of these is true, then all of them are true. If any of these is false, then all of them are false.
M is invertible
det M ≠ 0
the columns of M are (setwise) linearly independent
Mv = w has exactly one solution
M is a finite product of elementary matrices
Fact: For matrices M∈R^{n×n}, rref M is guaranteed to be upper triangular. Notice that this means that for some finite
product of elementary matrices E_{1} ⋅ ... ⋅ E_{n}, it is the case that
M = (E_{1} ⋅ ... ⋅ E_{n}) ⋅ U.
If E_{1} ,..., E_{n} are all lower triangular, then then M has an LU decomposition. However, this will not always be the case. But recall that
E_{1} ,..., E_{n} are all elementary matrices.
Fact: Given any product of elementary matrices E_{1} ⋅ ... ⋅ E_{n}, it is possible to find a lower triangular matrix
L by applying a finite number of elementary swap operations
S_{1} ,..., S_{n} such that L = S_{1} ⋅ ... ⋅ S_{n} ⋅ E_{1} ⋅ ... ⋅ E_{n} is lower triangular; then we have:
L
=
S_{1} ⋅ ... ⋅ S_{n} ⋅ E_{1} ⋅ ... ⋅ E_{n}
(S_{n}^{ 1} ⋅ ... ⋅ S_{1}^{ 1}) ⋅ L
=
E_{1} ⋅ ... ⋅ E_{n}
Thus, E_{1} ⋅ ... ⋅ E_{n} can be decomposed into a lower triangular matrix and a product of elementary swap matrices.
Fact: Any product of a finite number of swap elementary matrices S_{1} ,..., S_{n} is a permutation matrix.
Fact: Any matrix M can be written as the product of three matrices P ⋅ L ⋅ U where P is a permutation matrix, L is a lowertriangular matrix, and U is an uppertriangular matrix,
where:
P
=
S_{1} ⋅ ... ⋅ S_{n}
L
=
E_{1} ⋅ ... ⋅ E_{k}
U
=
rref M
Example: Suppose we have a system in which a single object is traveling along one spatial dimension (i.e., in a straight line).
The object has a distance travelled, a velocity, and an acceleration; these are the three dimensions of the system:
distance (m)
velocity (m/s)
acceleration (m/s^{2})
Consider the following equations that can be used to compute the distance the object travels and its final velocity given its
acceleration a∈R^{+}, initial velocity v_{0}∈R^{+}, and the amount of time t∈∈R^{+} it travels.
d
=
0.5 at^{2} + v_{0}t
v
=
at
Suppose we want to convert a view of the system state that describes the acceleration and velocity at time 0 into a view of the system
state that represents the distance travelled and velocity at time t. This conversion operation can be represented using a matrix M.
M
=
0.5 t^{2}dist. at t+1 / accel. at t
tdist. at t+1 / vel. at t
tvel. at t+1 / accel. at t
1 vel. at t+1 / vel. at t
=
0.5 t^{2}m / (m/s^{2})
tm / (m/s)
t(m/s) /(m/s^{2})
1 (m/s)/(m/s)
=
0.5 t^{2}s^{2}
ts
ts
1 scalar identity
This matrix is invertible. This immediately tells us that it is possible to derive the initial
acceleration and velocity given only the amount of time that has elapsed,
the distance travelled, and the final velocity.
By computing the inverse of this matrix, we can obtain formulas that allow us to derive the
initial velocity and acceleration of an object given how much time has passed, how far it has travelled, and its current velocity.
M^{ 1}
=
2/t^{2}accel. at t / dist. at t+1
taccel. at t / vel. at t+1
tvel. at t / dist. at t+1
1 vel. at t / vel. at t+1
=
2/t^{2}(m/s^{2})/m
2/t(m/s^{2}) / (m/s)
2/t(m/s) / m
1 (m/s)/(m/s)
=
2/t^{2}1/s^{2}
2/t1 / s
2/t1 / s
1 scalar identity
The formulas can be obtained by multiplying M^{ 1} by a system state describing the distance travelled and current velocity.
This yields:
a
=
2d/t^{2} + 2v/t
v_{0}
=
2d/t  v
We could have also obtained the above formulas using manipulation of equations of real numbers,
or via the corresponding row operations. Let us consider the latter approach:
0.5 t^{2}
t
t
1
⋅
a
v_{0}
=
d
v
0.5 t^{2}
t
t − (2/t ⋅ 0.5 t^{2})
1 − (2/t ⋅ t)
⋅
a
v_{0}
=
d
v − d ⋅ (2/t)
0.5 t^{2}
t
0
1 − 2
⋅
a
v_{0}
=
d
v − (2d/t)
0.5 t^{2}
t
0
1
⋅
a
v_{0}
=
d
(2d/t) − v
0.5 t^{2} − (t ⋅ 0)
t − (t ⋅ 1)
0
1
⋅
a
v_{0}
=
d − (t ⋅ ((2d/t) − v))
(2d/t) − v
0.5 t^{2}
0
0
1
⋅
a
v_{0}
=
− d + vt
(2d/t) − v
1
0
0
1
⋅
a
v_{0}
=
(2/t^{2}) ⋅ (− d + vt)
(2d / t) − v
a
v_{0}
=
−2d/t^{2} + v/t
2d / t − v
3.9. Matrix transpose
Definition: The transpose of a matrix M∈R^{n×n}, denoted M^{⊤}, is defined to be A such that for all i and j, A_{ij} = M_{ji}.
Fact: If a matrix M is scalar, diagonal, or symmetric, M^{⊤} = M. If a matrix M is upper triangular, M^{⊤} is lower triangular (and vice versa).
Example: Below are some examples of matrices and their transposes:
1
0
0
1
^{⊤}
=
1
0
0
1
a
b
c
d
^{⊤}
=
a
c
b
d
1
2
3
4
5
6
^{⊤}
=
1
3
5
2
4
6
1
0
0
2
3
0
4
5
6
^{⊤}
=
1
2
4
0
3
5
0
0
6
Fact: For A,B∈R^{n×n}, it is always the case that:
(A^{⊤})^{⊤}
=
A
(A + B)^{⊤}
=
A^{⊤} + B^{⊤}
s(B^{⊤})
=
(sB)^{⊤}
Fact: It is always the case that (AB)^{⊤} = B^{⊤}A^{⊤}. If M = AB, then:
(AB)_{ij}
=
ith row of A ⋅ jth column of B
=
ith column of A^{⊤} ⋅ jth row of B^{⊤}
=
jth row of B^{⊤} ⋅ ith column of A^{⊤}
=
(B^{⊤}A^{⊤})_{ji}.
Example: We consider the following example with matrices in R^{3× 3}. Let a, b, c, x, y, and z be vectors in R^{3}, with
v_{i} representing the ith entry in a vector v. Suppose we have the following product of two matrices. Notice that a, b, and c are the rows of the lefthand matrix,
and x, y, and z are the columns of the righthand matrix.
a_{1}
a_{2}
a_{3}
b_{1}
b_{2}
b_{3}
c_{1}
c_{2}
c_{3}
⋅
x_{1}
y_{1}
z_{1}
x_{2}
y_{2}
z_{2}
x_{3}
y_{3}
z_{3}
=
(a_{1}, a_{2}, a_{3}) ⋅ (x_{1}, x_{2}, x_{3})
(a_{1}, a_{2}, a_{3}) ⋅ (y_{1}, y_{2}, y_{3})
(a_{1}, a_{2}, a_{3}) ⋅ (z_{1}, z_{2}, z_{3})
(b_{1}, b_{2}, b_{3}) ⋅ (x_{1}, x_{2}, x_{3})
(b_{1}, b_{2}, b_{3}) ⋅ (y_{1}, y_{2}, y_{3})
(b_{1}, b_{2}, b_{3}) ⋅ (z_{1}, z_{2}, z_{3})
(c_{1}, c_{2}, c_{3}) ⋅ (x_{1}, x_{2}, x_{3})
(c_{1}, c_{2}, c_{3}) ⋅ (y_{1}, y_{2}, y_{3})
(c_{1}, c_{2}, c_{3}) ⋅ (z_{1}, z_{2}, z_{3})
=
a ⋅ x
a ⋅ y
a ⋅ z
b ⋅ x
b ⋅ y
b ⋅ z
c ⋅ x
c ⋅ y
c ⋅ z
Suppose we take the transpose of both sides of the equation above. Then we would have:
(
a_{1}
a_{2}
a_{3}
b_{1}
b_{2}
b_{3}
c_{1}
c_{2}
c_{3}
⋅
x_{1}
y_{1}
z_{1}
x_{2}
y_{2}
z_{2}
x_{3}
y_{3}
z_{3}
)^{⊤}
=
a ⋅ x
a ⋅ y
a ⋅ z
b ⋅ x
b ⋅ y
b ⋅ z
c ⋅ x
c ⋅ y
c ⋅ z
^{⊤}
=
a ⋅ x
b ⋅ x
c ⋅ x
a ⋅ y
b ⋅ y
c ⋅ y
a ⋅ z
b ⋅ z
c ⋅ z
=
x ⋅ a
x ⋅ b
x ⋅ c
y ⋅ a
y ⋅ b
y ⋅ c
z ⋅ a
z ⋅ b
z ⋅ c
=
x_{1}
x_{2}
x_{3}
y_{1}
y_{2}
y_{3}
z_{1}
z_{2}
z_{3}
⋅
a_{1}
b_{1}
c_{1}
a_{2}
b_{2}
c_{2}
a_{3}
b_{3}
c_{3}
=
x_{1}
y_{1}
z_{1}
x_{2}
y_{2}
z_{2}
x_{3}
y_{3}
z_{3}
^{⊤}
⋅
a_{1}
a_{2}
a_{3}
b_{1}
b_{2}
b_{3}
c_{1}
c_{2}
c_{3}
^{⊤}
Fact: If A is invertible, then so is A^{⊤}. This can be proven using the fact directly above.
\det [a,b;c,d] = a db c `` = a d  c b `` = \det [a,c;b,d] \det [a,b;c,d] = \det [a,c;b,d]
3.10. Orthogonal matrices
Definition: A matrix M∈R^{n×n} is orthogonal iff M^{⊤} = M^{ 1}.
Fact: The columns of orthogonal matrices are always orthogonal unit vectors.
The rows of orthogonal matrices are always orthogonal unit vectors.
We can see this in the R^{2×2} case. For columns, we use the fact that
M^{⊤} ⋅ M = I:
a
b
c
d
^{⊤}
a
b
c
d
=
1
0
0
1
a
c
b
d
a
b
c
d
=
1
0
0
1
(a,c) ⋅ (a,c)
(a,c) ⋅ (b,d)
(b,d) ⋅ (a,c)
(b,d) ⋅ (b,d)
=
1
0
0
1
.
For rows, we use that M ⋅ M^{⊤} = I:
a
b
c
d
a
b
c
d
^{⊤}
=
1
0
0
1
a
b
c
d
a
c
b
d
=
1
0
0
1
(a,b) ⋅ (a,b)
(a,b) ⋅ (c,d)
(c,d) ⋅ (a,b)
(c,d) ⋅ (c,d)
=
1
0
0
1
.
Below, we provide a verifiable argument of the above fact for the R^{2×2} case.
[a,b;c,d] * [a,c;b,d] = [1,0;0,1] [a a + b b, a c + b d; c a + d b, c c + d d] = [1,0;0,1]
a a + b b = 1 [a;b] * [a;b] = 1 [a;b] = 1
`([a;b]) is a unit vector`
c c + d d = 1 [c;d] * [c;d] = 1 [c;d] = 1
`([c;d]) is a unit vector`
a c + b d = 0 [a;b] * [c;d] = 0
`([a;b]) and ([c;d]) are orthogonal`
Fact: Matrices representing rotations and reflections are orthogonal. We can show this for the general (counterclockwise) rotation matrix:
cos θ
sin θ
sin θ
cos θ
cos θ
sin θ
sin θ
cos θ
^{⊤}
=
cos θ
sin θ
sin θ
cos θ
cos θ
sin θ
sin θ
cos θ
=
cos ^{2} θ + sin ^{2} θ
cos θ sin θ + sin θ cos θ
sin θ cos θ + cos θ sin θ
sin ^{2} θ + cos ^{2} θ
=
1
0
0
1
.
Example: Suppose that the hour hand of an animated
12hour clock face is represented using a vector [ x ; y ]. To
maintain the time, the coordinates [ x ; y ] must be updated once
every hour by applying a matrix M to the vector representing
the hour hand. What is the matrix M?
Since the hour hand must be rotated by 360/12 = 30 degrees in the clockwise direction while
the rotation matrix represents counterclockwise rotation, we actually want to find the
rotation matrix for 360  (360/12) = 330 degrees.
Thus, M must be the rotation matrix for θ = 2π  (2π / 30), where θ is specified
in radians. Thus, we have:
cos (330/360 ⋅ 2π)
=
(√ 3)/2
sin (330/360 ⋅ 2π)
=
−1/2
M
=
cos (330/360 ⋅ 2π)
−sin (330/360 ⋅ 2π)
sin (330/360 ⋅ 2π)
cos (330/360 ⋅ 2π)
=
(√ 3)/2
1/2
−1/2
(√ 3)/2
Example: You are instructed to provide a simple algorithm for
drawing a spiral on the Cartesian plane. The spiral is obtained using
counterclockwise rotation, and for every 360 degree
turn of the spiral, the spiral arm's distance from the origin
should double. Provide a matrix M∈R^{2×2} that will
take any point [ x ; y ] on the spiral and provide the next point
[ x' ; y' ] after θ radians of rotation (your definition of
M should contain θ).
It is sufficient to multiply a rotation matrix by a scalar
matrix that scales a vector according to the angle of
rotation. If the angle of rotation is θ, then the
scale factor s should be such that:
s^{((2π) / θ)}
=
2
s
=
2^{1 / ((2π) / θ)}
s
=
2^{θ/(2π)}
In the above, if n = π/θ then s is the nth root of 2.
Thus M would be:
2^{θ/(2π)}
0
0
2^{θ/(2π)}
⋅
cos θ
−sin θ
sin θ
cos θ
=
2^{θ/(2π)} cos θ
− (2^{θ/(2π)}) sin θ
2^{θ/(2π)} sin θ
2^{θ/(2π)} cos θ
Fact: Orthogonal matrices are closed under multiplication. For orthogonal A, B∈R^{n×n} we have:
(AB)^{⊤}
=
B^{⊤}A^{⊤}
=
B^{ 1}A^{ 1}
=
(AB)^{ 1}.
Fact: Orthogonal matrices are closed under inversion and transposition. For orthogonal A∈R^{n×n} we have:
(A^{ 1})^{⊤}
=
(A^{⊤})^{ 1}.
Thus, we can show that both A^{ 1} and A^{⊤} are orthogonal.
(A^{ 1})^{⊤}
=
(A^{ 1})^{ 1}
(A^{⊤})^{⊤}
=
A.
(A^{⊤})^{⊤}
=
(A^{⊤})^{ 1}
=
A.
We can summarize this by adding another row to our table of matrix subsets.
subset of R^{n×n}
definition
closed under matrix multiplication
properties of matrix multiplication
inversion
orthogonal matrices
M^{⊤} = M^{ 1}
closed
associative, distributive with addition, have identity
nonzero members have inverses; closed under inversion
Example: Let A∈R^{2×2} be an orthogonal matrix. Compute AA^{⊤}A.
We recall that because A is orthogonal, A^{⊤} = A^{ 1}. Thus, we have that
AA^{⊤}A = (AA^{ 1}) A = IA = A.
Fact: Suppose the last row of a matrix M∈R^{n×n} consists of all zeroes. Show that M is not invertible.
Suppose M is invertible. Then M^{⊤} is invertible. But the last column of M is all zeroes, which means it is a trivial linear
combination of the other column vectors of M. Thus, we have a contradiction, so M cannot be invertible.
3.11. Matrix rank
Yet another way to characterize matrices is by considering their rank.
Definition: We can define a the rank of a matrix M, rank M, as the number of nonzero rows in rref M.
We will see in this course that rank M is related in many ways to the various characteristics of a matrix.
Fact: Matrix rank is preserved by elementary row operations. Why is this the case? Because rref is idempotent and rank is defined in
terms of it:
rref M
=
rref (rref M)
number of nonzero rows in rref M
=
number of nonzero rows in rref (rref M)
rank (M)
=
rank (rref M)
Fact: If M∈R^{n×n} is invertible, rank M = n. How can we prove this? Because all invertible matrices
have rref M = I, and I has no rows with all zeroes. If I∈R^{n×n}, then I has n nonzero rows.
Fact: For A∈R^{n×n}, rank A = rank (A^{⊤}).
We will not prove the above in this course in general, but typical proofs
involve first proving that for all A∈R^{n×n}, rank A ≤ rank (A^{⊤}). Why is this sufficient to complete the proof? Because we
can apply this fact for both A and A^{⊤} and use the fact that (A^{⊤})^{⊤} = A:
rank(A)
≤
rank(A^{⊤})
rank(A^{⊤})
≤
rank((A^{⊤})^{⊤})
rank(A^{⊤})
≤
rank(A)
rank(A^{⊤})
=
rank(A)
To get some intuition about the previous fact, we might consider the following question: if the columns of a matrix M∈R^{2×2}
are linearly independent, are the rows also linearly independent?
Fact: The only matrix in R^{n×n} that has no nonzero rows in reduced row echelon form is I.
Fact: If a matrix M∈R^{n×n} has rank n, then it must be that rref M = I (and, thus, M is invertible). This is derived from
the fact above.
Fact: Invertible matrices are closed under the transposition operation. In other words, if M is invertible, then M^{⊤} is invertible. How
can we prove this using the rank operator? We know that rank is preserved under transposition, so we have:
n
=
rank(A)
=
rank(A^{⊤})
=
n
Thus, the rank of rank(A^{⊤}) is n, so it is invertible.
The following table summarizes the important facts about the rank and rref operators.
rank (rref M) = rank M
rref (rref M ) = rref M
rank (M^{⊤}) = rank M
for M∈R^{n×n}, M is invertible iff rank M = n
3.12.Assignment #3: Matrix Properties and Operations
In this assignment you will perform stepbystep algebraic manipulations involving vector and matrix properties and operations.
You may not add new lines or premises above an "implies" logical operator. If you add any
new premises, you will recieve no credit.
Finish the argument below that shows that matrix multiplication of vectors in R^{3} preserves lines.
\forall M \in \R^(3 \times 3), \forall u,v,w \in \R^3, \forall s \in \R,
w = s(uv) + v
`(w) is on the line defined by (u) and (v)`
\implies
# ... put proof steps here ...
`(M w) is on the line defined by (M u) and (M v)`
Finish the argument below that shows that if multiplication by a matrix M maps three vectors to [0; 0; 0], it
maps any linear combination v' of those vectors to [0; 0; 0].
Extra credit: you may include the opposite direction of this proof for extra credit (build a new, separate argument in which
det [a,c; c,d] = 0 is found above the implies and `([a;c]) and ([b;d]) are linearly dependent`
is at the end of the argument below it).
In this problem, you will define explicit matrices that correspond to elementary row operations on matrices in R^{2×2}.
Find appropriate matrices A, B, C, and D in R^{2×2} to finish the following argument.
A * [a,b;c,d] = [a+c, b+d ; c, d] B * [a,b;c,d] = [a, b ; c+a, d+b] C * [a,b;c,d] = [t*a, t*b ; c, d] D * [a,b;c,d] = [a, b ; t*c, t*d]
Use the matrices A, B, C, and D from part (a) with matrix multiplication to construct a matrix E that can be
shown to satisfy the last line in the following argument.
Extra credit: the row operations defined by A, B, C, and D are all invertible as long as t ≠ 0;
prove this for t = 1 by showing that the matrices A, B, C, and D are invertible.
\forall A,B,C,D \in \R^(2 \times 2), \forall t \in \R,
t = 1 A = ? B = ? C = ? D = ?
\implies
# ... put proof steps here ...
`(A) is invertible` `(B) is invertible` `(C) is invertible` `(D) is invertible`
Review 1. Vector and Matrix Algebra and Applications
The following is a breakdown of what you should be able to do at this point in the course
(and of what you may be tested on in an exam).
Notice that many of the tasks below can be composed.
This also means that many problems can be solved in more than one way.
vectors
definitions and algebraic properties of scalar and vector operations (addition, multiplication, etc.)
vector properties and relationships between vectors
dot product of two vectors
norm of a vector
unit vectors
orthogonal projection of a vector onto another vector
orthogonal vectors
linear dependence of two vectors
linear independence of two vectors
linear combinations of vectors
linear independence of three vectors
lines and planes
line defined by a vector and the origin ([0; 0])
line defined by two vectors
line in R^{2} defined by a vector orthogonal to that line
plane in R^{3} defined by a vector orthogonal to that plane
matrices
algebraic properties of scalar and matrix multiplication and matrix addition
collections of matrices and their properties (e.g., invertibility, closure)
identity matrix
elementary matrices
scalar matrices
diagonal matrices
upper and lower triangular matrices
matrices in reduced row echcelon form
determinant of a matrix in R^{2×2}
inverse of a matrix and invertible matrices
other matrix operations and properties
determine whether a matrix is invertible
using the determinant for matrices in R^{2×2}
using facts about rref for matrices in R^{n×n}
algebraic properties of matrix inverses with respect to matrix multiplication
transpose of a matrix
algebraic properties of transposed matrices with respect to matrix addition, multiplication, and inversion
matrix rank
matrices in applications
solve an equation of the form LU = w
matrices and systems of states
interpret partial observations of system states as vectors
interpret relationships betweem dimensions in a system of states as a matrix
given a partial description of a system state and a matrix of relationships, find the full description of the system state
interpret system state transitions/transformations over time as matrices
population growth/distributions over time
compute the system state after a specifieds amount of time
find the fixed point of a transition matrix
Below is a comprehensive collection of review problems going over the course material covered until this point. These problems are an
accurate representation of the kinds of problems you may see on an exam.
Example: Find any h∈R such that the following two vectors are linearly independent.
5
5
2
,
20
20
h
There are many ways to solve this problem. One way is to use the definition of linear dependence and find an h that does not satisfy it.
s
5
5
2
=
20
20
h
Then, we have that any h such that h ≠ 8 is sufficient to contradict linear dependence (and, thus, imply linear independence):
s ⋅ 5
=
20
s
=
4
s ⋅ (2)
=
h
8
=
h
Another solution is to recall that orthogonality implies linear independence. Thus, it is sufficient to find h such that
the two vectors are orthogonal.
5
5
2
⋅
20
20
h
=
0
This implies h = 100.
5(20) + (5)(20) + (2)h
=
0
h
=
100
Example: Suppose we have a matrix M such that the following three equations are true:
M
1
0
0
=
3
2
M
0
1
0
=
3
0
M
0
0
1
=
3
1
Compute the following:
M
2
1
1
=
?
We should recall that multiplying a matrix by a canonical unit vector with 1 in the ith row in the vector gives us the ith column of the matrix.
Thus, we can immediately infer that:
M
=
3
3
3
2
0
1
Thus, we have:
M ⋅
2
1
1
=
(3,3,3) ⋅ (2,1,1)
(2,0,1) ⋅ (2,1,1)
=
12
5
Example: List at least three properties of the following matrix:
0
1
0
1
0
0
0
0
1
The matrix has many properties, such as:
it is an elementary matrix
it is an invertible matrix
it is an orthogonal matrix
it is a symmetric matrix
it has rank n
its reduced row echelon form is the identity
Example: Find the matrix B∈R^{2×2} that is symmetric, has a constant diagonal
(i.e., all entries on the diagonal are the same real number), and satisfies the following
equation:
B
2
1
=
15
6
We know that B is symmetric and has a constant diagonal, so we need to solve for a and b in:
a
b
b
a
⋅
2
1
=
15
6
Example: Compute the inverse of the following matrix:
2
3
1
2
One approach is to set up the following equation and solve for a,b,c, and d:
a
b
c
d
⋅
2
3
1
2
=
1
0
0
1
Another approach is to apply the formula for the inverse of a matrix in R^{2×2}:
2
3
1
2
^{ 1}
=
(1 / (2 ⋅ 2  3 ⋅ 1)) ⋅
2
−3
−1
2
=
2
−3
−1
2
Example: Let a∈R be such that a ≠ 0. Compute the inverse of the following matrix:
a
a
a
a
As in the previous problem, we can either solve an equation or apply the formula:
a
a
a
a
^{ 1}
=
(1 / (a^{2}  a^{2})) ⋅
a
a
a
a
Example: Suppose x∈R is such that x ≠ 0. Compute the inverse of the following matrix:
x
0
2x
0
x
0
0
0
4x
Because we have an upper triangular matrix, computing the inverse by solving the following equation is fairly efficient. Start by considering
the bottom row and its product with each of the columns. This will generate the values for g, h, and i. You can then proceed to the other
rows.
x
0
2x
0
x
0
0
0
4x
a
b
c
d
e
f
g
h
i
=
1
0
0
0
1
0
0
0
1
The solution is:
1/x
0
1/(2x)
0
1/x
0
0
0
1/4x
Example: Assume a matrix M∈R^{2×2} is symmetric, has constant diagonal, and all its entries are nonzero. Show that A cannot
be an orthogonal matrix.
Let a ≠ 0 and b ≠ 0, and let us define:
M
=
a
b
b
a
.
Suppose that M is an orthogonal matrix; then, we have that:
M^{ 1}
=
M^{⊤}
M ⋅ M^{⊤}
=
I
a
b
b
a
⋅
a
b
b
a
=
1
0
0
1
.
Thus, we have ba + ab = 0, so 2ab = 0. This implies that either a = 0 or b = 0. But this contradicts the assumptions, so M cannot be
orthogonal.
Example: Suppose the Earth is located at the origin and your spaceship is in space at the location corresponding to the vector
[ 5 ; 4; 2 ]. Earth is sending transmissions along the vector [ √(3)/3 ; √(3)/3 ; √(3)/3 ]. What is the shortest distance
your spaceship must travel in order to hear a transmission from Earth?
By the triangle inequality, for any vector v∈R^{3}, the closest point to v on a line is the orthogonal projection
of v onto that line. We need to find the distance from the spaceship's current position to that point.
We first compute the orthogonal projection of [ 5 ; 4; 2 ] onto the line L
defined as follows:
L
=
{ a ⋅
√(3)/3
√(3)/3
√(3)/3
 a∈R}
We can compute the orthogonal projection using the formula for an orthogonal projection of one vector onto another.
Notice that the vector specifying the direction of transmission is already a unit vector:

√(3)/3
√(3)/3
√(3)/3

=
1
5
4
2
⋅
√(3)/3
√(3)/3
√(3)/3
=
√(3)/3
(v ⋅ u/u) ⋅ u/u
=
√(3)/3 ⋅
√(3)/3
√(3)/3
√(3)/3
=
1/3
1/3
1/3
Now, we must compute the distance between the destination above and the spaceship's current position:

1/3
1/3
1/3

5
4
2

=

16/3
11/3
5/3

=
√((16/3)^{2} + (11/3)^{2} + (5/3)^{2})
=
√(256/9 + 121/9 + 25/9)
=
√(402)/3
Thus, the shortest distance is √(402)/3.
Example: Two communication towers located at v∈R^{2} and u∈R^{2} are sending directed signals to each other (it is only possible to hear
the signal when in its direct path). You are at the origin and want to travel the shortest possible distance to
intercept their signals. What vector specifies the distance and direction you must travel to intercept their signals?
We can solve this problem by recognizing that the closest point to the origin on the line along which the signals can be intercepted is the vector that is orthogonal to the line (i.e., it is the orthogonal projection of the origin onto the
line). The definition of the line is:
L
=
{ a ⋅ (u  v) + v  a∈R}
Thus, the point p∈R^{2} to which we want to travel must be both on the line and orthogonal to the
line (i.e., its slope must be orthogonal to any vector that represents the slope of the line, such as u  v). In other words, it must satisfy the following two equations:
p
=
a ⋅ (u  v) + v
p ⋅ (u  v)
=
0
Thus, it is sufficient to solve the above system of equations for p.
4. Vector Spaces
We are interested in studying sets of vectors because they can be used to model sets of system states, observations
and data that might be obtained about systems, geometric shapes and regions, and so on. We can then represent realworld problems
(e.g., given some observations, what is the actual system state) as equations of the form M ⋅ v = w, and sets of vectors as
the collections of possible solutions to those equations. But what is exactly the set of possible solutions to M ⋅ v = w?
Can we characterize it precisely? Can we define a succinct notation for it? Can we say anything about it beyond simply solving the equation?
Does this tell us anything about our system?
In some cases, it may make more sense to consider only a finite set of system states, or an infinite set of discrete states (i.e., only
vectors that contain integer components); for example, this occurs if vectors are used to represent the number of atoms, molecules, cows, chickens,
power plants, single family homes, and so on. However, in this course, we make the assumption that our our sets
of system states (our models of systems) are infinite and continuous (i.e., not finite and not discrete); in this context, this simply
means that the entries in the vectors we use to represent system states are real numbers.
Notice that the assumption of continuity means that, for example, if we are looking for a particular state (e.g., a state corresponding
to some set of observations), we allow the possibility that the state we find will not correspond exactly to a state that "makes sense". Consider
the example problem involving the barn of cows and chickens. Suppose we observe 4 heads and 9 legs. We use the matrix that
represents the relationships between the various dimensions of the system to find the number of cows and chickens:
1
1
2
4
⋅
xchickens
ycows
=
4 heads
9 legs
x
=
3.5
y
=
0.5
Notice that the solution above is not an integer solution; yet, it is a solution to the equation we introduced because the set of system states
we are allowing in our solution space (the model of the system) contains all vectors in R^{2}, not just those with integer entries.
As we did with vectors and matrices, we introduce a succinct language (consisting of symbols, operators, predicates, terms, and formulas) for
infinite, continuous sets of vectors. And, as with vectors and matrices, we study the algebraic laws that govern these symbolic expressions.
4.1. Sets of vectors and their notation
We will consider three kinds of sets of vectors in this course; they are listed in the table below.
kind of set (of vectors)
maximum cardinality ("quantity of elements")
solution space of a...
examples
finite set of vectors
finite
{(0,0)}
{(2,3),(4,5),(0,1)}
vector space
infinite
homogenous system of linear equations: M ⋅ v = 0
{(0,0)}
R
R^{2}
span{(1,2),(2,3),(0,1)}
any point, line, or plane intersecting the origin
affine space
infinite
nonhomogenous system of linear equations: M ⋅ v = w
{ a + v  v∈V} where V is a vector space and a is a vector
any point, line, or plane
To represent finite sets of vectors symbolically, we adopt the convention of simply listing the vectors between a pair of braces (as with any
set of objects). However, we need a different convention for symbolically representing vector spaces and affine spaces. This is because we
must use a symbol of finite size to represent a vector space or affine space that may contain an infinite number of vectors.
If the solution spaces to equations of the form M ⋅ v = w are infinite, continuous sets of vectors, in what way can be characterize them?
Suppose that M∈R^{2×2} and that is M invertible. Then we have that:
M
=
a
b
c
d
w
=
s
t
M ⋅ v
=
w
a
b
c
d
⋅ v
=
s
t
v
=
(1/(adbc)) ⋅
d
−b
−c
a
⋅
s
t
v
=
(s/(ad−bc)) ⋅
d
−c
+ (t/(ad−bc)) ⋅
−b
a
Notice that the set of possible solutions v is a linear combination of two vectors in R^{2}. In fact, if a collection of solutions
(i.e., vectors) to the equation M ⋅ v = 0 exists, it must be a set of linear combinations (in the more general case of
M ⋅ v = w, it is a set of linear combinations with some specified offset). Thus, we introduce a succinct notation
for a collection of linear combinations of vectors.
Definition: A span of a set of vectors { v_{1}, ..., v_{n} } is the set of all linear combinations of vectors in { v_{1}, ..., v_{n} }:
Example: For each of the following spans, expand the notation into the equivalent set comprehension and determine if the set
of vectors is a point, a line, a plane, or a threedimensional space.
The set defined by:
span {
1
2
}
=
{a ⋅
1
2
 a∈R}
The above span is a line.
The set defined by:
span {
1
2
,
−2
4
}
=
span {
1
−2
}
=
{a ⋅
1
−2
 a∈R}
The above span is a line.
The set defined by:
span {
0
1
,
1
0
,
1
1
}
=
span {
0
1
,
1
0
}
=
{a ⋅
1
0
+ b ⋅
0
1
 a,b∈R}
=
R^{2}
The above span is a plane.
The set defined by:
span {
0
0
}
=
{
0
0
}
The above span is a set containing a single point (the origin).
The set defined by:
span {
0
1
0
,
1
0
0
,
0
0
1
}
=
{a ⋅
0
1
0
+ b ⋅
1
0
0
+ c ⋅
0
0
1
 a,b,c∈R}
=
R^{3}
The above span is a threedimensional space.
Example: Using span notation, describe the set of vectors that are orthogonal to the vector [ 1 ; 2 ] ∈R^{2}.
We know that the set of vectors orthogonal to [ 1 ; 2 ] can be defined as the line L where:
L
=
{
x
y

x
y
⋅
1
2
= 0 }
It suffices to find a vector on the the line L. The equation for the line is:
x
y
⋅
1
2
=
0
1 ⋅ x + 2 ⋅ y
=
0
y
=
(1/2) x
We can choose any point on the line y = (1/2) x; for example, [ 2 ; 1 ]. Then, we have that:
L
=
span {
2
1
}
Example: Using span notation, describe the set of solutions to the following matrix equation:
1
2
2
4
⋅
x
y
=
0
0
Notice that the above equation implies two equations:
1
2
⋅
x
y
=
0
2
4
⋅
x
y
=
0
In fact, the second equation provides no additional information because 2 ⋅ [ 1 ; 2 ] = [ 2 ; 4 ]. Thus, the solution space
is the set of vectors:
L
=
{
x
y

1
2
⋅
x
y
= 0 }
We can now use our solution to the previous problem to find a span notation for the solution space:
L
=
span {
2
1
}
Example: Suppose also that system states are described by vectors v∈R^{3} with the following units for each dimension:
v
=
x# carbon atoms
y# hydrogen atoms
z# oxygen atoms
Individual molecules are characterized using the following vectors:
C_{3}H_{8}:
3
8
0
, O_{2}:
0
0
2
, CO_{2}:
1
0
2
, H_{2}O:
0
2
1
Suppose that a mixture contains only molecules of water (H_{2}O) and carbon dioxide (CO_{2}).
Using the span notation, specify the possible set of system states that satisfy these criteria.
The possible set of system states for the mixture is:
V
=
span {
1
0
2
,
0
2
1
}
Recall that the above set is continuous. In this particular problem, only vectors with integer components
would be of interest; thus, the above set is at least guaranteed to contain all possible system states that satisfy
the specified criteria.
Then, if we wanted to solve a more constrained problem that satisfies the above criteria, we would know that we should only
consider vectors v∈V as possible solutions.
Suppose we observe the following system state (in terms of the number of each kind of atom):
v
=
100
400
400
How can we determine whether the above mixture could contain only H_{2}0 and CO_{2}?
Suppose we want to know if a mixture of only C_{3}H_{8} and O_{2} can ever react to produce a mixture of only CO_{2} and H_{2}O.
How can we represent this as an equation of spans? This problem can be represented as follows:
Definition: A vector space is a set of vectors that contains 0, is closed under vector addition and scalar multiplication, and is
such that all the elements in the set satisfy the vector space axioms governing vector addition and scalar multiplication.
Fact: Any set of linear combinations of a collection of vectors is closed under vector addition and scalar multiplication, contains
0, and satisfies the vector space axioms. In other words, for any collection of vectors v_{1}, ..., v_{n}, span{v_{1}, ..., v_{n}} is a vector space.
Fact: For any set of vectors V ⊂ R^{n} that satisfies the vector space axioms, there exists a finite set of at most n vectors
v_{1}, ..., v_{k} (where k ≤ n) such that span{v_{1}, ..., v_{k}} = V.
Given the two facts above, we can safely adopt span{v_{1}, ..., v_{n}} as a standard notation for vector spaces. In addition to this notation,
we will also often use the notation R^{n} for specific values of n (e.g., R^{2} is equivalent to span{[1;0],[0;1]}), and {0} for specific
vector 0 (e.g., {[0;0]} is equivalent to span{[0;0]}).
4.2. Membership and equality relations involving sets of vectors
Recall that many of the algebraic laws we saw governing operators on vectors and matrices involved equality of vectors and matrices. In fact, one
can view these laws as collectively defining the semantic equality of the symbols (i.e., they specify when two symbols refer to the same
object).
The meaning of symbols is closely tied to the equality relation we define over them. In the case of the span notation, one potential
problem is that there is more than one way to describe a set using span notation. For example:
span {
2
3
}
=
span {
4
6
}
span {
1
0
,
0
1
}
=
span {
2
1
,
0
1
}
How can we determine if two spans are equivalent? We must define the equality relation (i.e., the relational operator) that applies
to sets of vectors (including infinite sets containing all linear combinations of vectors). We first recall the definition of equality
for our vector notation.
Fact: Recall that two vectors v, w∈R^{n} are equal if and only if their components are equivalent:
a_{1}
a_{2}
⋮
a_{n}
=
b_{1}
b_{2}
⋮
b_{n}
iff
a_{1} = b_{1} and a_{2} = b_{2} and ... and a_{n} = b_{n}
With the equality of pairs of vectors defined, it is possible to provide a definition of equality for sets of vectors (both finite and infinite).
However, we will build the definition gradually. First, let us define when a vector is a member of a finite set of vectors using
vector equality.
Fact: A vector v∈R^{n} is a member of the finite set of vectors { w_{1}, ..., w_{k} } ⊂ R^{n} if it is equivalent to one of the
vectors in the set:
v∈ { w_{1}, ..., w_{k} }
iff
∃ w∈ { w_{1}, ..., w_{k} } s.t. v = w
Notice that the above defines membership of v in the set using an equation (in this case, v = w_{i} where i∈ {1,...,k}). We can
take the same approach in the case of an infinite set of vectors.
Fact: A vector v∈R^{n} is a member of the set of vectors span { w_{1}, ..., w_{k} } ⊂ R^{n} if it is equivalent to one of the
vectors in the set:
Thus, we can determine membership of a vector in a span by solving a matrix equation of the form M ⋅ u = v.
Example: Determine whether the following formula is true:
1
6
∈
span {
1
4
,
1
1
}
We proceed as in the fact above:
1
6
∈ span {
1
4
,
1
1
}
iff
∃ w∈ span {
1
4
,
1
1
} s.t.
1
6
= w
iff
∃ w∈ { a ⋅
1
4
+ b ⋅
1
1
 a∈R, b∈R } s.t.
1
6
= w
iff
∃ a ⋅
1
4
+ b ⋅
1
1
∈ span {
1
4
,
1
1
} s.t.
1
6
= a ⋅
1
4
+ b ⋅
1
1
iff
∃ a∈R, b∈R s.t.
1
6
= a ⋅
1
4
+ b ⋅
1
1
iff
∃
a
b
∈R^{2} s.t.
1
1
4
1
⋅
a
b
=
1
6
Since a solution to the matrix equation exists, the formula is true:
1
1
4
1
⋅
a
b
=
1
6
a
=
1
b
=
2
Example: Solve each of the following problems.
Determine whether the following formula is true or false:
2
5
1
∈ span {
4
0
2
,
0
1
0
}
Define the following line using span notation:
L
=
{
x
y
 y = 3 ⋅ x }
Now suppose we want to determine if a finite set of vectors is a subset of a span. Let us first consider how we would check if a
finite set of vectors is a subset of another finite set of vectors.
Fact: A finite set of vectors { v_{1}, ..., v_{j} } ⊂ R^{n} is a subset of the finite set of vectors { w_{1}, ..., w_{k} } ⊂ R^{n} if every
member of { v_{1}, ..., v_{j} } is a member of { w_{1}, ..., w_{k} }:
{ v_{1}, ..., v_{j} } ⊂ { w_{1}, ..., w_{k} }
iff
∀ v∈ { v_{1}, ..., v_{j} }, ∃ w∈ { w_{1}, ..., w_{k} } s.t. v = w
Thus, we can generalize the above by replacing the second finite set of vectors with a span.
Fact: A finite set of vectors { v_{1}, ..., v_{j} } ⊂ R^{n} is a subset of the set of vectors span { w_{1}, ..., w_{k} } ⊂ R^{n} if every
member of { v_{1}, ..., v_{j} } is a member of span { w_{1}, ..., w_{k} }:
Definition: Using facts about sets,
we can then specify the following definition of equality between two spans for two finite sets of vectors V ⊂ R^{n} and
W ⊂ R^{n}:
span W = span V
iff
span W ⊂ span V and span V ⊂ span W
Example: Determine whether the following formula is true:
span {
1
2
,
2
1
}
∈
span {
9
3
,
3
1
,
6
2
}
It suffices to set up two matrix equations and determine if solutions A∈R^{2 × 3} and B∈R^{3 × 2}
exist:
1
2
2
1
⋅ A
=
9
3
6
3
1
−2
9
3
6
3
1
−2
⋅ B
=
1
2
2
1
Alternatively, we can check each vector individually (this is analogous to considering the above equations for A and B one column at a time).
Is
1
2
a linear combination of
9
3
,
3
1
, and
6
2
?
Is
2
1
a linear combination of
9
3
,
3
1
, and
6
2
?
Is
9
3
a linear combination of
1
2
and
2
1
?
Is
3
1
a linear combination of
1
2
and
2
1
?
Is
6
2
a linear combination of
1
2
and
2
1
?
If all of the above are true, then the two spans are equivalent. If any of the above is false, then the two spans are not equivalent.
The above implies that a matrix can be used to represent a particular vector space. We will see in other sections
further below that a given vector space can be represented in more than one way by a matrix, and that more than one matrix can be used
to represent a vector space.
Definition: Given two vector spaces W and V, we have:
W is a (vector) subspace of V iff V ⊂ W.
4.3. Vector spaces as abstract structures
Our definitions of a vector space so far have explicitly referenced sets of concrete vectors as we usually understand them. However, this is not
necessary.
Definition: A vector space is a set of objects X that satisfies the following conditions:
addition ⊕: X × X→X is an operation on elements of X such that:
X is closed under ⊕, so for any x,y∈X:
x ⊕ y
∈
X
⊕ is commutative and associative; for any x,y,z∈X, we have:
x ⊕ y
=
y ⊕ x
(x ⊕ y) ⊕ z
=
x ⊕ (y ⊕ z)
there is a unique additive identity 0∈X for the operation ⊕ where for any x∈X:
x ⊕ 0
=
x
0 ⊕ x
=
x
every element x∈X has an additive inverse x where:
x ⊕ (x)
=
0
(x) ⊕ x
=
0
scalar multiplication ⊗: R × X→X is an operation on elements of X such that:
X is closed under ⊗, so for any s∈R, x∈X:
s ⊗ x
∈
X
1 ∈R is an identity with ⊗, so for any x∈X:
1 ⊗ x
=
x
⊗ is associative, so for any s, t∈R, x∈X:
s ⊗ (t ⊗ x)
=
(s ⊗ t) ⊗ x
⊗ distributes across ⊕, so for any x,y∈X:
s ⊗ (x ⊕ y)
=
s ⊗ x ⊕ s ⊗ y
The above definition specifies conditions under which any set of objects S can be studied as a vector space.
We have encountered many sets that satisfy the above properties (thus making them vector spaces); below is a table of vector spaces,
including vector spaces we have already encountered and vector spaces we will encounter later in the course.
vector space
addition operation
additive identity
scalar multiplication operation
R
addition of real numbers
0
multiplication of real numbers
R^{2}
vector addition:
[ a ; b ] + [ c ; d ] = [ a + c ; b + d ]
[ 0 ; 0 ]
scalar multiplication: s ⋅ [ a ; b ] = [ s ⋅ a ; s ⋅ b ]
R^{3}
vector addition:
[ a ; b ; c ] + [ d ; e ; f ] = [ a + d ; b + e ; c + f ]
[ 0 ; 0 ; 0]
scalar multiplication: s ⋅ [ a ; b ; c ] = [ s ⋅ a ; s ⋅ b ; s ⋅ c ]
scalar multiplication: s ⋅ [ a_{1} ; ... ; a_{n} ] = [ s ... a_{1} ; ... ; s ⋅ a_{n} ]
span { [0;0] } = { [0;0] }
vector addition
[0;0]
scalar multiplication of vectors in R^{2}
span { v_{1} , ... , v_{k} } ⊂ R^{n}
vector addition
[ 0 ; ... ; 0 ]
scalar multiplication of vectors in R^{n}
R^{2×2}
matrix addition
[ 0 , 0 ; 0 , 0 ]
scalar multiplication of a matrix
R^{n×n}
matrix addition
[ 0, ..., 0 ; ... ; 0, ..., 0 ]
scalar multiplication of a matrix
affine space with origin at a∈R^{2}
v ⊕ w = (v  a) + (w  a) + a
a
s ⊗ v = s ⋅ (v  a) + a
set of lines through the origin f(x) = ax
f ⊕ g = h where h(x) = f(x) + g(x)
f(x) = 0 ⋅ x
s ⊗ f = h where h(x) = s × f(x)
set of polynomials of degree 2 f(x) = ax^{2} + bx + c
f ⊕ g = h where h(x) = f(x) + g(x)
f(x) = 0
s ⊗ f = h where h(x) = s × f(x)
set of polynomials of degree k f(x) = a_{k}x^{k} + ... + a_{0}
f ⊕ g = h where h(x) = f(x) + g(x)
f(x) = 0
s ⊗ f = h where h(x) = s × f(x)
Example: The affine space A = {a + v  v∈V} is a vector space for appropriate definitions of addition, scalar multiplication, and identity.
addition (⊕) can be defined as follows. It is an operation on elements of A under which A is closed, and which satisfies the vector space axioms:
v ⊕ w = u where u = (v  a) + (w  a) + a = v + w  2a + a = v + w  a.
scalar multiplication (⊗) can be defined as follows. It is an operation on elements of A under which A is closed,
and which satisfies the vector space axioms:
s ⊗ v = u where u = (s ⋅ (v  a)) + a = sv  sa + a = sv + (1s)a.
there is a unique additive identity in A; it is the vector a:
v ⊕ a = v + a  a = v.
An abstract definition of vector spaces is useful because it also allows
us to study by analogy and learn about other objects that are not necessarily sets of vectors. All the properties we can derive about vector
spaces using the above definition will apply to other sets of objects that also satisfy the above definition.
Example: We consider the set of functions F = {f  f(x) = cx, c∈R}. How do we show that F is a vector
space?
there is a unique additive identity in F: f(x) = 0x
addition (+) is an operation on elements of F under which F is closed, and which satisfies the vector space axioms:
f + g = h where h(x) = f(x) + g(x)
scalar multiplication (⋅) is an operation on elements of F under which F is closed, and which satisfies the vector space axioms:
s ⋅ f = h where h(x) = s ⋅ f(x)
Example: Show that F = {f  f(x) = bx^{2} + cx, b,c∈R} is a vector space.
For our purposes, it is sufficient to show that there is an additive identity, that the set is closed under addition, and that the set is closed under
scalar multiplication. There is an additive identity:
f(x) = 0 ⋅ x^{2} + 0 ⋅ x
The set is closed under the usual addition operation on polynomial curves; for any f, g∈F we have f + g = h where:
f(x)
=
bx^{2} + cx
g(x)
=
b'x^{2} + c'x
f(x) + g(x)
=
(bx^{2} + cx) + (b'x^{2} + c'x)
=
(b + b') ⋅ x^{2} + (c + c') ⋅ x
h(x)
=
(b + b') ⋅ x^{2} + (c + c') ⋅ x
h
∈
F
For any f∈F and s∈R we have h = s ⋅ f where:
f(x)
=
bx^{2} + cx
s ⋅ f(x)
=
s ⋅ (bx^{2} + cx)
s ⋅ f(x)
=
(sb)x^{2} + (sc)x
h(x)
=
(sb)x^{2} + (sc)x
h
∈
F
Example: Find two elements in F = {f  f(x) = bx^{2} + cx + d, b,c,d∈R} that are linearly independent. What does it mean for
two functions to be linearly independent?
Two vectors g, h∈ \F are linearly independent if they are not linearly dependent. This means there exists no scalar such that
s ⋅ g = h or s ⋅ h = g. One example of such a pair would be:
g(x)
=
x^{2}
h(x)
=
1
There is no single scalar by which we can multiply the curve h(x) = 1 to obtain a parabola, and there is no single scalar by which we can
multiply the parabola to obtain a nonzero flat line.
Example: What is a finite set of functions that spans {f  f(x) = bx^{2} + cx + d, b,c,d∈R}? One such set is:
{f,g,h} where f(x) = 1, g(x) = x, h(x) = x^{2}
Example: We consider the set of functions {f  f(x) = x^{k}, k∈R}. The following definitions of addition and scalar multiplication
make this set of functions a vector space.
there is a unique additive identity: f(x) = x^{0}
addition (+) is defined as:
f + g = h where h(x) = f(x) ⋅ g(x)
scalar multiplication (⋅) is an operation on elements of S under which S is closed, and which satisfies the vector space axioms:
s ⋅ f = h where h(x) = f(x)^{s}
Fact: Once we start thinking of curves as a vector space, we can rewrite curvefitting problems as equations involving linear combinations of
curves. For example, suppose we have some curve φ for which we do not know the formula...
φ(x) = ...
...that represents some observable phenomenon. We can hypothesize that
the curve φ(x) is a linear combination of simpler curves, such as f(x) = x and g(x) = x^{2}. This is equivalent to
hypothesizing that φ is in the following vector space:
span {f, g} = { f  f(x) = ax^{2} + bx where a,b∈R }
We can then set up the following
equation:
a ⋅ f + b ⋅ g
=
φ
However, the above equation cannot directly be solved for a,b∈R because we do not know the formula for φ. Suppose we can sample
φ on some finite collection of input values { x_{1},...,x_{m} } ⊂ R.
We can then write an equation that approximates the above equation at those
points:
a ⋅
f(x_{1})
⋮
f(x_{m})
+ b ⋅
g(x_{1})
⋮
g(x_{m})
=
φ(x_{1})
⋮
φ(x_{m})
The above is equivalent to the following system of equations:
a ⋅ f(x_{1}) + b ⋅ g(x_{1})
=
φ(x_{1})
⋮
a ⋅ f(x_{m}) + b ⋅ g(x_{m})
=
φ(x_{m})
It can also be rewritten as the following matrix equation:
f(x_{1})
g(x_{1})
⋮
f(x_{m})
g(x_{m})
⋅
a
b
=
φ(x_{1})
⋮
φ(x_{m})
Notice that because we know f and g exactly, the matrix above has all constant entries. Furthermore, the righthand side of the equation also
consists of a vector with all constants as entries because we assumed we can sample φ on those inputs. Thus, we now have a matrix equation of
the form M ⋅ v = w.
We can now use rref computation or, if the matrix is invertible, any other method for computing the matrix inverse to solve
the above equation for a,b∈R. This would give us the exact coefficients of a curve in span {f, g} that fits φ at the x values { x_{1},...,x_{m} }.
Fact: For any two natural numbers k, m∈N where k ≠ m, the curves f and g defined below are linearly independent:
f(x) = x^{k}
g(x) = x^{m}
Corollary: For any natural number k∈N, the curves f and g defined below are linearly independent:
f(x) = x^{k}
g(x) = x^{k+1}
We can see this by choosing any set of real number inputs { x_{1},...,x_{k+2} }; the equation below then has no solution s∈R:
f
=
s ⋅ g
f(x_{1})
⋮
f(x_{k + 2})
=
s ⋅
g(x_{1})
⋮
g(x_{k + 2})
Since no such s exists, f and g must be linearly independent.
Example: Suppose we have the following two curves:
f(x) = x
g(x) = x^{2}
If we choose any three inputs, e.g., {1,2,3}, we will find that g cannot be a linear combination of f because the three points will not be collinear.
f
=
s ⋅ g
f(1)
f(2)
f(3)
=
s ⋅
g(1)
g(2)
g(3)
1
2
3
=
s ⋅
1
4
9
s
=
1
s
=
1/2
1
=
1/2
Since we derived a contradiction above, no such s exists, so f and g must be linearly independent.
Fact: For any collection of exactly k points in R^{2}:
{
x_{1}
y_{1}
, ...,
x_{k}
y_{k}
}
there exists a polynomial f(x) = a_{k1}x^{k1} + ... + a_{1}x + a_{0} that fits those points exactly.
We know the above must be true because the columns of the following matrix in R^{k × k} must be linearly independent:
(x_{1})^{k1}
...
(x_{1})
1
⋮
⋮
⋮
(x_{k})^{k1}
...
(x_{k})
1
This means that the equation below must have a unique solution:
(x_{1})^{k1}
...
(x_{1})
1
⋮
⋮
⋮
(x_{k})^{k1}
...
(x_{k})
1
⋅
a_{k1}
⋮
a_{0}
=
y_{1}
⋮
y_{k}
Example: Just as with any vector space, we can set up and solve problems involving linear combinations of vectors. Suppose we want to find a
function in the space {f  f(x) = ax^{3} + bx^{2} + cx + d, a,b,c,d∈R} that fits a certain collection of points, such as:
{(1,3), (1,13), (2,1), (2,33)}
We can interpret each point as a pair (x, f(x)). We can then set up the following system of equations:
f(1)
=
3
f(1)
=
13
f(2)
=
1
f(2)
=
33
Expanding, we have:
a(1)^{3} + b(1)^{2} + c(1) + d
=
3
a(1)^{3} + b(1)^{2} + c(1) + d
=
13
a(2)^{3} + b(2)^{2} + c(2) + d
=
1
a(2)^{3} + b(2)^{2} + c(1) + d
=
33
Notice that we can rewrite this in the form of an equation Mv = w:
(1)^{3}
(1)^{2}
(1)
1
(1)^{3}
(1)^{2}
(1)
1
(2)^{3}
(2)^{2}
(2)
1
(2)^{3}
(2)^{2}
(2)
1
⋅
a
b
c
d
=
3
13
1
33
This shows us that we can interpret individual objects in {f  f(x) = ax^{3} + bx^{2} + cx + d, a,b,c,d∈R} as vectors in R^{4}.
We can also view this problem in terms of systems, system states, and observations. If the function f(x) = ax^{3} + bx^{2} + cx + d that solves
the above system is a state of the system, then (3, 13, 1, 33) is a partial observation of that state along four distinct "dimensions".
Note that finding the equation for a line that crosses two points is just a special case of the above.
Example: We want to find a function f(x) = cx + d such that the points (2,3) and (5,4) fall on the line the equation represents. Thus,
we have:
(2)
1
(5)
1
⋅
c
d
=
3
4
We can consider many spaces of functions beyond spaces of polynomial functions. For example, suppose exp(x) = e^{x}.
Then we can define the following vector space:
span{ exp, cos , sin }.
We can also generalize this approach to families of related functions.
Example: Consider the following set of vectors of functions, where f' denotes the derivative of a function f:
F = { (f, f, f')  f(x) = bx^{2} + cx + d }
The set F is a vector space. Now, suppose we have the following problem. Find the function for a parabola in F such that
(1,4) and (2,3) lie on the parabola, and the maximum or minimum point on the parabola is at x = 1. Such a function is
represented by the solution to the following equation:
(1)^{2}
(1)
1
(2)^{2}
(2)
1
2(1)
1
0
⋅
b
c
d
=
4
3
0
Note that the number of rows in the matrix (and result vector) corresponds to the number of data points for which
a model is being sought, and is not bounded.
Example: Find the order 5 polynomial that exactly fits the following points in R^{2}:
1
45
,
8
4
,
6
5
,
2
8
,
10
8
,
15
6
We must first compute the vectors that approximate each of the following curves that span our space of possible polynomials
at each of the x components of the points above:
Thus, we would have the following approximation of the above linear combination at the specific x values in { 1, 8, 6, 2, 10, 15 }:
a_{5}
f(1)
f(8)
f(6)
f(2)
f(10)
f(15)
+
a_{4}
g(1)
g(8)
g(6)
g(2)
g(10)
g(15)
+
a_{3}
h(1)
h(8)
h(6)
h(2)
h(10)
h(15)
+
a_{2}
i(1)
i(8)
i(6)
i(2)
i(10)
i(15)
+
a_{1}
j(1)
j(8)
j(6)
j(2)
j(10)
j(15)
+
a_{0}
l(1)
l(8)
l(6)
l(2)
l(10)
l(15)
=
45
4
5
8
8
6
We can convert the above into the following matrix equation:
f(1)
g(1)
h(1)
i(1),
j(1)
l(1)
f(8)
g(8)
h(8)
i(8),
j(8)
l(8)
f(6)
g(6)
h(6)
i(6),
j(6)
l(6)
f(2)
g(2)
h(2)
i(2),
j(2)
l(2)
f(10)
g(10)
h(10)
i(10)
j(10)
l(10)
f(15)
g(15)
h(15)
i(15)
j(15)
l(15)
⋅
a_{5}
a_{4}
a_{3}
a_{2}
a_{1}
a_{0}
=
45
4
5
8
8
6
It is then sufficient to solve the above equation to obtain the ceofficients, and thus the degree 5 polynomial that exactly fits the points
above.
The disadvantage to using the approach presented in this subsection to fitting functions to data points is that the data must
match an actual function perfectly (or, equivalently, the space of functions being considered must be rich
enough to contain a function that can fit the data points exactly). We will see further below
how to find functions that have the "best" fit (for one particular definition of "best") without necessarily matching the points exactly.
Example: Let polynomials in F = {f  f(t) = at^{2} + bt + c } represent a space of possible radio signals.
To send a vector v∈R^{3} to Bob, Alice sets her device to generate the signal corresponding to the polynomial in F whose coefficients
are represented by v.
Bob can then let his radio receiver sample the radio signals in his environment at multiple points in time t to retrieve the message.
Suppose Alice wants to transmit the following vector v∈R^{3} to Bob:
v
=
5
2
1
Alice sets her radio transmitter to generate a signal whose amplitude as a function of time is:
f(t)
=
5 t^{2}  2 t + 1
How can Bob recover the message Alice is sending?
Bob can recover the message by having his device sample the signal at three points, e.g., t_{1} = 1, t_{2} = 2, and t_{3} = 3.
Once Bob does this, he can set up an equation to recover the curve Alice used to generate the signal:
(1)^{2}
(1)
1
(2)^{2}
(2)
1
(3)^{2}
(3)
1
⋅ v
=
f(1)
f(2)
f(3)
1
1
1
4
2
1
9
3
1
⋅ v
=
4
17
40
The above equation can then be solved to obtain Alice's message v.
4.4.Assignment #4: Vector Spaces and Polynomials
In this assignment you will solve problems involving vector spaces and vector spaces of polynomials.
Please submit a single file a4.* containing your solutions. The file extension may be anything you choose;
please ask before submitting a file in an exotic or obscure file format (plain text or PDF are preferred).
You can use the following form to compute the rref using WolframAlpha:
Explain in detail why the following formula is true for any a∈R and b∈R if a ≠ 0 and b ≠ 0:
span {
a
b
,
a+1
b
}
=
span {
1
0
,
0
1
}
This can be done in multiple ways.
One way is to show that the two vectors on the lefthand side are linearly independent. Suppose
they are linearly dependent. Then there exists s∈R such that:
a
b
=
s ⋅
a+1
b
a
=
s ⋅ (a + 1)
s
=
a / (a + 1)
b
=
s ⋅ b
s
=
1
a / (a + 1)
=
1
s
≠
s
Since assuming that they are linearly dependent leads to the contradiction above, they must be
linearly independent.
Alternatively, we can show that the following matrix is invertible:
det
a
a + 1
b
b
=
a ⋅ b  b ⋅ (a+1)
=
b ( a  a  1)
=
b ⋅ (1)
b
≠
0
det
a
a + 1
b
b
≠
0
Thus, the span of its columns is all of R^{2}.
Assume addition and scalar multiplication of curves in R^{2} are defined in the usual way as follows:
f + g = h
where h(x) = f(x) + g(x)
s ⋅ f = h
where h(x) = s ⋅ f(x)
Find a basis of the following vector space:
{ f  f(x) = a ⋅ 2^{x+2} + b ⋅ 2^{x} + cx where a,b,c∈R }
The space of curves defined above can be defined as the set of linear combinations of the following three curves:
f(x)
=
2^{x+2}
g(x)
=
2^{x}
h(x)
=
x
However, {f, g, h} is not a basis because f and g are linearly dependent:
f
=
4 ⋅ g
f(x)
=
4 ⋅ g(x)
2^{x+2}
=
4 ⋅ 2^{x}
2^{x+2}
=
2^{2} ⋅ 2^{x}
Find an orthonormal basis of each of the following spaces. Please show your work.
The line L defined by:
L
=
{
x
y
 y = 3 x }
Since L is a line, dim L = 1, so it is sufficient to find a single unit vector that spans L.
Any vector on the line sufficies. If x = 1, then y = 3, so we can say:
L
=
span {
1
3
}

1
3

=
√(1^{2} + (3)^{2})
=
√(10)
Thus, the orthonormal basis for L is:
L
=
span {
1/√(10)
3/√(10)
}
The space V defined by:
V
=
span {
1
2
0
3
,
1
0
3
1
,
2
0
0
0
,
1
2
0
3
}
First, we note that the last vector is a linear combination of two other vectors:
1
2
0
3
=
2
0
0
0
−
1
2
0
3
None of the remaining three vectors are a linear combination of the other two, so
we have found a basis:
basis V
=
{
1
2
0
3
,
1
0
3
1
,
2
0
0
0
}
Alternatively, we could have found a basis by computing the following rref and keeping only
the nonzero rows as the basis:
rref
1
2
0
3
1
0
3
1
2
0
0
0
1
2
0
3
We can now apply the GramSchmidt process to find an orthonormal basis:
u_{1}
=
2
0
0
0
e_{1}
=
1
0
0
0
u_{2}
=
1
2
0
3
− (
1
2
0
3
⋅
1
0
0
0
) ⋅
1
0
0
0
=
0
2
0
3
e_{2}
=
0
2/√(13)
0
3/√(13)
u_{3}
=
1
0
3
1
− (
1
0
3
1
⋅
1
0
0
0
) ⋅
1
0
0
0
− (
1
0
3
1
⋅
0
2/√(13)
0
3/√(13)
) ⋅
0
2/√(13)
0
3/√(13)
=
0
0
3
1
−
0
6/13
0
9/13
=
0
−6/13
3
4/13
e_{3}
=
√(1573/169) ⋅
0
−6/13
3
4/13
The plane in R^{3} orthogonal to the vector v where:
v
=
1
0
2
We must first find a basis for the plane P, which we can define as:
P
=
{ w  w ⋅ v = 0 }
=
{
x
y
z
 1 ⋅ x + 0 ⋅ y + 2 ⋅ z = 0 }
To find one vector on the plane, we simply use the equation in the definition above:
x + 2 ⋅ z
=
0
x
=
−2 ⋅ z
x
=
−2 (we choose x as the independent variable and assign a value to it)
y
=
2 (since y can be anything, we choose something convenient)
z
=
1 (this is determined by the equation)
Next, we want to find another vector in the plane. Since we are looking for an orthonormal basis,
we look for a vector that is orthogonal to the one we already found above. Thus, we must solve
a system of two equations:
x
y
z
⋅
1
0
2
=
0
x
y
z
⋅
−2
2
1
=
0
x + 2 ⋅ z
=
0
−2 x + 2 y + z
=
0
x
=
2 (we choose x as the independent variable and assign a value to it)
y
=
5/2
z
=
−1
Thus, we now have two orthogonal vectors. It suffices to normalize both of them:
V
=
span {
−2
2
1
,
2
5/2
−1
}
=
span {
−2/3
2/3
1/3
,
4/(3 √(5))
5/(3√(5))
−2/(3/√(5))
}
For each of the following, find the exact polynomial (i.e., the natural number representing the degree of the polynomial and its real number coefficients).
The input box below takes as input a spaceseparated list of real numbers. Clicking evaluate outputs the result of computing
f(x) for each of the numbers x in the list.
Suppose that the following is true:
f
∈
{ f  f(x) = ax^{2} + bx + c where a,b,c∈R }
Find the coefficients of the polynomial f being computed by the box above. Explain how you obtained your result (write out the equations).
It is sufficient to sample the polynomial at three points and then to solve for its coefficients:
(0)^{2}
(0)
1
(1)^{2}
(1)
1
(2)^{2}
(2)
1
⋅
a
b
c
=
f(0)
f(1)
f(2)
=
20
19
12
a
b
c
=
3
2
20
Any valid method or tool can be used to solve the equation, but the equation must be specified.
Note that the sampled points may vary; this is fine as long as the coefficients match the points
that were being used to fit the curve (that is, solutions with different coefficients should be
accepted as long as the initial points and curve oefficients and mutually consistent).
The input box below takes as input a spaceseparated list of real numbers. Clicking evaluate outputs the result of computing
f(x) for each of the numbers x in the list. The only information available to you is that f is a polynomial; it has some degree k,
but that degree is unknown to you.
Find all the coefficients of the polynomial f being computed by the box above.
Explain how you obtained your result (justify your answer by appealing to properties of polynomials and write out the equations).
This problem can be approached by repeatedly falsifying hypotheses about the polynomial until a hypothesis
that cannot be falsified is reached.
First, suppose that the polynomial for the curve has degree k = 1 (i.e., it is of the form ax + b). Then
it should be possible to fit a line (a polynomial of the form ax + b) exactly to three points
on the curve. If this is not possible, then the curve's polynomial has degree greater than k = 1.
By following this process, we find that an exact solution exists for 6 points and k = 5.
Let polynomials in F = {f  f(t) = at^{2} + bt + c } represent a space of possible radio signals.
To send a vector v∈R^{3} to Bob, Alice sets her device to generate the signal corresponding to the polynomial in F whose coefficients
are represented by v. Bob then has his receiver sample the radio signals in his environment at multiple points in time t to retrieve the message.
Suppose Alice wants to transmit the following vector v∈R^{3} to Bob:
v
=
2
1
3
Alice sets her radio transmitter to generate a signal whose amplitude as a function of time is:
f(t)
=
2 t^{2} + t + 3
Suppose Bob wants to recover the message sent by Alice. At how many values of t should Bob sample the signal to
recover Alice's message? Write out the computation Bob would perform to recover the message v from Alice.
Bob would need to sample the signal at three (3) distinct points, e.g. t∈ {0,1,2}. Bob would then solve the following equation:
(0)^{2}
(0)
1
(1)^{2}
(1)
1
(2)^{2}
(2)
1
⋅
a
b
c
=
f(0)
f(1)
f(2)
0
0
1
1
1
1
4
2
1
⋅
a
b
c
=
3
2
−3
a
b
c
=
−2
1
3
Suppose Bob samples the signals at t∈ {1,2,3} and obtains the vectors listed below. What vector in R^{3} did Alice send? Show your work.
{
1
1
,
2
−3
,
3
−11
}
Bob would need to solve the following equation:
(1)^{2}
(1)
1
(2)^{2}
(2)
1
(3)^{2}
(3)
1
⋅
a
b
c
=
1
−3
−11
a
b
c
=
−2
2
1
The solution can be obtained by solving a system of equations, by computing the rref of an augmented matrix,
or any other valid method
Suppose the environment contains noise from other communication devices; the possible signals in this noise are always from the span
of the following polynomials:
g(t)
=
2 t − 2
h(t)
=
t^{2} + 3 t
Alice and Bob agree that Alice will only try to communicate to Bob one scalar r∈R at a time.
They agree on a unit vector u∈R^{3} ahead of time. Any time Alice wants to send some r∈R to Bob, she will
have her device generate the signal corresponding to the polynomial represented by
r ⋅ u.
If the vectors below represent the samples collected by Bob, what scalar was Alice sending to Bob?
{
1
3
,
2
9
,
3
13
}
To allow Bob to distinguish Alice's signal from the noise, Alice must choose a signal that is linearly independent
from the two noise signals. While Alice and Bob could theoretically agree on any linearly independent vector (as long
as it is linearly independent, a correct solution is possible), the computions are easiest if the vector is orthogonal
to the span of the two noise vectors.
The two noise polynomials can be represented as vectors in R^{3}, and the noise plane P can be defined as their span:
P
=
span {
0
2
−2
,
1
3
0
}
We want to find a unit vector orthogonal to the plane P. It is sufficient to find a solution to the following system of
equations:
0
2
−2
⋅
x
y
z
=
0
1
3
0
⋅
x
y
z
=
0

x
y
z

=
1
Solving the above yields the vector:
u
=
(1/√(11)) ⋅
−3
1
1
Next, we must determine the coefficients of the curve Bob observed.
This curve is a linear combination of the
noise polynomials and Alice's polynomial (Alice's polynoial is represented by u).
(1)^{2}
(1)
1
(2)^{2}
(2)
1
(3)^{2}
(3)
1
⋅
a
b
c
=
3
9
13
a
b
c
=
−1
9
−5
Finally, we must remove the contribution of the noise vectors from the observed vector in order
to isolate Alice's signal. There are two ways to approach this (they are equivalent).
One approach is to find the orthogonal projection of the observed vector onto the span of the noise vectors,
and then to subtract this projection from the observed vector. Another approach is to project the observed
vector directly onto span {u}; we use the latter.
(
−1
9
−5
⋅ (1/√(11)) ⋅
−3
1
1
) ⋅ (1/√(11)) ⋅
−3
1
1
=
−21/11
7/11
7/11
Since u is a unit vector, the above vector's length represents the scalar by which u was multiplied,
so its length is the scalar s in s ⋅ u.
s
=

−21/11
7/11
7/11

=
√(539)/11
Suppose there is no noise in the environment. Alice, Bob, and Carol want to send individual scalars to one another at the same time:
all three would be sending a signal simultaneously, and all three would be listening at the same time:
Alice's device would generate a signal corresponding to a scalar multiple of f(x) = 2 x + 1;
Bob's device would generate a signal corresponding to a scalar multiple of g(x) = x^{2} + 1;
Carol's device would generate a signal corresponding to a scalar multiple of h(x) = x^{2}  3 x.
Suppose you sample the radio signals at a given moment and obtain the vectors below. Determine which scalars
Alice, Bob, and Carol are each transmitting. Your answer can be in the form of a vector, but you must specify which scalar corresponds
to which sender.
{
0
5
,
1
7
,
2
7
}
We can set up the following equation, where a, b, and c are the scalars being transmitted by
Alice, Bob, and Carol, respectively:
a ⋅
f(0)
f(1)
f(2)
+ b ⋅
g(0)
g(1)
g(2)
+ b ⋅
h(0)
h(1)
h(2)
=
5
7
7
f(0)
g(0)
h(0)
f(1)
g(1)
h(1)
f(2)
g(2)
h(2)
⋅
a
b
c
=
5
7
7
1
1
0
3
0
−2
5
−3
−2
⋅
a
b
c
=
5
7
7
Using any valid method for solving the equation above, we get:
a
b
c
=
3
2
1
An alternative approach is to first determine the polynomial being observed by setting up an equation,
finding the exact curve that fits the three points provided, and then setting up a second matrix equation
to determine what linear combination of the three signal polynomials adds up to the curve that fits the
three points.
4.5. Basis, dimension, and orthonormal basis of a vector space
We have adopted span{v_{1},...,v_{n}} as our notation for vector spaces. However, this makes it possible to represent a given vector space
in a variety of ways:
span{(0,1),(1,0)} = span{(0,1),(1,0),(1,0)} = span{(0,1),(1,0),(1,0),(1,0)}, and so on.
We might naturally ask: given
a vector space V, what is the smallest possible size for a finite set of vectors W such that V = span W?
Definition: A finite set of vectors B is a basis of the vector space V if span B = V and for all finite
sets of vectors W such that span W = V, B ≤ W.
Fact: A finite set of vectors v_{1},...,v_{n} is a basis for V iff span {v_{1},...,v_{n}} = V and the vectors v_{1},...,v_{n}
are setwise linearly independent.
Fact: Given a finite set of vectors v_{1},...,v_{n}, we can find the basis for span {v_{1},...,v_{n}} by creating a matrix M in which the
rows are the vectors v_{1},...,v_{n} and computing rref M. The set of distinct nonzero rows in rref M is a basis for span {v_{1},...,v_{n}}.
Example: We can find the basis of V = span {[3;0;7;0], [9;0;21;0], [0;0;2;8], [0;0;1;4], [3;0;13;4], [0;0;1;0]} by computing
the reduced row echelon form of the following matrix. We can use a thirdparty tool to do so (e.g.,
WolframAlpha).
M
=
3
0
7
0
9
0
21
0
0
0
2
8
0
0
1
4
3
0
13
4
0
0
1
0
rref M
=
1
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
The rows of the matrix rref M will correspond to the vectors in a basis for V. Thus, we have that:
V
=
span {
3
0
7
0
,
9
0
21
0
,
0
0
2
8
,
0
0
1
4
,
3
0
13
4
,
0
0
1
0
}
=
span {
1
0
0
0
,
0
0
1
0
,
0
0
0
1
}
Definition: The dimension of a vector space V, which we write as dim V, is the size of any basis of V (recall that every basis
is the same size as every other basis, since they all must have the smallest size of any set of vectors that spans V).
Example: What is the dimension of {f  f(x) = ax^{3} + bx^{2} + cx + d, a,b,c,d∈R}?
The dimension of a space is the size of its basis.
A basis must consist of vectors that are linearly independent from one another, and it must span the space. We know that the following set B
spans the space:
B
=
{ f, g, h, l } where
f(x)
=
x^{3}
g(x)
=
x^{2}
h(x)
=
x
l(x)
=
1
We also know that the curves f, g, h, and l are linearly independent. Thus, B is a basis. Since B = 4 and span B = V, then we
have that:
dim V
=
B
=
4
Example: Consider the vector space V consisting of finite sets of real numbers:
the empty set is the unique additive identity: ∅
addition (+) is defined as follows: for any two objects in our vectors space P∈V and T∈V, we have that
P + T = Q where Q = (P ∪ T)  (P ∩ T)
scalar multiplication is defined as:
s ⋅ T = Q where Q = {s ⋅ r  r∈T}
What are the basis and dimension of this vector space?
The basis of the space is B = { {1} } since you can obtain any finite set of real numbers
by taking linear combinations of the set {1} ∈V. Since span { {1} } = V and { {1} } = 1, dim V = 1.
Definition: A finite set of vectors B is an orthonormal basis of the vector space V if B is a basis of V,
all the vectors in B are unit vectors, and the vectors in B are setwise orthogonal.
Recall that for any vector v∈R^{n} and any unit vector u∈R^{n}, (v ⋅ u) ⋅ u is the projection of v onto the line parallel to
u (i.e., the vector space {au  a∈R}). We can use this fact to define a process for turning an arbitrary basis into an orthonormal basis.
Fact: Given a basis B = {v_{1},...,v_{n}}, it is possible to turn it into an orthonormal basis {e_{1},...,e_{n}} by using the GramSchmidt process. This
algorithm can be summarized as follows.
Intuitively, for each vector v_{i}∈B, the algorithm substracts out the contributions of the alreadycomputed orthogonal unit vectors from
v_{i}, obtaining u_{i}, and then rescales u_{i} to make it into a unit vector e_{i}.
At the end of the above process, {e_{1},...,e_{n}} is the orthonormal basis (and {e_{1},...,e_{n}} is a basis of orthogonal vectors that are not
necessarily unit vectors).
Many computational environments for mathematics that support linear algebra provide an operator for turning a basis into an orthonormal basis.
Why is an orthonormal basis useful? Once again, we appeal to the fact that (v ⋅ u) ⋅ u is the projection of v onto u, where
v ⋅ u is the length of that projection. Suppose we want to determine how to express an arbitrary vector v using an orthonormal
basis u_{1},...,u_{n}.
Fact: Given a vector v∈R^{n}, and an orthonormal basis B = {u_{1},...,u_{n}} for R^{n}, we can determine what linear combination of the vectors
u_{1},...,u_{n} yields v by computing:
a = (M_{B})^{⊤} ⋅ v.
Notice that the components of the vector a, which we will call a_{1},...,a_{n}, are the scalars needed to compute v as a
linear combination of the vectors u_{1},...,u_{n}:
v = a_{1}u_{1} + ... + a_{n}u_{n} = M_{B}a.
Fact: Given a vector v∈R^{n}, some k < n, and an orthonormal basis B = {u_{1},...,u_{k}} for a subspace W ⊂ R^{n},
the product (M_{B} ⋅ (M_{B})^{⊤} ⋅ v) is the projection of v onto W.
Example: Notice the relationship of the above fact to projection onto the span of a single unit vector u = (x,y) in R^{2}.
Suppose we have a basis B = {u} for a subspace {s ⋅ u  s∈R^{2}} ⊂ R^{2}. Then we have:
M_{B}
=
x
y
Thus, for an arbitrary v∈R^{2} not necessarily in {s ⋅ (x,y)  s∈R^{2}} we have:
M_{B} ⋅ (M_{B})^{⊤} ⋅ v
=
x
y
⋅
x
y
^{⊤} ⋅ v
=
x
y
⋅ (
x
y
⋅ v )
=
(
x
y
⋅ v ) ⋅
x
y
=
( v ⋅
x
y
) ⋅
x
y
=
( v ⋅ u ) ⋅ u
Thus, we see that the projection of a vector onto another unit vector in R^{2} is just a special case of the general approach to projection.
Example: Suppose we want to project the vector v∈R^{2} onto the space W ⊂ R^{2} where:
v
=
2
4
W
=
span {
1/√(2)
1/√(2)
}
B
=
{
1/√(2)
1/√(2)
}
We compute the matrix M_{B} and its transpose. M_{B} is a 2 × 1 matrix in this case, and (M_{B})^{⊤} is a 1 × 2 matrix:
M_{B}
=
1/√(2)
1/√(2)
(M_{B})^{⊤}
=
1/√(2)
1/√(2)
We can now apply the formula. Notice that it is equivalent to the formula for projection of v onto the single vector that spans W:
M_{B} ⋅ (M_{B})^{⊤} ⋅ v
=
1/√(2)
1/√(2)
⋅
1/√(2)
1/√(2)
⋅
2
4
=
1/√(2)
1/√(2)
⋅ (
1/√(2)
1/√(2)
⋅
2
4
)
=
1/√(2)
1/√(2)
⋅ (6/√(2))
=
3
3
Thus, the projection of v onto W is [3; 3].
Example: Suppose we want to project the vector v∈R^{3} onto the space W ⊂ R^{3} where:
v
=
2
4
5
W
=
span {
1/√(2)
1/√(2)
0
,
0
0
1
}
B
=
{
1/√(2)
1/√(2)
0
,
0
0
1
}
We compute the matrix M_{B} and its transpose. M_{B} is a 3 × 2 matrix in this case, and (M_{B})^{⊤} is a 2 × 3 matrix:
M_{B}
=
1/√(2)
0
1/√(2)
0
0
1
(M_{B})^{⊤}
=
1/√(2)
1/√(2)
0
0
0
1
We can now apply the formula:
M_{B} ⋅ (M_{B})^{⊤} ⋅ v
=
1/√(2)
0
1/√(2)
0
0
1
⋅
1/√(2)
1/√(2)
0
0
0
1
⋅
2
4
5
=
1/√(2)
0
1/√(2)
0
0
1
⋅
6/√(2)
5
=
3
3
5
Thus, the projection of v onto W is [3; 3; 5].
Example: Suppose we want to project the vector v∈R^{3} onto the space W ⊂ R^{3} where:
v
=
3
2
4
W
=
span {
1/√(2)
0
1/√(2)
,
0
1
0
}
What is the orthogonal projection of v onto W?
Since the two vectors that span W are already an orthonormal basis, we can simply project v onto the two individual vectors in the
orthonormal basis and take the sum to obtain the orthogonal projection of v onto W: