[link] Introduction, Background, and Motivation
[link] Overview
The physical laws in our universe appear to be predictable (i.e., they are consistent across time and space, and under similar circumstances, we can expect matter and energy to behave in similar ways). We can exploit this to make models and predictions about how events might unfold at a certain time and place in the universe, and we can test these predictions (we might call this the scientific method).
We can also exploit the predictability of physical laws by identifying what conditions match a particular goal we may wan to bring about, or task that we may want to accomplish, and then setting up initial conditions (such as inside a mechanical, electronic, chemical, biological, and/or computational device) that will eventually lead to those circumstances.
Many physical processes that currently understand and use to accomplish our tasks are based on only a small collection of building blocks (typically, these are collections of rules that are consistently followed by systems). Just as humans have developed symbolic systems (i.e., natural languages such as English) to name, describe, and think about day-to-day objects and processes in their environment, communities of people within the mathematics, applied mathematics, computer science, and related disciplines have developed symbolic systems for describing the simple building blocks according to which phenomena in our physical environment (including everything from mechanical devices, to physical computers, to software) seem to operate.
In this course, we will study a particular mathematical language that is well-suited for describing certain structures (as well as their properties and behaviors) that are routinely and widely employed within mathematical and computational modelling, data representation and storage, data processing, and other software applications. These structures include logical formulas, proofs, sets, trees, and graphs. We will also study the combinatorial properties of these structures, including how to define them recursively, how to quantify and compute their size, as well as how to categorize, count, and measure the ways in which they can be constructed, combined, and explored. We will become familiar with induction and recursive and self-referential definitions for these structures and their properties. We will also look at how these structures and associated techniques can be used in some practical applications.
[link] Expected background, prerequisites, and course format
In covering the material for this course, we will use the standard language and conventions for discussing mathematical structures that have been developed by the mathematics and computer science communities. You should be familiar with high school level algebra and calculus (including mathematical functions of a single variable, e.g., f(x) = x2). We will introduce the terminology and concepts associated with boolean arithmetic, logic, set theory, graph theory, and combinatorics as part of the course material, but you should have some familiarity with logical operators (i.e., "and", "or", and "not"), at least within the context of a programming language, and the mathematical notion of an abstract set that contains zero or more elements.
The homework assignments in this course will involve the assembly of programs in the Python programming language, as well as the manual assembly of solutions to mathematical problems. A basic familiarity with programming is expected, but no previous knowledge of the Python programming language is expected. Relevant features of the Python programming language will be introduced as needed within the course material.
[link] Boolean Arithmetic, Boolean Algebra, and Logical Formulas
In this section, we will introduce the fundamental building blocks of boolean arithmetic and logic, including the symbolic language used to represent logical formulas and proofs. Simultaneously, we will employ the modern programming language Python, together with some specialized libraries, to assemble and work with logical formulas and proofs.
As with most human languages that have developed organically over time, mathematics has a rich and often redundant vocabulary. We introduce terminology in this section that we will use consistently in this course. However, keep in mind that there are often other synonyms within mathematics and computer science for the concepts we encounter.
Those who may wish to review and practice the material in this section in the form of a board game
can be accommodated.
[link] Boolean formulas and boolean arithmetic
Definition:
A
boolean formula or
formula is a string of symbols that follow a certain syntax. If the formula is written using a correct syntax, we say it is a
well-formed formula (or
WFF). The symbols ∨ (which represents the binary
or operation, also called
disjunction), ∧ (which represents the binary
and operation, also called
conjunction) , ¬ (which represents the
not operation, also called
negation),
⇒ (which represents the
implies relationship), and
⇔ (which represents the
if and only if or
iff relationship) are
boolean operators.
boolean formula |
conditions |
⊤ |
always a well-formed formula |
⊥ |
always a well-formed formula |
f1 ∨ f2 |
is a formula if f1 and f2 are both well-formed formulas |
f1 ∧ f2 |
is a formula if f1 and f2 are both well-formed formulas |
¬ f |
is a formula if f is a well-formed formula |
f1 ⊕ f2 |
is a formula if f1 and f2 are both well-formed formulas |
f1 ⇒ f2 |
is a formula if f1 and f2 are both well-formed formulas |
f1 ⇔ f2 |
is a formula if f1 and f2 are both well-formed formulas |
( f ) |
is a formula if f is a well-formed formula |
Notice that we can "build up" formulas from the basic building blocks ⊤, ⊥, and by combining existing formulas using the operators ∨, ∧, ¬, ⊕, ⇒, and ⇔. We can say that formulas are defined in a self-referential way, or inductively, because a formula is built up of other, smaller formulas. We will introduce a more formal definition of induction later, but it is accurate to say that the definition for well-formed formulas is inductive.
In the same way that we can assign meaning to sequences of symbols such as "sky", "tree", or "the girl threw the frisbee" by pointing to the physical objects and events they represent, we can assign meaning to each sequence of symbols that happens to be a well-formed formula.
Definition:
There are only two possible meanings for a well-formed formula:
true and
false. The following table specifies how we can determine the meaning of a given well-formed formula.
boolean formula |
meaning (true or false) |
example using the representation of the Python programming language |
⊤ |
always true |
True |
⊥ |
always false |
False
|
f1 ∨ f2 |
true if f1 or f2 (or both) are true |
True or (False and True) |
f1 ∧ f2 |
only true if both f1 and f2 are true |
True and False |
¬ f |
true if f is false |
not (True or (False and True)) |
f1 ⊕ f2 |
true if f1 or f2, but not both, are true |
True != False |
f1 ⇒ f2 |
- if f1 is true, then f2 must be true
- f1 is false, or f2 is true
- f1 is "less than or equal to" f2
(if ⊥ is 0 and ⊤ is 1)
|
True <= False |
f1 ⇔ f2 |
- f1 is true if and only if f2 is true
- f1 and f2 are either
both true or both false
|
True == False |
( f ) |
true if f is true |
(True and False) |
Notice that like the definition of well-formed formulas, the process for assigning a meaning to a formula is self-referential, or recursive: to determine the meaning of a formula, we determine the meanings of the smaller formulas from which is it built and combine the results. We will introduce a more formal definition of recursion later, but it is accurate to say that the process of assigning a meaning to a formula is recursive.
Definition:
The process of converting a well-formed formula to its meaning is often called evaluation (especially within the context of a programming language), because the sequence of symbols that comprises the formula is converted into one of two possible values: true or false.
It is worth noting that the terms inductive and recursive are often used interchangeably to refer to self-referential definitions, whether the definitions specify a method for "building up" a self-referential structure, or an algorithm for "breaking down" a self-referential structure.
Example:
We can determine the meaning of a formula by working our way up from the innermost subformulas, replacing each subformula with its meaning. This process is guaranteed to terminate and produce the meaning of the formula. For example:
Example:
For each of the following formulas, determine its meaning.
-
⊤ ∨ ⊥
-
⊤ ∧ ⊤ ∧ ⊤
-
⊤ ⊕ ⊤
-
⊥
-
(⊤ ⇒ ⊤) ∧ (⊥ ⇒ ⊤)
-
(⊤ ∨ ⊥) ⇔ ⊤
-
(⊤ ∨ ⊥) ⇔ (⊤ ⊕ ⊤)
-
⊤ ∧ (¬ ⊤)
[link] Boolean algebra
Algebra of the integers and real numbers is often introduced as a method for solving equations with one or more unknown variables (in fact, this was historically the
purpose for which some of the techniques we now call algebra were developed). Equations are either true or false; they are, in fact, formulas. Solving an equation over the integers, such as the one below, usually involves finding a
value with which
x can be replaced so that the formula is
true.
In the above example, if
x is replaced with the integer 3, the equation is true. If we instead only allow our variables to have the values ⊤ and ⊥, we can ask the same question about any formula that contains a variable.
Definition:
We can extend our definition of a well-formed boolean formula to also allow variables:
boolean formula |
conditions |
⋮ |
⋮ |
x |
a well-formed formula if x is a variable name |
Sometimes, we will want to write very long variables names; in those cases, we will write the variable in
underlined italics. For example, a formula with two variables
the sun is visible and
it is daytime might look as follows:
the sun is visible ⇒ it is daytime
|
|
Example:
For each of the following formulas, determine what combinations of values for
x and
y ensure that the formula is true (in some cases, there may be no solution for
x and
y that would make the formula true).
-
x ∨ ⊥
-
y ∧ x ∧ ⊤
-
⊤ ⊕ y
-
⊥
-
(x ⇒ y) ∧ (y ⇒ x)
-
x ⇔ y
-
x ⇔ x
-
x ∧ (¬ x)
Example:
Suppose we want to find what values for
x and
y make the following formula
false:
Note that the
if and only if operator
⇔ behaves just like equality; thus, we could instead find the values for
x and
y that make the following "equation"
true:
Equivalently, we could find the values for
x and
y that make the logical negation of the original formula true:
Definition:
When we assign some value, such as ⊤, to a variable
x, we say that
the value ⊤ is assigned to x or that
x maps to ⊤, and we denote this using mathematical notation as:
Definition:
Suppose we have a mapping that assigns multiple variables to values, such as the following:
{x1 ↦ ⊤, x2 ↦ ⊥, x3 ↦ ⊥, ..., xk ↦ ⊤}
|
|
We call such a mapping an
assignment or a
model.
Definition:
Suppose that a formula
f contains some variables
x1, ...,
xk. If we can find some
assignment m of the values ⊥ and ⊤ to each of the variables such that substituting each variable in the formula with its assigned value results in a formula with the meaning
true, we say that this particular assignment of values to
x1, ...,
xk solves or
satisfies the formula
f. We denote this fact using the following notation:
Example:
Consider the formula
x ∨ (
y ∧
z) and the assignment {
x ↦ ⊤,
y ↦ ⊥,
z ↦ ⊥}. Replacing
x with ⊤,
y with ⊥, and
z with ⊥ in
x ∨ (
y ∧
z) results in the formula:
Since the above formula has the meaning
true, we can say that {
x ↦ ⊤,
y ↦ ⊥,
z ↦ ⊥}
satisfies x ∨ (
y ∧
z), which we denote:
{x ↦ ⊤, y ↦ ⊥, z ↦ ⊥} ⊨ x ∨ (y ∧ z)
|
|
When equations involving logical formulas are used to model possible facts about some system, the variables will often represent some state or property of the system.
Example:
Suppose we have the following formula involving the variables
the sun is visible and
it is daytime:
the sun is visible ⇒ it is daytime
|
|
This formula might describe a property of our real-world experience of a person that is in a particular fixed location on the surface of the Earth. We could state that the above formula is
always true (i.e., it is always an accurate description of the system it describes). For every possible assignment of values to each variable, the above formula is indeed accurate, in that it is true exactly in those situations that might occur on Earth, and false in any situation that cannot occur:
the sun is visible |
it is daytime |
meaning |
interpretation |
⊤ |
⊤ |
true |
a sunny day |
⊤ |
⊥ |
false |
|
⊥ |
⊤ |
true |
a cloudy day |
⊥ |
⊥ |
true |
nighttime |
In particular, only one set of values causes the formula to be false: if the sun is in the sky, but it is not daytime. This is indeed impossible; all the others are possible (it may be day or night, or it may be cloudy during the day). In other words, the
assignments that represent a sunny day, a cloudy day, and nighttime all
satisfy the formula
the sun is visible ⇒ it is daytime. For example:
{the sun is visible ↦ ⊤, it is daytime ↦ ⊤} ⊨ the sun is visible ⇒ it is daytime
|
|
One way to find all the solutions for a boolean formula with one or more variables is to build a truth table that enumerates every possible combination of values that can be assigned to the variables, and then shows how the formula (and possibly its subformulas) evaluate under that assignment.
Example:
Suppose we want to solve the following formula:
We can build a truth table for the above formula.
x |
y |
z |
z ∨ x |
y ⊕ z |
(z ∨ x) ⇔ (y ⊕ z) |
⊤ | ⊤ | ⊤ | ⊤ | ⊥ | ⊥ |
⊤ | ⊤ | ⊥ | ⊤ | ⊤ | ⊤ |
⊤ | ⊥ | ⊤ | ⊤ | ⊤ | ⊤ |
⊤ | ⊥ | ⊥ | ⊤ | ⊥ | ⊥ |
⊥ | ⊤ | ⊤ | ⊤ | ⊥ | ⊥ |
⊥ | ⊤ | ⊥ | ⊥ | ⊤ | ⊥ |
⊥ | ⊥ | ⊤ | ⊤ | ⊤ | ⊤ |
⊥ | ⊥ | ⊥ | ⊥ | ⊥ | ⊤ |
Thus, there are four possible solutions for the formula (
z ∨
x)
⇔ (
y ⊕
z). For example, the assignment
x ↦ ⊤,
y ↦ ⊤, and
z ↦ ⊥ is one possible solution.
Just as operators on the real numbers or integers have algebraic properties (such as
x +
y =
y +
x, or 2 ⋅
x =
x +
x, and so on), boolean operators also have algebraic properties that can be derived from their
meaning. In fact, every formula that contains variables and is true for every possible assignment of values to those variables is technically an algebraic property. However, we will list several common algebraic properties.
Fact:
We define ≡ to be equivalence of formulas: if f ≡ g for two formulas f and g, this means that f and g have the same meaning (i.e., under any given assignment of variables to values, both formulas are either true or false, but never different; in other words, their truth tables are identical).
Note that we do not need ≡, since ⇔ would suffice for a presentation of the properties below. However, we introduce ≡ to emphasize that we are defining algebraic properties that govern formulas which may themselves contain ⇔.
Fact:
The ∨, ∧, and ¬ operators have some familiar algebraic properties:
| ≡ | | | |
| ≡ | | | |
| ≡ | | | |
| ≡ | | | |
| ≡ | | | |
| ≡ | | | |
| ≡ | | | (distributivity of ∧ across ∨) |
|
| ≡ | | | |
Furthermore, ∨ and ∧ can be expressed in terms of one another with the help of ¬:
Using the above, we can define ∧ in terms of ∨ by negating both sides:
Alternatively, we could define ∨ in terms of ∧ by negating
x and
y (e.g., negating
x everywhere should not change anything, because the formula is true for
any value of
x, whether it is ⊤ or ⊥):
Example:
If an algebraic property holds, this means that the final rows of the truth tables for the two formulas are exactly the same. For example, consider the following algebraic law:
If we build truth tables for both formulas, we see that the final row in both tables is the same:
x |
y |
x ∧ y |
¬(x ∧ y) |
⊤ | ⊤ | ⊤ | ⊥ |
⊤ | ⊥ | ⊥ | ⊤ |
⊥ | ⊤ | ⊥ | ⊤ |
⊥ | ⊥ | ⊥ | ⊤ |
x |
y |
¬(x) |
¬(y) |
¬(x) ∨ ¬(y) |
⊤ | ⊤ | ⊥ | ⊥ | ⊥ |
⊤ | ⊥ | ⊥ | ⊤ | ⊤ |
⊥ | ⊤ | ⊤ | ⊥ | ⊤ |
⊥ | ⊥ | ⊤ | ⊤ | ⊤ |
Fact:
The ⊕,
⇒, and
⇔ operators have many algebraic properties. We present a few important properties. In particular, these operators can be defined using ∨, ∧, and ¬:
Furthermore,
⇔ can be defined in terms of
⇒, and
⇔ is an equivalence relation:
| ≡ | |
| ≡ | | | |
| ≡ | | | |
(x ⇔ y) ∧ (y ⇔ z) ⇒ (x ⇔ z) | |
| ≡ | | | |
|
One important property of ⊕ is that taking ⊕ of a formula with itself always yields ⊥:
Since ⊕, ⇔, and ⇔ can all be defined in terms of ∨, ∧, and ¬, we could actually eliminate them completely without affected the power of our boolean formula language. Since ∨ and ∧ are also related, we could also eliminate one of these. The fact that ∧ and ¬ would be sufficient to express all possible logical formulas makes ¬ and ∧ functionally complete (i.e., capable of expressing all possible truth tables on any number of variables).
In fact, we can go even further; the following single boolean operator
↑ (also called the
Sheffer stroke, or NAND) is functionally complete all by itself, because it can be used to define ∧ and ¬:
There are other such operators, such as
NOR.
Note on operator precedence.
Because different communities of mathematicians and computer scientists sometimes use different precedence conventions for logical operators, in this course we will include parentheses in formulas where the order of operations might otherwise be ambiguous. One common convention is that ¬ has highest precedence, followed by ∧ (since it can corresponding to multiplication when ⊤ is represented with 1 and ⊥ is represented with 0), then by ∨ (since it can corresponding to addition), then by ⇒ (since it is often used to organize relationships and dependencies between "simpler" formulas involving operations with higher precedence), and then by ⊕.
Example:
Using the programming language Python, we can represent a formula with variables as a function that takes boolean arguments and returns a boolean result. For example, suppose we have the following formula:
We could represent this as the following Python function:
def f(x, y, z):
return x and (y or (not(z)))
We can then evaluate the above for a given assignment by calling the function f
on some arguments. For example, if we want to evaluate the formula under the assignment {x ↦ ⊤, y ↦ ⊥, z ↦ ⊥}, we can make the following function call:
>>> f(True, False, False)
True
We can also use the Python unpacking operators to explicitly represent our assignment as a Python list or dictionary. For example:
>>> m = [True, False, False]
>>> f(*m)
True
>>> m = {'x': True, 'y': False, 'z': False}
>>> f(**m)
True
[link] Boolean satisfiability, recursive backtracking, and applications
Finding an assignment or model that satisfies a boolean formula is a fundamental and ubiquitous problem in computer science. An immense variety of different real-world and theoretical constructs, systems, devices, concepts, and problems can be represented as boolean formulas (where the solutions to the formula might provide a solution to the problem, or might answer some question about the properties of whatever is being represented). In this section we define two versions of this problem, and describe one standard algorithmic approach for solving the problem.
Definition:
We define an instance of the
boolean satisfiability problem as follows: given a formula
f, find an assignment (or
model)
m such that
m satisfies
f:
Definition:
We define an instance of the maximum boolean satisfiability problem as follows: given a formula f, find an assignment (or model) m such that m satisfies f, and such that m has at least as many variables assigned to ⊤ as any other model that satisfies f.
In other words, if we list all the models that satisfy f, we want the one that has the most variables mapped to ⊤ (or at least as many as any other, since more than one model may map the same number of variables to ⊤).
Unfortunately, unless something is known about the particular structure of the formula f, the problem of finding a model that satisfies a formula cannot be solved efficiently. Essentially, the best known algorithm is as fast as any algorithm that builds the entire truth table for the formula.
One standard way to implement an algorithm that exhaustively tries all possible combinations of assignments to a set of variables is to use recursive backtracking.
Algorithm (recursive backtracking):
The
recursive backtracking algorithm is a way to exhaustively try all possible solution to a problem by nesting an arbitrarily large number of loops inside each other using recursion. The following pseudocode describes a generic version of the recursive backtracking algorithm for some problem
f for which we are seeking a candidate solution
s.
- inputs: some function f, and some partial candidate solution s
- if s is a full-size candidate solution
- if s solves the problem f
- return the result s
- otherwise
- return nothing
- otherwise s is not a full-size solution and must be extended
- for every possible way that s can be extended to some s' that is closer to a full-size solution
- call this algorithm recursively on f and s'
Example:
A simple example is the problem of opening a safe. Suppose that a safe combination consists of some sequence of the letters a
, b
, and c
. We do not know the length of the sequence, or what sequence will unlock the safe. Suppose that there is a function that checks if the sequence is correct and unlocks the safe (we do not know how it works, but we know its name and that if it unlocks the safe, it returns True
):
def unlock(password):
if password == 'abcbac':
return True
else:
return False
We want to write a Python function that tries passing every possible string containing a
, b
, and c
to unlock()
. If we know the length of the safe combination (and if the length were short, e.g., three letters), then we could simply use nested for
loops to try all the combinations and print the combination that unlocks the safe:
def three():
for x in {'a', 'b', 'c'}:
for y in {'a', 'b', 'c'}:
for z in {'a', 'b', 'c'}:
if unlock(x + y + z):
print(x + y + z)
However, if we do not know the length of the sequence, we might want to make an arbitrary hierarchy of nested for
loops to find the password:
def any():
for x1 in {'a', 'b', 'c'}:
for x2 in {'a', 'b', 'c'}:
for xN in {'a', 'b', 'c'}:
if unlock(x1 + x2 + ... + xN + ...):
print(x1 + x2 + ... + xN + ...)
Of course, it is not possible to write the above explicitly. However, by using recursion, we can write a function that behaves exactly in this way: as an arbitrarily nested hierarchy of for
loops. First, suppose we simply want to print all the combinations. We could do the following:
def any(guess = ""):
print(guess)
for c in {'a', 'b', 'c'}:
newGuess = guess + c
any(newGuess)
The above algorithm actually keeps trying combinations of all lengths. If we run the above, we'll see that it keeps printing combinations as they get longer and longer. How can we make this process stop at a certain length of password? We would need to check the length of our guess and not make a recursive call if the guess is too long. We can do this by using an if
statement:
def any(guess = ""):
if len(guess) <= 4:
print(guess)
for c in {'a', 'b', 'c'}:
newGuess = guess + c
any(newGuess)
Suppose we only want to print the guess if it actually unlocks the safe. We can do so in the following way:
def any(guess = ""):
if unlock(guess):
print(guess)
else:
for c in {'a', 'b', 'c'}:
newGuess = guess + c
any(newGuess)
Suppose that you knew the length of the combination would be supplied as an integer argument to the function. How would you modify the above to actually return the correct guess instead of just printing it? One way to do the latter is presented below.
def any(guess = ""):
if unlock(guess):
return guess
else:
for c in {'a', 'b', 'c'}:
newGuess = guess + c
result = any(newGuess)
if result != None:
return result
How could you modify the above to only try combinations of a specific length, and nothing longer?
Example:
As a simpler example, suppose a password consists of a string of some length containing only the letter a
, but we do not know its length.
def unlock(password):
if password == "aaaaa":
return True
else:
return False
We can write a recursive function that tries every possible length, and returns the working password once it is found:
def find(guess = ""):
if unlock(guess):
return guess
else:
return find(guess + "a")
Notice that the above function find()
will recursively call itself and "build up" its parameter guess
until it is long enough to match the true password.
Now, suppose we are worried that the password might contain letters other than a
, in which case the above function will keep calling itself recursively forever. We may want to set an upper bound on how long the guess
can become, and then allow the function to fail gracefully by returning None
:
def find(guess = ""):
if unlock(guess):
return guess
elif len(guess) > 100:
return None
else:
return find(guess + "a")
Example:
Suppose we want to write a recursive algorithm that returns the largest integer in a list. We can do so by first splitting the list two two halves, and then finding the largest element in each half. We can then take the larger of these two results. This is guaranteed to give us the largest element in the list.
def largest(xs):
if len(xs) == 1:
return xs[0]
else:
leftHalf = xs[:len(xs)/2]
rightHalf = xs[len(xs)/2:]
return max(largest(leftHalf), largest(rightHalf))
Many real-world situations and devices, or systems, can be modeled mathematically by identifying a set of discrete system states in which that system can be (e.g., a lamp can either be "on" or "off").
In many cases, it is possible to model the various interesting, discrete properties of the system as variables (e.g., the variable the lamp is on can be used to keep track of whether the lamp is actually on or off by assigning the values ⊤ and ⊥ to it, respectively). Then, a given state can be modeled using assignments (or models) of values to variables (e.g., the model {the lamp is on ↦ ⊤} is a model of a system state in which the lamp is on, and {the lamp is on ↦ ⊤} is a model of the system state in which the lamp is off).
If a system can be described using a set of models of possible system states, we can then use formulas to describe logical constraints over (and relationships between) the various parts of the system. For example, we can then say that the only "allowed" or "possible" system states are those that satisfy a particular formula.
Example:
Suppose we want to model a system consisting of a single light switch that can either be in an "on" position or an "off" position, and a light controlled by the switch, which may be either "on" or "off". We can introduce two predicates for describing this system state: switch and light, where the first represents the state of the switch, and the second represents the state of the light (in both cases, ⊤ is "on" and ⊥ is "off").
While there are 2
2 = 4 possible ways to assign values to these two variables, not all would correspond to a possible state of the system if the light switch works correctly. In particular, any model
m of a state must adhere to the following:
In other words, the switch and light should always correspond to the same setting. The two system states that satisfy the above formula are:
Example:
Note that there is no single correct way to model a situation using formulas. In an alternative form of the light switch example, we can introduce four predicates for describing system states: switch is off, switch is on, light is off, light is on.
While there are 2
4 = 16 possible ways to assign values to these four variables, not all would correspond to a possible state of the system if the light switch works correctly. For example, any model
m of a state must adhere to the following:
m ⊨ switch is off ⊕ switch is on
|
|
In other words, the switch cannot be in two positions at once. Likewise, the light can only be in one state at a time:
m ⊨ light is off ⊕ light is on
|
|
Finally, the switch and light should always correspond:
m ⊨ (switch is off ⇔ light is off) ∧ (switch is on ⇔ light is on)
|
|
Thus, the two system states that satisfies the conjunction (i.e., ∧) all these formulas are:
| ≡ | {switch is off ↦ ⊤, switch is on ↦ ⊥, light is off ↦ ⊤, light is on ↦ ⊥} |
|
| ≡ | {switch is off ↦ ⊥, switch is on ↦ ⊤, light is off ↦ ⊥, light is on ↦ ⊤}
|
|
Example:
Suppose we want to model a system consisting of an airlock (i.e., a room with two doors, one leading outside and one leading into an enclosure). If the airlock's outer door is open, the inner door must be closed.
First, suppose we use
outer door and
inner door to represent whether the doors are open (⊤) or closed (⊥). A naive programmer implemented the airlock software so that every possible state
m of the airlock door adheres to the following requirement:
m ⊨ outer door ⊕ inner door
|
|
Does the above constraint satisfy the practical requirements of an airlock? If the airlock needs to allow someone to depressurize or pressurize before they exit the airlock, it should be possible to have both doors of the airlock closed at the same time. Is this possible under the given constraint? We can answer this question by determining if any state of the airlock satisfies the following formula:
m ⊨ (outer door ⊕ inner door) ∧ ¬(outer door) ∧ ¬(inner door)
|
|
In other words, is it possible for the constraint guaranteed by the programmer to be satisfied, and for the two doors to both be locked? If we try to solve the above boolean satisfaction problem for
m, we will find that it has no solution (states that satisfy the formula are in
green, and states that do not satisfy the formula are in
red):
outer door |
inner door |
¬(outer door) |
¬(inner door) |
outer door ⊕ inner door |
overall formula |
⊤ (open) | ⊤ (open) | ⊥ | ⊥ | ⊥ | ⊥ |
⊤ (open) | ⊥ (closed) | ⊥ | ⊤ | ⊤ | ⊥ |
⊥ (closed) | ⊤ (open) | ⊤ | ⊥ | ⊤ | ⊥ |
⊥ (closed) | ⊥ (closed) | ⊤ | ⊤ | ⊥ | ⊥ |
How can this be fixed? Instead of using the ⊕ operator, the programmer should have used
⇒, so that every state of the airlock
m should satisfy:
m ⊨ outer door ⇒ ¬(inner door)
|
|
Then, we can try to solve the following formula:
m ⊨ (outer door ⇒ ¬(inner door)) ∧ ¬(outer door) ∧ ¬(inner door)
|
|
We see that the above formula does have a solution:
outer door |
inner door |
¬(outer door) |
¬(inner door) |
outer door ⇒ ¬(inner door) |
overall formula |
⊤ (open) | ⊤ (open) | ⊥ | ⊥ | ⊥ | ⊥ |
⊤ (open) | ⊥ (closed) | ⊥ | ⊤ | ⊤ | ⊥ |
⊥ (closed) | ⊤ (open) | ⊤ | ⊥ | ⊤ | ⊥ |
⊥ (closed) | ⊥ (closed) | ⊤ | ⊤ | ⊤ | ⊤ |
Example:
Suppose we want to improve our airlock by installing a fire detector inside the enclosure. The fire detector should ensure that someone in the airlock cannot go inside the enclosure if there is a fire detected inside.
A new programmer comes along and incorporates the functionality of the fire detector into the software, and now the airlock's software adheres to the following constraint:
m ⊨ (outer door ⇒ ¬(inner door)) ∧ (fire ⇒ ¬(inner door))
|
|
In other words, in addition to the first constraint, the inner door locks if there is a fire. Is there a potentially dangerous flaw in the software? One problem is that if there is a fire, it should not be possible for both doors to be closed because this may be dangerous (e.g., due to heat accumulation in the airlock). For concision, let us call the formula to which the software adheres
f:
| ≡ | (outer door ⇒ ¬(inner door)) ∧ (fire ⇒ ¬(inner door))
|
|
We can then check whether the dangerous situation might occur by looking for states that adhere to the following:
m ⊨ f ∧ fire ∧ ¬(outer door) ∧ ¬(inner door)
|
|
We can examine the truth table to determine if there is a solution (states that satisfy the formula are in
green, and states that do not satisfy the formula are in
red, and we abbreviate some of the variable names):
outer |
inner |
¬(outer) |
¬(inner) |
fire |
outer ⇒ ¬(inner) |
fire ⇒ ¬(inner) |
⊤ (open) | ⊤ (open) | ⊥ | ⊥ | ⊥ (safe) | ⊥ | ⊤ |
⊤ (open) | ⊥ (closed) | ⊥ | ⊤ | ⊥ (safe) | ⊤ | ⊤ |
⊥ (closed) | ⊤ (open) | ⊤ | ⊥ | ⊥ (safe) | ⊤ | ⊤ |
⊥ (closed) | ⊥ (closed) | ⊤ | ⊤ | ⊥ (safe) | ⊤ | ⊤ |
⊤ (open) | ⊤ (open) | ⊥ | ⊥ | ⊤ (fire) | ⊥ | ⊥ |
⊤ (open) | ⊥ (closed) | ⊥ | ⊤ | ⊤ (fire) | ⊤ | ⊤ |
⊥ (closed) | ⊤ (open) | ⊤ | ⊥ | ⊤ (fire) | ⊥ | ⊥ |
⊥ (closed) | ⊥ (closed) | ⊤ | ⊤ | ⊤ (fire) | ⊤ | ⊤ |
Thus, there exists an unsafe state. One way to avoid this is to modify the software so that it ensure that the outer door is open during a fire (i.e., that the formula
fire ⇒ outer door is always true).
Example:
You are running a cloud computing service and receive several customer requests for a particular computing resource that fall into specific, fixed time windows:
- request a: from 2 AM - 4 AM
- request b: from 3 AM - 7 AM
- request c: from 6 AM - 10 AM
- request d: from 2 AM - 8 AM
- request e: from 8 AM - 9 AM
- request f: from 9 AM - 10 AM
Since the resource can only be allocated to one customer at a time, you cannot satisfy all the requests. How can you satisfy the maximum number of requests?
One approach is to assemble a collection of logical formulas that capture the fact that certain requests are mutually exclusive. If each request has a corresponding variable, then a model that assigns ⊤ to that variable would indicate that the request will be allocated, while a model that assigns ⊥ to that variable would indicate that the request will not be allocated. The formulas in this case would be:
If we take the conjunction (i.e., ∧) of all of the above formulas to make a new formula
h, then any model
m that satisfies the formula
h would correspond to a possible allocation of the requests. If we choose the model
m that allocates the
maximum number of requests (i.e., if we solve the
maximum boolean satisfiability problem), we can determine how to allocate the maximum number of requests.
[link] Assignment #1: Boolean Arithmetic, Algebra, and Satisfiability
In this assignment you will define several Python functions for solving problems involving boolean algebra.
Your solution may not import any modules or employ any external library functions (unless the problem statement explicitly permits this). The different problems and problem parts rely on the lecture notes and on each other; carefully consider whether you can use functions from the lecture notes, or functions you define in one part within subsequent parts.
-
Convert each of the following boolean formulas into their corresponding Python syntax. For each problem part, define an uppercase single-letter variable and assign the formula to that variable in a single assignment statement. For example, suppose the formula for part (f) were specified as follows:
You would then add the following line to your Python file:
F = False and (True or (not (False)))
-
⊥ ⇒ (⊤ ∨ ⊥)
-
(⊤ ⇒ (⊤ ∨ ⊥)) ⊕ ⊥
-
(⊤ ⊕ (¬(⊤))) ⇔ (¬ (⊤))
-
(⊤ ⇔ ⊥) ⊕ (⊥ ⇔ ⊤)
-
(⊤ ⇒ ⊥) ∧ (¬(⊥) ⇒ ¬(⊤))
-
Convert each of the following boolean formulas that contain variables into a corresponding Python function. For each problem part, define a lowercase single-letter named function that takes one or more named arguments (one corresponding to each variable in the formula) and returns the result of evaluating that formula given those arguments. For example, suppose the formula for part (f) were specified as follows:
You would then add the following function definition to your Python file:
def f(x, y, z):
return z and (x or (not (y)))
-
x ⇒ y
-
x ⇔ (y ⊕ z)
-
((x ⇒ y) ∧ x) ⇒ y
-
(x ⇔ y) ⇒ (¬(x) ⇔ ¬(y))
-
((x ⇒ y) ∨ (w ⇒ v)) ⇒ ((x ∧ w) ⇒ (z ∧ (y ∨ v)))
-
In this problem, you will implement functions for printing truth tables for formulas with variables. You may use the following helper function, which prints a tab-delimited list of values.
def prints(values):
print("\t".join([str(value) for value in values]))
The above function can be used as follows:
>>> prints([True, False, True])
True False True
You may also use the following helper function, which returns a list of the argument names of a function:
def variables(f):
return list(f.__code__.co_varnames)
The above function can be used as follows:
>>> def h(x,y,z): return (y or x) and (not(z) <= x)
>>> variables(h)
['x', 'y', 'z']
-
Implement a function
truthtableXY(f)
that takes as its input a single function f
(i.e., a Python function corresponding to a formula such as those you defined in Problem #2 above). You may assume f
takes two boolean arguments x
and y
. The function should print a truth table for f
.
>>> def f(x,y): return x and y
>>> truthtableXY(f)
y x formula
True True True
True False False
False True False
False False False
-
Implement a recursive function
truthtable(f)
that takes as its first argument a single function f
(i.e., a Python function corresponding to a formula). The function f
may take any non-zero quantity of arguments. The function should print a truth table for f
.
>>> def h(x,y,z): return (y or x) and (not(z) <= x)
>>> truthtable(h)
x y z formula
True True True False
True True False False
True False True False
True False False False
False True True True
False True False False
False False True False
False False False False
Your truthtable()
function should employ the recursive backtracking approach, and can be organized as follows:
- the function should have a second parameter
values
with a default value of []
, which will be the list of values the function builds up and eventually passes to f
;
- if the list
values
is empty, the function should print a row containing all the variable names (one column header per variable);
- if the list
values
is the same length as the list of variables of f
, the function should print a row of values containing all the values in values
, as well as the result of applying f
to that list of values (use the *
-operator to apply f
to the list of arguments);
- if the list
values
is shorter than the list of variables of f
, the function should make recursive calls to truthtable()
, with approprate changes to the arguments of truthtable()
.
-
Implement a function
rows(f)
that takes as its first argument a single function f
(i.e., a Python function corresponding to a formula). The function should return the number of rows in the truth table for f
.
>>> def h(x,y,z): return (y or x) and (not(z) <= x)
>>> rows(h)
8
-
In this problem you will implement an algorithm for solving the maximum boolean satisfiability problem. You will then use it to solve a simple instance of the maximum boolean satisfiability problem. You may use the following helper function, which counts the number of
True
values in a list:
def count(values):
return len([v for v in values if v == True])
The above function can be used as follows:
>>> count([True, False, True, False])
2
-
Implement a recursive function
solve(f)
that takes as its first argument a single function f
(i.e., a Python function corresponding to a formula). The function f
may take any non-zero quantity of arguments. The function solve(f)
should return a list of values that satisfies f
and has the largest possible number of True
values (as many as any other list of values that satisfies f
). -
Suppose you need to transport some animals (a rhino, a lion, a zebra, a hyena, and a snake) across a river in a boat. Since certain animals are predators or may poison or step on each other, there are certain constraints in this situation that you want to follow, and you model these using boolean formulas:
- if a snake is in the boat, then the hyena is not in the boat;
- if a lion is in the boat, then the hyena is not in the boat;
- if a rhino is in the boat, then the snake is not in the boat;
- if a lion is in the boat, then the snake is not in the boat;
- if a lion is in the boat, then the zebra is not in the boat;
- if a snake is in the boat, then the zebra is not in the boat.
Define a formula transport()
containing five variables (rhino, lion, zebra, hyena, and snake). If a variable corresponding to a particular animal is assigned the value True
, this represents that the animal is in the boat. The formula should capture all of the constraints.
Define a variable maximum
to be an integer that represents the largest number of animals that can be in the boat at the same time. You must use the solve()
function in your definition of maximum
.
[link] Set Theory and its Relationships to Boolean Algebra
Set theory and boolean algebra were both derived from intuitive processes and techniques that humans developed for solving problems in their environment. Thus, both set theory and boolean algebra are languages for describing (in an abstract way that eliminates many of the concrete details) the same universal principles we observe in our environment. This also means that set theory and boolean algebra can be defined in terms of one another: it is possible to define boolean algebra using set theory, and it is possible to define set theory using boolean algebra. In this section, we will study and explore both directions of this relationship.
[link] Sets, set operations, and their algebraic properties
Sets are an intuitive mathematical concept that can capture the idea of a collection of distinct items. Since every item in a set must be distinct, there can be no duplicates. Thus, it is not a good way to represent collections containing various quantiities of each distinct item; it is more suitable for representing collections of more "abstract" things, like concepts.
As with formulas, we will use sequences of certain symbols to describe sets. We define this symbolic language below.
Definition:
The following table specifies various notations for sets, what they mean, and how they can be represented within the Python programming language.
set notation |
name |
meaning |
example in Python |
{a, b, c} |
set |
a set containing four distcint elements a, b, and c |
{'a', 'b', 'c'} |
∅ |
empty set |
an empty set that contains no elements |
set() |
A ∪ B |
union of A and B |
the set containing all elements in A and all elements in B |
A.union(B)
A | B |
A ∩ B |
intersection of A and B |
the set containing only elements that are both in A and in B |
A.intersection(B)
A & B |
A − B |
set difference |
the set containing all elements in A that are not in B |
A - B |
A |
complement of A |
the set containing only elements that are not in A |
U - A |
U |
universe |
the set containing all possible elements (depends on context) |
must be defined by user |
Typically (but not always, so it is always a good idea to confirm this), all the set operations have the same level of precedence, and should be performed left-to-right (i.e., they are left-associative). For example:
Example:
Determine what elements are contained in each of the following sets:
-
∅ ∩ {a, b, c}
-
({a, b, c} ∪ {c, d, e}) - {c}
-
{a, b, c} where U = {a, b, c, d, e, f}
-
{a, b, c} ∩ {b, d, f} where U = {a, b, c, d, e, f}
-
U where U = {a, b, c, d, e, f}
-
A where A = {d, e, f} and U = {a, b, c, d, e, f}
Fact:
The set operations have many algebraic properties. We list some of the ones we encounter frequently below:
| = | | | |
| = | | | |
| = | | | |
| = | | | |
| = | | | |
| = | | | |
| = | | | (distributivity of ∩ across ∪)
|
|
The ∪ and ∩ operations can be expressed in terms of one another with the help of the set complement operation:
[link] Sets of models
In this section we introduce a way to interpret (or, equivalently, understand, represent, or think of the meaning of) formulas as sets, and particularly sets of models. For the purposes of this discussion, we assume that the number of variables that can appear in our formulas is finite and fixed (i.e., it never changes once we agree on what the variables are). Having a finite, fixed number of variables immediately puts an upper bound on the number of distinct models that can exist, because each model is just a distinct way to assign a value (i.e., ⊤ or ⊥) to every one of the variables.
Definition:
Suppose that we have a collection of
k different models
m1,...,
mk (remember that each model is an assignment of variables to values that looks something like {
x ↦ ⊤,
y ↦ ⊥, ...}). Suppose also that each model satisfies some formula
f, so that we can write:
Then we say that the
set of models {
m1,...,
mk}
satisfies f, and we denote this using the following notation:
Example:
Suppose that we have only two variables:
x and
y. There are 2
2 = 4 distinct models if there are only two variables:
Suppose we set our universe
U to be the set of models {
m1,
m2,
m3,
m4}. Then consider the following three formulas contianing
x and
y:
Notice that
m1,
m2, and
m3 all satisfy
f, but only
m1 satisfies
g, and only
m1 and
m4 satisfy
h. We could write the following:
But this means that we could
interpret x ∨
y as the set of models {
m1,
m2,
m3}, we could interpret
x ∧
y as the set of models {
m1}, and we could interpret
x ⇔ y as the set of models {
m1,
m4}. In other words, each formula gives us a way to specify
which models we want to include in a set.
[link] Set algebra of model sets and boolean algebra
Fact:
Assume we are only working with models and formulas that contain a certain set of variables
x1,...,
xk. Suppose we have two model sets
A and
B that satisfy the two formulas
f and
g, respectively. In other words:
Then all of the following are true:
In other words:
- the union of two model sets satisfies the disjunction (or) of two corresponding formulas;
- the intersection of two model sets satisfies the conjunction (and) of two corresponding formulas;
- the complement of a model set satisfies the negation (not) of a corresponding formula.
Example:
Suppose we have the following sets of models:
| = | {{x ↦ ⊤, y ↦ ⊥}, {x ↦ ⊥, y ↦ ⊤}} |
|
| = | |
Use A, B, and set operations (union, intersection, difference, and compelement, but not ∅ or U) to build the largest model set you can that satisfies each of the following formulas:
-
x ⊕ y
-
x ∧ y
-
¬(x ∧ y)
-
x
-
⊥
-
⊤
There is exactly one possible set for each part, because there is always a largest set of models that satisfy the formula (if there were two largest sets that both satisfy a formula, they could be combined into a larger set that still satisfies the formula, since every model found in either set satisfies the formula). However, there may be more than one way to build that set using
A and
B. For each part below, we provide one or two of the ways to build that set.
-
The set A satisfies x ⊕ y.
-
The set B satisfies x ∧ y.
-
The set B satisfies ¬(x ∧ y). We could also say A ∪ B satisfies it (since it refers to the same set of models).
-
The set can only contain models in which x is assigned ⊤. However, there is no way to isolate the model in A where x ↦ ⊤, so the best we can do is the set B, which contains one of the two models in which x is assigned ⊤.
-
No models can satisfy ⊥, so we need to build the empty set. One example of an empty set is A ∩ B.
-
All models satisfy ⊤, so we can obtain all the model sets using A ∪ A, or B ∪ B, among others.
Example:
Suppose we are considering formulas with two variables x and y, and we have the following sets of models:
| = | {{x ↦ ⊤, y ↦ ⊥}, {x ↦ ⊥, y ↦ ⊥}} |
|
| = | |
| = | {{x ↦ ⊤, y ↦ ⊥}, {x ↦ ⊥, y ↦ ⊤}}
|
|
Find a formula that corresponds to each of the following model sets.
-
B
-
B
-
B ∪ C
-
A ∪ B ∪ C
-
∅
-
A ∩ C ∪ B
There are multiple solutions to each of the parts above. Below, we provide a few for each.
-
Since both x and y are ⊤, some formulas that are satisfied include x ∧ y, as well as (x ⇔ ⊤) ∧ (y ⇔ ⊤).
-
One formula is ¬(x ∧ y), and another is ¬(x) ∨ ¬(y).
-
One formula is (x ∧ y) ∨ (x ⊕ y) because x ∧ y is satisfied by B and x ⊕ y is satisfied by C. Another formula is x ∨ y, since at least one of the two variables is always true in B or C.
-
Since A ∪ B ∪ C contains all possible models, one formula that is satisfied is ⊤ (all models satisfy ⊤ because any assignment of values to variables results in ⊤ being true). Another formula could be x ∨ y ∨ ¬(x) ∨ ¬(y).
-
A formula that is always false is not satisfied by any models. This can include formulas such as ⊥, as well as x ⇔ (¬(x)) because a variable must always be equal to itself.
-
First, we can break down the notation to determine what set it describes:
| = | |
| = | {{x ↦ ⊥, y ↦ ⊤}, {x ↦ ⊤, y ↦ ⊤}, {x ↦ ⊥, y ↦ ⊥}} |
|
| = | {{x ↦ ⊥, y ↦ ⊤}, {x ↦ ⊤, y ↦ ⊤}, {x ↦ ⊥, y ↦ ⊥}}
|
|
At this point, we see the set only omits {x ↦ ⊤, y ↦ ⊥}, so possible formulas include x ⇒ y, as well as ¬(x) ∨ y.
Fact:
Given a formula f, there is a unique (i.e., single) largest set S of models that satisfies f.
To understand why, let us assume that this is not true. Then there is some formula
f for which there exist two sets
A and
B that are distinct, and yet we have:
But because all the models in
A and
B satisfy
f, the above implies that:
Thus, either
A and
B are the same set, or
A ∪
B is larger than both of them, contradicting our assumption that
A and
B are each as large as any other set of models that satisfies
f.
The
boolean satisfiability problem is solved by finding
any one model that can satisfy a formula, and the
maximum boolean satisfiability problem is solved by finding any model that satisfies a formula, and has as many variables mapped to ⊤ as any other formula that satisfies a formula. What if we are interested in knowing the
number of models that satisfy a formula?
Definition:
The size of a set S is the number of elements in that set, and is denoted using |S|, where |S| is always an integer that is 0 or greater.
Example:
Below, we show the sizes of some sets:
Fact:
Given a formula f, there is a unique (i.e., single) largest set S of models that satisfies f.
To understand why, let us assume that this is not true. Then there is some formula
f for which there exist two sets
A and
B that are distinct, and yet we have:
But because all the models in
A and
B satisfy
f, the above implies that:
Thus, either
A and
B are the same set, or
A ∪
B is larger than both of them, contradicting our assumption that
A and
B are each as large as any other set of models that satisfies
f.
Definition:
We define an instance of the
counting boolean satisfiability problem as follows: given a formula
f, find the total number of models that satisfy
f. In other words, find the size |
S| of the unique largest set
S that satisfies the formula:
Knowing the number of models that satisfy a formula can be useful. It can allow us to get a better idea of how easy it might be to find a model that satisfies (i.e., solves) the formula, or how restrictive a formula is.
Fact:
Suppose that we are working with formulas that can only contain any of the variables
x1, ...,
xk (so there are
k distinct variables). Then, the size of the universe
U of models that may or may not satisfy a formula has size 2
k:
Recall that for a given number of variables, we can determine the number of rows a truth table may have for a formula. We also know that the number of rows is equivalent to the largest possible set U that may satisfy a formula for that number of variables. But what about the number of formulas that only use variables from a collection of variables x1, ..., xk?
The number of well-formed formulas is infinite, because we can always build a larger formula. But for a given collection of variables x1, ..., xk, the number of distinct truth tables is finite. This is because all truth tables are of size 2k, so they are finite, and there is only a finite number of ways that we can assign a value to the formula for every one of those 2k rows in the truth table.
Example:
Suppose we are working with the variables
x and
y. How many truth tables are there involving these two variables? We can list them all (there are two variables, so there are 2
2 = 4 rows in each table, and so there are 2
# rows = 2
(22) = 2
22 = 2
4 = 16 truth tables).
Notice that no matter what well-formed formula we assemble that uses the variables
x and
y, we can
always evaluate that formula to determine its meaning for each of the 2
2 = 4 possible assignments of values to variables (i.e., models), and since the meaning of the formula must be either ⊤ or ⊥, we will in the end have one of the truth tables above.
How can we compute the number of truth tables for a given number of variables? Notice that each truth table corresponds to a particular set of models that satisfy the formula (each row for which the formula is ⊤ corresponds to a model in the set of satisfying models).
Definition:
Given a set
B, we say that
A is a
subset of
B if every element in
A is also in
B. In other words, if
A -
B = ∅, or if
A ∩
B =
A, then
A is a subset of
B. We denote this as follows:
Example:
Below are some examples of subset relationships:
Definition:
Given a set S, the power set of S, which we denote as ℘(S), is the set of subsets of S. In other words, every possible subset of S is inside the set ℘(S), and each element in ℘(S) is a set.
Fact:
Given a set
S of size |
S|, the size of the powerset is denoted |℘(
S)|, where:
This is because each possible element inside
S is either in a subset, or it is not in a subset. Thus, since each of the |
S| elements have two possible states (in and out), there are 2 ⋅ 2 ⋅ ... ⋅ 2 states in total (where we have |
S| factors of 2 in the product).
Example:
Below are some examples of power sets of specific sets:
| = | {∅, {a}, {b}, {c}, {a,b}, {b,c}, {a,c}, {a,b,c}} |
|
| = | |
| = | |
Notice that the empty set is a subset of the empty set, so the power set of the empty set is 2
0 = 1.
Fact:
Suppose we are working with formulas that contain the variables
x1, ...,
xk. Every truth table for these variables corresponds to a particular set of models (i.e., those that satisfy the formula and correspond to rows in the table for which the formula is ⊤). Every set of models must be a subset of the universe
U for this collection of variables. Thus, all the possible sets of models correspond exactly to the elements of ℘(
U), the power set of the universe
U:
| | | | a model (assignment of values to variables) |
|
all rows in a truth table | |
| | | | |
| | | | a set of models, subset of U (rows for which formula is ⊤) |
|
| | | | set of all sets of models, power set of U
|
|
Thus, the number of truth tables is exactly the size of ℘(
U):
Example:
Suppose we are working with the variables
x and
y. Each possible truth table for these variables corresponds to a particular subset of the rows (i.e., the rows that have a value of ⊤ in the formula column), and thus it corresponds to a particular subset of the universe. We can pair each truth table with the subset of the universe to which it corresponds.
{{x ↦ ⊤, y ↦ ⊤}, {x ↦ ⊤, y ↦ ⊥}, {x ↦ ⊥, y ↦ ⊤}, {x ↦ ⊥, y ↦ ⊥}}
|
{{x ↦ ⊤, y ↦ ⊤}, {x ↦ ⊤, y ↦ ⊥}, {x ↦ ⊥, y ↦ ⊤}}
|
{{x ↦ ⊤, y ↦ ⊤}, {x ↦ ⊤, y ↦ ⊥}, {x ↦ ⊥, y ↦ ⊥}}
|
{{x ↦ ⊤, y ↦ ⊤}, {x ↦ ⊤, y ↦ ⊥}}
|
{{x ↦ ⊤, y ↦ ⊤}, {x ↦ ⊥, y ↦ ⊤}, {x ↦ ⊥, y ↦ ⊥}}
|
{{x ↦ ⊤, y ↦ ⊤}, {x ↦ ⊥, y ↦ ⊤}}
|
{{x ↦ ⊤, y ↦ ⊤}, {x ↦ ⊥, y ↦ ⊥}}
|
{{x ↦ ⊤, y ↦ ⊤}}
|
{{x ↦ ⊤, y ↦ ⊥}, {x ↦ ⊥, y ↦ ⊤}, {x ↦ ⊥, y ↦ ⊥}}
|
{{x ↦ ⊤, y ↦ ⊥}, {x ↦ ⊥, y ↦ ⊤}}
|
{{x ↦ ⊤, y ↦ ⊥}, {x ↦ ⊥, y ↦ ⊥}}
|
{{x ↦ ⊤, y ↦ ⊥}}
|
{{x ↦ ⊥, y ↦ ⊤}, {x ↦ ⊥, y ↦ ⊥}}
|
{{x ↦ ⊥, y ↦ ⊤}}
|
{{x ↦ ⊥, y ↦ ⊥}}
|
∅
|
Example:
Suppose we want to know the number of truth tables for the set of formulas that have up to 4 variables. We can compute it:
Example:
Suppose you are implementing or maintaining a small social networking application with five members: Alice, Bob, Carl, Dan, and Eve. For any given instance (i.e., session) of the application running in a single browser window, the following constraints must be satisfied:
- if Alice is logged in, Bob is visible and Dan is visible;
- if Bob is logged in, Alice is visible and Dan is visible;
- if Carl is logged in, Eve is visible;
- if Dan is logged in, Alice is visible and Bob is visible;
- if Eve is logged in, Carl is visible;
- only one person can be logged in at a time.
Answer each of the following.
-
How can we represent the above constraints using a formula? What is the formula?
-
How many distinct models should satisfy the formula? In other words, in how many rows in the truth table should the formula column contain ⊤?
-
Suppose you are running the server and database for this application, and the members are all allowed to either make themselves visible or not visible to one another. From your perspective, at any given time, any of the members might be logged in or logged out, and the visibility properties between the members might have different settings depending on their current preferences. The database keeps track of everything: whether each member is logged in, and what the visibility preferences are. How many different database configurations can exist?
-
The above constraints suggest a particular set of relationships between the members: Alice, Bob, and Dan are all friends, and Carl and Eve are friends with each other, but no one else is a friend with anyone else. Suppose we wanted to use a single variable to represent each pair-wise relationship between members. How many variables would we need?
We know that each user must be either logged in or not logged in, and each user must be either visible or not visible. Since there are five users and two states to track for each user, we will use 2 ⋅ 5 variables:
Alice logged in,
Bob logged in,
Carl logged in,
Dan logged in,
Eve logged in,
Alice visible,
Bob visible,
Carl visible,
Dan visible, and
Eve visible. Thus, the size of our universe is 2
10 = 1024. We answer each part, with some discussion (there is sometimes more than one right answer).
-
How we represent the constraints using a formula depends on what the unspecified constraints might be. If we assume that anything that is not mentioned explicitly in the constraints should not be restricted, then the above constraints would allow for the possibility that everyone can be visible to, e.g., Alice, it is just required that Bob and Dan must be visible. Then, we can represent each of the first five constraints as follows using a formula:
- Alice logged in ⇒ (Bob visible ∧ Dan visible)
- Bob logged in ⇒ (Alice visible ∧ Dan visible)
- Carl logged in ⇒ Eve visible
- Dan logged in ⇒ (Alice visible ∧ Bob visible)
- Eve logged in ⇒ Carl visible
On the other hand, this is not necessarily what the person who wrote the constraints meant. Maybe by saying "Bob and Dan are visible", the person implied that the others should not be visible. Then we would need to write the constraints as follows:
- Alice logged in ⇒ (Bob visible ∧ ¬(Carl visible) ∧ Dan visible ∧ ¬(Eve visible))
- Bob logged in ⇒ (Alice visible ∧ ¬(Carl visible) ∧ Dan visible ∧ ¬(Eve visible))
- Carl logged in ⇒ (¬(Alice visible) ∧ ¬(Bob visible) ∧ ¬(Dan visible) ∧ Eve visible)
- Dan logged in ⇒ (Alice visible ∧ Bob visible ∧ ¬(Carl visible) ∧ ¬(Eve visible))
- Eve logged in ⇒ (¬(Alice visible) ∧ ¬(Bob visible) ∧ Carl visible ∧ ¬(Dan visible))
The only way to determine which of the above is the "correct" interpretation of the constraints written in English is to ask the original author of the constraints. If we imagine an application in which everyone can set their own privacy settings, we might imagine the first interpretation is more correct, because we want everyone to be able to make their profile public if they wish (e.g., Alice must be able to see her friends Bob and Dan, but she might also be able to see Eve if Eve made her profile public).
The last constraint is difficult to represent using only the well-formed formulas we have seen so far; we must do so by enumerating all the possibilities:
- Alice logged in ⇒ (¬(Bob logged in) ∧ ¬(Carl logged in) ∧ ¬(Dan logged in) ∧ ¬(Eve logged in))
- Bob logged in ⇒ (¬(Alice logged in) ∧ ¬(Carl logged in) ∧ ¬(Dan logged in) ∧ ¬(Eve logged in))
- Carl logged in ⇒ (¬(Alice logged in) ∧ ¬(Bob logged in) ∧ ¬(Dan logged in) ∧ ¬(Eve logged in))
- Dan logged in ⇒ (¬(Alice logged in) ∧ ¬(Bob logged in) ∧ ¬(Carl logged in) ∧ ¬(Eve logged in))
- Eve logged in ⇒ (¬(Alice logged in) ∧ ¬(Bob logged in) ∧ ¬(Carl logged in) ∧ ¬(Dan logged in))
-
If users are not allowed to make their profile public, the only possibilities are that one of the five users is logged in, or that no one is logged in. In that case, there are 5 + 1 = 6 possible rows in the truth table.
However, if users are allowed to set their profiles to be public or private, then computing the number of rows that satisfy the formula is more complicated (unless we do it automatically by simply submitting the appropriate formula to a solver for the counting boolean satisfiability problem).
If no one is logged in but everyone controls their own public profile privacy settings independently, then there are actually 25 = 32 possible combinations (there are five users, and the profile for each user is either public or private). Then, if any of the users are logged in, it must be that there are a number of valid configurations. For example, consider the case in which Alice is logged in: Carl and Eve may or may not be visible to Alice, and all of these are allowed:
Alice logged in |
Bob visible |
Carl visible |
Dan visible |
Eve visible |
⊤ | ⊤ | ⊥ | ⊤ | ⊥ |
⊤ | ⊤ | ⊤ | ⊤ | ⊥ |
⊤ | ⊤ | ⊥ | ⊤ | ⊤ |
⊤ | ⊤ | ⊤ | ⊤ | ⊤ |
Thus, for Alice, Bob, and Dan, there are 22 = 4 possible configurations of Carl's and Eve's profile privacy settings. From Carl and Eve's perspective, there are 23 = 8 possible configurations for Alice's, Bob's, and Dan's profile privacy settings. If we add up all the possibilities, we get:
Thus, there would be 60 rows in the table in which the column corresponding to the overall formula has the value ⊤.
-
This question is simply asking about the number of possible configurations of the ten variables we have considered, which is the same as the size of the universe. We know that in this problem, the size of the universe is 210 = 1024, so there are 1024 database configurations.
-
Since there are five users, we would have 4 + 3 + 2 + 1 = 10 pair-wise relationships: four relationships between Alice and the others, three more between Bob and the others (excluding his relationship with Alice, which we already counted), two more between Carl and the others (excluding relationships with Alice and Bob, which we already counted), and so on. Thus, we would need 10 variables if we had one variable for each pair-wise relationship.
[link] Assignment #2: Algebraic Properties of Formulas and Sets
In this assignment you will define a collection of Python variables and functions involving formulas and sets.
Your solution may not import any modules or employ any external library functions (unless the problem statement explicitly permits this). The different problems and problem parts rely on the lecture notes and on each other; carefully consider whether you can use functions from the lecture notes, or functions you define in one part within subsequent parts.
-
For each of the following formula definitions, use De Morgan's Law and other algebraic properties of boolean operators to convert it into an equivalent formula that uses only ∧ and ¬, and no other operators. You only need to include the new versions of these definitions (that use only ∧ and ¬); you do not need to include the original definitions.
-
def oneA(x, y):
return (x or (not y))
-
def oneB(x, y):
return ((not x) <= (not y))
-
def oneC(x, y, z):
return ((x and y) <= z)
-
def oneD(w, x, y, z):
return (x or y) and (w or z)
-
def oneE(w, x, y, z):
return ((not w) or (not x)) or (not (y and z))
-
def oneF(v, w, x, y, z):
return (((v <= (not w)) and x) <= (y <= z))
-
Include the following set definitions in your Python file:
U = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'}
X = {'a', 'b', 'c'}
Y = {'c', 'd', 'e', 'f'}
Z = {'f', 'g'}
For each of the following sets, use some number of set intersection, union, and difference operations to build the specified set using U
, X
, Y
, and Z
. You can only use set operations and the four provided set variables; you may not include any sets built using the brace notation { ... }
. There may be more than one correct solution to each part; you only need to supply one correct solution for each part.-
Define a set
twoA
in your Python file that is equivalent to the set { 'f' }
.
Your definition cannot contain any explicit sets; it must use set operations to build the set { 'f' }
using U
, X
, Y
, Z
. It should then be possible to test your definition as follows:
-
Define a set
twoB
in your Python file that is equivalent to the set {'a', 'b'}
. -
Define a set
twoC
in your Python file that is equivalent to the set {'d', 'e'}
. -
Define a set
twoD
in your Python file that is equivalent to the set {'a', 'b', 'g', 'h'}
. -
Define a set
twoE
in your Python file that is equivalent to the set {'c', 'd', 'e', 'h'}
. -
Define a set
twoF
in your Python file that is equivalent to the set {'a', 'b', 'd', 'e', 'g', 'h'}
. -
Define a set
twoG
in your Python file that is equivalent to the empty set set()
.
-
For each of the following functions, use De Morgan's Law for sets to convert each function on sets into a function that returns the same result, but uses only set union and set difference operations, and does not use set intersection. You only need to include the new versions of these definitions (that use only set union and set difference); you do not need to include the original definitions. You may assume that
U
is always the set corresponding to the universe from which the elements in all the other sets are drawn.-
def threeA(U, X, Y):
return X & Y
-
def threeB(U, X, Y):
return (U - X) & (U - Y)
-
def threeC(U, X, Y, Z):
return U - ((U - X) & (Y & Z))
-
def threeD(U, X, Y, Z):
return (X & Y) | (Y & Z)
-
In this problem, you will implement Python functions for working with sets of models that satisfy particular formulas. You may include the following functions in your Python file:
variables(f)
returns the list of variables in the definition of a formula f
, and universe(n)
returns the set of all boolean value tuples of a certain length n
.
def variables(f):
return list(f.__code__.co_varnames)
def universe(n, values = tuple()):
if len(values) == n:
return {values}
else:
return universe(n, values + (True,)) | universe(n, values + (False,))
-
Implement a function
models(f)
that takes as its input a single function f
(i.e., a Python function corresponding to a formula). The function should return a set containing all the tuples that satisfy f
. Hint: you could define a local variable M
to be an empty set, then use a for
loop to look through all the formulas in an appropriate universe, and add each tuple that satisfies the formula to the set using M.add()
. -
Implement a function
satisfies(S, f)
that takes as its input a set of tuples S
and a function f
that represents a formula. The function should return True
if every tuple in S
satisfies f
, and False
, otherwise. -
Implement a function
equivalent(f, g)
that takes as its inputs two functions f
and g
that represent formulas. If both formulas take the same number of variables and are equivalent (i.e., they have the same truth tables), then equivalent(f, g)
should return True
; otherwise, it should return False
. Hint: use models()
to write a one- or two-line definition for this function. -
Implement a function
both(f, g)
that takes as its inputs two functions f
and g
that represent formulas. It should return a set of tuples that contains only those tuples that satisfy both f
and g
. Your implementation must be at most two lines. -
Implement a function
either(f, g)
that takes as its inputs two functions f
and g
that represent formulas. It should return a set of tuples that contains only those tuples that satisfy either f
or g
. Your implementation must be at most two lines. -
Implement a function
neither(f, g)
that takes as its inputs two functions f
and g
that represent formulas. It should return a set of tuples that contains only those tuples that do not satisfy either f
or g
. Your implementation must be at most two lines.
-
Suppose that Alice, Bob, Carl, Dan, and Eve are all trying to schedule a one-hour appointment with you. They each have one of two possible times that they can meet:
- Alice can meet at 2 PM or 4 PM;
- Bob can meet at 1 PM or 3 PM;
- Carl can meet at 1 PM or 5 PM;
- Dan can meet at 2 PM or 3 PM;
- Eve can meet at 4 PM or 5 PM.
How many different schedules exist in which everyone is scheduled for an appointment, and no appointments overlap? Define a single variable schedules
to be an integer that corresponds to the number of different schedules that will satisfy everyone:
Hints:
- define a new formula function that takes ten arguments (one for each possible time and person) and uses logical operations (you only need ⊕ and ∧) to determine whether a schedule is possible by returning
True
or False
;
- by calling the functions you implemented in Problem #4 above, compute the set of tuples that satisfies the formula you assembled and use this to determine the number of possible schedules.
[link] Logic and Formulas as Proofs
In this section we will formally define the notion of a proof and a theorem (a.k.a., a fact, lemma, and so on) using concepts that have been introduced in previous sections. In particular, we will define these as a certain kind of formula: a formula that is always true. We will also introduce logical rules that allow us to build such formulas.
[link] Building true formulas
Suppose we are interested in building boolean formulas that are always true. In other words, we want to build formulas that are true under every possible assignment of the variables to truth values. We could say that such formulas f are satisfied by the universe (i.e., U ⊨ f); that is, these formulas can accurately model the properties of the universe of assignments.
One naive, brute force strategy might be to begin listing all possible well-formed formulas, and then checking their truth tables. We could throw away formulas that are not always true, and keep those that are always true. However, this process would be extremely inefficient.
Another approach might be to assemble a particular collection of
algebraic laws that identify exactly those algebraic manipulations that can grow a formula but keep the formula's meaning
true. In other words, we might want algebraic laws that explain to us how we can expand the simplest true formulas, such as ⊤, into larger and larger true formulas. We can then use these laws as building blocks to build larger and larger formulas that are always true, or even to more efficiently check if a particular formula is always true.
We will call these particular algebraic rules for formulas logical rules, or just logic. We can begin by introducing two formulas that are always guaranteed to be true.
Fact:
The formula ⊤ is always true.
Fact:
For any variable
x, the following formula is always true:
We can confirm this by consulting the truth table for this formula:
In fact, for any
formula f, the following formula must be true:
We can confirm this by observing that no matter what assignment of variables we consider, the meaning of
f is the same on both the left and right side of the
⇒ operator:
x1 |
x2 |
... |
xk |
f |
f ⇒ f |
⊤ | ⊤ | ... | ⊤ | ? | ⊤ |
⊤ | ⊤ | ... | ⊥ | ? | ⊤ |
⋮ | ⋮ | | ⋮ | ⋮ | ⋮ |
Regardless of what the meaning of
f is in a particular row,
f ⇒ f will always be true.
Once we have a way to build simple formulas that are always true, we can expand these into larger formulas that are always true by converting them into formulas using transformations that are guaranteed not to make a formula that is already true into a formula that is false. Note that any transformation of the formula into an
equivalent formula using algebraic laws satisfies this condition, but there are other transformations that can satisfy this condition, as well.
Definition:
A logical inference rule (or simply inference rule) is a generalization of the notion of an algebraic law for equivalent formulas. Instead of requiring that the truth table for two formulas f and g must be the same, an inference rule only requires that for any row in the truth table for f in which f has a meaning of ⊤, the same row in the truth table for g must also be true. An inference rule is written using the following notation:
The formula f above the line is called the premise, and the formula g below the line is called the conclusion.
Example:
The following is a valid inference rule:
We can confirm it is a valid inference rule by consulting the truth tables for the two formulas:
x |
y |
x ∧ y |
x ∨ y |
⊤ | ⊤ | ⊤ | ⊤ |
⊤ | ⊥ | ⊥ | ⊤ |
⊥ | ⊤ | ⊥ | ⊤ |
⊥ | ⊥ | ⊥ | ⊥ |
Notice that in some rows, if the premise has the meaning ⊥, the conclusion has the meaning ⊤. But there is no row in which a premise has the meaning ⊤ but the conclusion has the meaning ⊥.
Note that if an inference rule holds in both directions for two formulas, then the formulas are equivalent.
Fact:
Suppose that for two formulas f and g, both of the following inference rules are true:
Then we can be certain that:
This is because in any row in which f has meaning ⊤, g must also have the meaning ⊤ (according to the first inference rule), and in any row in which g has meaning ⊤, f must have the meaning top (according to the second inference rule):
f |
g |
|
⊤ | ⊤ | allowed by both inference rules |
⊤ | ⊥ | violates first inference rule |
⊥ | ⊤ | violates second inference rule |
⊥ | ⊥ | allowed by both inference rules |
We introduce inference rules in addition to equivalence of formulas because we are working with special formulas in this section: formulas that are always true. For these kinds of formulas, what seems like a weaker restriction imposed by an inference rule is actually not weaker.
Fact:
Suppose we have two formulas f and g and f is always true, and suppose that the following inference rule holds:
Then we can be certain that:
This is because f has the meaning ⊤ in all of the rows in its truth table, which means g must also have the meaning ⊤ (according to the inference rule) in all of its rows:
f |
g |
|
⊤ | ⊤ | allowed |
⊤ | ⊥ | violates the inference rule |
⊥ | ⊤ | violates condition that f is always true |
⊥ | ⊥ | violates condition that f is always true |
We can now introduce some additional logical inference rules that can be useful for converting formulas that are always true into larger formulas that are always true.
Fact:
For any three formulas f, g, and h, we have:
We can confirm this by consulting the truth table for the two formulas above:
f |
g |
h |
f ⇒ h |
(f ∧ g) ⇒ h |
⊤ | ⊤ | ⊤ | ⊤ | ⊤ |
⊤ | ⊤ | ⊥ | ⊥ | ⊥ |
⊤ | ⊥ | ⊤ | ⊤ | ⊤ |
⊤ | ⊥ | ⊥ | ⊥ | ⊤ |
⊥ | ⊤ | ⊤ | ⊤ | ⊤ |
⊥ | ⊤ | ⊥ | ⊤ | ⊤ |
⊥ | ⊥ | ⊤ | ⊤ | ⊤ |
⊥ | ⊥ | ⊥ | ⊤ | ⊤ |
Note that because ∧ is commutative, this inference rule can be combined with algebraic laws to yield:
Furthermore, because ∧ is associative, this inference rule means that we can always introduce any number of formulas into a conjunction on the left-hand side of the ⇒ operator:
(f1 ∧ ... ∧ fm) ⇒ h | (f1 ∧ ... ∧ fm ∧ g1 ∧ ... ∧ gn) ⇒ h |
|
Example:
Suppose that we believe the following relationships to hold between a light source and the time:
- if the sun is in the sky, then it is daytime.
We can represent the relationship above using the formula:
Notice that if the meaning of the above formula is true, introducing additional constraints on the left-hand side of the ⇒ operator can never make new formula false. In some cases, it would just be additional, irrelevant information:
(giraffes are yellow ∧ sun is in the sky) | |
| ⇒ | |
What if the additional formula is false? This does not matter, either, because then the left-hand side of the ⇒ operator is false, which automatically makes the whole formula true (i.e., ⊥ ⇒ ⊤ and ⊥ ⇒ ⊥ are always true):
(giraffes are red ∧ sun is in the sky) | |
| ⇒ | |
This holds even in the following extreme example, where the left-hand side becomes contradictory because of what we introduce:
(¬(sun is in the sky) ∧ sun is in the sky) | |
| ⇒ | |
Fact:
For any three formulas f, g, and h, we have:
In the form of inference rules, this would be:
(f ∧ g) ⇒ h | (f ∧ g) ⇒ (h ∧ g) |
|
(f ∧ g) ⇒ (h ∧ g) | (f ∧ g) ⇒ h |
|
We can confirm this by consulting the truth table for the two formulas above:
f |
g |
h |
(f ∧ g) ⇒ h |
(f ∧ g) ⇒ (h ∧ g) |
⊤ | ⊤ | ⊤ | ⊤ | ⊤ |
⊤ | ⊤ | ⊥ | ⊥ | ⊥ |
⊤ | ⊥ | ⊤ | ⊤ | ⊤ |
⊤ | ⊥ | ⊥ | ⊤ | ⊤ |
⊥ | ⊤ | ⊤ | ⊤ | ⊤ |
⊥ | ⊤ | ⊥ | ⊤ | ⊤ |
⊥ | ⊥ | ⊤ | ⊤ | ⊤ |
⊥ | ⊥ | ⊥ | ⊤ | ⊤ |
Intuitively, this means that as long as a subformula appears on the left-hand side of the ⇒ operator, we are free to remove or add it to the right-hand side of the ⇒ operator. In other words, "we can always conclude that what we assumed is true, and we are also free to remove that conclusion".
Example:
Suppose that we believe the following relationships to hold between a light source and the time:
(sun is in the sky ⇒ it is daytime) ∧ sun is in the sky | |
| ⇒ | |
Then we can add the fact that the sun is in the sky on the right-hand side of the outermost ⇒ operator without changing the meaning of the formula:
(sun is in the sky ⇒ it is daytime) ∧ sun is in the sky | |
| ⇒ | it is daytime ∧ sun is in the sky
|
|
Fact (modus ponens):
For any two formulas f and g, we have:
(f ⇒ g) ∧ f | (f ⇒ g) ∧ f ∧ g |
|
The above can be simplified further into either or both of the following:
We can confirm this by consulting the truth table for the two formulas above:
f |
g |
(f ⇒ g) ∧ f |
(f ⇒ g) ∧ f ∧ g |
f ∧ g |
g (as conclusion) |
⊤ | ⊤ | ⊤ | ⊤ | ⊤ | ⊤ |
⊤ | ⊥ | ⊥ | ⊥ | ⊥ | ⊥ |
⊥ | ⊤ | ⊥ | ⊥ | ⊥ | ⊤ |
⊥ | ⊥ | ⊥ | ⊥ | ⊥ | ⊥ |
In fact, after inspecting the truth table, we may notice that in this case, we can make a more restrictive claim:
Example:
Suppose that we believe the following relationships to hold between a light source and the time:
- if the sun is in the sky, then it is daytime.
Then we intuitively know that if sun is in the sky, it must be daytime. The modus ponens inference rule captures this intuition (we introduce wider in this instance spacing for legibility):
(sun is in the sky ⇒ it is daytime) ∧ sun is in the sky | (sun is in the sky ⇒ it is daytime) ∧ sun is in the sky ∧ it is daytime |
|
Below are the variants of the rule with more concise conclusions:
(sun is in the sky ⇒ it is daytime) ∧ sun is in the sky | sun is in the sky ∧ it is daytime |
|
(sun is in the sky ⇒ it is daytime) ∧ sun is in the sky | it is daytime |
|
Example:
Suppose that we believe the following relationships to hold between a light sensor, a light source, and the time:
- if the light sensor is activated, the sun is in the sky;
- if the sun is in the sky, then it is daytime.
We can represent the two relationships above using two formulas:
We can use the facts we introduced in this subsection to build a formula that is always true, no matter what values we assign to the three variables in this formula. We can start with a formula that must always be true:
Using a known fact, we can introduce our two assumptions about the world into our formula, as well as sun, all on the left-hand side of the ⇒ operator:
((sensor ⇒ sun) ∧ (sun ⇒ daytime) ∧ sun ∧ daytime) | |
| ⇒ | |
Modus ponens and associativity of ∧ tell us that we can simplify a subformula in the above:
(sun ⇒ daytime) ∧ sun ∧ daytime | |
| ≡ | |
Then we have the following formula, which is still always true:
((sensor ⇒ sun) ∧ (sun ⇒ daytime) ∧ sun) | |
| ⇒ | |
Using a fact once more, we can introduce one more subformula on the left-hand side of the ⇒ operator:
((sensor ⇒ sun) ∧ sensor ∧ sun ∧ (sun ⇒ daytime)) | |
| ⇒ | |
Using modus ponens and associativity of ∧ once again, we get:
((sensor ⇒ sun) ∧ (sun ⇒ daytime) ∧ sensor) | |
| ⇒ | |
We have now built a formula that is still always true, but it expresses a relationship that is not as direct. In English, the above formula says: "given our two assumptions about the relationship between the light sensor, the sun, and the time of day, if the sensor is activated, it must be daytime". This formula is always true; there is no condition under which it can be false. This formula can be called a theorem. We can confirm this by building the truth table:
sensor |
sun |
daytime |
((sensor ⇒ sun) ∧ (sun ⇒ daytime) ∧ sensor) ⇒ daytime |
⊤ | ⊤ | ⊤ | ⊤ |
⊤ | ⊤ | ⊥ | ⊤ |
⊤ | ⊥ | ⊤ | ⊤ |
⊤ | ⊥ | ⊥ | ⊤ |
⊥ | ⊤ | ⊤ | ⊤ |
⊥ | ⊤ | ⊥ | ⊤ |
⊥ | ⊥ | ⊤ | ⊤ |
⊥ | ⊥ | ⊥ | ⊤ |
Notice that in the rows where daytime is true, the overall formula is true, as expected. If daytime is false, then either sensor does not imply sun, or sun does not imply daytime, so the left-hand side of the ⇒ operator is false, making the overall formula true.
In the example above, we used inference rules to ensure that the formula we were constructing would always be true. However, our approach might seem "backwards" in that particular example because the order of the transformations we performed on the formula do not correspond to the order in which we might intuitively reason about the relationships being modeled. Furthermore, copying the entire formula at every stop is burdensome. To address this, we will need to use a slightly different notation that can keep track of how our formula grows without requiring that we copy the entire formula over each time.
[link] Proofs and theorems
In the previous subsection, we saw that proofs and theorems are both formulas. The distinction between a
proof and a
theorem is that it is possible to efficiently verify that a proof is indeed always true, while it may not be possible to verify a theorem is true without building the entire truth table for it (i.e., solving the
counting boolean satisfiability problem and confirming that the total count is equal to the size of the universe).
We will introduce a slightly different notation for formulas that are proofs. This notation will make it easier to think of proofs as something we can "build" in a step-by-step manner.
Definition:
A proof is a well-formed formula written as a vertical list of subformulas f1,...,fk where that looks as follows:
This notation is equivalent to the following notations (note that these are all algebraically equivalent):
| ⇒ | |
f1 ⇒ (f2 ⇒ (f3 ⇒ ( ... ( fk-1 | |
| ⇒ | |
If the above formula is always true (i.e., all rows in its truth table have the value ⊤ in the column corresponding to the overall formula), then we say the sequence of formulas constitutes a valid proof.
Typically, in a proof sequence
f1,...,
fk, it is required that after some point, each subformula
fj must be derived using a simple inference rule from the formulas
f1, ...,
fj-1 that come before it. Effectively, this all means that the following formulas must
also all be true. For example, the following might be the case for a proof:
Sometimes, this is restricted even further: every
fj must either be exactly the same as some previous
fi where
i <
j, or it must be the direct conclusion of an inference rule for which the premises are the one, two, or three preceding formulas
fj-1,
fj-2, and
fj-3. Effectively, this means all the following formulas might also be true:
If either of the above collections of formulas are true, we can be certain the proof is
efficiently verifiable.
To see how each "subsequence" of formulas can correspond to a true formula in this way, we can consider the following. Suppose we have formula
f,
g, and
h, and we are sure that the following is true (e.g., due to modus ponens):
The following series of logical and algebraic manipulations ensures the formula being manipulated (found on the first line) remains true all the way to the end:
Notice that we can summarize this as follows in English: "if
f and
g can be used to derive
h, then the proof
f ⇒ g can be extended into the proof
f ∧
g ⇒ h".
Definition:
An axiom is any subformula within a proof that we assume to be true. In other words, we do not require that the axiom is always true. For example, suppose we have three axioms a1, a2, and a3 in the following proof:
This means that we do not require that a1, a2, or a3 must always be true; they will only ever appear on the left-hand side of the ⇒ operator. All the following would be required to be true, however:
| ⇒ | |
| ⇒ | |
| ⇒ | |
| ⋮ | |
(a1 ∧ a2 ∧ a3 f1 ∧ f2 ∧ ... ∧ fk-1) | |
| ⇒ | |
Notice that because a1, a2, and a3 always appear on the left-hand side of ⇒, they do not necessarily need to always be true. If they are false, then all of the above formulas are true, because ⊥ ⇒ ⊤ is always true.
Example:
Suppose we have the following axioms governing several components on a Mars rover:
| ⇒ | |
| ⇒ | |
| ⇒ | |
| ⇒ | |
| ⇒ | |
((antenna can turn ∧ communications work) | |
| ⇒ | |
Because these are axioms, we will assume they are always true and will not require that they be proven. Equivalently, we can say we simply will ignore any rows in a truth table in which any of the above are false.
Suppose we encode all of the above axioms within a single formula
a (that is,
f is a conjunction of all the axioms). Then we can begin to assemble some simple proofs.
The above is a simple proof corresponding to the formula (
a ∧
light sensor)
⇒ light sensor. It is true because of
this fact and
this fact. We can keep adding more to the proof:
| | axioms (no proof necessary, assumed to be true) |
|
| | |
light sensor ⇒ sun is visible | |
| | exactly the same as one of the axioms |
|
| | |
sun is visible ⇒ battery charging | |
| | exactly the same as one of the axioms |
|
| | |
|
The above, slightly longer proof represents a formula that is always true. We note that each formula along the way (i.e., each subsequence of steps in the list) is also a true formula:
| | a ∧ sensor ⇒ (sensor ⇒ sun) |
|
| | a ∧ sensor ∧ (sensor ⇒ sun) ⇒ sun |
|
| | a ∧ sensor ∧ (sensor ⇒ sun) ∧ sun ⇒ (sun ⇒ battery) |
|
| | a ∧ sensor ∧ (sensor ⇒ sun) ∧ sun ∧ (sun ⇒ battery) ⇒ battery
|
|
What is interesting about such proofs is that we can then use the equivalence version of modus ponens (i.e., (
f ⇒ g) ∧
f ∧
g ≡ (
f ⇒ g) ∧
f) along with the fact that a lot of things repeat (i.e.,
f ∧
f is equivalent to
f) to get rid of steps while keeping the formula true. Suppose we have the following very long proof:
| | axioms (no proof necessary, assumed to be true) |
|
| | |
light sensor ⇒ sun is visible | |
| | exactly the same as one of the axioms |
|
| | |
sun is visible ⇒ battery charging | |
| | exactly the same as one of the axioms |
|
| | |
battery charging ⇒ motors work | |
| | exactly the same as one of the axioms |
|
| | |
motors work ⇒ antenna can turn | |
| | exactly the same as one of the axioms |
|
| | |
|
We can eliminate many of the steps above because they are redundant, or because they do not change the meaning of the formula:
This is very interesting, because it means that the formula represented by the above proof has the same meaning as the original proof, so like the proof, it must always be true. Thus, the following formula is always true:
However, it is not immediately obvious that the above formula is always true. In fact, we would need to either rebuild the proof and check that each step was a valid logical or algebraic manipulation, or we would need to build the entire truth table and check all 2
7 = 128 rows (i.e., solve the
counting boolean satisfiability problem for this proof). Thus, the above formula is a
theorem (because it is always true), but it is not necessarily a
proof, anymore (depending on what we consider to be an
efficiently verifiable proof).
Fact:
We can provide sufficient conditions for an efficiently verifiable proof as follows: a formula f is an efficiently verifiable proof if it can be converted into a vertical list in which each non-axiom step is either (1) an exact duplicate of an axiom or another step above it, or (2) a direct conclusion of applying modus ponens to the two steps immediately before it.
Example:
For each of the following formulas, determine if it conforms to the definition of a proof:
- (a ∧ b ∧ c ∧ d ∧ e) ⇒ c
- (a ∧ (b ⇒ c) ∧ b) ⇒ c
- ((b ⇒ c) ∧ (a ⇒ b) ∧ a) ⇒ c
We discuss each of the formulas.
-
Since c is an exact duplicate of a formula on the left-hand side of the ⇒ operator, if the formulas on the left-hand side of ⇒ are axioms then this is a proof.
-
By modus ponens, we can obtain c from (b ⇒ c) ∧ b (because (b ⇒ c) ∧ b ≡ (b ⇒ c) ∧ b ∧ c is an algebraic law). Thus, c is a direct conclusion of applying modus ponens to the two steps immediately before c, so this formula is a proof.
-
This formula is a theorem, but it is not a proof. The final step c cannot be derived directly from the two previous steps using modus ponens, and it does not appear on the left-hand side of the ⇒ operator.
Example:
Suppose we adopt the same collection of axioms a is in the example above. However, suppose we are given a theorem and are asked to assemble a proof for it. For example, suppose we are given the following theorem:
In the form of a vertical list, this theorem would look as follows:
Our task is to find a list of subformulas that conforms to our definition of a proof: each step is either an axiom, a duplicate of a previous step, or the consequence of applying modus ponens to the two previous steps.
Under these circumstances, one strategy we can employ is to work backwards. In particular, we know that because there is no exact duplicate of
communications work anywhere in the theorem, this must be derived using modus ponens. Thus, we know that whatever the proof might contain, it must at least contain the following;
| | |
| | |
| | |
battery charging ⇒ communications work | |
| | exactly the same as one of the axioms |
|
| | |
However, the above tells us that we also need
battery charging for modus ponens to work:
| | |
| | |
| | |
| | |
battery charging ⇒ communications work | |
| | exactly the same as one of the axioms |
|
| | by modus ponens of the two subformulas immediately above
|
|
But where can we get
battery charging? Also from modus ponens, since it is again not an axiom or anywhere else in the vertical list of subformulas:
| | |
| | |
| | |
sun is visible ⇒ battery charging | |
| | exactly the same as one of the axioms |
|
| | by modus ponens of the two subformulas immediately above |
|
battery charging ⇒ communications work | |
| | exactly the same as one of the axioms |
|
| | by modus ponens of the two subformulas immediately above
|
|
At this point, we again notice that
sun is visible can only be obtained using modus ponens:
| | |
| | |
light sensor ⇒ sun is visible | |
| | exactly the same as one of the axioms |
|
| | by modus ponens of the two subformulas immediately above |
|
sun is visible ⇒ battery charging | |
| | exactly the same as one of the axioms |
|
| | by modus ponens of the two subformulas immediately above |
|
battery charging ⇒ communications work | |
| | exactly the same as one of the axioms |
|
| | by modus ponens of the two subformulas immediately above
|
|
[link] Assignment #3: Theorems, Proofs, and Modus Ponens
In this assignment you will assemble a simple proof verifier and a short proof.
Your solution may not import any modules or employ any external library functions (unless the problem statement explicitly permits this). The different problems and problem parts rely on the lecture notes and on each other; carefully consider whether you can use functions from the lecture notes, or functions you define in one part within subsequent parts.
-
For each of the following formulas, determine whether it is a theorem, an efficiently verifiable proof, or neither. In your Python file, define the variables
oneA
, oneB
, oneC
, oneD
, oneE
, oneF
, and oneG
to be either 'Theorem'
, 'Proof'
, or 'Neither'
. For example, if the question had a part (h) and the formula for that part were a proof, you would include in your Python file the following line:
In the above manner, answer the following.-
(x ∧ y) ⇒ x
-
((y ⇒ z) ∧ (x ⇒ y) ∧ (w ⇒ x) ∧ w) ⇒ z
-
x ⇒ x
-
(u ∧ v ∧ w ∧ x ∧ y) ⇒ z
-
((u ⇒ v) ∧ (w ⇒ x) ∧ u ∧ (u ⇒ v) ∧ v) ⇒ (w ⇒ x)
-
(x ⊕ y) ⇒ (x ∨ y)
-
¬(x) ∨ (y ∨ z)
-
For the remainder of this assignment, we will represent proofs in Python usings lists. Each step in the proof will either be a list such as
['x', 'y']
containing two items (representing an implication such as x ⇒ y), or a single string such as 'z'
(representing a predicate or variable such as z). For example, suppose we have the following proof:
We would represent this as the following Python list:
['z', 'x', ['x', 'y'], 'y']
In this problem, you will implement a Python function that can verify that a proof supplied using the above representation is actually valid (i.e., that the proof represents a formula that is always true). Implement a Python function valid(axioms, proof)
that takes two lists are arguments: axioms
is a list of implications that are assumed to be true, while proof
is a list of implications and predicates where each entry is:
- an exact copy of something that appears in
axioms
or earlier in the list proof
, or
- a conclusion of modus ponens where the two entries in the list immediately before it are the two premises of modus ponens (the order of the two premises does not need to matter).
If the input proofs
satisfies the above conditions, then valid(axioms, proofs)
should return True
. Othewise, it should return False
. Some example inputs and outputs are found below:
>>> valid(['a'], ['a', 'a'])
True
>>> valid(['a', 'b'], ['c', 'd'])
False
>>> valid(['x', ['x', 'y']], ['x', ['x', 'y'], 'y'])
True
>>> valid([['a', 'b'], ['b', 'c'], 'a'], ['a', ['a', 'b'], 'b', ['b', 'c'], 'c'])
True
>>> valid([['a', 'b'], ['b', 'c'], 'a'], ['a', 'c'])
False
Hint: You can write a for
loop that traverses a list xs
as follows:
for i in range(0, len(xs)):
print(xs[i])
-
You may include the following axiom list in your Python file:
axiomsThree = [\
['user can log in as admin', 'user can change admin password'],\
['user can own files', 'user can delete files'],\
['user can log in as admin', 'user can set security policy'],\
['user can log in as admin', 'user can see files'],\
['user can set security policy', 'user can set file ownership'],\
['user knows admin password', 'user can log in as admin'],\
['user can set file ownership', 'user can own files'],\
'user knows admin password'\
]
Define a proof (i.e., a Python list) called proofThree
in your file that concludes with the predicate 'user can delete files'
:
proofThree = [\
'user can delete files'\
]
Your proof should be valid according to valid()
:
>>> valid(axiomsThree, proofThree)
True
[link] Predicates, constants, and universal quantification
In this subsection, we generalize the notion of a predicate in order to make it possible to describe many properties of a system more concisely using logical formulas.
Recall that we can treat variables as predicates that represent particular binary (true or false) states of a system, object, or other entity being modelled. This is particularly reasonable within the context of proofs: proofs and theorems are formulas that are always true, no matter how we assign truth values to predicates. Thus, while a predicate might still technically "vary" in terms of its meaning (true or false), we might also work with it (e.g., in assembling proofs) without ever considering its meaning. In that case, the proofs would encode relationships between predicates, and not constraints on what values they can be assigned.
Definition:
A domain of discourse, or simply domain (which we usually denote using D) is a set of distinct objects. Usually, predicates represent relationships between or properties of elements of the domain of discourse. We sometimes call the elements in the domain of discourse constants.
So far, we have used variables to represent particular states of a system whose properties we might be modelling using formulas. However, we often had variable names that shared structure. For example, Alice has an appointment at 1 PM and Bob has an appointment at 2 PM are two variables that have a common structure: ... has an appointment at .... We can make our reuse of such patterns within our formula variables more explicit by introducing predicates.
Definition:
We can extend our definition of a well-formed boolean formula to also allow predicates that talk about elements in a domain of discourse. When elements from a domain of discourse appear inside a predicate, we usually call them
terms.
boolean formula |
comments |
⋮ |
⋮ |
example |
a predicate without any terms |
example x example |
a predicate with one term x from D |
x example y |
a predicate with two terms x, y from D |
⋮ |
⋮ |
Typically,
predicate,
predicate instance, and
predicate instantiation will refer to a predicate in which the slots are occupied by particular terms. We can think of each predicate instance as a distinct variable that may have the meaning ⊤ or ⊥.
Example:
Suppose our domain of discourse is D = {1 PM, 2 PM, Alice, Bob} and we have as predicates x has appointment at y and x is busy where x and y are from the domain of discourse. We can build well-formed formulas such as the following (each of the examples below is a formula consisting of a predicate instance):
Alice has appointment at 2 PM |
|
Bob has appointment at 1 PM |
|
|
However, notice that the following are also well-formed formulas given our definitions so far:
2 PM has appointment at 1 PM |
|
1 PM has appointment at Bob |
|
|
The above is not a flaw in our definitions. It is perfectly valid to ask whether 1 PM has appointment at Bob is a true formula; it is just desirable in this particular example to ensure that 1 PM has appointment at Bob must have the meaning ⊥ under any circumstances that we might consider to be an accurate model of a possible state of reality.
Fact:
If we know the number of elements in the domain of discourse D and the predicates that are available, we can compute the total number of predicate instantiations (i.e., variables) that can be constructed. For example, for each predicate that has one slot, there are |D| possible instantiations; for each predicate that has two slots, there are |D| ⋅ |D| different predicate instantiations, and so on.
In general, of
pi represents the number of distinct predicates with
i slots, then we can say that the total number of predicate instantiations is:
| = | p0 + (|D| ⋅ p1) + (|D|2 ⋅ p2) + (|D|3 ⋅ p3) + ... + (|D|i ⋅ pi) + ...
|
|
We could write the above using summation notation:
We can go further and say that there are 2
# variables possible
models (i.e., rows in a truth table) for a given
D and collection of predicates.
Example:
Suppose our domain of discourse is D = {2 PM, Alice, Bob} and we have as predicates ... has appointment at ..., ... is a person, and ... is a time. That means there is one predicate that has two slots, and two predicates that have one slot. Thus, the total number of predicate instantiations (a.k.a., variables) we can have is:
We can confirm this by listing all the instantiations:
|
|
|
|
|
|
2 PM has appointment at 2 PM |
|
2 PM has appointment at Alice |
|
2 PM has appointment at Bob |
|
Alice has appointment at 2 PM |
|
Alice has appointment at Alice |
|
Alice has appointment at Bob |
|
Bob has appointment at 2 PM |
|
Bob has appointment at Alice |
|
Bob has appointment at Bob
|
|
We also know that there are 2# variables = 215 possible models (i.e., rows in a truth table) for any formula involving the above variables.
By introducing predicates, we see that a model (i.e., a particular assignment of values to instantiated predicates) becomes a way to specify the properties and relationships between the constants in a domain of discourse. Thus, models become a way to represent knowledge (i.e., concepts and their relationships), or information. For example, a database consisting of one or more tables can be represented mathematically as a collection of predicates, and a given model can represent which entries actually appear in a database (thus, each individual model would be one possible state of the database).
Example:
Suppose our domain of discourse is D = {1 PM, 2 PM, Alice, Bob} and we have as predicates x is a person , x is a time and x has appointment at y where x and y are from the domain of discourse. We can use universal quantification to encode the fact that the predicate x has appointment at y can only be true for some x and y from D if it is also true that x is a person and that y is a time by requiring that the following universally quantified formula must always be true:
∀ x, y from D, x has appointment at y ⇒ (x is a person ∧ y is a time)
|
|
If we call the above formula f, we have the following truth table:
x has appointment at y |
x is a person |
y is a time |
f |
meaning |
⊤ | ⊤ | ⊤ | ⊤ | a person with an appointment |
⊤ | ⊤ | ⊥ | ⊥ | meaningless in context, so false |
⊤ | ⊥ | ⊤ | ⊥ | meaningless in context, so false |
⊤ | ⊥ | ⊥ | ⊥ | meaningless in context, so false |
⊥ | ⊤ | ⊤ | ⊤ | a person without an appointment |
⊥ | ⊤ | ⊥ | ⊤ | meaningless in context, so false |
⊥ | ⊥ | ⊤ | ⊤ | meaningless in context, so false |
⊥ | ⊥ | ⊥ | ⊤ | meaningless in context, so false |
Notice that in the cases in which the formula f is true, it is certain that in order for x has appointment at y to be true, x is a person and y is a time must both be true.
Definition:
We can extend our definition of a well-formed boolean formula to also include
univerally quantified formulas in which a variable appears that can be substituted with any element in the domain of discourse
D.
boolean formula |
comments |
⋮ |
⋮ |
∀ x, f |
f is a formula with predicates in which x can appear as a term |
We read the ∀ symbol as "
forall", and if a formula or subformula begins with ∀, we say the formula is
universally quantified. The meaning of a universally quantified formula can be defined using conjunction. Suppose that
D = {
a,
b,
c, ...} is the domain of discourse. Then we have that:
f where x is substituted with a ∧ f where x is substituted with b ∧ ... | |
| ≡ | |
Fact (∀-instantiation):
For any domain of discourse D, any element in the domain of discourse a, and any formula f that might contain some predicates in which x is a term, we have the following inference rule:
∀ x, f | f with all occurrences of x substituted with a |
|
Example:
Suppose our domain of discourse is D = {crow, pigeon} and we have as predicate ... is a bird. The the following are two examples of the ∀-instantiation inference rule being used:
∀ x, x is a bird | crow is a bird |
|
∀ x, x is a bird | pigeon is a bird |
|
Notice that the meaning of ∀ x, x is a bird is determined by the value assigned to the two possible variables that can be built: crow is a bird and pigeon is a bird:
pigeon is a bird |
crow is a bird |
∀ x, x is a bird |
⊤ | ⊤ | ⊤ |
⊤ | ⊥ | ⊥ |
⊥ | ⊤ | ⊥ |
⊥ | ⊥ | ⊥ |
The above suggests that we can simply view ∀ x, x is a bird as a short way of writing a conjunction of every possible instantiation of the predicate ... is a bird:
pigeon is a bird ∧ crow is a bird | |
| ≡ | |
Example:
Suppose our domain of discourse is D = {crow, pigeon, wug, ...} and we have the predicates ... is a bird and ... has wings. Suppose we have the following formula as an axiom:
| | ∀ x, x is a bird ⇒ x has wings |
|
| | |
We can use ∀-instantiation together with modus ponens to assemble a proof stating that wugs have wings.
∀ x, x is a bird ⇒ x has wings | |
| | |
| | |
wug is a bird ⇒ wug has wings | |
| | |
| | |
Predicates, universal quantification, and implication can allow us to represent category relationships in a more abstract (and a more concise) manner. Instead of having separate predicates such as ... is a bird or ... is an animal for each category, we can include the category names in the domain of discourse and use a single universally quantified formula that makes it possible to use category relationships to derive conclusions.
Example:
Suppose our domain of discourse is D = {crow, pigeon, wug, animal, bird, ...} and we have as predicates all ... are ... and ... is a .... Suppose we have the following formula as an axiom:
| | ∀ x, a, b, (all a are b ∧ x is a a) ⇒ x is a b |
|
| | |
| | |
|
We can use ∀-instantiation, modus ponens, and associativity of ∧ to assemble a proof stating that wugs are animals.
∀ x, a, b, (all a are b ∧ x is a a) ⇒ x is a b | |
| | |
| | |
| | |
all bird are animal ∧ wug is a bird | |
| | exact duplicates of axioms, grouped using associativity of ∧ |
|
(all bird are animal ∧ wug is a bird) ⇒ wug is a animal | |
| | |
| | by modus ponens of the two steps immediately above
|
|
[link] Infinitely large domains and logical induction
In the subsection above, we have seen how a collection of predicates can be used to represent the properties and relationships between a set of entities (i.e., the domain of discourse). However, writing down formulas that capture all the assumptions we may want to make (e.g., in a theorem or proof) about all the elements in a very large (or even an infinitely large) domain of discourse can be impractical or impossible. In this subsection, we look at one way in which we can address this difficulty.
Example:
Suppose the domain of discourse is a set of strings (where every string consists of some number of copies of the letter a) and the set of all positive integers:
D = {a, aa, aaa, ...} ∪ {1, 2, 3, 4, 5, ...}
|
|
We introduce the predicates ... is a number and ... is a string. The following axioms allow us to derive that anything in the subset {a, aa, aaa, ...} is a string:
| | |
| | ∀ s, s is a string ⇒ sa is a string
|
|
The following axioms allow us to derive that anything in the subset {1, 2, 3, 4, 5, ...} is a number:
| | |
| | ∀ n, n is a number ⇒ n+1 is a number
|
|
We could now constrain what constants from the domain of discourse are allowed in each slot of a predicate that specifies the length of a string: ... is of length .... We can do so by requiring that the following formula must always be true:
∀ s, n, s is of length n ⇒ (s is a string ∧ n is a number)
|
|
Example:
Suppose the domain of discourse is a set of strings (where every string consists of some number of copies of the letter a) and the set of all positive integers:
D = {a, aa, aaa, ...} ∪ {1, 2, 3, 4, 5, ...}
|
|
We want to use the predicate ... is of length ... to keep track of the sizes of strings. We can introduce two axioms that allow us to build up a proof about the length of any string:
| | |
∀ s, n, s is of length n ⇒ sa is of length n+1 | |
| | |
Notice that the above two cases actually correspond exactly to the two cases of a recursive implementation of a length function for strings:
def length(s):
if s == 'a':
return 1
else:
n = length(s[0:-1])
return n+1
We can assemble a proof as follows to show that the particular string aaaa has the length 4:
| | |
∀ s, n, s is of length n ⇒ sa is of length n+1 | |
| | |
| | |
a is of length 1 ⇒ aa is of length 2 | |
| | |
| | |
aa is of length 2 ⇒ aaa is of length 3 | |
| | |
| | |
aaa is of length 3 ⇒ aaaa is of length 4 | |
| | |
| | |
[link] Assignment #4: Review of Boolean Algebra and Logic
In this assignment you will complete a variety of problems that review the material we have seen so far in the course.
There is no programming in this assignment. However, please
use Python syntax for boolean and set operations in order to make grading easier and to avoid any confusion or grading issues:
- use
and
for ∧;
- use
or
for ∨;
- use
not
for ¬;
- use
&
for ∩;
- use
|
for ∪;
- use
U - X
for X.
You must show your work in your solutions.
-
For each of the formulas in this problem, do the following:
-
(x ∧ y) ⇒ x
-
x ⊕ y
-
x ⇒ (¬(x))
-
(x ∧ y) ⇔ z
-
⊤ ∧ (¬(x))
-
(x ⊕ y) ⇒ (x ∨ y)
-
For each of the algebraic laws and inference rules below, either indicate that the law holds or the rule holds and explain why by referring to known facts (e.g., distributivity, De Morgan's laws, modus ponens, and so on), or find a single model that shows that the law or rule does not hold.
-
x ≡ ¬(x)
-
¬(x ∧ ¬(y)) ≡ x ⇒ y
-
x ⇒ y ≡ y ⇒ x
-
(x ∧ y) ∨ (z ∧ x) ≡ ¬(¬(x) ∨ ¬(y ∨ z))
-
The following inference rule:
-
Suppose tha f, g, and h are well-formed formulas and that A, B, and C are three sets of models (where U is the universe) such that each set is the largest possible model set that satisfies its respective formula:
Using A, B, C, U, and the set operations ∪, ∩, and −, write out the largest model set that satisfies each of the following formulas. Hint: in some cases, you may need to apply some algebraic laws for logical operations first.
-
f ∨ ¬(h)
-
f ∧ (g ∨ h)
-
¬(f ∧ g ∧ h)
-
f ⇒ g
-
f ⇔ ¬(f)
-
⊤
-
Complete each of the following proofs so that each step in the proof is one of the following:
- an exact duplicate of an axiom or a previous step;
- an application of modus ponens using the two steps immediately above it;
- a grouping of subformulas using associativity of ∧;
- a ∀-instantiation;
- a conversion of a base case and inductive step into a single universally quantified formula via the axiom of induction.
Label each step you add with one of the above (you may use the roman numerals to make things more concise).-
Complete the following proof:
-
Suppose the domain of discourse is D = {products, boxes, containers, warehouses}. Complete the following proof:
| | ∀ x, y, z, (x are in y ∧ y are in z) ⇒ x are in z |
|
| | |
| | |
| | containers are in warehouses |
|
| | |
| | products are in warehouses
|
|
-
Suppose the domain of discourse is the set of strings that are made using the two letters a and b. Complete the following proof:
| | |
| | |
| | ∀ s, s is a string ⇒ sa is a string |
|
| | ∀ s, s is a string ⇒ sb is a string |
|
| | |
| | |
-
Suppose the domain of discourse is the set of positive integers D = {0, 0+1, 0+1+1, 0+1+1+1, ...}. Complete the following proof:
| | 0 is less than or equal to 0 |
|
| | ∀ n, 0 is less than or equal to n ⇒ 0 is less than or equal to n+1 |
|
| | |
| | 0 is less than or equal to 0+1+1+1+1
|
|
-
Extra credit: Suppose the domain of discourse is the set of positive integers D = {0, 0+1, 0+1+1, 0+1+1+1, ...}. Complete the following proof:
| | ∀ k, k is less than or equal to k |
|
| | ∀ m, n, m is less than or equal to n ⇒ m is less than or equal to n+1 |
|
| | |
| | 0+1+1+1+1 is less than or equal to 0+1+1+1+1+1+1+1+1
|
|
[link] Trees, Graphs, Measurements, and Probabilities
Trees and graphs are abstract concepts that are ubiquitous in computer science and applied mathematics (including many peripheral scientific disciplines that involve applied mathematics, such as engineering, biology, chemistry, physics, neuroscience, and so on). Trees and graphs allow us to represent networks of relationships that may exist between entities (whether they are neurons, computers, people, locations, abstract concepts, or other entities). They can also be used to represent flows (of materials, products, people, data, and so on) or other processes (such as growth of plants, how diseases propagate, how resources are allocated within an economy, or how algorithms explore a space of possibities to find a solution). As a result of their general applicability, graphs and trees are commonly found as data structures within software applications.
In this section we will use the concepts we have defined in previous sections on logic and induction to formally define the concept of a tree and a graph. We will also use logical induction to define different ways to measure trees and graphs along dimensions of interest (such as size, height, and so on). We will use recurrence relations to recursively define measurements of trees and graphs, and we will learn how to solve recurrence relations to obtain closed-form functions that can be used to compute these measurements.
[link] Defining trees and their properties using logical induction
In this subsection, we demonstrate how logical induction can be used to define trees, and we show how these definitions can be converted into corresponding Python code.
Definition:
We can define a
binary tree using the following two induction axioms (one base case and one inductive step):
| | |
| | ∀ t1, t2, (t1 is a binary tree ∧ t2 is a binary tree) ⇒ t1 /•\ t2 is a binary tree
|
|
In the above, we omitted the usual underlined notation for predicates to improve legibility. Any tree that fits the above definition is called a binary tree because every
node • in the tree has exactly two
children, which are also known as
subtrees. The top node in the tree is called the
root node.
Example:
We can convert the definition of a binary tree into a Python function that checks whether a given Python value is a binary tree. Suppose we represent a tree node using a Python tuple ( ... )
and subtrees as nested tuples. Then the following function checks whether a value consisting of nested tuples is a binary tree:
def is_binary_tree(t):
if type(t) == tuple and len(t) == 0:
return True
elif type(t) == tuple and len(t) == 2:
return is_binary_tree(t[0]) and is_binary_tree(t[1])
else:
return False
Definition:
We can define a
perfect binary tree using the following two induction axioms (one base case and one inductive step):
| | • is a perfect binary tree |
|
| | ∀ t, t is a perfect binary tree ⇒ t /•\ t is a perfect binary tree
|
|
Notice that in a perfect binary tree, every node has two subtrees that are exactly the same.
Definition:
We can define a
degenerate binary tree using the following three induction axioms (one base case and one inductive step):
| | • is a degenerate binary tree |
|
| | ∀ t, t is a degenerate binary tree ⇒ t /•\ • is a degenerate binary tree |
|
| | ∀ t, t is a degenerate binary tree ⇒ • /•\ t is a degenerate binary tree
|
|
Notice that in a degenerate binary tree, every node has at most one subtree child and one leaf child.
[link] Measurements of trees and recurrence relations
Given a definition for a set of trees, we can use induction to define various ways to measure the trees in that set along certain dimensions, sugh as size, height, width, and so on.
Definition:
The following is the definition of the
size (i.e., number of nodes) of a binary tree:
| | |
| | ∀ t1, t2, n1, n2 (t1 has size n1 ∧ t2 has size n2) ⇒ t1 /•\ t2 has size n1 + n2 + 1
|
|
The following is the definition of the
height of a binary tree:
| | |
| | ∀ t1, t2, n1, n2 (t1 has height n1 ∧ t2 has height n2) ⇒ t1 /•\ t2 has height max(n1, n2) + 1
|
|
The following is the definition of the
width of a binary tree:
| | |
| | ∀ t1, t2, n1, n2 (t1 has width n1 ∧ t2 has width n2) ⇒ t1 /•\ t2 has width n1 + n2
|
|
Definition:
The following is the definition of the
size (i.e., number of nodes) of a perfect binary tree:
| | |
| | ∀ t, n t has size n ⇒ t /•\ t has size n + n + 1
|
|
The following is the definition of the
height of a perfect binary tree:
| | |
| | ∀ t, n t has height n ⇒ t /•\ t has height n + 1
|
|
The following is the definition of the
width of a perfect binary tree:
| | |
| | ∀ t, n t has width n ⇒ t /•\ t has width n + n
|
|
Definition:
The following is a recursive definition (i.e., a recurrence relation) of the relationship between the height of a tree and its size:
| = | |
| = | height_to_size(n-1) + height_to_size(n-1) + 1
|
|
The above can be simplified as follows:
| = | |
| = | 2 ⋅ height_to_size(n-1) + 1
|
|
Example:
We can implement a Python function that computes the size of a binary tree (i.e, the number of nodes):
def size(t):
if type(t) == tuple and len(t) == 0:
return 1
elif type(t) == tuple and len(t) == 2:
return size(t[0]) + size(t[1]) + 1
Example:
We can implement a Python function that computes the height of a binary tree (the max()
operator returns the larger of its two integer arguments):
def height(t):
if type(t) == tuple and len(t) == 0:
return 1
elif type(t) == tuple and len(t) == 2:
return max(height(t[0]), height(t[1])) + 1
Example:
We can implement a Python function that computes the size of a tree given its height:
def height_to_size(n):
if n == 1:
return 1
elif n > 1:
return 2 * height_to_size(n-1) + 1
[link] Solving recurrence relations for closed-form functions
Example:
How can we find a closed-form function that defines the size of a degenerate binary tree in terms of its height? Recall that the relationship between height and size can be defined using the following recurrence relation:
Thus, we are looking for a non-recursive definition of a function
f that satisfies the above two equations for all positive integers
n. One such
f is:
We can confirm this by substituting the above
f into the recurrence relation:
Fact:
Suppose we have a recurrence relation of the following form:
The the closed-form function that satisfies the two equations above is:
Example:
How can we find a closed-form function that defines the size of a perfect binary tree in terms of its height? Recall that the relationship between height and size can be defined using the following recurrence relation:
Thus, we are looking for a non-recursive definition of a function
f that satisfies the above two equations for all positive integers
n. One such
f is:
We can confirm this by substituting the above
f into the recurrence relation:
Example:
How can we find a closed-form function that defines the size of a perfect ternary tree in terms of its height? For perfect ternary trees, the relationship between height and size can be defined using the following recurrence relation:
Thus, we are looking for a non-recursive definition of a function
f that satisfies the above two equations for all positive integers
n. One such
f is:
We can confirm this by substituting the above
f into the recurrence relation:
| = | |
| = | |
| = | |
| = | |
| = | |
| = | 3 ⋅ ((1/2) ⋅ (3n-1 − 1)) + 1 |
|
| = | |
|
Fact:
Suppose we have a recurrence relation of the following form:
The the closed-form function that satisfies the two equations above is:
Fact:
Suppose we have a recurrence relation of the following form:
The the closed-form function that satisfies the two equations above is:
Fact:
Suppose we have a recurrence relation of the following form:
The the closed-form function that satisfies the two equations above is:
| = | cx − 1 + k ⋅ (1/(c − 1)) ⋅ (cx - 1 − 1)
|
|
Recall that logb(x/y) = logb(x) − logb(y), and that logb(1) = 0 because b0 = 1.
Example:
How can we find a closed-form function that defines the height of a perfect binary tree in terms of its width? The relationship between width and height can be defined using the following recurrence relation:
Thus, we are looking for a non-recursive definition of a function
f that satisfies the above two equations for all positive integers
n. The
f that satisfies this equation is:
We can confirm this by substituting the above
f into the recurrence relation:
| = | |
| = | |
| = | |
| = | |
| = | |
| = | (log2(n) − log2(2) + 1) + 1 |
|
| = | |
Example:
How can we find a closed-form function that defines the height of a perfect ternary tree in terms of its width? The relationship between width and height can be defined using the following recurrence relation:
Thus, we are looking for a non-recursive definition of a function
f that satisfies the above two equations for all positive integers
n. The
f that satisfies this equation is:
We can confirm this by substituting the above
f into the recurrence relation:
| = | |
| = | |
| = | |
| = | |
| = | |
| = | (log3(n) − log3(3) + 1) + 1 |
|
| = | |
Fact:
Suppose we have a recurrence relation of the following form:
The the closed-form function that satisfies the two equations above is:
Fact:
Suppose we have a recurrence relation of the following form:
The the closed-form function that satisfies the two equations above is:
| = | (1/(a − 1)) ⋅ (a1 + logc x − 1)
|
|
Fact:
Suppose we have a recurrence relation of the following form:
The the closed-form function that satisfies the two equations above is:
Fact:
Suppose we have a recurrence relation of the following form:
The the closed-form function that satisfies the two equations above is the factorial function:
The number of edges that can exist in a graph with a certain number of nodes can be determined by defining a recurrence relation, as well.
Fact:
Suppose we have a recurrence relation of the following form:
The the closed-form function that satisfies the two equations above is:
Fact:
Suppose we have a recurrence relation of the following form:
| = | |
| = | f(n − 1) + 2 ⋅ (n − 1) + 1
|
|
The the closed-form function that satisfies the two equations above is:
Fact:
Suppose we have a recurrence relation of the following form:
The the closed-form function that satisfies the two equations above is:
| = | a ⋅ ((((x + 1) ⋅ x)/2) − 1) + (b ⋅ (x − 1)) + c
|
|
[link] Summations and recurrence relations
The mathematical notation for summation of terms parameterized by an integer index is a concise way to describe a mathematical term. It represents an algorithm (effectively, a loop) that can be run to obtain a result. Thus, it can be viewed as an alternative, iterative version of some recurrence relations. Any function defined using a summation over a finite range of integers can be converted into a recurrence relation.
Fact:
Suppose we have some function
g over the integers, and we also have a function
f defined in terms of
g in the following way:
Then we can convert the above into the following recurrence relation:
Fact:
We have the following identity for any constant integer
c:
Fact:
We have the following identity:
Fact:
We have the following identity for any constant integer
c:
We can see this is true by multiplying the sum 1 +
c +
c2 +
c3 + ... +
cx by (
c − 1):
(c − 1) ⋅ (1 + c + c2 + c3 + ... + cx) | |
| = | (c + c2 + c3 + ... + cx + cx+1) − (1 + c + c2 + c3 + ... + cx) |
|
| = | |
Example:
In the Python programming language, it is possible to write an expression that corresponds to mathematical summation notation. For example, suppose we have the following summation:
The above summation corresponds to the following Python expression:
sum(c**i for i in range(1,x+1))
For a particular instantiated example, suppose we have the following:
The above summation corresponds to the following Python expression:
>>> sum(2**i for i in range(1,11))
2046
>>> 2**1 + 2**2 + 2**3 + 2**4 + 2**5 + 2**6 + 2**7 + 2**8 + 2**9 + 2**10
2046
[link] Assignment #5: Trees, Graphs, and Recurrence Relations
In this assignment you will work with recurrence relations that describe measurements of trees and graphs.
You may import the following library function in your module:
Your solution may not import any other modules or employ any external library functions. The different problems and problem parts rely on the lecture notes and on each other; carefully consider whether you can use functions from the lecture notes, or functions you define in one part within subsequent parts.
-
Implement a Python function
check(f, g)
that takes two functions f
and g
as its two arguments. The function check()
should return True
if both f
and g
return exactly the same output on all integer inputs from 1
to 100
(inclusive). Otherwise, it should return False
. You may use the check()
function to check the correctness of your solutions to for the remaining problems. -
For each of the following recurrence relations, define a closed-form function using only variables, integers,
+
, -
, *
, //
(integer division), **
(exponentiation), and log()
(logarithm). In your Python file, define the functions twoA(x)
, twoB(x)
, and so on. For example, if the solution to part (h) were the closed-form function f(x) = x2, you would include in your Python file the following definition:
In the above manner, solve the following problem parts.-
The following recurrence relation:
-
The following recurrence relation:
| = | |
| = | f(n − 1) + (6 ⋅ n) − (3 ⋅ n)
|
|
-
The following recurrence relation:
-
The following recurrence relation:
-
The following recurrence relation:
-
The following recurrence relation:
-
The following recurrence relation:
-
For each of the following summations or descriptions, define a recurrence relation as a Python function. In your Python file, define the functions
threeA(n)
, threeB(n)
, and so on. For example, if the solution to part (h) were the recurrence relation f(1) = 1, f(n) = f(n-1) + 2, you would include in your Python file the following definition:
def threeH(n):
if n == 1:
return 1
elif n > 1:
return threeH(n-1) + 2
In the above manner, solve the following problem parts.-
The following summation:
-
The following summation:
-
The following summation:
-
Suppose we have the following relationship between the height and size of a perfect 6-ary (or "senary") tree in which each non-leaf node has six children:
-
a perfect senary tree of height 1 has 1 node;
-
a perfect senary tree of height n has six times as many nodes as a perfect senary tree of height n − 1, plus one more root node.
Define the recurrence relation that captures that size of a senary tree in terms of its height n. -
Suppose a username must start with the
@
character, but can contain some sequence of any of the 26 letters of the alphabet after that. Thus, @abc
would be a username of length four. Define a recurrence relation that specifies how many possible usernames of length n are possible. Keep in mind your solution must be a recurrence relation, not a closed form function. -
Suppose we have the following relationship between the height and width of a perfect 3-ary (or "ternary") tree in which each non-leaf node has three children:
-
a perfect ternary tree of height 1 has width 1;
-
a perfect ternary tree of height n has three times the width of a perfect ternary tree of height n − 1.
Define the recurrence relation that captures that width of a perfect ternary tree in terms of its height n. -
Define a recurrence relation that specifies the size (i.e., number of nodes) of a perfect ternary tree in terms of its width (that is, for an input tree width n, the value f(n) should be the size of the tree having that width).
-
For parts (a) to (g) of Problem #3 above, determine the closed-form function that solves the recurrence relation you defined. Specify the function definitions in your Python module under the names
fourA(x)
, fourB(x)
, fourC(x)
, and so on.
[link] Permutations with repetition, permutations, and combinations
In this subsection we review a few common counting problems, and present algorithms that can be used to solve them.
Definition (permutation with repetition):
A permutation of n distinct objects is a particular ordering of these objects. Given n distinct objects (e.g., characters or letters), if we are allowed to use any number of copies of each of those n objects, then the number of distinct sequences of length k consisting of these n objects that can be built is nk. This is known as the number of permutations with repetition.
Example:
Suppose we are given an alphabet consisting of the characters a, b, and c. How many passwords of length 5 can we make if we are allowed to use any number of copies of each character?
There are 5 possible positions in the password, and each position can be filled with one of three distinct characters. Thus, the total number of passwords is:
Example:
Suppose we are given an alphabet consisting of the characters a, b, and c. How many passwords of length at least 1 and at most 5 can we make if we are allowed to use any number of copies of each character?
We can take the sum of the number of 1-character passwords, the number of 2-character passwords, and so on:
Using a
known fact about summations, we can find a closed form expression for the above:
| = | |
| = | ((1/(3 − 1)) ⋅ (36 − 1)) − 1 |
|
| = | |
| = | |
Thus, we can use three distinct characters to make 363 distinct passwords between length 1 and length 5, inclusive. We can confirm this is true:
Definition (permutation):
A
permutation of
n distinct objects is a particular ordering of these objects. There are
n! (read as "
n factorial") permutations of a collection of
n distinct objects, where:
| = | n ⋅ (n − 1) ⋅ (n − 2) ⋅ (n − 3) ⋅ ... ⋅ 3 ⋅ 2 ⋅ 1
|
|
Example:
Suppose we are given an alphabet consisting of the characters a, b, c, and d. How many passwords of length 4 can we make if each character must appear exactly once in the password?
We are in a situation where we must list all possible combinations, and we cannot reuse any character. Thus, if we have four possible character positions in our password, then the first position has one of four possible characters (that is,
a,
b,
c, or
d), and the next position has one of three characters (since we have used one up), and so on. Thus, the total number of passwords that we can build while satisfying this constraint is:
Example:
Suppose we are given an alphabet consisting of 26 characters. How many passwords of length 5 can we make if each character must appear at most once in the password?
We are in a situation where we must list all possible combinations, and we cannot reuse any character, but we are limited to 5 character positions. Thus, if we have 26 possible character positions in our password, then the first position has one of 26 possible characters, and the next position has one of 25 characters (since we have used one up), and so on. Thus, the total number of passwords that we can build while satisfying this constraint is:
| = | | | 26 ⋅ 25 ⋅ 24 ⋅ 23 ⋅ 22 ⋅ 21 ⋅ ... ⋅ 2 ⋅ 1 | 21 ⋅ ... ⋅ 2 ⋅ 1 |
| |
| | |
| = | |
Definition (combination):
A
combination of
k distinct objects from a collection of
n distinct objects is a particular subset of
k objects (such that order does not matter). The number of of combinations of
k objects from a collection of
n objects is denoted and defined as:
The above notation is read in English as "
n choose
k".
Example:
Given a set of objects {A, B, C, D}, how many different subsets of size 2 are there?
Here are all ordered sequences of two elements without repetition:
AB, BA, AC, CA, AD, DA, BC, CB, BD, DB, CD, DC
|
|
The number of ordered sequences of the above form is:
Notice that we do not care about the order, so
AB and
BA should only count once. Thus, we divide the above by 2! to account for this (the number of ways to permute two elements) and get the distinct subsets:
{A,B}, {A,C}, {A,D}, {B,C}, {B,D}, {C,D}
|
|
The number of subsets is then:
| = | |
| = | | | 4 ⋅ 3 ⋅ 2 ⋅ 1 | (2 ⋅ 1) ⋅ (2 ⋅ 1) |
| |
| |
|
| = | |
Example:
Suppose we have a group of n people. How many different subgroups of k people can form? How many subgroups of people (including a group containing no people) form?
We know that the number of subgroups of size
k is:
We also know that the number of subgroups is just the number of subsets of a set of size
n, which is 2
n. This also tells us that the following identity must hold:
Example:
Suppose we have a graph with n nodes. How many different edges can be drawn between any two nodes in the graph if we are only allowed to draw one edge between each pair of nodes?
Since we can only draw one edge between each pair of nodes, we need to count the number of ways in which we can choose two nodes from a graph of
n nodes:
| = | |
| = | | | n ⋅ (n − 1) ⋅ (n − 2) ⋅ ... ⋅ 1 | (n − 2)! ⋅ 2! |
| |
| |
|
| = | |
| = | |
Notice that the above corresponds to a
recurrence relation solution and a
summation identity we saw earlier.
Example:
Suppose we have a collection of characters with some duplicate characters: a, a, a, b, b, c, d. If we do not distinguish between copies of the same character, how many distinct permutations (without repetition) are there of these 7 characters?
We know there are 7! permutations if we do distinguish between tha duplicate copies of
a and
b that we have. However, the total of 7! overcounts the actual number of distinct permutations. For example (if we relabel the
b copies as
b1 and
b2 for clarity), the following two sequences are counted as two distinct permutations:
But the above is true for every possible sequence. Thus, we are counting 2! as many sequences (since there are 2! ways to arrange
b1 and
b2) as we want to count because we do not distinguish between the two copies of
b. Likewise for the three copies of
a, we are counting 3! as many sequences. Thus, we can divide these factors out of our overall computation:
| = | | | 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1 | (3 ⋅ 2 ⋅ 1) ⋅ (2 ⋅ 1) |
| |
| | |
| = | | = | |
Example:
Suppose we want to implement a Python function groups(n, k, m)
that takes three integers n, k, and m, and computes how many groups of people can be chosen from n people, where the size of the group is between k and m (inclusive).
Since a group of people always has some specific size (i.e., the number of people in the group), we know that we can simply add the counts for each possible size of group (e.g., the number of groups of size three plus the number of groups of size four, and so on). Thus, we can use the sum(...)
function to implement our function as follows:
from math import factorial
def cmb(n, k):
return factorial(n) // (factorial(n-k) * factorial(k))
def groups(n, k, m):
return sum(cmb(n, i) for i in range(k, m+1))
[link] Generating functions
Generating functions are polynomials used for counting particular kinds of combinations. In order to solve a particular counting problem, we choose an appropriate collection of generating functions and multiply them so that the desired solution is a particular coefficient in the resulting polynomial.
Definition:
Consider any polynomial of the following form for some particular collection of coefficients
a0,
a1,
a2, ...:
a0 ⋅ x0 + a1 ⋅ x1 + ... + an ⋅ xn
|
|
We can also consider a polynomial that is an infinite sum for some particular collection of coefficients
a0,
a1,
a2, ...:
a0 ⋅ x0 + a1 ⋅ x1 + ... + an ⋅ xn + ...
|
|
Any such polynomial is called a
generating function.
Example:
The following is a simple generating function in which all coefficients are 1:
Example:
Suppose you must are allowed to choose any three coins from an unlimited collection of quarters, nickels, and dimes. In how many different ways can you choose coins so that their total value is 40 cents?
The following generating function captures the values that a given coin can have:
We are allowed to choose three coins, which means the coefficient of
x50 in the following product of generating functions will represent our solution:
We can compute the above explicitly:
| = | (x10 + 2x15 + x20 + 2x30 + 2x35 + x50) ⋅ (x5 + x10 + x25) |
|
| = | x15 + 3x20 + 3x25
+ x30 + 3x35 + 6x40 + 3x45 + 3x55 + 3x60
+ x75
|
|
Thus, there are 6 ways in which we can choose three coins from this collection so that their total value is 40.
Example:
Suppose you throw two 6-sided dice (with labels 1, 2, 3, 4, 5, and 6). In how many different ways can the total from the dice be 9? If all possible rolls are equally likely (i.e., both dice are fair), what is the probability that the total is 9?
We can use the following generating function to represent each of the individual dice:
x + x2 + x3 + x4 + x5 + x6
|
|
Then, the coefficient of the
x9 term in the following product would be our solution:
(x + x2 + x3 + x4 + x5 + x6)2
|
|
We can compute the above product explicitly:
| = | (x + x2 + x3 + x4 + x5 + x6) ⋅ (x + x2 + x3 + x4 + x5 + x6) |
|
| = | x2 + 2x3 + 3x4 + 4x5 + 5x6 + 6x7 + 5x8 + 4x9 + 3x10 + 2x11 + x12
|
|
Thus, there are 4 different ways in which the resulting throw can add up to 9.
The total number of possible rolls is permutation with repetition of length two using the six numbers in the range from 1 to 6. Thus, there are 62 = 36 distinct rolls of the dice. Out of these, 4 can add up to 9, so the probability of getting a 9 is 4/36 = 1/9.
Fact (binomial theorem):
The following identity holds:
| = | ( | | ) ⋅ x0 + ( | | ) ⋅ x1 + ( | | ) ⋅ x2 + ... + ( | | ) ⋅ xn |
|
| = | |
Example:
Suppose we want to count the number of ways in which we can choose any combination of two items from a set of three items.
Each item from the collection of three items can either be in the chosen combination (i.e., 1) or not in the chosen combination (i.e., 0). There are three such items. Thus, one way to restate the question is to say the following: suppose we must set three switches to either 0 or 1:
- switch A: 0 or 1;
- switch B: 0 or 1;
- switch C: 0 or 1.
In how many different ways can we set the switches so that their total adds up to 2? We can model the possible switch settings using exponents in a generating function:
x0 corresponds to 0, and
x1 corresponds to 1. Then, note that
x0 = 1 and
x1 =
x and consider the following product of generating functions:
(1 + x) ⋅ (1 + x) ⋅ (1 + x)
|
|
If we multiply the above out, we get:
1 ⋅ 1 ⋅ 1
+ 1 ⋅ 1 ⋅ x
+ 1 ⋅ x ⋅ 1
+ 1 ⋅ x ⋅ x
+ x ⋅ 1 ⋅ 1
+ x ⋅ 1 ⋅ x
+ x ⋅ x ⋅ 1
+ x ⋅ x ⋅ x
|
|
Rewriting the above using exponents, we get:
x0 + 0 + 0
+ x0 + 0 + 1
+ x0 + 1 + 0
+ x0 + 1 + 1
+ x1 + 0 + 0
+ x1 + 0 + 1
+ x1 + 1 + 0
+ x1 + 1 + 1
|
|
If we collect up
xk terms with the same powers (e.g.,
x2 +
x2 +
x2 = 3
x2), we get:
Notice that the fact that 3 is the coefficient of
x2 means that there are three ways in which we can set the switches (i.e., exponents) so that they add up to 2. Thus, the coefficient of
x2 is exactly the solution we wanted to our counting problem, which in this case is:
Fact:
The following identity holds:
We can see this is true by multiplying the infinite sum 1 +
x +
x2 +
x3 + ... by (1 −
x):
(1 − x) ⋅ (1 + x + x2 + x3 + ...) | |
| = | (1 + x + x2 + x3 + ...) − x ⋅ (1 + x + x2 + x3 + ...) |
|
| = | (1 + x + x2 + x3 + ...) − (x + x2 + x3 + ...) |
|
| = | |
Notice that this is just a variant of the
summation identity and
recurrence relation solution we saw earlier.
Fact (negative binomial theorem):
The following identity holds:
Example:
Consider the following equation involving three non-negative integer variables a, b, and c:
For how many different assignments of non-negative integers to the variables a, b, and c is the above equation true?
Suppose we model the possible values of
a using the exponents of the generating function 1 +
x +
x2 +
x3 + ..., and we do the same for
b and
c. Then this counting problem can be solved by finding the coefficient of the
x120 term in the result of the following product of generating functions:
Using
an identity, we can convert the above into:
Using the
negative binomial theorem, we then know that the coefficient of the
x120 term is:
Thus, there are 7381 different ways to assign non-negative integers to
a,
b, and
c so that
a +
b +
c = 120.
Example:
Suppose you throw three 100-sided dice (with labels 0, 1, ..., 99). In how many different ways can the total from the dice be 150? If every outcome is equally likely (i.e., the dice are fair), what is the probability that the total is 150?
We can use the following generating function to represent each of the individual dice:
Writing out the above in its entirety is not practical, but we can use
a known identity to rewrite the above more concisely:
If we want to count the number of ways in which the three dice can have a total of 150, we need to find the coefficient of
x150 in the following product of generating functions:
We can expand the first term:
| = | (1 − 3x100 + 3x200 − x300) ⋅ | | 3
|
|
We see that the only possible way to obtain
x150 terms in the above is either when the 1 term or the -3
x100 term is multiplied by an appropriate term in the right-hand side of the above product. Thus, we have the following coefficient:
1 ⋅ (coefficient of x150 in | | 3) − 3 ⋅ (coefficient of x50 in | | 3)
|
|
Using the
negative binomial theorem to fill in the above, we get:
The total number of possible rolls is permutation with repetition of length two using the six numbers in the range from 0 to 99. Thus, there are 1003 = 1,000,000 distinct rolls of the dice. Out of these, 7498 can add up to 150, so the probability of getting a 150 is 7498/1,000,000 ≈ 0.007.
In the previous two examples, we used an infinite generating function in one case, but a finite generating function in the other case. Why did we make this distinction?
Fact:
Suppose we have the following two generating functions (one is infinite, the other is finite and has a largest exponent of k):
| = | |
| = | 1 + x + x2 + x3 + ... + xk
|
|
Suppose we take some exponent n of each of the two functions (that is, we compute fn and gn). Notice that for exponents less than or equal to k, the coefficients for the two functions will actually be exactly the same. However, for any exponent greater than k, the coefficients will be different (the coefficients in fn will be larger). As an example, consider the following:
Notice that the coefficients diverge when the exponent is greater than the largest exponents of the finite generating function.
Thus, in situations in which we must model some choice of a number of integer values, we must ask: is the exponent corresponding to the coefficient we need greater than the largest possible integer we can choose? If it is, then we must use a finite generating function. If it is not, we can take a shortcut and just use the infinite generating function.
Example:
Suppose we have a tree in which each non-leaf can have either two or three branches. Given this constraint, how many possible trees of height 3 have width 6? We know that there are at least two such trees: (1) a root node with three branches, with a two-branch tree at the end of each branch, and (2) a root node with two branches, with a three-branch tree at the end of each branch. Is there a more systematic way for us to count the number of trees with a certain width?
We can use products of generating functions to keep track of how many trees there are of each width. For example, a tree of height 2 can have either width 2 or width 3, and we can represent this using the following generating function:
The above indicates that there is one tree having width 2 (the coefficient 1 of the term
x2), and one tree having width 3 (the coefficient of the term
x3). The number of trees having width 1 is represented using the following generating function:
Suppose we have a generating function
g that represents the number of trees of each width of some height
n. How can we build a generating function that represents the number of trees of each width having height
n+1? The following generating function will accomplish this:
In other words, we can introduce a new root node with two subtrees, or a new root node with three subtrees. Suppose we start with
x, which represents the number of trees having width 1:
Then, using the formula
g2 +
g3, we obtain:
If we set
g =
x2 +
x3 and apply the same calculation again, we get:
| = | x4 + 2x5 + 2x6 + 3x7 + 3x8 + x9
|
|
Thus, we see that there are indeed only 2 trees of height 3 having width 6.
def treesOfHeight(n):
g = x
for i in range(1, n):
g = g**2 + g**3
return g
To determine the number of trees of a certain height having a certain width, we can access the appropriate coefficient of the resulting generating function:
>>> treesOfHeight(3)
1*x**4 + 2*x**5 + 2*x**6 + 3*x**7 + 3*x**8 + 1*x**9
>>> g = treesOfHeight(3)
>>> g[6]
2
[link] Counting by measurements, and computing probability of measurements
In the previous subsections we have seen how to count the number of possible ways that objects from a collection can be selected and assembled into a sequence or new collection, and how generating functions can be used to count the number of possible ways that integer quantities can be combined to yield a particular total. All of these techniques are tools for counting the number of situations (similar to system states) that have a certain measurement value along certain dimensions. In this section, we will use logic to explicitly model the problem of counting only those situations that have a certain measurement. We will then use this framework to define a notion of the probability of a given situation occurring (i.e., being true), and we will show how we can use the counting techniques we have presented in the previous subsections to compute such probabilities.
Example:
Suppose we have a domain of discourse that consists of two sets: the objects {elephant, hospital, car, book, mouse, library, hammer} and the sizes {small, medium, large}. We also have the predicate ... is of ... size, and the following are all predicate instantiations (i.e., variables) that map to true in our model (in the technical sense we have seen before) of the world, while all other instantiations map to false:
elephant is of medium size |
|
hospital is of large size |
|
|
|
|
|
|
Suppose we wish to count the number of objects in {elephant, hospital, car, book, mouse, library, hammer} that have each of the possible sizes {small, medium, large}. We could do so as follows:
|{ x | x is of small size }| | |
| = | | = | |
|{ x | x is of medium size }| | |
| = | | = | |
|{ x | x is of large size }| | |
| = | | = | |
Suppose we want to count for each possible size the fraction of all the objects that have that size. We can do so as follows:
| | |{ x | x is of small size }| | |{elephant, hospital, car, book, mouse, library, hammer}| |
| |
| | |
| = | |
| | |{ x | x is of medium size }| | |{elephant, hospital, car, book, mouse, library, hammer}| |
| |
| | |
| = | |
| | |{ x | x is of large size }| | |{elephant, hospital, car, book, mouse, library, hammer}| |
| |
| | |
| = | |
Definition:
We use the following notation to denote that the probability that a formula f is true is p, where p is a real number in the range 0 ≤ p ≤ 1:
Suppose f is a formula that contains a variable x (such as "¬(x) ∨ ⊤" or "x is green"). Assuming that each possible assignment of a constant or value from a set S to the variable x is equally likely, we use the following notation to denote that when we choose an element of S and assign it to x, the probability of our choosing an replacement for x that makes the formula f true is p:
Example:
Suppose we have a domain of discourse that consists of three sets: the objects O = {elephant, whale, car, book, mouse, library}, the sizes {small, medium, large}, and the descriptions {animate, inanimate}. We also have the predicates ... is of ... size and ... is ..., and the following are all predicate instantiations (i.e., variables) that map to true in our model (in the technical sense we have seen before) of the world, while all other instantiations map to false:
elephant is of medium size | |
| | |
| | |
| | |
| | |
| | |
| | |
The two dimensions along which we can measure objects are size and animacy. These dimensions are independent, so we can represent them using orthogonal axes (i.e., rows and columns), as in the diagram below:
|
|
animate |
inanimate |
|
|
{elephant, mouse, whale} 3 / 6 |
{book, car, library} 3 / 6 |
small |
{book, mouse} 2 / 6 |
{mouse} 6 / 36 = 1 / 6 |
{book} 6 / 36 = 1 / 6 |
medium |
{elephant, car} 2 / 6 |
{elephant} 6 / 36 = 1 / 6 |
{car} 6 / 36 = 1 / 6 |
large |
{library, whale} 2 / 6 |
{whale} 6 / 36 = 1 / 6 |
{library} 6 / 36 = 1 / 6 |
Assuming that the chances of choosing each of the objects in O are equally likely, we can say the following:
Prx ∈ O [ x is of small size ] = | | |
|
Prx ∈ O [ x is of medium size ] = | | |
|
Prx ∈ O [ x is of large size ] = | | |
|
Prx ∈ O [ x is animate ] = | | |
|
Prx ∈ O [ x is inanimate ] = | |
|
|
Definition:
Suppose we know that for some logical formulas f and g, the following is true:
The probabilities that f and g are true are independent if the following holds:
- if we restrict ourselves to the cases where f is true, it is still the case that Pr[ g ] = q;
- if we restrict ourselves to the cases where f is false, it is still the case that Pr[ g ] = q;
- if we restrict ourselves to the cases where g is true, it is still the case that Pr[ f ] = p;
- if we restrict ourselves to the cases where g is false, it is still the case that Pr[ f ] = p.
Definition:
Two logical formulas f and g are mutually exclusive if the following holds:
- if we restrict ourselves to the cases where f is true, it is still the case that g is false;
- if we restrict ourselves to the cases where g is true, it is still the case that f is false.
Example:
Let the domain of discourse be O = {elephant, whale, car, book, mouse, library}. Are the probabilities for the dimensions in the following table independent?
|
animate {elephant, mouse, whale} |
inanimate {book, car, library} |
small {book, mouse} |
{mouse} 6 / 36 = 1 / 6 |
{book} 6 / 36 = 1 / 6 |
medium {car, elephant} |
{elephant} 6 / 36 = 1 / 6 |
{car} 6 / 36 = 1 / 6 |
large {whale, library} |
{whale} 6 / 36 = 1 / 6 |
{library} 6 / 36 = 1 / 6 |
Yes. If we restrict ourselves to any measurement along the size dimension, the probability of an element being animate is 1/2, which is also the overall probability (across all sizes) of an object being animate. Likewise, if we restrict ourselves to a particular animacy column, the probability that an object is of a particular size (e.g., small) is always 1/3, which is also the overall probability of an object being that size.
Example:
Are the probabilities for the dimensions in the following table independent?
|
can fly {flying squirrel, bat, robin} |
cannot fly {wolf, bear} |
has fur {flying squirrel, bat, wolf, bear} |
{bat, flying squirrel} 2 / 5 |
{wolf, bear} 2 / 5 |
has no fur {robin} |
{robin} 1 / 5 |
∅ 0 / 5 |
No, they are not independent because the overall probability that an element can fly is 3/5, while the probability that an element can fly if it has no fur is 1 and the probability that an element can fly if it does have fur is 1/2.
Example:
Are the probabilities for the dimensions in the following table independent?
|
can fly {flying squirrel, bat, robin, crow} |
cannot fly {bear, crocodile} |
has fur {flying squirrel, bat, bear} |
{bat, flying squirrel} 2 / 6 |
{bear} 1 / 6 |
has no fur {robin, crow, crocodile} |
{robin, crow} 2 / 6 |
{crocodile} 1 / 6 |
Yes, the overall probability that an element has fur is 1/2, which is also the probability that an animal that can fly has fur, and the the probability that an animal that cannot fly has fur. Likewise, the overall probability that an animal can fly is 2/3, which is also the probability that an animal with fur can fly, and the probability that an animal without fur can fly.
Fact:
Suppose we know that for some logical formulas f and g, the following is true:
If the probabilities for f and g are independent, then we have that:
Fact:
Suppose we know that for some logical formulas f, the following is true:
Then we have that:
As long as dimensions are independent and different measurements are mutually exclusive, the facts above are sufficient to determine the probability of any logical formula because the logical operators ¬ and ∧ are functionally complete (i.e., they can be used to represent all other logical operations).
Fact:
Suppose we know that for some logical formulas f and g, the following is true:
Then if f and g are mutually exclusive (i.e., can never both be true at the same time), we have:
One common situation in which this arises is if the two formulas represent different measurements of an object along the same dimension.
Fact:
Suppose we know that for some logical formulas f and g, the following is true:
If the probabilities that f and g are true are independent, then we have that:
Notice that 1 − (1 − p) ⋅ (1 − q) can be derived as the probability of the formula ¬(¬(f) ∧ ¬(g)); the latter formula is equivalent to f ∨ g via De Morgan's Law.
Example:
Suppose we have a domain of discourse that consists of three sets: the objects {elephant, whale, car, book, mouse, library}, the sizes {small, medium, large}, and the descriptions {animate, inanimate}. We also have the predicates ... is of ... size and ... is ..., and the axioms as exactly like those in the related example above.
|
|
animate |
inanimate |
|
|
{elephant, mouse, whale} 3 / 6 |
{book, car, library} 3 / 6 |
small |
{book, mouse} 2 / 6 |
{mouse} 6 / 36 = 1 / 6 |
{book} 6 / 36 = 1 / 6 |
medium |
{elephant, car} 2 / 6 |
{elephant} 6 / 36 = 1 / 6 |
{car} 6 / 36 = 1 / 6 |
large |
{library, whale} 2 / 6 |
{whale} 6 / 36 = 1 / 6 |
{library} 6 / 36 = 1 / 6 |
-
What is the probability that a replacement for x chosen uniformly at random from O = {elephant, whale, car, book, mouse, library} is small?
The following is a visual representation of the set in question:
|
animate {elephant, mouse, whale} 3 / 6 |
inanimate {book, car, library} 3 / 6 |
small {book, mouse} 2 / 6 |
{mouse} 6 / 36 = 1 / 6 |
{book} 6 / 36 = 1 / 6 |
medium {elephant, car} 2 / 6 |
{elephant} 6 / 36 = 1 / 6 |
{car} 6 / 36 = 1 / 6 |
large {library, whale} 2 / 6 |
{whale} 6 / 36 = 1 / 6 |
{library} 6 / 36 = 1 / 6 |
We can compute the probability as follows:
Prx ∈ O [ x is of small size ]
| |
| = | | | |{ x | x is of small size }| | |{elephant, whale, car, book, mouse, library}| |
| |
| |
|
| = | | | |{book, mouse}| | |{elephant, whale, car, book, mouse, library}| |
| |
| |
|
| = | |
|
-
What is the probability that a replacement for x chosen uniformly at random from O = {elephant, whale, car, book, mouse, library} is medium in size and animate?
The following is a visual representation of the possibilities specified by the formula:
|
animate {elephant, mouse, whale} 3 / 6 |
inanimate {book, car, library} 3 / 6 |
small {book, mouse} 2 / 6 |
{mouse} 6 / 36 = 1 / 6 |
{book} 6 / 36 = 1 / 6 |
medium {elephant, car} 2 / 6 |
{elephant} 6 / 36 = 1 / 6 |
{car} 6 / 36 = 1 / 6 |
large {library, whale} 2 / 6 |
{whale} 6 / 36 = 1 / 6 |
{library} 6 / 36 = 1 / 6 |
Since
medium and
animate are measurements along two independent measurement dimensions (i.e., size and animacy), any
x we choose must have exactly one measurement along each dimension. Thus, we must multiply the probability that
x is medium in size by the probability that
x is animate:
Prx ∈ O [ x is of medium size ∧ x is animate ]
| |
| = | Prx ∈ O [ x is of medium size ] ⋅ Prx ∈ O [ x is animate ] |
|
| = | |
| = | |
| = | |
-
What is the probability that a replacement for x chosen uniformly at random from O = {elephant, whale, car, book, mouse, library} is small or large in size?
The following is a visual representation of the possibilities specified by the formula:
|
animate {elephant, mouse, whale} 3 / 6 |
inanimate {book, car, library} 3 / 6 |
small {book, mouse} 2 / 6 |
{mouse} 6 / 36 = 1 / 6 |
{book} 6 / 36 = 1 / 6 |
medium {elephant, car} 2 / 6 |
{elephant} 6 / 36 = 1 / 6 |
{car} 6 / 36 = 1 / 6 |
large {library, whale} 2 / 6 |
{whale} 6 / 36 = 1 / 6 |
{library} 6 / 36 = 1 / 6 |
Since
small and
large are both mutually exclusive measurements of
x along the
same measurement dimension (i.e., size),
x can never have both measurements at the same time. Thus, we can add the probabilities that
x is of each of these two sizes:
Prx ∈ O [ x is of small size ∨ x is of large size ]
| |
| = | Prx ∈ O [ x is of small size ] + Prx ∈ O [ x is of large size ] |
|
| = | |
| = | |
| = | |
-
What is the probability that a replacement for x chosen uniformly at random from O = {elephant, whale, car, book, mouse, library} is small or medium in size, and is inanimate?
The following is a visual representation of the possibilities specified by the formula:
|
animate {elephant, mouse, whale} 3 / 6 |
inanimate {book, car, library} 3 / 6 |
small {book, mouse} 2 / 6 |
{mouse} 6 / 36 = 1 / 6 |
{book} 6 / 36 = 1 / 6 |
medium {elephant, car} 2 / 6 |
{elephant} 6 / 36 = 1 / 6 |
{car} 6 / 36 = 1 / 6 |
large {library, whale} 2 / 6 |
{whale} 6 / 36 = 1 / 6 |
{library} 6 / 36 = 1 / 6 |
Since
small and
medium are both possible measurements of
x along the
same measurement dimension (i.e., size),
x can never have both measurements at the same time. Thus, we can add the probabilities that
x is of each of these two sizes. However, if we also want to consider its animacy, we need to multiply the probability that its size is small or medium by the probability that it is inanimate.
Prx ∈ O [ (x is of small size ∨ x is of medium size) ∧ x is inanimate ]
|
|
Note the parentheses in the formula above. These are important, as they determine the order of operations when we compute the probability:
| = | (Prx ∈ O [ x is of small size ] + Prx ∈ O [ x is of medium size ]) ⋅ Prx ∈ O [ x is inanimate ] |
|
| = | |
| = | |
| = | |
| = | |
The above is equivalent to the following (by the distributive property of ∧ across ∨, and the corresponding distributive property of multiplication across addition):
Prx ∈ O [ (x is of small size ∧ x is inanimate) ∨ (x is of medium size ∧ x is inanimate) ]
|
|
We can compute the corresponding probability:
| = |
(Prx ∈ O [ x is of small size ] ⋅ Prx ∈ O [ x is inanimate ])
+ (Prx ∈ O [ x is of medium size ] ⋅ Prx ∈ O [ x is inanimate ])
|
|
| = | |
| = | |
| = | |
| = | |
-
What is the probability that a replacement for x chosen uniformly at random from O = {elephant, whale, car, book, mouse, library} is medium in size or animate?
The following is a visual representation of the possibilities specified by the formula. Notice that we must be careful not to "double count" the intersection of the regions:
|
animate {elephant, mouse, whale} 3 / 6 |
inanimate {book, car, library} 3 / 6 |
small {book, mouse} 2 / 6 |
{mouse} 6 / 36 = 1 / 6 |
{book} 6 / 36 = 1 / 6 |
medium {elephant, car} 2 / 6 |
{elephant} 6 / 36 = 1 / 6 |
{car} 6 / 36 = 1 / 6 |
large {library, whale} 2 / 6 |
{whale} 6 / 36 = 1 / 6 |
{library} 6 / 36 = 1 / 6 |
Since
medium and
animate are independent, we can add the probabilities, but we must subtract the probability of the intersection occurring out of the total.
Prx ∈ O [ x is of medium size ∨ x is inanimate ]
|
|
We then compute the probability:
| = |
Prx ∈ O [ x is of medium size ]
+ Prx ∈ O [ x is animate ]
− (Prx ∈ O [ x is of medium size ] ⋅ Prx ∈ O [ x is animate ]) |
|
| = | |
| = | |
| = | |
| = | |
By De Morgan's Law, the above is equivalent to the following:
Prx ∈ O [ ¬(¬(x is of medium size) ∧ ¬(x is inanimate)) ]
|
|
The visual representation can be adjusted to reflect the above (in the diagram below, we show the intersection of the complements; this is then subtract from the overall set or from the probability of 1 to obtain the desired subset):
|
animate {elephant, mouse, whale} 3 / 6 |
inanimate {book, car, library} 3 / 6 |
small {book, mouse} 2 / 6 |
{mouse} 6 / 36 = 1 / 6 |
{book} 6 / 36 = 1 / 6 |
medium {elephant, car} 2 / 6 |
{elephant} 6 / 36 = 1 / 6 |
{car} 6 / 36 = 1 / 6 |
large {library, whale} 2 / 6 |
{whale} 6 / 36 = 1 / 6 |
{library} 6 / 36 = 1 / 6 |
We can compute the corresponding probability:
| = |
1 − ((1 − Prx ∈ O [ x is of medium size ])
⋅ (1 − Prx ∈ O [ x is animate ])) |
|
| = | |
| = | |
| = | |
| = | |
| = | |
Example:
Suppose we have the following assumptions about the probabilities that two boolean variables are true:
If we count the total number of models involving these two boolean variables, we see that each individual model is 1/4 of the total number of models:
x |
y |
|
⊤ | ⊤ | 1/4 |
⊤ | ⊥ | 1/4 |
⊥ | ⊤ | 1/4 |
⊥ | ⊥ | 1/4 |
Since x and y are independent dimensions of measurement (each variable can independently either have one of the two distinct "measurements", or values, ⊤ or ⊥), we can compute the probability of the particular model {x ↦ ⊤, y ↦ ⊤} being selected as:
However, we cannot directly compute the following because the values of x and y are not distinct measurements along the same dimension:
Instead, we need to use algebraic laws of logical operators (in particular, De Morgan's Law) to convert the formula above into one that uses only the negation and conjunction operators (¬ and ∧):
| = | Pr[ ¬(¬(x ≡ ⊤) ∧ ¬(y ≡ ⊤)) ]
|
|
Once we have the formula above, we can apply the known algebraic facts for computing probabilities of formulas:
We can now combine the above:
Pr[ ¬(x ≡ ⊤) ∧ ¬(y ≡ ⊤) ] | |
| = | Pr[ ¬(x ≡ ⊤) ] ⋅ Pr[ ¬(y ≡ ⊤) ] |
|
| = | |
| = | |
Finally, we have:
Pr[ ¬(¬(x ≡ ⊤) ∧ ¬(y ≡ ⊤)) ] | |
| = | 1 − Pr[ ¬(x ≡ ⊤) ∧ ¬(y ≡ ⊤) ] |
|
| = | |
| = | |
Thus, we have computed:
Example:
Suppose we have the following assumptions about the probabilities that two boolean variables are true (note that Pr[ x ] and Pr[ x ≡ ⊤ ] have the same meaning):
We want to compute the probability that the following formula is true:
We can do so by first converting the formula using algebraic laws so that it only contains the logical operators ¬ and ∧:
We compute the probability of ¬(y):
We also compute the probability of x ∧ ¬(y):
Finally, we can compute the probability of the overall formula:
Thus, we can say that:
Example:
Suppose we have the following assumptions about the probabilities that two boolean variables are true, and that a particular formula is true:
Note that we do not know exactly what the probability is of x being true, but we know it is some real number p such that 0 ≤ p ≤ 1.
We note that
x ⇒ y ≡ ¬(
x ∧ ¬(
y)), and compute the probability that ¬(
x ∧ ¬(
y)) is true (using
p wherever we need the probability that
x is true):
At this point, since Pr[
x ⇒ y ] = 9/10, we can set up an equation with a single unknown real number variable
p:
Thus, we have determined the probability Pr[
x ] (i.e., the probability that
x is true) to be 2/5.
Example:
Suppose that that we are working with perfect trees in which each non-leaf node is equally likely to have two or three branches (i.e., the probability that a non-leaf node has two branches is 1/2, and the probability it has three branches is also 1/2). Then we could write the following generating function for a tree of height 2; in this generating function, in each term c ⋅ xk the coefficient c represents the probability that a tree would have width k:
The generating function for a tree of height 3 would then be as follows:
| | ⋅ ( | | ⋅ x2 + | | ⋅ x3)2 + | | ⋅ ( | | ⋅ x2 + | | ⋅ x3)3
|
|
In the above generating function, the coefficient of x4 would be the probability that a tree of height 3 has width 4.
[link] Assignment #6: Permutations, Combinations, and Generating Functions
In this assignment you will solve problems requiring counting and computation of probabilities.
You may import the following library function in your module:
from math import factorial
from fractions import Fraction
Your solution may not import any other modules or employ any external library functions. The different problems and problem parts rely on the lecture notes and on each other; carefully consider whether you can use functions from the lecture notes, or functions you define in one part within subsequent parts.
-
For each of the following descriptions, define a mathematical expression for computing the specified count by using multiplication
*
, integer division //
, addition +
, and/or the factorial()
function. In your Python file, define the variables oneA
, oneB
, oneC
, and so on. For example, a solution to part (h) might look as follows:
oneH = factorial(5) // factorial(2)
In the above manner, solve the following problem parts.-
You are given the collection of characters {a, b, c, d, e, f, g}, and you must count the number of ordered sequences of length four that can be made using these characters in which no character repeats.
-
You are given the collection of characters {a, a, b, b, c, c, d, d}, and you must count the number of distinct ordered sequences of length eight that can be made using these characters in which each of the characters from the collection can be used exactly once.
-
You must count the number of different ways that ten people can split into two groups consisting of four people and six people. Hint: once the group of four people has been chosen, this automatically means the remaining six people are also in a group.
-
Given a set of size 15, how many distinct subsets of size five does it have?
-
Suppose there are seven people and five pets. In how many different ways can we choose three people and two pets to invite to an event (the people and pets are chosen independently).
-
How many different groups of size two, three, four, or five can we choose from a group of seven people?
-
Suppose there are nine people and four pets. In how many different ways can we choose a group of three, four, or five people and a group of two, three, or four pets to invite to an event? The number of people we invite and the number of pets we invite are independent.
-
For each of the following descriptions, use Python to define a generating function and then specify using the index notation
[...]
which coefficient of the generating function corresponds to the correct count. In your Python file, you should include the class definition for generating function found in the appendix. For your solutions, define the variables twoA
, twoB
, twoC
, and so on. For example, a solution to part (h) might look as follows:
twoH = ((x**2 + x**3 + x**4)**2)[3]
In the above manner, solve the following problem parts.-
In how many ways can you choose a combination of four items from a set of six items? Hint: use the binomial theorem.
-
Suppose you have four six-sided dice, where each dice is labelled with the numbers 0, 2, 4, 6, 8, 10. In how many ways can you roll the four dice so that the total is 20?
-
You must build a fence of length 20. You are allowed to use any number of fence sections, and each fence section can be of length 2, 3, 5, or 7. You decide to buy a particular bundle of fence sections (e.g., four section of length 5, or ten sections of length 2). How many distinct bundles of fence sections could you buy that would allow you to build a fence of the desired length without having any extra fence sections left over (it's fine to count different arrangements of the sections as different bundles, so that 3 + 3 + 7 + 7 and 7 + 7 + 3 + 3 count as two different bundles). Hint: split the problem into cases based on the number of sections you buy, and then use
sum(...)
. -
Suppose you have five 50-sided dice, where each dice is labelled with the numbers 0, 1, 2, ..., 49. In how many ways can you roll the five dice so that the total is 158? Hint: use the
sum(...)
function as in this example, except with powers of x
to build an appropriate generating function. -
Suppose you have five 100-sided dice, where each dice is labelled with the numbers 1, 2, 3, ..., 100. In how many ways can you roll the five dice so that the total is 406? Hint: be careful when applying a theorem; you will need to do a little additional work because the labels start from 1 and not 0.
-
If non-leaf nodes of a tree can have either one, two, or three branches, how many trees of height 3 have width 7? You must write out the generating function; you may not use an algorithm.
-
Implement a function
sequences(cs)
that takes a list of characters cs
as its input and computes the number of distinct ordered sequences of length len(cs)
that can be made using these characters. Hint: use set()
and the .count()
method.
>>> sequences(['a','a','a'])
1
>>> sequences(['a','b','c'])
6
>>> sequences(['a','b','b','c'])
12
>>> sequences(['a','a','a','b','b','c','c'])
210
-
Implement a function
solutions(k, n)
that takes two integer inputs k
and n
. The function should compute the number of possible positive integer solutions to the following equation:
Each solution to the above would be some assignment of a positive integer to each of the yi so that they add up to n, and your function must return the total count of all such solutions. Hint: note that the integer must be positive and thus non-zero, so you'll need to think carefully about what generating functions to use; if you use appropriate theorems, your implementation should be very short (two or three lines).
>>> solutions(2, 10)
9
>>> solutions(3, 100)
4851
>>> solutions(5, 20)
3876
>>> solutions(6, 10)
126
-
Implement a function
hgtToSizes(h)
that takes an integer input h
that represents a tree height. The function should return the generating function that represents how many height-h
trees there are of each possible size (i.e., total number of all nodes in the tree) if each non-leaf node in the tree can have either one, two, or three branches.
>>> hgtToSizes(2)
1*x**2 + 1*x**3 + 1*x**4
>>> hgtToSizes(3)
1*x**3 + 1*x**4 + 2*x**5 + 2*x**6 + 4*x**7 + 5*x**8 + 7*x**9 + 7*x**10 + 6*x**11 + 3*x**12 + 1*x**13
-
Extra credit: Implement a function
hgtToWidthProb(b, h, w)
that take three integer inputs:
-
b
represents the maximum number of branches each non-leaf node in the tree might have (so each node can have 1
, 2
, ..., b-1
, or b
branches);
-
h
represents tree height;
-
w
represents a tree width.
The function should return the probability (represented using a Python Fraction()
object) that a tree of height h
has width w
. Hint: build a generating function by using Python Fraction(...)
objects as the coefficients.
[link] More applications of counting and probability
Example:
Suppose we are modelling how bacterial cells multiply over time. To simplify the problem, assume that time is divided into discrete time steps, and at a given time step, each cell has a 50% chance of dividing into two cells, and a 50% chance of not dividing into two cells. After 5 time steps, what is the probability that there are at least 10 cells?
We approach the problem by observing that all we need to do is compute the probabilities of the possible widths of trees of height 5 when each node can have either one or two branches (each with probability 1/2). Thus, we would start with:
Next, we would "update" our generating function four times using the following rule:
Finally, to find the probability that there are at least 10 cells, we would compute the sum of all coefficients of the generating function for the terms
x10,
x11,
x12, ...,
x16 (the maximum width of any tree of height 5 must be 16).
Example:
It is your job to allocate memory for an artificial intelligence algorithm that will model a game. You are not told the details of the game, but you know the following: during each turn, a player rolls a fair six-sided die that has the labels 1, 2, ..., 6. The result of the dice roll determines how many distinct options the player has in making the next move (e.g., if the roll results in a 2, the player has two possible ways they can make a move during that turn).
The AI algorithm needs to represent and store in memory the entire collection of possible ways the game can progress. Thus, it will use a tree to represent this: each node is a possible turn, and each branch is one of the possible moves a player can make. The number of children that each node has will be determined by the roll of a fair dice.
Suppose games always last exactly four turns. You decide to allocate enough memory to store exactly 20 nodes. What is the probability that this will be enough memory to store the entire tree of possible ways the game can progress?
Fact (generalized binomial theorem):
The following identity holds:
| = | ( | | ) ⋅ an ⋅ b0 + ( | | ) ⋅ an − 1 ⋅ b1 + ( | | ) ⋅ an − 2 ⋅ b2 + ... + ( | | ) ⋅ a0 ⋅ bn |
|
| = | |
| = | |
Example:
Suppose that a robot is standing in the center of a room. Every second, the robot moves one meter forward with probability 1/3, or it moves one meter backward with probability 2/3. What is the probability that after 10 seconds, the robot will end up in the center of the room?
We can model this situation using a generating function by representing the forward motion of the robot using a positive exponent, and the backward motion using a negative exponent. Thus, the behavior at any particular time step is:
After 10 seconds, the probability that the robot moved some distance
d is given by the coefficient of the
xd term in the following generating function:
In this particular problem, we are interested in the coefficient of
x0. One way to simplify the above is to introduce a factor of
x into the generating function, so that there are no negative exponents:
We still want to find the coefficient of
x0 of the above generating function. However, we see that this corresponds to the coefficient of
x10 of the second term ((2/3)
x0 + (1/3)
x2)
10. We can then use the
generalized binomial theorem to compute this:
| = | Σ10i = 0 ( | | ) ⋅ ( | | )10 − i ⋅ ( | | )i ⋅ (x2)i
|
|
The above shows us that the coefficient will correspond to the
i = 5 case, since 2 ⋅ 5 = 10. Thus, our desired coefficient is:
Notice we can explain the above formula in a different way. Each of the 10 time steps is an indepent choice the robot must make between moving forward or backward. The only way the robot will end up at the center of the room is if the robot chooses to move forward during exactly 5 of those time steps (and moves backward during the other five). This explains the combination term in the above. Then, it is only necessary to compute the probability of choosing to move forward five times, and choosing to move backward five times. This explains the two fraction terms in the above.
Example:
Suppose that there is an equal likelihood that on a given weekday, a stock price will fall 3, 2, or 1 dollars in value, will stay the same, or will rise 1, 2, or 3 dollars in value. What is the probability that after five weekdays, the stock has risen in value or stayed the same?
We can model the behavior of the stock price on a given day using a generating function:
| | (x−3 + x−2 + x−1 + x0 + x1 + x2 + x3)
|
|
Given these assumptions, the maximum that the stock price can rise over five days is 3 ⋅ 5 = 15. Thus, we then need to compute the sum of the coefficients of
xi for
i ≥ 0 for the following generating function:
( | | (x−3 + x−2 + x−1 + x0 + x1 + x2 + x3))5
|
|
We can do so using Python:
>>> g = (Fraction(1,7) * sum(x**i for i in range(-3,4)))**5
>>> sum(g[i] for i in range(0, 15))
Fraction(1304, 2401)
Thus, the overall probability that the stock price will be the same or higher after five days is 1304/2401.
Example:
Suppose that passwords for a certain device must be sequences that consist of some number of a and b characters. Assuming that the password for the device is aabab, answer each of the following questions.
-
Alice's password guessing algorithm is equally likely to choose any 5-character password that can be assembled from the five characters a, a, a, b, and b. What is the probability that it will guess the password on its first try?
The total number of possible 5-character passwords is:
Notice this corresponds to a combination because we are "choosing" which of the five password character positions will be
a (or, equivalently,
b):
Since Alice's algorithm is equally likely to choose any of these, the probability it will choose exactly the correct one is:
-
Bob's password guessing algorithm builds the password from left to right, one character at a time. At each step, the probability that it will choose a is 2/5, and the probability that it will choose b is 3/5. What is the probability that it will guess the correct 5-character password on its first try?
Each character in the 5-character password is an independent dimension consisting of two mutually exclusive choices (
a and
b). The probability that Bob's algorithm will choose the particular sequence is then the product of probabilities over these five independent dimensions:
Notice that this intuitively makes sense: there are 2
5 = 32 possible 5-character passwords that can be built, and if Bob's algorithm was equally likely to choose
a or
b for each character position, the overall probably would be 1/32. However, since Bob's algorithm is biased in favor of choosing a
b while the actual password has more
a character instances than
b character instances, the probability that Bob's algorithm will guess the correct password is actually lower.
-
What is the probability that it will guess a 5-character password that is not necessarily correct, but has the correct number of a and b characters?
We know that the probability that Bob's algorithm will choose a password containing three
a characters and two
b characters is (2/5)
3 ⋅ (3/5)
2. However, the number of ways in which this can happen is:
Thus, the probability would be:
Notice that this is exactly the coefficient of
x2 in the generalized binomial theorem expansion of the following generating function:
-
Suppose Alice runs her algorithm three times. What is the probability that at least one of the three guesses will be the correct password?
We want to measure the probability that at least one of the guesses generated by Alice's algorithm is correct. However, we also want to count any situation in which two or three of the guesses are correct. Thus, we want the probability of the following logical formula:
first guess is correct ∨ second guess is correct ∨ third guess is correct
|
|
The three situations that correspond to the three predicates above are not mutually exclusive (e.g., the first and third guesses might both be correct). Thus, we cannot simply add the probabilities. Fortunately, the three attempts are completely independent (for example, the probability of possible outcomes in the second attempt is the same whether or not we restrict ourselves to a particular outcome of the first attempt). Thus, we can instead use De Morgan's law and compute the probability of the following:
¬(¬(first guess is correct) ∧ ¬(second guess is correct) ∧ ¬(third guess is correct))
|
|
In other words, we instead multiply the complements of the probabilities to find the probability that
none of the guesses are correct. Then take the complement of that to determine the probability that
at least one guess works. We can can now compute:
Thus, the probability of at least one of the guesses being correct is 271/1000.
-
Suppose Alice runs her algorithm three times. What is the probability that exactly one of the three guesses will be the correct password?
In this case, we are considering three mutually exclusive possibilities: the first guess is correct and they other two are not, the second guess is correct and the others are not, and the third guess is correct and the others are not. We can compute each of these probabilities separately and then add them:
( | | ⋅ | | ⋅ | | ) + ( | | ⋅ | | ⋅ | | ) + ( | | ⋅ | | ⋅ | | ) | |
| = | |
| = | |
-
Suppose Bob runs his algorithm three times. What is the probability that all three guesses will be the correct password?
If each guess is an independent dimension, we want the probability of a particular measurement along each dimension, so we multiply:
[link] Assignment #7: Generating Functions, Probability, and Logical Formulas
In this assignment you will solve problems involving logical formulas and the computation of probabilities using generating functions and other counting methods.
from math import factorial
from fractions import Fraction
Your solution may not import any other modules or employ any external library functions. The different problems and problem parts rely on the lecture notes and on each other; carefully consider whether you can use functions from the lecture notes, or functions you define in one part within subsequent parts.
-
Suppose we are given the following data about which predicates apply to elements in the collection of objects O = {tomato, frog, robin, parrot, fox, tree} (this particular collection of concepts and properties is adapted from examples in Semantic Cognition by Rogers & McClelland):
object |
predicates that apply (i.e., instantiations that are true) |
tomato |
tomato is red, tomato cannot eat, tomato cannot fly |
frog |
frog can eat, frog is green, frog cannot fly |
robin |
robin can eat, robin can fly, robin is red |
parrot |
parrot can eat, parrot is green, parrot can fly |
fox |
fox is red, fox cannot fly, fox can eat, |
tree |
tree cannot eat, tree is green, tree cannot fly |
Answer the following questions about the above information.-
Determine whether the following formula is true for all objects x in O. Indicate your answer by including in your Python file either
oneA = True
or, if it is not true, providing a counterexample such as oneA = 'penguin'
:
∀ x, (x is red ∧ x can eat) ⇒ x can fly
|
|
-
Determine whether the following formula is true for all objects x in O. Indicate your answer by including in your Python file either
oneB = True
or, if it is not true, providing a counterexample such as oneB = 'penguin'
:
∀ x, x cannot fly ⇒ x cannot eat
|
|
-
Separate the predicates into separate dimensions of mutually exclusive properties. Use Python strings such as
'cannot fly'
to represent each predicate (avoid typos to make grading easier), and define oneC = ...
to be a list of sets in which each set contains all the strings for a particular dimension. As an example, here is one way this could be done for the predicates ... is hot, ... is cold, ... is wide, and ... is narrow:
oneC = [{'is hot', 'is cold'}, {'is wide', 'is narrow'}]
-
For each pair of dimensions in part (c) above, indicate whether for the given data the pair of dimensions is independent (for purposes of probability computation). Indicate your answer by putting every pair of independent dimensions together into a set. For example, if
{'is hot', 'is cold'}
and {'is wide', 'is narrow'}
are independent, and {'is sweet', 'is sour'}
and {'is wide', 'is narrow'}
are independent, but no other pair of dimensions is independent, then you would write:
oneD = [\
{ {'is hot', 'is cold'}, {'is wide', 'is narrow'} },\
{ {'is sweet', 'is sour'}, {'is wide', 'is narrow'} }\
]
-
If an object x is selected uniformly at random from O (any object has the same probability of being selected as any other), what is the probability that x can fly is true? Indicate your answer by including in your Python file using the line
oneE = Fraction(..., ...)
(filling in the ...
appropriately). -
Use Python
Fraction()
objects to write out the computation oneF = ...
for the probability that the following formula is true (for an object x selected uniformly at random from O):
Prx ∈ O[ x cannot fly ∧ x is red ]
|
|
-
Use Python
Fraction()
objects to write out the computation oneG = ...
for the probability that the following formula is true (for an object x selected uniformly at random from O):
Prx ∈ O[ x is green ∨ x cannot eat ]
|
|
-
Use Python
Fraction()
objects to write out the computation oneH = ...
for the probability that the following formula is true (for an object x selected uniformly at random from O):
Prx ∈ O[ x is green ⇒ x cannot fly ]
|
|
-
Use Python
Fraction()
objects to write out the computation oneI = ...
for the probability that a new predicate ... is slithy applies to an object x chosen from O given the following information (assuming "slithiness" is independent from all the other dimensions):
Prx ∈ O[ x is slithy ∧ x cannot fly ] | |
| = | |
-
Suppose you are modelling a drunken walk along a bridge that is 4 steps wide. The drunk person starts from one end of the bridge (at the center of the bridge's walkway) and at each time step, they will step forward. They will also weave either one step to the right, one step to the left, or will only move forward without moving left or right (each possibility has an equal probability of occurring). They will fall of the bridge if they move 3 or more steps to the left, or 3 or more steps to the right.
-
What is the minimum number of steps until there is a non-zero probability of falling off the bridge? Indicate your answer using a line of the form
twoA = ...
. -
Assuming your answer to part (a) is the integer
twoA
, define the probability twoB = ...
(using addition and/or multiplication of Python Fraction()
objects) of falling off the bridge after either twoA
or after twoA
+ 1
time steps. Hint: you cannot fall off the bridge again after you have already fallen off the bridge; compute your probability accordingly. -
What is the probability
twoC = ...
that after five steps have been taken, no fall has occurred?
-
Suppose we have the following model for the way a tree grows: it begins growing from a single root node (which exists at time step
t
= 1
), and at each time step t
(starting with time step t
= 2
), every node that appeared at time step t
− 1
behaves as follows:
- with probability 1/3, the node does not grow any branches (and never again grows any branches);
- with probability 1/3, the node grows one branch;
- with probability 1/3, the node grows two branches.
Implement a Python function probHgtAtTime(h, t)
that uses generating functions to compute the probability that the tree has height h
at time step t
. -
You are modelling how two different types of bacteria reproduce. At each time step, each cell of type A that exists will behave as follows:
- with probability 2/3, the cell will split into two cells;
- with probability 1/3, the cell will not split.
Each cell of type B that exists at a given time step will behave as follows:
- with probability 1/3, the cell will split into three cells;
- with probability 2/3, the cell will not split.
Implement a Python function moreB(t)
that uses generating functions to compute the probability that at time step t
, there will be strictly more cells of type B than cells of type A. Hint: for each possible number of cells of type A, and for each possible number of cells of type B, we can compute the probability of that particular pair of quantities occuring at the same time; each of these possibilities is mutually exclusive. -
Suppose you are advising a company about the amount of funds they should have available to repay potential debt at a future point in time (e.g., to be kept in a separate account from which they can only withdraw at the time they need to pay the debt). The goal is to ensure that the probability the company will go bankrupt at that point in time is below 10%.
Assume that the mathematical model of the company's performance each month is as follows: given some positive integer
c
, each month the company will either earn c
in funds with probability p
, or it will lose an amount c
in funds with probability 1
− p
.-
Implement a Python function
probInRange(c, p, t, a, b)
that computes the probability that the company's net loss or gain after t
months will be between a
and b
(inclusive). You may assume the following (you do not need to check these conditions in your code):
c
> 0
;
0
< p
< 1
;
t
> 0
;
a
≤ b
.
Your code should work for large values of c
and t
, and you may assume that p
is a Python Fraction()
object. Hint: do not build a generating function; instead, use a theorem to determine how to compute the desired coefficient. -
At time
t
, the company's performance will be assessed; if its cumulative losses (i.e., negative earnings) at that time are greater than the amount of funding the company allocated to repay its debt, the company will go bankrupt. Implement a Python function minAlloc(c, p, t)
that computes the minimum amount of funds the company should allocate (as an integer) so that the probability that the company will go bankrupt is below 10%.