Definition:
A language supports
ad hoc polymorphism if it allows programmers to specify multiple definitions for the same constant or variable (which could be a literal, an operator symbol, a function name, a type, and so on). An interpreter or compiler then choose the appropriate definition to use based on the context (usually, the types of the argument expressions and/or the surrounding expressions and statements in the abstract syntax tree).
Common examples of language features that support ad hoc polymorphism include operator overloading in C++, C
and Python (operator overloading is not supported in Java), method overloading in Java subclasses, Java interfaces, and type classes in Haskell.
Definition:
A language supports parametric polymorphism if it allows programmers to specify a single definition for a function or procedure that can then be applied to values of multiple types (typically, a large range of types, or even all types). Typically, the function or procedure will operate on a container type, or some other parameterized or generic type. Thus, languages that support parametric polymorphism usually allow programmers to define data structures that can be parameterized by other data structures.
Common examples of language features that support parametric polymorphism include Java generics and the Haskell type system (the latter uses an extension of the Hindley-Milner type inference algorithm that is also used by the programming language ML).
Note an important distinction between ad hoc polymorphism and parametric polymorphism: ad hoc polymorphism allows programmers to continue adding new definitions for a given variable or constant within the same scope, while parametric polymorphism only permits a single, albeit generic, definition for a function or procedure within any given scope.
Example:
Suppose we define a function len for computing lengths of lists of integers.
len :: [Integer] -> Integer len [] = 0 len ((:) x xs) = 1 + len xs
Unfortunately, we cannot use the above function to compute the length of a list of, e.g., Bool values because the type [Integer] does not unify with the type [Bool] . We could write an alternative type to address this.
len :: [Bool] -> Integer len [] = 0 len ((:) x xs) = 1 + len xs
We can now use len on lists of type [Bool] , but we can no longer use it on lists of type [Integer] . Notice that the only thing that has changed is the type of the function; the body is the same in both cases. Haskell allows us to take advantage of this by employing a type variable.
len :: [a] -> Integer len [] = 0 len ((:) x xs) = 1 + len xs
With the above declaration, it is now possible to apply the len function to a list containing any type of element. This is because the type variable a unifies with any other type (including Bool and Integer ), so the input type [a] will unify with both [Integer] and [Bool] , as well as types such as [String -> String] (a list of functions that take String inputs and return String outputs).
*> len [True, False, True] 3 *> len [1, 2, 3, 4, 5] 5 *> len [\s -> s ++ ".", \t -> t ++ ";"] 2
Since len is a function that can be applied to different input types, but has only a single definition that works for all input types, it is an example of parametric polymorphism.
Example:
We can define a Set data structure for holding sets of Integer values that has an underlying list implementation.
module IntegerSet where data Set = Empty | Set [Integer] deriving Show insert :: Set -> Integer -> Set insert (Set l) x = Set (x:l) insert (Empty) x = Set [x]
contains :: Integer -> Set -> Bool contains x (Empty) = False contains x (Set l) = x `elem` l
However, what if we also want user to have the option of using a data structure for sets that uses an underlying binary search tree implementation? We can first define a class for set data structures that specifies the signatures of the functions that a set data structure must support. We can even define new functions that rely only on the signature.
module SetClass where
class IsSet s where empty :: s insert :: s -> Integer -> s contains :: Integer -> s -> Bool inserts :: s -> [Integer] -> s inserts s (x:xs) = inserts (insert s x) xs inserts s [] = s
This makes it possible to provide two different implementations that are both members of this class.
module IntegerSetList where
import SetClass data IntegerSetList = Empty | Set [Integer] deriving Show
instance IsSet IntegerSetList where empty = Empty
insert (Set l) x = Set (x:l) insert (Empty) x = Set [x]
contains x (Empty) = False contains x (Set l) = x `elem` l
module IntegerSetTree where
import SetClass
data IntegerSetTree = Leaf | Node Integer IntegerSetTree IntegerSetTree deriving Show
instance IsSet IntegerSetTree where empty = Leaf
insert (Leaf ) x = Node x Leaf Leaf insert (Node y s s') x = if x < y then Node y (insert s x) s' else Node y s (insert s' x)
contains x (Leaf ) = False contains x (Node y s s') = if y == x then True else if x < y then contains x s else contains x s'
Haskell type classes need not contain types; they can also contain type constructors (i.e., functions in the type system that can be used to construct a type given a type, i.e., type constructors with kind Type -> Type , and so on).
Example:
We can define a Set data structure for holding sets of values of any type a .
module Set where data Set a = Empty | Set [a] deriving Show insert :: Set a -> a -> Set a insert (Set l) x = Set (x:l) insert (Empty) x = Set [x]
contains :: Eq a => a -> Set a -> Bool contains x (Empty) = False contains x (Set l) = x `elem` l
However, what if we also want user to have the option of using a data structure for sets that uses an underlying binary search tree implementation? We can first define a class for set data structures that specifies the signatures of the functions that a set data structure must support. We can even define new functions that rely only on the signature.
module SetClass where
class IsSet s where empty :: s a insert :: Ord a => s a -> a -> s a contains :: (Ord a, Eq a) => a -> s a -> Bool inserts :: Ord a => s a -> [a] -> s a inserts s (x:xs) = inserts (insert s x) xs inserts s [] = s
This makes it possible to provide two different implementations of polymorphic set data structures that are both members of this class. Notice that we added the Ord a constraint on the element type a in case we want to compare them (e.g., if we are using a binary search tree implementation of sets).
module SetList where
import SetClass data SetList a = Empty | Set [a] deriving Show
instance IsSet SetList where empty = Empty
insert (Set l) x = Set (x:l) insert (Empty) x = Set [x]
contains x (Empty) = False contains x (Set l) = x `elem` l
module SetTree where
import SetClass
data SetTree a = Leaf | Node a (SetTree a) (SetTree a) deriving Show
instance IsSet SetTree where empty = Leaf
insert (Leaf ) x = Node x Leaf Leaf insert (Node y s s') x = if x < y then Node y (insert s x) s' else Node y s (insert s' x)
contains x (Leaf ) = False contains x (Node y s s') = if y == x then True else if x < y then contains x s else contains x s'
Example:
Suppose we have a Set data structure for holding sets of values of any type a that has a corresponding size :: Set a -> Int function.
module Set where data Set a = Empty | Set [a] deriving (Eq, Show) size :: Set a -> Int size (Set l) = length l size (Empty) = 0
We can make it possible to order sets by their size using the < operator by making the type Set a (for any possible a ) a member of the Ord type class.
instance Eq a => Ord (Set a) where s < s' = size s < size s'
Note that the instance declaration is prepended with the condition that the type a must be in the Eq type class. This is because the definition of Set a contains a deriving Eq clause, which means that a type Set a can only be used in a module if the parameter type a is a member of the Eq type class (otherwise, Haskell would not be able to compare two sets for equality using the automatically derived definition of (==) generated by the deriving Eq clause).
[link] Folds, Unfolds, and Algebraic Properties of Functional Programs
We have seen that the operational semantics of declarative and functional languages support an approach to problem solving that involves first defining a model of some system, and then querying that model to answer questions about that system or explore its properties. We can define this approach to programming with more consistency and mathematical rigor by identifying the mathematical transformations that are typically employed and composed when using this approach. In particular, most component operations fall into two categories:
- unfold (or anamorphism) operations are used to define or construct a data value;
- fold (or catamorphism) operations are used to reduce, simplify, break down, or transform a data value.
A few special cases of the composition of a fold followed by an unfold are commonly used, as well (these are commonly implemented using comprehensions when the data value being transformed is a set or list):
- map (sometimes known as metamorphism) operations are used to modify all the components of a data value using a single, often local transformation that is applied to all the components across the entire data value;
- filter operations are used to remove (or select) only some of the components of a data value.
The ability to describe an algorithm using a composition of unfold, fold, map, and filter operations imposes fewer restrictions on how that computation can be carried out (when compared to a description of the same algorithm using a sequential list of operations). A description using fold, map, and so on may still impose some hierarchical dependencies on the computation, but the operational semantics of such a description allows an implementation (or a compiled version) of the algorithm to perform operations in parallel unless such parallelization is explicitly forbidden by the algorithm itself through explicit sequential dependencies introduced by the programmer. In other words, the default interpretation of such an algorithm definition is inherently parallelizable.
Defining the above operations on algebraic data types (including polymorphic algebraic data types) is often a straightforward process (in fact, it can often be automated). The Haskell base library provides functions for all of the above operations on Haskell lists, and many library functions are implemented using these functions; the Haskell compiler is optimized to compile these functions into efficient machine code. Most of the above functions are also supported by languages such as Python.
Example:
We can begin by implementing a fold operation on Haskell Integer lists that does nothing.
fold :: [Integer] -> [Integer] fold [] = [] fold (x:xs) = x : fold xs
We can modify the above to replace the two list constructors (i.e., (:) and [] ) in the input list with a binary operator and a base case (in this case, (+) and 0 ). This way, the fold function now computes the sum of the elements in the list.
fold :: [Integer] -> Integer fold [] = 0 fold (x:xs) = x + fold xs
We can further generalize the above so that it takes the two replacement functions as arguments. The two functions are a replacement for (:) , so it must be of type Integer -> Integer -> Integer , and a replacement for [] , which must be of type Integer .
fold :: (Integer -> Integer -> Integer) -> Integer -> [Integer] -> Integer fold f b [] = b fold f b (x:xs) = f x (fold f b xs)
We can now use the above fold implementation to define many common functions on lists of integers.
sum :: [Integer] -> Integer sum = fold (+) 0
product :: [Integer] -> Integer product = fold (*) 1
maximum :: [Integer] -> Integer maximum = fold max 0
minimum :: [Integer] -> Integer minimum l = fold min (fold max l) l
Example:
The fold function for the default right-associative list implementation supported by Haskell (i.e., with the built-in constructors (:) and [] ) is called foldr .
foldr :: (a -> b -> b) -> b -> [a] -> b foldr f base [] = base foldr f base (x:xs) = f x (foldr f base xs)
The foldr function can be used to implement many of the usual library functions one might find useful when working with lists, and this is what is done in the Haskell libraries. Below are some examples (some of these are simplified versions of what is actually found in the libraries).
sum :: Num a -> [a] -> a sum xs = foldr (+) 0 xs
max :: Ord a => a -> a -> a max x y = if x > y then x else y
maximum :: Ord a => [a] -> a maximum xs = foldr max 0 xs
Example:
We can implement any map operation as a fold. Suppose we want to add the constant 1 to every element in an Integer list. For example, we want to achieve the following using a call to foldr :
*> [x + 1 | x <- [1,2,3,4,5]] [2,3,4,5,6]
We can accomplish this by defining a modified list node constructor function that adds 1 to its first argument, then builds the list node:
addOneThenCons :: Integer -> [Integer] -> [Integer] addOneThenCons x xs = (x + 1) : xs
We can now use the above together with foldr :
*> foldr addOneThenCons [] [1,2,3,4,5] [2,3,4,5,6]
Using λ abstractions, we can avoid having to explicitly define addOneThenCons :
*> foldr (\x xs -> (x+1) : xs) [] [1,2,3,4,5] [2,3,4,5,6]
Example:
We can implement any filter operation as a fold. For example, suppose we want to only keep positive integers from an integer list:
*> [x | x <- [-3,-2,-1,0,1,2,3], x > 0] [1,2,3]
We can accomplish this by defining a modified list node constructor function that only adds a new list node if the data inside it is a positive integer:
consIfPositive :: Integer -> [Integer] -> [Integer] consIfPositive x xs = if x > 0 then x : xs else xs
We can now use the above together with foldr :
*> foldr consIfPositive [] [-3,-2,-1,0,1,2,3] [1,2,3]
Using λ abstractions, we can avoid having to explicitly define consIfPositive :
*> foldr (\x xs -> if x > 0 then x : xs else xs) [] [-3,-2,-1,0,1,2,3] [1,2,3]
Example:
We can implement an easily parallelizable version of quicksort using filter operations:
qsort :: [Integer] -> [Integer] qsort [] = [] qsort (x:xs) = qsort [y | y <- xs, y < x] ++ [x] ++ qsort [y | y <- xs, y >= x]
Example:
We can implement a fold operation that works for associative binary operators that recursively splits the problem into two parts over and over until a base case is reached.
fold :: (a -> a -> a) -> a -> [a] -> a fold f b [] = b fold f b [x] = x fold f b xs = fold f b (take (length xs `div` 2) xs) `f` fold f b (drop (length xs `div` 2) xs)
If each recursive invocation were run on a separate thread, processor, server, and so on, this would maximum the amount of parallelization that can be achieved in performing this fold operation.
Example (filter, map, and fold operations in SQL-like languages):
SQL-like languages (as well as the relational algebra on which they are based) can be viewed as supporting fold, map, and filter operations (and compositions thereof) on a particular data structure: a table. This allows databases to be distributed across multiple storage devices, and it makes it possible to simultaneously query different portions of a database, stored on different devices, using multiple computational devices (processors, virtual machines, servers, and so on) running in parallel.
For example, suppose an SQL table consists of some number of rows, and each row has an entry of the form (Name, Age) :
+ | People | + | Name | Age | + | Alice | 20 | | Bob | 17 | | Carl | 23 | +
The following SQL query allows a user to retrieve only the Name entries in the table; this query amounts to a map operation:
> SELECT Name FROM People
+ | Name | + | Alice | | Bob | | Carl | +
The following SQL query allows a user to retrieve only the entries within a certain Age range; this query amounts to a filter operation:
> SELECT * FROM People WHERE Age >= 18
+ | Name | Age | + | Alice | 20 | | Carl | 23 | +
The following SQL query allows a user to retrieve the sum of the Age fields of all the entries; this query amounts to fold operation (it is often called an aggregation operation):
> SELECT SUM(Age) FROM People
+ | Age | + | 60 | +
We can easily simulate all of the above capabilities using another language that supports the declarative and functional programming paradigms, such as Haskell. Suppose we have the following definitions:
type Name = String type Age = Integer data Table = [(Name, Age)]
people = [("Alice", 20), ("Bob", 17), ("Carl", 23)]
We can then perform the following queries:
*> [name | (name, age) <- people] ["Alice", "Bob", "Carl"]
*> [(name, age) | (name, age) <- people, age >= 18] [("Alice", 20), ("Carl", 23)]
*> foldr (+) 0 [age | (name, age) <- people] 60
Equivalently, we can use map , filter , foldr , and λ abstractions instead of list comprehensions (the Haskell compiler simply converts list comprehensions into some combination of these during compilation):
*> map (\(name, age) -> name) people ["Alice", "Bob", "Carl"]
*> map fst people ["Alice", "Bob", "Carl"]
*> filter (\(name, age) -> age >= 18) people] [("Alice", 20), ("Carl", 23)]
*> foldr (+) 0 (map fst people) 60
All of the above can also be done using Python list comprehensions:
>>> People = [("Alice", 20), ("Bob", 17), ("Carl", 23)]
>>> [name for (name, age) in People] ["Alice", "Bob", "Carl"]
>>> [(name, age) for (name, age) in People if age >= 18] [('Alice', 20), ('Carl', 23)]
>>> sum([age for (name, age) in People]) 60
Python also supports map , filter , reduce (similar to Haskell's foldr ), and lambda abstractions:
>>> list(map((lambda person: person[0]), People)) ['Alice', 'Bob', 'Carl']
>>> list(filter((lambda person: person[1] >= 18), People)) [('Alice', 20), ('Carl', 23)]
>>> from functools import reduce >>> reduce(lambda x,y: x + y, [age for (name, age) in People]) 60
Example (filter and map operations in JavaScript libraries):
Many JavaScript libraries, such as jQuery/jQuery UI, node.js, d3, and others support an abstraction for manipulating web documents (which are often used as application components on the web today) that is organized around filter and map operations.
In jQuery, it is possible to select all the elements with a certain tag in an HTML document and then to specify a function that will be applied to each element selected. This corresponds to the composition of a filter and map operation.
For example, the following code selects all the li elements and updates the text content of each to indicate the item number corresponding to that element. Notice that the .text() function takes a function as an argument (the function supplied as an argument takes an index as an input and returns a string):
$("ul li").text( function( index ) { return "item number " + ( index + 1 ); } );
More generally, it's possible to update individual document elements using the .each() function.
[link] Assignment #5: Embedded Languages and State Space Search
In this assignment you will practice using the declarative and functional programming language Haskell by implementing a small embedded language, and by building a library of algorithms for solving an optimization problem. You must submit two files:
Please follow the gsubmit directions and remember to put your files in the hw5 directory.
Your solutions to each of the problem parts below will be graded on their correctness, concision, and mathematical legibility. The different problems and problem parts rely on the lecture notes and on each other; carefully consider whether you can use functions from the lecture notes, or functions you define in one part within subsequent parts.
A testing script with several test cases is available for download: hw5-tests.hs . You should be able to place it in the same directory with the other assignment files and load it. Feel free to modify or extend it as you see fit.
-
In this problem you will implement an interpreter for a small embedded programming language. Your solutions should be included in the file
hw5/Database.hs .
The module Database already contains data type definitions below for representing the syntax of the embedded language.
type Column = String data User = User String deriving (Eq, Show) data Table = Table String deriving (Eq, Show) data Command = Add User | Create Table | Allow (User, Table) | Insert (Table, [(Column, Integer)]) deriving (Eq, Show)
A program in this language will consist of a list of elements of values (i.e., a value of type [Command] ) such as the following example.
example :: [Command] example = [ Add (User "Alice"), Add (User "Bob"), Create (Table "Revenue"), Insert (Table "Revenue", [("Day", 1), ("Amount", 2400)]), Insert (Table "Revenue", [("Day", 2), ("Amount", 1700)]), Insert (Table "Revenue", [("Day", 3), ("Amount", 3100)]), Allow (User "Alice", Table "Revenue") ]
-
Implement a function
select :: [Command] -> User -> Table -> Column -> Maybe [Integer] that takes four arguments: a list of commands describing the current state of the database, the user evaluating the query, the table on which the query is being evaluated, and the name of the column the user wants to select. Hint: use list comprehensions with pattern matching.
If the specified user has permission to query the table, the function should return a list of integers corresponding to the values in the specified column of the specified table (wrapped with the Just constructor). If the user or table do not exist, or if the specified user does not have permission to query the specified table, the function should return Nothing .
*Database> select example (User "Alice") (Table "Revenue") "Day" Just [1,2,3]
*Database> select example (User "Alice") (Table "Revenue") "Amount" Just [2400,1700,3100]
*Database> select example (User "Bob") (Table "Revenue") "Amount" Nothing
-
Implement a function
aggregate :: [Command] -> User -> Table -> Column -> Operator -> Maybe Integer that takes five arguments: a list of commands describing the current state of the database, the user evaluating the query, the table on which the query is being evaluated, the name of the column to aggregate, and an operator (of type Integer -> Integer -> Integer ) to use when aggregating the values in the specified column. Hint: use foldr .
If the specified user has permission to query the table, the function should return the aggregate (with respect to the supplied aggregator function) of the integers corresponding to the values in the specified column of the specified table (wrapped with the Just constructor). If the user or table do not exist, or if the specified user does not have permission to query the specified table, the function should return Nothing .
*Database> aggregate example (User "Alice") (Table "Revenue") "Amount" (+) Just 7200
-
Implement a function
validate :: [Command] -> Bool that checks that every command in the list refers to a table that has already been created and a user that has already been added earlier in the list of commands. If every command is valid, it should return True ; otherwise, it should return False . Hint: write a helper function and call it on a reversed list of commands, then check each command against the rest of the list.
-
In this problem, you will implement a small library for representing and working with a state space graph for an optimization problem. All the functions you define should be included in the file
hw5/Allocation.hs .
The optimization problem we are addressing is defined as follows: given a list of items, each of a certain integer size, put each item into one of two bins so that once all items have been allocated, the sum of the sizes of the items in the first bin is as close as possible to the sum of the items in the second bin (note that this optimization problem is NP-complete and is equivalent to the subset sum problem, so no efficient solution for the problem is believed to exist).-
We will represent an allocation of items (either in the midst of running an optimization algorithm or within a final result) using the data type
Alloc :
data Alloc = Alloc Bin Bin deriving (Eq, Show)
We will represent the graph of possible states that an algorithm can traverse given a starting allocation and a list of tasks using the Graph data type:
data Graph = Branch Alloc Graph Graph | Finish Alloc deriving (Eq, Show)
Define a function graph :: Alloc -> [Item] -> Graph that takes a starting allocation and a list of items, and returns the full graph of all possible state space traversals available to an algorithm given the list of items. An example is provided below (spacing has been adjusted for legibility):
*> graph (Alloc 0 0) [1,2] Branch (Alloc 0 0) (Branch (Alloc 1 0) (Finish (Alloc 3 0)) (Finish (Alloc 1 2)) ) (Branch (Alloc 0 1) (Finish (Alloc 2 1)) (Finish (Alloc 0 3)) )
The definition must work even if the supplied list of items is infinite. -
Define a function
alloc :: Graph -> Alloc that makes it easy to retrieve the allocation within the root node of a particular graph of type Graph .
*> alloc (Finish (Alloc 0 0)) Alloc 0 0
-
One allocation
a :: Alloc is better than another allocation b :: Alloc if the difference between the two bin totals in a is less than the difference between the two bin totals in b . Add an instance declaration for the Alloc type so that it is possible to use the built-in Haskell infix operators < and <= to compare two allocations according to their differences. You may use the Haskell library function abs :: Integer -> Integer to compute the absolute value of an integer. If you're getting a stack overflow when testing < , make sure you also define <= explicitly.
*> (Alloc 1 4) < (Alloc 5 10) True *> (Alloc 10 4) < (Alloc 1 3) False
-
One graph
g :: Graph is better than another graph g' :: Graph if the difference between the two bin totals in the root node of g is less than the difference between the two bin totals in the root node of g' . Add an instance declaration for the Graph type so that it is possible to use the built-in Haskell infix operator < to compare two graphs.
*> (Finish (Alloc 1 4)) < (Finish (Alloc 5 10)) True
-
Define a function
final :: Graph -> [Alloc] that returns an aggregate list of the allocations within all the leaf nodes of the supplied state space graph. -
Define a function
depth :: Integer -> Graph -> [Alloc] that returns the allocations within the state space graph nodes that are at depth exactly n within the state space graph (the root node is at depth 0 ).
-
In this problem, you will implement a small library of algorithms for solving the optimization problem we defined above. All the functions you define should be included in the file
hw5/Allocation.hs .
A strategy is an algorithm that can traverse the state space graph; given an existing graph, it chooses some descendant node in the graph and returns the subgraph for which that descendant node is the root.
type Strategy = Graph -> Graph
-
Define a strategy
greedy :: Strategy that takes a graph as an input. It should choose and return the better child of the two children of the root of the graph. -
Define a strategy
patient :: Integer -> Strategy that takes an integer argument n and a graph as its two inputs, and chooses and returns the best descendant of the supplied graph that can be found at depth n (an allocation is best if it is better than all others at that depth). If n is 0 , the function should simply return the graph supplied to it as an input. -
Define a strategy
optimal :: Strategy that takes a state space graph, and returns the leaf node in the state space graph that has the best allocation. Hint: you can define this function in two lines; look carefully over the functions already available to you. -
Define a metastrategy
metaCompose :: Strategy -> Strategy -> Strategy that takes two strategies as arguments. It should apply the first strategy to obtain a new subgraph, and then it should apply the second strategy to that subgraph and return the result.
*> (metaCompose greedy greedy) (graph (Alloc 0 0) [1,2,3]) Branch (Alloc 2 1) (Finish (Alloc 5 1)) (Finish (Alloc 2 4)) *> (metaCompose (patient 2) greedy) (graph (Alloc 0 0) [1,2,3,4]) Branch (Alloc 2 4) (Finish (Alloc 6 4)) (Finish (Alloc 2 8))
-
Define a metastrategy
metaRepeat :: Integer -> Strategy -> Strategy that repeats a given strategy a specified number of times. Your definition of metaRepeat should take an Integer and a Graph argument. If the integer is 0 , it should simply return the graph unchanged. If the integer is positive, it should apply the strategy the specified number of times to the graph and then return the result. -
Define a metastrategy
metaGreedy :: Strategy -> Strategy -> Strategy that takes two strategies as its inputs. It should apply the two strategies to its input graph and should return the better of the two results that it obtains. -
Consider the following strategy:
impatient :: Integer -> Strategy impatient n g = (metaRepeat n greedy) g
Describe one way in which impatient is superior to patient , and one way in which it is inferior.
-
Extra credit: Define a metastrategy
fit :: Graph -> [Strategy] -> Strategy that takes a graph and a list of strategies as inputs; this metastrategy should choose the strategy in the list of strategies that has the best performance on the given graph, and return it. Hint: define a way to compare strategies and use minimumBy .
[link] Programming Language Concepts
This section contains a collection of review problems going over all the course material. These problems are an accurate representation of the kinds of problems you may see on an in-class exam.
Below is a breakdown of what concepts you should udnerstand and small-scale problems you should be able to solve.
- mathematical definitions involved in the definition of a programming language
- strings
- token sequences
- concrete syntax
- BNF notation
- terminals
- non-terminals
- cases
- productions
- non-recursive, recursive, left-recursive
- abstract syntax
- variables
- expressions
- values/constants
- statements
- outputs/output sequences
- type system
- type inference rules
- typical type inference rule for function application
- operational semantics
- abstract interpretation (i.e., restricted/abstract operational semantics measuring some dimensions of interest, often used in static analysis)
- software components (data structures, and algorithms) involved in an implementation of a programming language
- data structure
- string
- token sequence
- abstract syntax tree
- intermediate representation abstrac syntax tree
- machine language instruction sequence
- bytecode instruction sequence
- heap
- stack
- environment/context
- substitution
- algorithms and software components
- tokenizer
- parser
- predictive recursive descent parser
- backtracking recursive descent parser
- static analysis algorithms (always terminate)
- abstract interpreter (cost, size, memory, time, etc.)
- type checker
- optimization algorithms
- constant folding (i.e., evaluation of subexpressions with no variables)
- loop unrolling
- interpreter
- evaluation algorithm
- execution algorithm
- unification
- substitution algorithm
- pattern matching unification algorithm
- compiler (always terminates)
- run-time system/virtual machine
- garbage collector
- testing/validation
- bounded exhaustive testing input generator
- validation algorithm (checks that two components have the same input-output behavior)
- imperative and procedural programming langauge concepts
- control flow
- procedures
- how these are implemented by an interpreter
- how these are supported by a compiler to a machine language (stack)
- declarative and functional programming language concepts and techniques
- how to solve equations involving data values using pattern matching unification
- models/modules and queries
- how can pattern matching unification...
- support matching of function argument patterns and input values
- algebraic data types
- constructors (base cases and inductive constructors with child trees)
- defining non-recursive and recursive functions that operate on algebraic data types
- kinds of polymorphism
- ad hoc polymorphism
- parametric polymorphism
- higher order functions
- types of higher-order functions
- how to define and apply higher-order functions
- how to compose functions
- infinite data structures and applications
- infinite lists, trees, decision trees, state space graphs
- traversing/searching a state space graph
- folds, maps, and filters
- using list comprehensions
- folds on lists and trees
- defining fold functions
- using folds to perform aggregate computations
- using functions with no names (lambda abstractions) in folds
- miscellaneous tasks
- does a parser implement a BNF definition of a grammar?
- does an interpreter implementation conform to an operational semantics?
- does a type checker implementation conform to a type system definition?
- apply an operational semantics or run an interpreter on a given program
- apply type inference rules or run a type checker on a given program
- explain the purpose of a definition or algorithm, or how multiple definitions and algorithms can fit together
Exercise:
Suppose you are given the following type definitions for representing simple English sentences:
type Noun = String type Adjective = String type Verb = String type Adverb = String
data NounPhrase = AN Adjective NounPhrase | N Noun data VerbPhrase = AV Adverb VerbPhrase | V Verb data Sentence = S NounPhrase VerbPhrase
-
Determine the minimal substitution that solves the following equation under pattern-matching unification, or explain why no solution can exist (by demonstrating what base case causes an issue):
| = | S (AN "yellow" (N "cat")) (V "runs")
|
|
The minimal substitution is:
{a ↦ "yellow" , b ↦ "runs" }
|
|
-
Determine the minimal substitution that solves the following equation under pattern-matching unification, or explain why no solution can exist (by demonstrating what base case causes an issue):
S (AN a (AN b (N "cat"))) (V c) | |
| = | S (AN "yellow" (N "cat")) (V "runs")
|
|
There is no substitution because once unification reaches a recursive invocation for the following equation, it will fail:
Since the constructor AN can never be unified with the constructor N , there is no substitution that can be applied to both sides of the original equation to make both sides equivalent.
-
Given the above data type definition, is it possible to write a single Haskell function pattern that will match any sentence in which
"cat" is the subject?
No, this is impossible because the pattern would need to match a tree of arbitrary depth, since the N leaf constructor can only occur at the botton of a NounPhrase tree. In other words, the pattern would need to look something like the following:
f (AN _ (AN _ ( ... (AN _ (N "cat"))))) = ...
|
|
Nothing like the above is supported by Haskell.
Exercise:
Adjust the following grammar definition so that it accepts exactly the same set of token sequences, but is not left-recursive:
We introduce a new production, and we nove to it all the cases in the production for n that begin with a terminal. We then replace all those cases in the production for n with a single case corresponding to the non-terminal for the new production, and we replace all the left-recursive non-terminals with the new non-terminal.
Exercise:
Suppose you are given the following grammar definition and operational semantics:
According to the above operational semantics, to what should change foo evaluate?
According to the second inference rule, change foo should evaluate to is bar. Notice that the above inference rules could be implemented as an evaluation algorithm, e.g. in Haskell:
data Actor = Foo | Bar data Action = Change Actor | Is Actor
eval (Is r ) = Is r eval (Change Foo) = Is Bar eval (Change Bar) = Is Foo
Exercise:
Answer the following questions by drawing diagrams (your diagrams may need to incorporate self-loops).
-
Determine which of the following terms refer to data structures, and which terms refer to algorithms. Draw a flow chart incorporating all of the above components that demonstrates how they might interact in an actual implementation (there may be more than one correct answer, but the way they interact must be reasonable):
- input file;
- abstract syntax trees;
- parser;
- compiler;
- type checker;
- error message;
- loop unroller;
- machine instruction sequences.
input file |
|
loop unroller |
|
|
|
|
|
machine instruction sequences |
⇓ |
|
⇑ ⇓ |
|
|
|
|
|
⇑ |
parser |
⇒ |
abstract syntax trees |
⇒ |
type checker |
⇒ |
abstract syntax trees |
⇒ |
compiler |
⇓ |
|
|
|
⇓ |
|
|
|
|
error messages |
|
|
|
error messages |
|
|
|
|
-
Draw a flow chart incorporating all of the below components that demonstrates how they might interact in an actual implementation:
- exhaustive case generator;
- abstract syntax trees;
- compiler;
- interpreter;
- machine instruction sequences;
- simulator;
- outputs;
- output comparison function.
exhaustive case generator |
|
⇓ |
|
abstract syntax trees |
|
⇓ |
|
⇓ |
|
interpreter |
|
compiler |
⇒ |
machine instruction sequences |
⇓ |
|
|
|
⇓ |
outputs |
|
outputs |
⇐ |
simulator |
⇓ |
|
⇓ |
|
|
output comparison function |
|
|
[link] Programming Language Concepts
This material is no longer available.
[link] Using gsubmit
In this course, you will submit your assignments using gsubmit . This section reproduces and extends some of the instructions already made available by the BU Computer Science Department.[link] Register for a CS account
You must obtain a CS account to use the csa machines maintained by the CS Dept. You will need to physically visit the undergraduate computing lab located at 730 Commonwealth Avenue, on the third floor in room 302.[link] Download SSH/SCP client software
You will need an SCP or SFTP client (such as WinSCP for Windows or CyberDuck for OS X) to copy files from your local computer to your csa home directory. If you are using Windows, you will also need an SSH client (such as PuTTY).[link] Submitting assignments using gsubmit
A typical workflow can be described as follows.- You assemble your assignment solution file(s) on your own computer or device.
local device
|
|
your csa2 /csa3 home directory
|
|
your gsubmit directory for CS 320
|
-
You log into
csa2 or csa3 using an SCP or SSH client and create a directory for your submission in your CS account home directory. Note that in the examples below %> represents a terminal prompt, which may look different on your system.
local device
|
|
your csa2 /csa3 home directory
|
|
your gsubmit directory for CS 320
|
-
If you have not already done so (e.g., if you were using an SSH client in the previous step), you log into
csa2 or csa3 using an SCP client and copy your completed file(s) into that directory.
local device
|
⇒ |
your csa2 /csa3 home directory
|
|
your gsubmit directory for CS 320
|
-
If you have not already done so, you log into
csa2 or csa3 using an SSH client and run the gsubmit commands to copy the files from your CS account home directory to the gsubmit directories to which the course staff has access.
%> cd ~ %> gsubmit cs320 hw1
local device
|
⇒ |
your csa2 /csa3 home directory
|
⇒ |
your gsubmit directory for CS 320
|
-
To view your submitted files, you can use the following command:
To look at a file that has already been submitted, you can use:
%> gsubmit cs320 -cat hw1/hw1.py
After grades are posted (normally, this will be announced on the mailing list and in lecture), you can check your grade using:
%> gsubmit cs320 -cat grade.hw1.txt
[link] Python
The Python programming language will be among the languages we use in this course. This language supports the object-oriented, imperative, and functional programming paradigms, has automatic memory managememt, and natively supports common high-level data structures such as lists and sets. Python is often used as an interpreted language, but it can also be compiled.[link] Assembling a Python module
The simplest Python program is a single file (called a module) with the file extension .py . For example, suppose the following is contained within a file called example.py :
Assuming Python is installed on your system, to run the above program from the command line you can use the following (you may need to use python3 , python3.2 , python3.3 , etc. depending on the Python installation you're using). Note that in the examples below %> represents a terminal prompt, which may look different on your system.
%> python example.py Hello, world.
If you run Python without an argument on the command line, you will enter Python's interactive prompt. You can then evaluate expressions and execute individual statements using this prompt; you can also load and execute a Python module file:
%> python Python 3.2 ... Type "help", "copyright", "credits" or "license" for more information. >>> exec(open("example.py").read()) Hello, world. >>> x = "Hello." >>> print(x) Hello. >>> x 'Hello.' >>> 1 + 2 3
[link] Common data structures (i.e., Python expressions)
Python provides native support for several data structures that we will use throughout this course: integers, strings, lists, tuples, sets, and dictionaries (also known as finite maps). In this subsection, we present how instances of these data structures are represented in Python, as well as the most common operations and functions that can be applied to these data structure instances.-
Booleans consist of two constants:
True and False .- The usual logical operations are available using the operators
and , or , and not .
>>> True True >>> False False >>> True and False or True and (not False) True
-
Integers are written as in most other programming languages (i.e., as a sequence of digits).
- The usual arithmetic operations are available using the operators
+ , * , - , and / . The infix operator // represents integer division, and the infix operators ** represents exponentiation. Negative integers are prefixed with the negation operator - . - The usual relational operators
== , != , < , > , <= , >= are available. - The
int() function can convert a string that looks like an integer into an integer.
>>> 123 123 >>> 1 * (2 + 3) // 4 - 5 -4 >>> 4 * 5 >= 19 True >>> int("123") 123
-
Strings are delimited by either
' or " characters. Strings can be treated as lists of single-character strings. Another way to look at this is that there is no distinction between a character and a string: all characters are just strings of length 1. Multiline strings can be delimited using """ or ''' (i.e., three quotation mark characters at the beginning and end of the string literal).- The empty string is denoted using
'' or "" . - Two strings can be concatenated using
+ . - The function
len() returns the length of a string. - Individual characters in a string can be accessed using the bracketed index notation (e.g.,
s[i] ). These characters are also strings themselves.
>>> 'Example.' 'Example.' >>> "Example." 'Example.' >>> len("ABCD") 4 >>> "ABCD" + "EFG" 'ABCDEFG' >>> "ABCD"[2] 'C'
-
Lists are similar to arrays: they are ordered sequences of objects and/or values. The entries of a list can be of a mixture of different types, and lists containing one or more objects are delimited using
[ and ] , with the individual list entries separated by commas. Lists cannot be members of sets.- The empty list is denoted using
[] . - Two lists can be concatenated using
+ . - The function
len() returns the length of a list. - Individual entries in a list can be accessed using the bracketed index notation (e.g.,
a[i] ). - To check if a value is in a list, use the
in relational operator.
>>> [1,2,"A","B"] [1, 2, 'A', 'B'] >>> [1, 2] + ['A','B'] [1, 2, 'A', 'B'] >>> len([1,2,"A","B"] ) 4 >>> [1,2,"A","B"][0] 1 >>> 1 in [1, 2] True
-
Tuples are similar to lists (they are ordered, and can contain objects of different types), except they are delimited by parentheses
( and ) , with entries separated by commas. The main distinction between lists and tuples is that tuples are hashable (i.e., they can be members of sets).- The empty tuple is denoted using
() . - A tuple containing a single object
x is denoted using (x, ) . - Two tuples can be concatenated using
+ . - A tuple can be turned into a list using the
list() function. - A list can be turned into a tuple using the
tuple() function. - The function
len() returns the length of a tuple. - Individual entries in a tuple can be accessed using the bracketed index notation (e.g.,
t[i] ). - To check if a value is in a tuple, use the
in relational operator.
>>> (1,2,"A","B") (1, 2, 'A', 'B') >>> (1,) (1,) >>> (1, 2) + ('A','B') (1, 2, 'A', 'B') >>> list((1, 2, 'A','B')) [1, 2, 'A', 'B'] >>> tuple([1, 2, 'A','B']) (1, 2, 'A', 'B') >>> len((1,2,"A","B")) 4 >>> (1,2,"A","B")[0] 1 >>> 1 in (1, 2) True
-
Sets are unordered sequences that cannot contain duplicates. They are a close approximation of mathematical sets. Sets cannot be members of sets.
- The empty set is denoted using
set() . - The methods
.union() and .intersect correspond to the standard set operations. - A list or tuple can be turned into a set using the
set() function. - A set can be turned into a list or tuple using the
list() or list() function, respectively. - The function
len() returns the size of a set. - To access individual entries in a set, it is necessary to turn the set into a list or tuple.
- To check if a value is in a set, use the
in relational operator.
>>> {1,2,"A","B"} {1, 2, 'A', 'B'} >>> ({1,2}.union({3,4})).intersection({4,5}) {4} >>> set([1, 2]).union(set(('A','B'))) {'A', 1, 2, 'B'} >>> len({1,2,"A","B"}) 4 >>> 1 in {1,2,"A","B"} True
-
Frozen sets are like sets, except they can be members of other sets. A set can be turned into a frozen set using the
frozenset() function.
>>> frozenset({1,2,3}) frozenset({1, 2, 3}) >>> {frozenset({1,2}), frozenset({3,4})} {frozenset({3, 4}), frozenset({1, 2})}
-
Dictionaries are unordered collections of associations between some set of keys and some set of values. Dictionaries are also known as finite maps.
- The empty dictionary is denoted using
{} . - The list of keys that the dictionary associates with values can be obtained using
list(d.keys()) . - The list of values that the dictionary contains can be obtained using
list(d.values()) . - The function
len() returns the number of entries in the dictionary. - Individual entries in a dictionary can be accessed using the bracketed index notation (e.g.,
d[key] ).
>>> {"A":1, "B":2} {'A': 1, 'B': 2} >>> list({"A":1, "B":2}.keys()) ['A', 'B'] >>> list({"A":1, "B":2}.values()) [1, 2] >>> len({"A":1, "B":2}) 2 >>> {"A":1, "B":2}["A"] 1
[link] Function, procedure, and method invocations
Python provides a variety of ways to supply parameter arguments when invoking functions, procedures, and methods.-
Function calls and method/procedure invocations consist of the function, procedure, or method name followed by a parenthesized, comma-delimited list of arguments. For example, suppose a function or procedure
example() is defined as follows:
def example(x, y, z): print("Invoked.") return x + y + z
To invoke the above definition, we can use one of the following techniques.-
Passing arguments directly involves listing the comma-delimited arguments directly between parentheses.
>>> example(1,2,3) Invoked. 6
-
The argument unpacking operator (also known as the
* -operator, the scatter operator, or the splat operator) involves providing a list to the function, preceded by the * symbol; the arguments will be drawn from the elements in the list.
>>> args = [1,2,3] >>> example(*args) Invoked. 6
-
The keyword argument unpacking operator (also known as the
** -operator) involves providing a dictionary to the function, preceded by the ** symbol; each named paramter in the function definition will be looked up in the dictionary, and the value associated with that dictionary key will be used as the argument passed to that parameter.
>>> args = {'z':3, 'x':1, 'y':2} >>> example(**args) Invoked. 6
-
Default parameter values can be specified in any definition. Suppose the following definition is provided.
def example(x = 1, y = 2, z = 3): return x + y + z
The behavior is then as follows: if an argument corresponding to a parameter is not supplied, the default value found in the definition is used. If an argument is supplied, the supplied argument value is used.
>>> example(0, 0) 3 >>> example(0) 5 >>> example() 6
[link] Comprehensions
Python provides concise notations for defining data structures and performing logical computations. In particular, it support a comprehension notation that can be used to build lists, tuples, sets, and dictionaries.-
List comprehensions make it possible to construct a list by iterating over one or more other data structure instances (such as a list, tuple, set, or dictionary) and performing some operation on each element or combination of elements. The resulting list will contain the result of evaluating the body for every combination.
>>> [ x for x in [1,2,3] ] [1, 2, 3] >>> [ 2 * x for x in {1,2,3} ] [2, 4, 6] >>> [ x + y for x in {1,2,3} for y in (1,2,3) ] [2, 3, 4, 3, 4, 5, 4, 5, 6]
It is also possible to add conditions anywhere after the first for clause. This will filter which combinations are actually used to add a value to the resulting list.
>>> [ x for x in {1,2,3} if x < 3 ] [1, 2] >>> [ x + y for x in {1,2,3} for y in (1,2,3) if x > 2 and y > 1 ] [5, 6]
-
Set comprehensions make it possible to construct a set by iterating over one or more other data structure instances (such as a list, tuple, set, or dictionary) and performing some operation on each element or combination of elements. The resulting list will contain the result of evaluating the body for every combination. Notice that the result will contain no duplicates because the result is a set.
>>> { x for x in [1,2,3,1,2,3] } {1, 2, 3}
-
Dictionary comprehensions make it possible to construct a dictionary by iterating over one or more other data structure instances (such as a list, tuple, set, or dictionary) and performing some operation on each element or combination of elements. The resulting dictionary will contain the result of evaluating the body for every combination.
>>> { key : 2 for key in ["A","B","C"] } {'A': 2, 'C': 2, 'B': 2}
[link] Other useful built-in functions
The built-in function type() can be used to determine the type of a value. Below, we provide examples of how to check whether a given expression has one of the common Python types:
>>> type(True) == @bool True >>> type(123) == int True >>> type("ABC") == str True >>> type([1,2,3]) == list True >>> type(("A",1,{1,2})) == tuple True >>> type({1,2,3}) == set True >>> type({"A":1, "B":2}) == dict True
[link] Common Python definition and control constructs (i.e., Python statements)
A Python program is a sequence of Python statements. Each statement is either a function definition, a variable assignment, a conditional statement (i.e., if , else , and/or elif ), an iteration construct (i.e., a for or while loop), a return statement, or a break or continue statement.-
Variable assignments make it possible to assign a value or object to a variable.
It is also possible to assign a tuple (or any computation that produces a tuple) to another tuple:
-
Function and procedure definitions consist of the
def keyword, followed by the name of the function or procedure, and then by one or more arguments (delimited by parentheses and separated by commas).
def example(a, b, c): return a + b + c
-
Conditional statements consist of one or more branches, each with its own boolean expression as the condition (with the exception of
else ). The body of each branch is an indented sequence of statements.
def fibonacci(n): if n <= 0: return 0 elif n <= 2: return 1 else: return fibonacci(n-1) + fibonacci(n-2)
-
Iteration constructs make it possible to repeat a sequence of statements over and over. The body of an iteration construct is an indented sequence of statements.
-
The while construct has a boolean expression as its condition (much like
if ). The body is executed over and over until the expression in the condition evaluates to False , or a break statement is encountered.
def example1(n): i = 0 sum = 0 while i < n: sum = sum + i i = i + 1 return sum
def example2(n): i = 0 sum = 0 while True: sum = sum + i i = i + 1 if i == n: break return sum
-
The for construct makes it possible to repeat a sequence of statements once for every object in a list, tuple, or set, or once for every key in a dictionary.
def example3(n): sum = 0 for i in range(0,n): sum = sum + i return sum
def example4(d): sum = 0 for key in d: sum = sum + d[key] return sum
[link] Haskell
The Haskell programming language will be among the languages we use in this course. This language supports the functional programming paradigm, has automatic memory managememt, and natively supports algebraic data types. Haskell is both an interpreted and compiled language.[link] Haskell modules and declarations
Haskell code is organized into modules. There is one named module per Haskell file, and the file name should match the module name (e.g., the file Example.hs should contain the Example module definition). The module body consists of a series of declarations, which can appear in any order; the only exception is that if there are multiple declarations for a function, they must all be grouped together.
module Example where
g (x) = f(x) + f(x);
f (0) = 0; f (x) = f(x - 1) + 1;
Each declaration in a Haskell module falls into one of two categories: an expression-level declaration, or a type-level declaration. Any declaration that defines variables or functions is an expression-level declaration; any declaration that defines a type is a type-level declaration.
f(x) = 2 * x
data Tree = Leaf | Node Tree Tree
[link] Common data structures (i.e., Haskell expressions)
The Haskell base library (sometimes also called the "Prelude", and imported implicitly and automatically into every module) has a large number of standard and conventional data type and function declarations. These declarations provide native support for a variety of data structures and operations that we will use throughout this course: integers, strings, lists, and tuples, among others. In this subsection, we present how instances of these data structures are represented in Haskell, as well as the most common operations and functions that can be applied to these data structure instances.-
The Bool data type supports two constructors:
True and False .- The usual logical operations are available using the operators
&& , || , and not . - The built-in syntax
if ... then ... else ... is also available (similar to Python's ... if ... else ... and C/C++/Java's ternary operator ... ? ... : ... .
*> True True *> False False *> True || False && True && (not False) True *> if True then "Yes" else "No" "Yes"
-
Integers are written as in most other programming languages (i.e., as a sequence of digits). Haskell supports a bounded-size type
Int and an unbounded-size type Integer .- The usual arithmetic operations are available using the operators
+ , * , - , / , and ^ . The Prelude function div represents integer division. The Prelude functions min and max make it possible to compute the minimum and maximum of two integers, respectively, and the function minimum and maximum make it possible to compute the minimum and maximum of a list of integers. Negative integers are prefixed with the negation operator - . - The usual relational operators
== , /= (not equal), < , > , <= , >= are available.
*> 123 123 *> 1 * (2 + 3) `div` 4 - 5 -4 *> 4 * 5 >= 19 True *> max 10 20 20 *> minimum [1,2,3] 1
-
Strings are delimited by
" characters (individual characters are always delimited using ' ). Strings can be treated as lists of characters.- The empty string is denoted using
"" . - Two strings can be concatenated using
++ . - The
Prelude function length returns the length of a string.
*> "Example." "Example." *> length "ABCD" 4 *> "ABCD" ++ "EFG" "ABCDEFG"
-
Lists are ordered sequences of expressions. The entries of a list must all be of the same type. Lists are delimited using
[ and ] , with the individual list entries separated by commas.- The empty list is denoted using
[] . - Elements can be added to the front of a list using
: . - Two lists can be concatenated using
++ . - The
Prelude function length returns the length of a list. - To check if a value is in a list, use the
Prelude function elem (note that this relies on the == operation being defined on the type of the elements in the list).
*> [1,2,3] [1,2,3] *> 1 : [2,3] [1,2,3] *> [1,2,3] ++ [4,5] [1,2,3,4,5] *> length [1,2,3,4] 4 *> 1 `elem` [1, 2] True
-
Tuples are ordered, and can contain expressions of different types. They are delimited by parentheses
( and ) , with entries separated by commas.- The empty tuple is denoted using
() . - There are no tuples containing a single element;
(e) is equivalent to e for any Haskell expression e .
|