title | filename | chapternum |
---|---|---|
Mathematical Background |
lec_00_1_math_background |
1 |
- Recall basic mathematical notions such as sets, functions, numbers, logical operators and quantifiers, strings, and graphs.
- Rigorously define Big-$O$ notation.
- Proofs by induction.
- Practice with reading mathematical definitions, statements, and proofs.
- Transform an intuitive argument into a rigorous proof.
"I found that every number, which may be expressed from one to ten, surpasses the preceding by one unit: afterwards the ten is doubled or tripled ... until a hundred; then the hundred is doubled and tripled in the same manner as the units and the tens ... and so forth to the utmost limit of numeration.", Muhammad ibn Mūsā al-Khwārizmī, 820, translation by Fredric Rosen, 1831.
In this chapter we review some of the mathematical concepts that we use in this book. These concepts are typically covered in courses or textbooks on "mathematics for computer science" or "discrete mathematics"; see the "Bibliographical Notes" section (notesmathchap{.ref}) for several excellent resources on these topics that are freely-available online.
A mathematician's apology. Some students might wonder why this book contains so much math. The reason is that mathematics is simply a language for modeling concepts in a precise and unambiguous way. In this book we use math to model the concept of computation. For example, we will consider questions such as "is there an efficient algorithm to find the prime factors of a given integer?". (We will see that this question is particularly interesting, touching on areas as far apart as Internet security and quantum mechanics!) To even phrase such a question, we need to give a precise definition of the notion of an algorithm, and of what it means for an algorithm to be efficient. Also, since there is no empirical experiment to prove the nonexistence of an algorithm, the only way to establish such a result is using a mathematical proof.
Depending on your background, you can approach this chapter in two different ways:
-
If you have already taken "discrete mathematics", "mathematics for computer science" or similar courses, you do not need to read the whole chapter. You can just take a quick look at secmathoverview{.ref} to see the main tools we will use, notationsec{.ref} for our notation and conventions, and then skip ahead to the rest of this book. Alternatively, you can sit back, relax, and read this chapter just to get familiar with our notation, as well as to enjoy (or not) my philosophical musings and attempts at humor.
-
If your background is less extensive, see notesmathchap{.ref} for some resources on these topics. This chapter briefly covers the concepts that we need, but you may find it helpful to see a more in-depth treatment. As usual with math, the best way to get comfortable with this material is to work out exercises on your own.
-
You might also want to start brushing up on discrete probability, which we'll use later in this book (see probabilitychap{.ref}).
The main mathematical concepts we will use are the following. We just list these notions below, deferring their definitions to the rest of this chapter. If you are familiar with all of these, then you might want to just skip to notationsec{.ref} to see the full list of notation we use.
-
Proofs: First and foremost, this book involves a heavy dose of formal mathematical reasoning, which includes mathematical definitions, statements, and proofs.
-
Sets and set operations: We will use extensively mathematical sets. We use the basic set relations of membership (
$\in$ ) and containment ($\subseteq$ ), and set operations, principally union ($\cup$ ), intersection ($\cap$ ), and set difference ($\setminus$ ). -
Cartesian product and Kleene star operation: We also use the Cartesian product of two sets
$A$ and$B$ , denoted as$A \times B$ (that is,$A \times B$ the set of pairs$(a,b)$ where$a\in A$ and$b\in B$ ). We denote by$A^n$ the$n$ fold Cartesian product (e.g.,$A^3 = A \times A \times A$ ) and by$A^*$ (known as the Kleene star) the union of$A^n$ for all$n \in {0,1,2,\ldots}$ . -
Functions: The domain and codomain of a function, properties such as being one-to-one (also known as injective) or onto (also known as surjective) functions, as well as partial functions (that, unlike standard or "total" functions, are not necessarily defined on all elements of their domain).
-
Logical operations: The operations AND (
$\wedge$ ), OR ($\vee$ ), and NOT ($\neg$ ) and the quantifiers "there exists" ($\exists$ ) and "for all" ($\forall$ ). -
Basic combinatorics: Notions such as
$\binom{n}{k}$ (the number of$k$ -sized subsets of a set of size$n$ ). -
Graphs: Undirected and directed graphs, connectivity, paths, and cycles.
-
Big-$O$ notation:
$O,o,\Omega,\omega,\Theta$ notation for analyzing asymptotic growth of functions. -
Discrete probability: We will use probability theory, and specifically probability over finite samples spaces such as tossing
$n$ coins, including notions such as random variables, expectation, and concentration. We will only use probability theory in the second half of this text, and will review it beforehand in probabilitychap{.ref}. However, probabilistic reasoning is a subtle (and extremely useful!) skill, and it's always good to start early in acquiring it.
In the rest of this chapter we briefly review the above notions. This is partially to remind the reader and reinforce material that might not be fresh in your mind, and partially to introduce our notation and conventions which might occasionally differ from those you've encountered before.
Mathematicians use jargon for the same reason that it is used in many other professions such as engineering, law, medicine, and others. We want to make terms precise and introduce shorthand for concepts that are frequently reused. Mathematical texts tend to "pack a lot of punch" per sentence, and so the key is to read them slowly and carefully, parsing each symbol at a time.
With time and practice you will see that reading mathematical texts becomes easier and jargon is no longer an issue. Moreover, reading mathematical texts is one of the most transferable skills you could take from this book. Our world is changing rapidly, not just in the realm of technology, but also in many other human endeavors, whether it is medicine, economics, law or even culture. Whatever your future aspirations, it is likely that you will encounter texts that use new concepts that you have not seen before (see alphagozerofig{.ref} and zerocashfig{.ref} for two recent examples from current "hot areas"). Being able to internalize and then apply new definitions can be hugely important. It is a skill that's much easier to acquire in the relatively safe and stable context of a mathematical course, where one at least has the guarantee that the concepts are fully specified, and you have access to your teaching staff for questions.
The basic components of a mathematical text are definitions, assertions and proofs.
Mathematicians often define new concepts in terms of old concepts. For example, here is a mathematical definition which you may have encountered in the past (and will see again shortly):
Let
onetoonedef{.ref} captures a simple concept, but even so it uses quite a bit of notation.
When reading such a definition, it is often useful to annotate it with a pen as you're going through it (see onetoonedefannotatedef{.ref}).
For example, when you see an identifier such as
{#onetoonedefannotatedef .margin }
Theorems, lemmas, claims and the like are true statements about the concepts we defined. Deciding whether to call a particular statement a "Theorem", a "Lemma" or a "Claim" is a judgement call, and does not make a mathematical difference. All three correspond to statements which were proven to be true. The difference is that a Theorem refers to a significant result that we would want to remember and highlight. A Lemma often refers to a technical result that is not necessarily important in its own right, but that can be often very useful in proving other theorems. A Claim is a "throwaway" statement that we need to use in order to prove some other bigger results, but do not care so much about for its own sake.
Mathematical proofs are the arguments we use to demonstrate that our theorems, lemmas, and claims are indeed true. We discuss proofs in proofsbackgroundsec{.ref} below, but the main point is that the mathematical standard of proof is very high. Unlike in some other realms, in mathematics a proof is an "airtight" argument that demonstrates that the statement is true beyond a shadow of a doubt. Some examples in this section for mathematical proofs are given in simplepathlemex{.ref} and topsortsec{.ref}. As mentioned in the preface, as a general rule, it is more important you understand the definitions than the theorems, and it is more important you understand a theorem statement than its proof.
In this section we quickly review some of the mathematical objects (the "basic data structures" of mathematics, if you will) we use in this book.
A set is an unordered collection of objects.
For example, when we write
We can define sets by either listing all their elements or by writing down a rule that they satisfy such as
$$
\text{EVEN} = { x ;|; \text{
Of course there is more than one way to write the same set, and often we will use intuitive notation listing a few examples that illustrate the rule.
For example, we can also define
Note that a set can be either finite (such as the set
Operations on sets: The union of two sets
Tuples, lists, strings, sequences: A tuple is an ordered collection of items. For example
Cartesian product: If
There are several sets that we will use in this book time and again. The set
$$
\N = { 0, 1,2, \ldots }
$$
contains all natural numbers, i.e., non-negative integers.
For any natural number
We will also occasionally use the set
Strings: Another set we will use time and again is
$$
{0,1}^n = { (x_0,\ldots,x_{n-1}) ;:; x_0,\ldots,x_{n-1} \in {0,1} }
$$
which is the set of all
We will write the string
For every string
We will also often talk about the set of binary strings of all lengths, which is
Another way to write this set is as $$ {0,1}^* = {0,1}^0 \cup {0,1}^1 \cup {0,1}^2 \cup \cdots $$ or more concisely as
The set
Generalizing the star operation: For every set
Concatenation: The concatenation of two strings
If
Input | Output |
---|---|
0 | 0 |
1 | 1 |
2 | 0 |
3 | 1 |
4 | 0 |
5 | 1 |
6 | 0 |
7 | 1 |
8 | 0 |
9 | 1 |
Table: An example of a function.
If
Giving a bijection between two sets is often a good way to show they have the same size.
In fact, the standard mathematical definition of the notion that "$S$ and
Partial functions: We will sometimes be interested in partial functions from
The notion of partial functions is a strict generalization of functions, and so every function is a partial function, but not every partial function is a function. (That is, for every non-empty
Basic facts about functions: Verifying that you can prove the following results is an excellent way to brush up on functions:
-
If
$F:S \rightarrow T$ and$G:T \rightarrow U$ are one-to-one functions, then their composition$H:S \rightarrow U$ defined as$H(s)=G(F(s))$ is also one to one. -
If
$F:S \rightarrow T$ is one to one, then there exists an onto function$G:T \rightarrow S$ such that$G(F(s))=s$ for every$s\in S$ . -
If
$G:T \rightarrow S$ is onto then there exists a one-to-one function$F:S \rightarrow T$ such that$G(F(s))=s$ for every$s\in S$ . -
If
$S$ and$T$ are non-empty finite sets then the following conditions are equivalent to one another: (a)$|S| \leq |T|$ , (b) there is a one-to-one function$F:S \rightarrow T$ , and (c) there is an onto function$G:T \rightarrow S$ . These equivalences are in fact true even for infinite$S$ and$T$ . For infinite sets the condition (b) (or equivalently, (c)) is the commonly accepted definition for$|S| \leq |T|$ .
{#functionsdiagrampng .margin }
You can find the proofs of these results in many discrete math texts, including for example, Section 4.5 in the Lehman-Leighton-Meyer notes.
However, I strongly suggest you try to prove them on your own, or at least convince yourself that they are true by proving special cases of those for small sizes (e.g.,
Let us prove one of these facts as an example:
If
Choose some
Graphs are ubiquitous in Computer Science, and many other fields as well. They are used to model a variety of data types including social networks, scheduling constraints, road networks, deep neural nets, gene interactions, correlations between observations, and a great many more. Formal definitions of several kinds of graphs are given next, but if you have not seen graphs before in a course, I urge you to read up on them in one of the sources mentioned in notesmathchap{.ref}.
Graphs come in two basic flavors: undirected and directed.^[It is possible, and sometimes useful, to think of an undirected graph as the special case of a directed graph that has the special property that for every pair
{#graphsexampefig .margin offset="1.5in"}
An undirected graph
Given this definition, we can define several other properties of graphs and their vertices.
We define the degree of
Here are some basic facts about undirected graphs. We give some informal arguments below, but leave the full proofs as exercises (the proofs can be found in many of the resources listed in notesmathchap{.ref}).
In any undirected graph
degreesegeslem{.ref} can be shown by seeing that every edge
The connectivity relation is transitive, in the sense that if
conntranslem{.ref} can be shown by simply attaching a path of the form
For every undirected graph
simplepathlem{.ref} can be shown by "shortcutting" any non-simple path from
::: {.solvedexercise title="Connected vertices have simple paths" #simplepathlemex} Prove simplepathlem{.ref} :::
::: {.solution data-ref="simplepathlemex"}
The proof follows the idea illustrated in shortcutpathfig{.ref}.
One complication is that there can be more than one vertex that is visited twice by a path, and so "shortcutting" might not necessarily result in a simple path; we deal with this by looking at a shortest path between
Let
::: {.remark title="Finding proofs" #comingupwithproofs} simplepathlemex{.ref} is a good example of the process of finding a proof. You start by ensuring you understand what the statement means, and then come up with an informal argument why it should be true. You then transform the informal argument into a rigorous proof. This proof need not be very long or overly formal, but should clearly establish why the conclusion of the statement follows from its assumptions. :::
The concepts of degrees and connectivity extend naturally to directed graphs, defined as follows.
A directed graph
A directed graph might contain both
We say that
The lemmas we mentioned above have analogs for directed graphs. We again leave the proofs (which are essentially identical to their undirected analogs) as exercises.
In any directed graph
In any directed graph
For every directed graph
::: {.remark title="Labeled graphs" #labeledrem}
For some applications we will consider labeled graphs, where the vertices or edges have associated labels (which can be numbers, strings, or members of some other set).
We can think of such a graph as having an associated (possibly partial) labelling function
If
Suppose that
For example, the following is a formalization of the true statement that there exists a natural number
"For sufficiently large
The following shorthands for summing up or taking products of several numbers are often convenient.
If
and
For example, the sum of the squares of all numbers from
Since summing up over intervals of integers is so common, there is a special notation for it. For every two integers,
In mathematics, as in coding, we often have symbolic "variables" or "parameters".
It is important to be able to understand, given some formula, whether a given variable is bound or free in this formula.
For example, in the following statement
Since
The same issue appears when parsing code. For example, in the following snippet from the C programming language
for (int i=0 ; i<n ; i=i 1) {
printf("*");
}
the variable i
is bound within the for
block but the variable n
is free.
The main property of bound variables is that we can rename them (as long as the new name doesn't conflict with another used variable) without changing the meaning of the statement. Thus for example the statement
is equivalent to aboutnstmt{.eqref} in the sense that it is true for exactly the same set of
Similarly, the code
for (int j=0 ; j<n ; j=j 1) {
printf("*");
}
produces the same result as the code above that used i
instead of j
.
::: {.remark title="Aside: mathematical vs programming notation" #notationrem}
Mathematical notation has a lot of similarities with programming language, and for the same reasons.
Both are formalisms meant to convey complex concepts in a precise way.
However, there are some cultural differences.
In programming languages, we often try to use meaningful variable names such as NumberOfVertices
while in math we often use short identifiers such as
One consequence of that is that in mathematics we often end up reusing identifiers, and also "run out" of letters and hence use Greek letters too, as well as distinguish between small and capital letters and different font faces.
Similarly, mathematical notation tends to use quite a lot of "overloading", using operators such as
Both fields have a notion of "types", and in math we often try to reserve certain letters for variables of a particular type.
For example, variables such as
Kun's book [@Kun18] contains an extensive discussion on the similarities and differences between the cultures of mathematics and programming. :::
"$\log\log\log n$ has been proved to go to infinity, but has never been observed to do so.", Anonymous, quoted by Carl Pomerance (2000)
It is often very cumbersome to describe precisely quantities such as running time and is also not needed, since we are typically mostly interested in the "higher order terms".
That is, we want to understand the scaling behavior of the quantity as the input variable grows.
For example, as far as running time goes, the difference between an
Generally (though still informally), if
::: {.definition title="Big-$O$ notation" #bigohdef}
Let
We say that $F =o(G)$ if for every
It's often convenient to use "anonymous functions" in the context of
$O$ is not equality. Using the equality sign for
There are some simple heuristics that can help when trying to compare two functions
-
Multiplicative constants don't matter in
$O$ -notation, and so if$F(n)=O(G(n))$ then$100F(n)=O(G(n))$ . -
When adding two functions, we only care about the larger one. For example, for the purpose of
$O$ -notation,$n^3 100n^2$ is the same as$n^3$ , and in general in any polynomial, we only care about the larger exponent. -
For every two constants
$a,b>0$ ,$n^a = O(n^b)$ if and only if$a \leq b$ , and$n^a = o(n^b)$ if and only if$a<b$ . For example, combining the two observations above,$100n^2 10n 100 = o(n^3)$ . -
Polynomial is always smaller than exponential:
$n^a = o(2^{n^\epsilon})$ for every two constants$a>0$ and$\epsilon>0$ even if$\epsilon$ is much smaller than$a$ . For example,$100n^{100} = o(2^{\sqrt{n}})$ . -
Similarly, logarithmic is always smaller than polynomial:
$(\log n)^a$ (which we write as$\log^a n$ ) is$o(n^\epsilon)$ for every two constants$a,\epsilon>0$ . For example, combining the observations above,$100n^2 \log^{100} n = o(n^3)$ .
::: {.remark title="Big
Many people think of mathematical proofs as a sequence of logical deductions that starts from some axioms and ultimately arrives at a conclusion. In fact, some dictionaries define proofs that way. This is not entirely wrong, but at its essence, a mathematical proof of a statement X is simply an argument that convinces the reader that X is true beyond a shadow of a doubt.
To produce such a proof you need to:
-
Understand precisely what X means.
-
Convince yourself that X is true.
-
Write your reasoning down in plain, precise and concise English (using formulas or notation only when they help clarity).
In many cases, the first part is the most important one. Understanding what a statement means is oftentimes more than halfway towards understanding why it is true. In the third part, to convince the reader beyond a shadow of a doubt, we will often want to break down the reasoning to "basic steps", where each basic step is simple enough to be "self-evident". The combination of all steps yields the desired statement.
There is a great deal of similarity between the process of writing proofs and that of writing programs, and both require a similar set of skills. Writing a program involves:
-
Understanding what is the task we want the program to achieve.
-
Convincing yourself that the task can be achieved by a computer, perhaps by planning on a whiteboard or notepad how you will break it up into simpler tasks.
-
Converting this plan into code that a compiler or interpreter can understand, by breaking up each task into a sequence of the basic operations of some programming language.
In programs as in proofs, step 1 is often the most important one. A key difference is that the reader for proofs is a human being and the reader for programs is a computer. (This difference is eroding with time as more proofs are being written in a machine verifiable form; moreover, to ensure correctness and maintainability of programs, it is important that they can be read and understood by humans.) Thus our emphasis is on readability and having a clear logical flow for our proof (which is not a bad idea for programs as well). When writing a proof, you should think of your audience as an intelligent but highly skeptical and somewhat petty reader, that will "call foul" at every step that is not well justified.
A mathematical proof is a piece of writing, but it is a specific genre of writing with certain conventions and preferred styles. As in any writing, practice makes perfect, and it is also important to revise your drafts for clarity.
In a proof for the statement
-
Is this sentence or equation stating that some statement is true?
-
If so, does this statement follow from the previous steps, or are we going to establish it in the next step?
-
What is the role of this sentence or equation? Is it one step towards proving the original statement, or is it a step towards proving some intermediate claim that you have stated before?
-
Finally, would the answers to questions 1-3 be clear to the reader? If not, then you should reorder, rephrase, or add explanations.
Some helpful resources on mathematical writing include this handout by Lee, this handout by Hutching, as well as several of the excellent handouts in Stanford's CS 103 class.
"If it was so, it might be; and if it were so, it would be; but as it isn’t, it ain’t. That’s logic.", Lewis Carroll, Through the looking-glass.
Just like in programming, there are several common patterns of proofs that occur time and again. Here are some examples:
Proofs by contradiction: One way to prove that
There are no natural numbers
Suppose, towards a contradiction that this is false, and so let
Proofs of a universal statement: Often we want to prove a statement
For every natural number
Let
Proofs of an implication: Another common case is that the statement
If
Suppose that
Rearranging the terms of eq:quadeq{.eqref} we get $$ s^2/(4a) c- b^2/(4a) = (b^2-4ac)/(4a) c - b^2/(4a) = 0 $$
Proofs of equivalence: If a statement has the form "$A$ if and only if
Proofs by combining intermediate claims:
When a proof is more complex, it is often helpful to break it apart into several steps.
That is, to prove the statement
Proofs by case distinction: This is a special case of the above, where to prove a statement
Proofs by induction: We discuss induction and give an example in inductionsec{.ref} below. We can think of such proofs as a variant of the above, where we have an unbounded number of intermediate claims
"Without loss of generality (w.l.o.g)": This term can be initially quite confusing. It is essentially a way to simplify proofs by case distinctions. The idea is that if Case 1 is equal to Case 2 up to a change of variables or a similar transformation, then the proof of Case 1 will also imply the proof of Case 2. It is always a statement that should be viewed with suspicion. Whenever you see it in a proof, ask yourself if you understand why the assumption made is truly without loss of generality, and when you use it, try to see if the use is indeed justified. When writing a proof, sometimes it might be easiest to simply repeat the proof of the second case (adding a remark that the proof is very similar to the first one).
::: {.remark title="Hierarchical Proofs (optional)" #lamportrem} Mathematical proofs are ultimately written in English prose. The well-known computer scientist Leslie Lamport argues that this is a problem, and proofs should be written in a more formal and rigorous way. In his manuscript he proposes an approach for structured hierarchical proofs, that have the following form:
-
A proof for a statement of the form "If
$A$ then$B$ " is a sequence of numbered claims, starting with the assumption that$A$ is true, and ending with the claim that$B$ is true. -
Every claim is followed by a proof showing how it is derived from the previous assumptions or claims.
-
The proof for each claim is itself a sequence of subclaims.
The advantage of Lamport’s format is that the role that every sentence in the proof plays is very clear. It is also much easier to transform such proofs into machine-checkable forms. The disadvantage is that such proofs can be tedious to read and write, with less differentiation between the important parts of the arguments versus the more routine ones. :::
In this section we will prove the following: every directed acyclic graph (DAG, see DAGdef{.ref}) can be arranged in layers so that for all directed edges
We start with the following definition. A layering of a directed graph is a way to assign for every vertex
Let
In this section we prove that a directed graph is acyclic if and only if it has a valid layering.
Let
To prove such a theorem, we need to first understand what it means. Since it is an "if and only if" statement, topologicalsortthm{.ref} corresponds to two statements:
For every directed graph
For every directed graph
To prove topologicalsortthm{.ref} we need to prove both acyclictosortlem{.ref} and sorttoacycliclem{.ref}.
sorttoacycliclem{.ref} is actually not that hard to prove.
Intuitively, if
::: {.proof data-ref="sorttoacycliclem"}
Let
acyclictosortlem{.ref} corresponds to the more difficult (and useful) direction. To prove it, we need to show how, given an arbitrary DAG
If you have not seen the proof of this theorem before (or don't remember it), this would be an excellent point to pause and try to prove it yourself.
One way to do it would be to describe an algorithm that given as input a directed acyclic graph
There are several ways to prove acyclictosortlem{.ref}. One approach to do is to start by proving it for small graphs, such as graphs with 1, 2 or 3 vertices (see topsortexamplesfig{.ref}, for which we can check all the cases, and then try to extend the proof for larger graphs). The technical term for this proof approach is proof by induction.
{#topsortexamplesfig .margin }
Induction is simply an application of the self-evident Modus Ponens rule that says that if
(a)
and
(b)
then
In the setting of proofs by induction we typically have a statement
Proofs by induction are closely related to algorithms by recursion.
In both cases we reduce solving a larger problem to solving a smaller instance of itself. In a recursive algorithm to solve some problem P on an input of length
There are several ways to prove acyclictosortlem{.ref} by induction.
We will use induction on the number
$Q(n)$ is "For every DAG $G=(V,E)$ with $n$ vertices, there is a layering of $G$."
The statement for
To do so, we need to somehow find a way, given a graph
The above is the intuition behind the proof of acyclictosortlem{.ref}, but when writing the formal proof below, we use the benefit of hindsight, and try to streamline what was a messy journey into a linear and easy-to-follow flow of logic that starts with the word "Proof:" and ends with "QED" or the symbol
::: {.proof data-ref="acyclictosortlem"}
Let
We make the following claim:
Claim:
Proof of Claim: Suppose otherwise that every vertex
Given the claim, we can let
We claim that
-
Case 1:
$u \neq v_0$ ,$v \neq v_0$ . In this case the edge$u \rightarrow v$ exists in the graph$G'$ and hence by the inductive hypothesis$f'(u) < f'(v)$ which implies that$f'(u) 1 < f'(v) 1$ . -
Case 2:
$u=v_0$ ,$v \neq v_0$ . In this case$f(u)=0$ and$f(v) = f'(v) 1>0$ . -
Case 3:
$u \neq v_0$ ,$v=v_0$ . This case can't happen since$v_0$ does not have in-neighbors. -
Case 4:
$u=v_0, v=v_0$ . This case again can't happen since it means that$v_0$ is its own-neighbor --- it is involved in a self loop which is a form cycle that is disallowed in an acyclic graph.
Thus,
Reading a proof is no less of an important skill than producing one.
In fact, just like understanding code, it is a highly non-trivial skill in itself.
Therefore I strongly suggest that you re-read the above proof, asking yourself at every sentence whether the assumption it makes is justified, and whether this sentence truly demonstrates what it purports to achieve.
Another good habit is to ask yourself when reading a proof for every variable you encounter (such as
topologicalsortthm{.ref} guarantees that for every DAG
::: {.theorem title="Minimal layering is unique" #minimallayeruniquethm}
Let
For every layering
The definition of minimality in minimallayeruniquethm{.ref} implies that for every vertex
The idea is to prove the theorem by induction on the layers.
If
::: {.proof data-ref="minimallayeruniquethm"}
Let
We will prove that
::: { .pause } The proof of minimallayeruniquethm{.ref} is fully rigorous, but is written in a somewhat terse manner. Make sure that you read through it and understand why this is indeed an airtight proof of the Theorem's statement. :::
Most of the notation we use in this book is standard and is used in most mathematical texts. The main points where we diverge are:
-
We index the natural numbers
$\N$ starting with$0$ (though many other texts, especially in computer science, do the same). -
We also index the set
$[n]$ starting with$0$ , and hence define it as${0,\ldots,n-1}$ . In other texts it is often defined as${1,\ldots, n }$ . Similarly, we index our strings starting with$0$ , and hence a string$x\in {0,1}^n$ is written as$x_0x_1\cdots x_{n-1}$ . -
If
$n$ is a natural number then$1^n$ does not equal the number$1$ but rather this is the length$n$ string$11\cdots 1$ (that is a string of$n$ ones). Similarly,$0^n$ refers to the length$n$ string$00 \cdots 0$ . -
Partial functions are functions that are not necessarily defined on all inputs. When we write
$f:A \rightarrow B$ this means that$f$ is a total function unless we say otherwise. When we want to emphasize that$f$ can be a partial function, we will sometimes write$f: A \rightarrow_p B$ . -
As we will see later on in the course, we will mostly describe our computational problems in terms of computing a Boolean function
$f: {0,1}^* \rightarrow {0,1}$ . In contrast, many other textbooks refer to the same task as deciding a language $L \subseteq {0,1}^$. These two viewpoints are equivalent, since for every set $L\subseteq {0,1}^$ there is a corresponding function$F$ such that$F(x)=1$ if and only if$x\in L$ . Computing partial functions corresponds to the task known in the literature as a solving promise problem. Because the language notation is so prevalent in other textbooks, we will occasionally remind the reader of this correspondence. -
We use
$\ceil{x}$ and$\floor{x}$ for the "ceiling" and "floor" operators that correspond to "rounding up" or "rounding down" a number to the nearest integer. We use$(x \mod y)$ to denote the "remainder" of$x$ when divided by$y$ . That is,$(x \mod y) = x - y\floor{x/y}$ . In context when an integer is expected we'll typically "silently round" the quantities to an integer. For example, if we say that$x$ is a string of length$\sqrt{n}$ then this means that$x$ is of length$\lceil \sqrt{n}, \rceil$ . (We round up for the sake of convention, but in most such cases, it will not make a difference whether we round up or down.) -
Like most Computer Science texts, we default to the logarithm in base two. Thus,
$\log n$ is the same as$\log_2 n$ . -
We will also use the notation
$f(n)=poly(n)$ as a shorthand for$f(n)=n^{O(1)}$ (i.e., as shorthand for saying that there are some constants$a,b$ such that$f(n) \leq a\cdot n^b$ for every sufficiently large$n$ ). Similarly, we will use$f(n)=polylog(n)$ as shorthand for$f(n)=poly(\log n)$ (i.e., as shorthand for saying that there are some constants$a,b$ such that$f(n) \leq a\cdot (\log n)^b$ for every sufficiently large$n$ ). -
As is often the case in mathematical literature, we use the apostrophe character to enrich our set of identifiers. Typically if
$x$ denotes some object, then$x'$ ,$x''$ , etc. will denote other objects of the same type. -
To save on "cognitive load" we will often use round constants such as
$10,100,1000$ in the statements of both theorems and problem set questions. When you see such a "round" constant, you can typically assume that it has no special significance and was just chosen arbitrarily. For example, if you see a theorem of the form "Algorithm$A$ takes at most$1000\cdot n^2$ steps to compute function$F$ on inputs of length$n$ " then probably the number$1000$ is an arbitrary sufficiently large constant, and one could prove the same theorem with a bound of the form$c \cdot n^2$ for a constant$c$ that is smaller than$1000$ . Similarly, if a problem asks you to prove that some quantity is at least$n/100$ , it is quite possible that in truth the quantity is at least$n/d$ for some constant$d$ that is smaller than$100$ .
Like programming, mathematics is full of variables. Whenever you see a variable, it is always important to keep track of what its type is (e.g., whether the variable is a number, a string, a function, a graph, etc.). To make this easier, we try to stick to certain conventions and consistently use certain identifiers for variables of the same type. Some of these conventions are listed in notationtable{.ref} below. These conventions are not immutable laws and we might occasionally deviate from them. Also, such conventions do not replace the need to explicitly declare for each new variable the type of object that it denotes.
---
caption: 'Conventions for identifiers in this book'
alignment: LL
table-width: 1/3
id: notationtable
---
*Identifier* *Often denotes object of type*
$i$,$j$,$k$,$\ell$,$m$,$n$ Natural numbers (i.e., in $\mathbb{N} = \{0,1,2,\ldots \}$)
$\epsilon,\delta$ Small positive real numbers (very close to $0$)
$x,y,z,w$ Typically strings in $\{0,1\}^*$ though sometimes numbers or other objects. We often identify an object with its representation as a string.
$G$ A _graph_. The set of $G$'s vertices is typically denoted by $V$. Often $V=[n]$. The set of $G$'s edges is typically denoted by $E$.
$S$ Set
$f,g,h$ Functions. We often (though not always) use lowercase identifiers for _finite functions_, which map $\{0,1\}^n$ to $\{0,1\}^m$ (often $m=1$).
$F,G,H$ Infinite (unbounded input) functions mapping $\{0,1\}^*$ to $\{0,1\}^*$ or $\{0,1\}^*$ to $\{0,1\}^m$ for some $m$. Based on context, the identifiers $G,H$ are sometimes used to denote functions and sometimes graphs.
$A,B,C$ Boolean circuits
$M,N$ Turing machines
$P,Q$ Programs
$T$ A function mapping $\mathbb{N}$ to $\mathbb{N}$ that corresponds to a time bound.
$c$ A positive number (often an unspecified constant; e.g., $T(n)=O(n)$ corresponds to the existence of $c$ s.t. $T(n) \leq c \cdot n$ every $n>0$). We sometimes use $a,b$ in a similar way.
$\Sigma$ Finite set (often used as the _alphabet_ for a set of strings).
Mathematical texts often employ certain conventions or "idioms". Some examples of such idioms that we use in this text include the following:
-
"Let
$X$ be$\ldots$ ", "let$X$ denote$\ldots$ ", or "let$X= \ldots$ ": These are all different ways for us to say that we are defining the symbol$X$ to stand for whatever expression is in the$\ldots$ . When$X$ is a property of some objects we might define$X$ by writing something along the lines of "We say that$\ldots$ has the property$X$ if$\ldots$ .". While we often try to define terms before they are used, sometimes a mathematical sentence reads easier if we use a term before defining it, in which case we add "Where$X$ is$\ldots$ " to explain how$X$ is defined in the preceding expression. -
Quantifiers: Mathematical texts involve many quantifiers such as "for all" and "exists". We sometimes spell these in words as in "for all
$i\in\N$ " or "there is$x\in {0,1}^*$ ", and sometimes use the formal symbols$\forall$ and$\exists$ . It is important to keep track of which variable is quantified in what way the dependencies between the variables. For example, a sentence fragment such as "for every$k >0$ there exists$n$ " means that$n$ can be chosen in a way that depends on$k$ . The order of quantifiers is important. For example, the following is a true statement: "for every natural number $k>1$ there exists a prime number $n$ such that $n$ divides $k$." In contrast, the following statement is false: "there exists a prime number $n$ such that for every natural number $k>1$, $n$ divides $k$." -
Numbered equations, theorems, definitions: To keep track of all the terms we define and statements we prove, we often assign them a (typically numeric) label, and then refer back to them in other parts of the text.
-
(i.e.,), (e.g.,): Mathematical texts tend to contain quite a few of these expressions. We use
$X$ (i.e.,$Y$ ) in cases where$Y$ is equivalent to$X$ and$X$ (e.g.,$Y$ ) in cases where$Y$ is an example of$X$ (e.g., one can use phrases such as "a natural number (i.e., a non-negative integer)" or "a natural number (e.g.,$7$ )"). -
"Thus", "Therefore" , "We get that": This means that the following sentence is implied by the preceding one, as in "The
$n$ -vertex graph$G$ is connected. Therefore it contains at least$n-1$ edges." We sometimes use "indeed" to indicate that the following text justifies the claim that was made in the preceding sentence as in "The $n$-vertex graph $G$ has at least $n-1$ edges. Indeed, this follows since $G$ is connected." -
Constants: In Computer Science, we typically care about how our algorithms' resource consumption (such as running time) scales with certain quantities (such as the length of the input). We refer to quantities that do not depend on the length of the input as constants and so often use statements such as "there exists a constant $c>0$ such that for every $n\in \N$, Algorithm $A$ runs in at most $c \cdot n^2$ steps on inputs of length $n$." The qualifier "constant" for
$c$ is not strictly needed but is added to emphasize that$c$ here is a fixed number independent of$n$ . In fact sometimes, to reduce cognitive load, we will simply replace$c$ by a sufficiently large round number such as$10$ ,$100$ , or$1000$ , or use$O$ -notation and write "Algorithm $A$ runs in $O(n^2)$ time."
- The basic "mathematical data structures" we'll need are numbers, sets, tuples, strings, graphs and functions.
- We can use basic objects to define more complex notions. For example, graphs can be defined as a list of pairs.
- Given precise definitions of objects, we can state unambiguous and precise statements. We can then use mathematical proofs to determine whether these statements are true or false.
- A mathematical proof is not a formal ritual but rather a clear, precise and "bulletproof" argument certifying the truth of a certain statement.
- Big-$O$ notation is an extremely useful formalism to suppress less significant details and allows us to focus on the high-level behavior of quantities of interest.
- The only way to get comfortable with mathematical notions is to apply them in the contexts of solving problems. You should expect to need to go back time and again to the definitions and notation in this chapter as you work through problems in this course.
::: {.exercise title="Logical expressions" #logicalex}
a. Write a logical expression
b. Write a logical expression
::: {.exercise title="Quantifiers" #quantifiersex}
Use the logical quantifiers
a. An expression
b. An expression
:::
::: {.exercise }
Describe the following statement in English words:
::: {.exercise title="Set construction notation" #setsdescription} Describe in words the following sets:
a.
b.
:::
::: {.exercise title="Existence of one to one mappings" #cardinalitiesex}
For each one of the following pairs of sets
a. Let
b. Let
c. Let
::: {.exercise title="Inclusion Exclusion" #inclex }
a. Let
b. Let
c. Let
::: {.exercise #onetoonesize }
Prove that if
::: {.exercise #ontosize}
Prove that if
::: {.exercise }
Prove that for every finite
::: {.exercise }
Suppose that ${ S_n }{n\in \N}$ is a sequence such that $S_0 \leq 10$ and for $n>1$ $S_n \leq 5 S{\lfloor \tfrac{n}{5} \rfloor} 2n$.
Prove by induction that
::: {.exercise }
Prove that for every undirected graph
::: {.exercise title="$O$-notation" #ohnotationex}
For every pair of functions
a.
b.
c.
d.
e.
::: {.exercise}
Give an example of a pair of functions
::: {.exercise #graphcycleex}
Prove that for every undirected graph
::: {.exercise #indsetex}
Prove that for every undirected graph
The heading "A Mathematician's Apology", refers to Hardy's classic book [@Hardy41]. Even when Hardy is wrong, he is very much worth reading.
There are many online sources for the mathematical background needed for this book. In particular, the lecture notes for MIT 6.042 "Mathematics for Computer Science" [@LehmanLeightonMeyer] are extremely comprehensive, and videos and assignments for this course are available online. Similarly, Berkeley CS 70: "Discrete Mathematics and Probability Theory" has extensive lecture notes online.
Other sources for discrete mathematics are Rosen [@Rosen19discrete] and Jim Aspens' online book [@AspensDiscreteMath]. Lewis and Zax [@LewisZax19], as well as the online book of Fleck [@Fleck], give a more gentle overview of much of the same material. Solow [@Solow14] is a good introduction to proof reading and writing. Kun [@Kun18] gives an introduction to mathematics aimed at readers with programming backgrounds. Stanford's CS 103 course has a wonderful collection of handouts on mathematical proof techniques and discrete mathematics.
The word graph in the sense of undirgraph{.ref} was coined by the mathematician Sylvester in 1878 in analogy with the chemical graphs used to visualize molecules.
There is an unfortunate confusion between this term and the more common usage of the word "graph" as a way to plot data, and in particular a plot of some function
Carl Pomerance's quote is taken from the home page of Doron Zeilberger.