Chapter 0 – Fundamentals of Mathematics

An overview of this chapter’s contents and take-aways can be found here.

This introductory chapter tries to illustrate the value mathematics holds for the economist profession and deals with fundamental, overarching concepts that may be viewed as prerequisites for any mathematical application and study, regardless of its purpose or context. You might have come across some or even most of them during your undergraduate studies, but even if you have it may be worthwhile to take the time and review and re-organize them in your mind, and perhaps also refresh your memory a bit.

Why Mathematics?

Usually, it is easier to find motivation for studying abstract and complex matters if you know why you’re doing it. You may ask yourself, “Hey, I signed up for an Economics program! Why should I bother with a Math course, rather than reviews in Micro- or Macroeconomics, or perhaps Econometrics?” So, why is Mathematics of central importance to the economist profession?

As economists we work with mathematical models to describe economic problems and we use statistical and numerical estimation techniques to infer properties of theses models. Even if you do not considert doing research in economics later on in your professional life, you will most likely encounter problems that can only be described and solved using mathematics!

Notation and Logic

Mathematics can be thought of as a language. In this view, the vocabulary of mathematics consists of a large set of commonly agreed-upon mathematical notation (numbers, variables, functions, operators for e.g. addition (+) and multiplication (\times), etc.) and symbols (e.g. \Rightarrow, \subset, \in, etc.). Any combination of items in the vocabulary constitutes a statement, which can be judged whether it is meaningful and if so, also true. These statements can then be combined to form logical arguments that convey information about mathematical facts and relationships.

Mathematical Notation and Symbols (or: Vocabulary)

People “speaking” (or making use of) the mathematical language confine themselves to a rather narrow set of expressions. These expressions help you to more efficiently write down statements. The table below gives a brief overview of those symbols most important for economists. In addition, the sets of numbers  \mathbb R, \mathbb N, \mathbb C, \mathbb Q and  \mathbb Z,  i.e. real, natural, complex, rational numbers and integers, are very important and should be familiar to you. Don’t worry if you can’t remember everything in the table just yet, you will see and use these often enough.

\begin{tabular}{cc|cc}\hline \multicolumn{2}{l|}{\textit{Basics and Quantifiers}} & \multicolumn{2}{|l}{\textit{Logical Statements}}\\ Symbol & Meaning & Symbol & Meaning \\\hline\hline $\exists$ & there exists & $\Rightarrow$ & implies\\ $\exists !$ & there exists exactly one & $\Leftrightarrow$ & is equivalent to\\ $\nexists$ & there does not exist (any) & $\Leftarrow$ & is implied by\\ $\forall$ & for all & $\land$ & logical ``and''\\ $:$ & which/for which/such that & $\lor$ & logical ``or''\\ & (alternatively: ``it holds that'') & $\neg$ & logical ``not''\\ $\in$ & element of & $(\ldots)$ & delimiters of statement\\ $\not\in$ & not an element of & & \\ \hline\hline \end{tabular}

In the first column, the word quantifier, which refers to the first four symbols, may be new to you. They are important since we frequently make use of quantifying statements, that is, expressions of the form

    \begin{equation*}{\color{orange}\text{Quantifier}} + {\color{blue}\text{Considered Elements}} + {\color{red}\text{Property}} \end{equation*}

which indicate whether a certain property holds for all (\forall), some (\exists), exactly one (\exists ! ) or none (\nexists) of the elements considered. To see this in a simple example, consider the mathematical way of saying that all natural numbers are non-negative:

\begin{tabular}{ccc}{\color{orange}$\forall$} {\color{blue}$n \in \mathbb{N}$}: {\color{red}$n\geq 0$}  & \text{or equivalently} &{\color{orange}$\not\exists$} {\color{blue}$n \in \mathbb{N}$}: {\color{red}$n < 0$}.\end{tabular}

Now let us have a look at the table’s second column. The logical “or” has a meaning slightly different from standard English, and more precisely translates to “and/or”, i.e. it does not preclude that both statements are true, but requires at least one to be true. As an example, consider the true statement (1\in\mathbb R \lor 1\in\mathbb N), which asserts that the number 1 is an element of the real numbers “or” an element of the natural numbers. The logical “not” reads as “it is not the case that” and inverts the meaning of a statement, asserting the exact opposite. We can frequently re-write these statements as more natural expressions, e.g. \neg(1.5\in\mathbb N) as 1.5\not\in\mathbb N, or \neg(\exists n\in\mathbb N: n\cdot \pi\in\mathbb N) as \nexists n\in\mathbb N: n\cdot \pi\in\mathbb N. This simplification abides exactly by the same logic as the one we also use in “normal” English where, for instance, “there are some people who own cats” is a more direct way of saying “not all people do not own cats”.


In our discussion of the table above, we have already considered statements without explicitly defining them. Formally, a statement is any combination of mathematical vocabulary. A statement may or may not be meaningful: reading it out verbally it gives you a grammatically correct English sentence. This makes assessing meaningfulness relatively intuitive and straightforward.

Furthermore, statements may or may not be true. Note that meaningful mathematical statements need not be true; consider e.g. (5 > 10) or (\forall n \in N: n < 5). A statement that is not meaningfull can never be true: if there is no meaning, there is nothing to be contrasted against the universe of “true” circumstances. To (re-)familiarize yourself with these notions, consider Table 2, where a denotes “Alaska”, b “Berlin”, and EC the set of European capitals.

 \begin{tabular}{c|c|c}\hline English & Mathematical & meaningful (M) and true (T)?\\\hline\hline Alaska European capital & $a$ $EC$ & not M\\ Alaska is a European capital & $a\in EC$ & M but not T\\ Berlin is a European capital & $b\in EC$ & M and T \\\hline \end{tabular}

Let’s get some more practice with notation and mathematical statements and have a look at the table below. Try to first cover columns 2 and 3 and see whether you can identify the verbal meaning of the mathematical statement and assess whether is true.


\begin{tabular}{c|c|c}\hline Math. Statement & Verbal Meaning & Statement true?\\\hline\hline $\exists x\in\mathbb R: x\in\mathbb Q$ & There exists a real number $x$ which is a rational number. & $+$ (e.g. 1)\\$\forall x\in\mathbb R: x\in\mathbb Q$ & For all real numbers $x$, it holds that $x$ is a rational number. & $-$ (e.g. $\pi$)\\$(x\geq 5 \land y\leq 4)\Rightarrow x\geq y$ & $x$ being greater than 5 and $y$ being smaller than $4$ & $+$ \\& implies $x$ being greater than $y$. & \\$(x\geq 5 \lor y\leq 4)\Rightarrow x > y$ & $x$ being greater than 5 or $y$ being smaller than $4$ & $-$ \\& implies $x$ being strictly greater than $y$. & \\$\forall x\in\mathbb R:( \exists ! y\in\mathbb R: x=y)$ & For all real numbers $x$ it holds that there exists exactly & $+$\\& one real number $y$ for which $x$ is equal to $y$.\\\hline \end{tabular}

Note that the delimiting brackets are of crucial importance in the third and fourth statement, because only with them, it is clear that the implication (\Rightarrow) refers to the whole statement in parentheses rather than just to “y>4”. In the last example, on the other hand, they are just there to increase clarity and would probably be left out by many. Also note that “greater/smaller than” includes equality unless we use the prefix “strictly”. Similarly, we call  y\geq x (y>x) a weak (strict) inequality. As a last comment on the table above, while it may take some time to get used to the notation, you should be able to clearly see the notation’s value added by simply comparing the space needed for the mathematical and verbal statements.

If you feel like testing your understanding of the discussion thus far, you can take a short quiz found here.

From Statements to Arguments and Reasoning

Most of the time, it is rather easily assessed whether individual statements, as we have seen thus far, are true. The more essential part of mathematical analysis is how certain statements relate to each other. We call an argument an assertion of a relationship between two (sets of) statements which we call the premise(s) and conclusion(s), respectively. Typically, the assertion is either that the premises imply (\Rightarrow) the conclusion or that they are equivalent to it (\Leftrightarrow).

In the illustration above, black dots refer to individual statements. Indeed, you have already seen two arguments in last table, namely statements 3 and 4. Here, you have also seen that the individual premises x\geq 5 and y\leq 4 could be combined to a single statement, namely (x\geq 5\land y\leq 4). Indeed, this is what we always do to collect our premises when expressing the argument as a mathematical statement. From the examples above, you can also see that the argument is nothing but a special type of statement, namely one that asserts a relationship between individual sub-statements!

We can use basic logic to internally investigate whether the argument “makes sense”, i.e. whether the asserted relationship between premises and conclusion holds, while remaining agnostic about the plausibility of the premises. If it does, we call the argument valid. The establisished properties and definitions unambiguously determine whether an argument is valid or not. For instance, the argument that the premises “ants are taller than humans” and “humans are taller than elefants” imply the conclusion that “ants are taller than elefants” is indeed valid, since the conclusion logically follows from the premises.

If a valid argument additionally has true premises, it is called sound. It is worthwhile to stress that validity is always needed for soundness – an invalid argument can never be sound! Unlike with validity, the assessment of soundness, i.e. whether or not some premises are true, may be context-specific. For instance, the argument that if “f is a differentiable function” and if “any differentiable function is continuous”, then “f is a continuous function” is valid. Whether or not it is sound depends on the premises – the latter premise is, as we will see in Chapter 3, a general statement that is always true, whereas the former depends on how the concrete function f is defined in the context we are concerned with.

To familiarize ourselves with arguments, let us consider some examples (to evaluate soundness, assume that the geographical relationships we consider are those of the real world):

\begin{tabular}{c|cc|c|c|cc}\hline Nr. & Premise 1 & Premise 2 & ass. rel. & Conclusion & valid & sound\\\hline\hline1 & Berlin is the German capital & Germany is part of Europe & $\Rightarrow$ & Berlin is a European capital & + & +\\2 & Berlin is the Chinese capital & China is part of Europe & $\Rightarrow$ & Berlin is a European capital & + & -\\3 & Berlin is the French capital & Italy is part of Asia & $\Rightarrow$ & Berlin is a European capital & - & \\4 & Berlin is the German capital & Germany is part of Europe & $\Leftrightarrow$ & Berlin is a European capital & - & \\5 & I was born blind & My blindness has never been healed & $\Leftrightarrow$ & I have always been blind & + & -\\\hline \end{tabular}

First, for Nr. 3, the premises do not preclude the conclusion. Therefore, provided that the premises are true, the conclusion may still be true as well. However, the premises do not give us enough information to assert that the conclusion must be true when the premises are true. The invalidity of Nr. 4 is for a similar reason, try to find out why exactly. Finally, Nr. 5 serves as an example for a valid statement of equivalence and a case where one statement (here: the conclusion) has multiple implications (here: the premises). Depending on who makes the argument (i.e. who “I” refers to precisely), this argument could also be sound, but given that you are reading this text in your browser with your eyes, it cannot be sound here.

Typically, mathematical theory is more concerned with argument validity rather than soundness.  Theory provides us with theorems and propositions that tell us that “if this and that is true, then also some other property will be true”. You can find an abundance of examples in the remainder of the course, but to make the point very clear, let’s consider the so-called Weierstrass Extreme Value Theorem (its content is not important at this point, do not worry if this does not make sense yet), which states that “If (premise 1) f is a continuous function and (premise 2) f has a compact domain then (conclusion) f must assume a global maximum and minimum.” Here, f is an unspecified, hypothetical function. For concrete functions, the premises may or may not be true, but this is not essential for the usefulness of this theorem and the validity of the statement.

On the other hand, if you are working on some exercise problems or writing an exam, you will frequently be given concrete contexts (in the example of the Weierstrass theorem: concrete functions) to work with. Then, you will likely refer to all the valid arguments that you know from your textbooks and try to make sound arguments with them. Say, for instance, you are given some utility function and are asked whether it has a global maximum. Then, if your argument is that by the Weierstrass Extreme Value Theorem, this function must have a global maximum, it depends on the precise function that you are given whether your argument is sound or not.

Excursion: So what exactly do we mean by “basic logic”?

Above, we stated that the rules which determine whether an argument is valid or not come from “basic logic”. While the expression itself may give you some idea what we mean by this, thus far, it is not explicitly clear how to precisely think about this concept.

Basic logic can be thought of as the fundamental rules that determine whether a certain mathematical argument is valid.

To see this abstract elaboration in action, let us consider how basic logic helps us in the example of ants, humans and elephants, we can in fact proceed intuitively. We know logically that if one thing is larger than the other, and this other thing is again larger than a third thing, then the first thing must also be larger than the third – this is just common sense. Mathematically, all we do in this example is to compare positive real numbers with each other: the elements in the sets of ants’ (A), humans’ (H) and elephants’ (E) numerical heights. We consider the argument

(\forall a\in A:(\forall h\in H: a>h)\land \forall h\in H:(\forall e\in E: h>E)) \Rightarrow (\forall a\in A:(\forall e\in E: a>e))

The basic mathematical reason that this relationship is true (i.e. the “mathematical common sense” that justifies the argument) is transitivity of the “strictly-greater-than” relation on the real numbers, namely that if for x,y,z\in\mathbb R, x>y and y>z, then also x>z.

In this rather simple example, transitivity of the “>“-relation is the entire fundamental mathematical reason why this argument is valid. Still, it is already non-obvious how exactly this circumstance justifies validity of the argument. This is especially true for more complex arguments that depend on a multitude of fundamental mathematical facts. This is typically where mathematical proofs come in: they provide a step-wise decomposition of how fundamental mathematical circumstances make certain arguments valid or invalid.

Implications, Equivalence, and Necessary and Sufficient Conditions

Throughout their career, any economist will hear the words “necessary” and “sufficient” quite a lot. If you have been thinking thoroughly about the three logical arrows in the notation table, you will not have a hard time to understand what follows.

Suppose we are interested in some statement S. A necessary condition for S must hold for S to be true. It need not guarantee truth of S. Thus, S is true only if the necessary condition is satisfied. A sufficient condition for S guarantees that S is true. However, it need not hold for S to be true. Thus, S is true if the sufficient condition is satisfied. Finally, an equivalent condition for S (i) must hold for S to be true and (ii) guarantees that S is true. Thus, S is true if and only if the equivalent condition is satisfied. By its definition, the equivalent condition is also both a necessary and a sufficient condition.

Let us consider an example, and let us define S:= (\forall x\in\mathbb R: f(x) \geq 0), where f is some function mapping from and to real numbers \mathbb R, that we do not specify any further for now. Then, a necessary condition for S would be N = (f(3) \geq 0). S can be true only if N is satisfied. A sufficient condition is S' = (\nexists x\in\mathbb R: f(x) \leq 1). If S' holds, this guarantees that S does as well; however, there are many examlpes of functions f where S holds even though S' is violated (for example, think about f(x) = x^2). Finally, an equivalent condition is E = (\nexists x\in\mathbb R: f(x) < 0).

If this is not fully clear to you now, think about whether S is true for the following specific examples of f (you may consult the conditions N, S', E defined above):

  1. f(x) = x-4
  2. f(x) = 1 + \max\{x, 0\}
  3. f(x) = x^2 - x


1: S is not true (N violated); 2: S is true (S' is true); S is not true (E violated: e.g. f(0.5) = 0.25 - 0.5 = -0.25 < 0)


In terms of our logical arrows, let C be the condition. If C is necessary for S, then C is implied by S: C\Leftarrow S. If instead, C is sufficient for S, then C implies S: C\Rightarrow S. And if C is an equivalent condition for S, then C is equivalent to S: C\Leftrightarrow S.

If we want to establish S, sufficient and equivalent conditions typically make us happy: their truth is enough to know that S is true. With a necessary condition C, on the other hand, we only know that S cannot be true unless C is also true (C\Leftarrow S). This may help disprove S: if C is not true, or respectively, the opposite of C, \neg C, is true, then S is not true (and \neg S is true): \neg C \Rightarrow \neg S. Thus, violation of the necessary condition implies violation of the statement of interest. Notice the relationship of negation and implication: we have just argued that C\Leftarrow S is equivalent to \neg C \Rightarrow \neg S. This means that, for any given implication, when considering the inverted/negated statements, you can always just “flip” the implication arrow. Go through the argument again and make sure that you logically understand why this works!

To give a you some practice, consider the United Kingdom (UK)’s definition of an economic recession, which states that an economy is in the state of recession whenever GDP growth has been negative for at least two quarters. Consider the following conditions – which ones are necessary, sufficient, equivalent or nothing at all for the German economy currently being in a recession?

  1. German GDP growth was negative in the last quarter.
  2. German GDP growth was at -1% constantly throughout the last year.
  3. The average of German GDP growth during the last two quarters was -0.25%.
  4. The average of German GDP growth during the last two quarters was below zero.
  5. German GDP growth was below zero both in the last and the second-to-last quarter.


1: necessary, 2: sufficient, 3: nothing, 4: necessary, 5: equivalent.


In the mathematical context relevant to economists, you come across necessary and sufficient conditions mostly in optimization, where we frequently deal with them (mostly related to second derivatives) when investigating whether a solution constitutes a maximum, a minimum, or neither. Therefore, they are at the heart of, amongst others, utility or profit maximization, cost minimization, and also error/deviation minimization of statistical estimators.

Excursion: Theorems, Propositions, Lemmas and Corollaries

When reading mathematical texts, you come across a range of “facts” with different names. If you are interested, you can find a brief overview of what sets apart Theorems, Propositions, Lemmas and Corollaries, which make up for almost all of these facts, below.

The most common “fact” is the proposition. It is a statement that is “interesting” by itself, and usually contains at least an important part or “setup”-result for the purpose of the text. Since by the nature of the word, some fact is proposed, propositions are always expected to come with a proof (but of course not references to propositions in other texts, as in “see Proposition 5 of Textbook XY”). Accordingly, all results labeled as “proposition” in the companion script of this course feature a proof allowing you to understand step by step why they are true.

A theorem is similar but distinct, as theorems are typically of greater importance than the proposition, either to the text itself or in the relevant mathematical context. For instance, a mathematical paper would probably call its two to three main results theorems and other related, more technical insights propositions. Moreover, any fact of central importance to a mathematical (sub-) field is likely to be called a theorem, take again the example of the Weierstrass Extreme Value Theorem.

Next, a lemma typically has no immediate value for the insights to be taken away from a text, but rather, it provides a “helper fact” that facilitates proving a proposition. As such, lemmas most frequently occur directly before propositions requiring rather complex, multi-step proofs, and their predominant value lies in organizing the structure of the line of reasoning presented as proof.

Finally, a corollary is something that follows rather immediately – without any or at most with one to two lines of proof – from one or more other facts. But just because corollaries are easy to establish given previous considerations does not mean they are not important, and some very important theorems are indeed corollaries!

What is true for all of the concepts mentioned here is that they give you a mathematical fact. These facts can be complex and/or unintuitive and it may be hard to immediately see why they are true. Naturally,  you may therefore ask, when would we expect to see a proof?  Well, for any of the concepts, when a text states them for the first time, they are expected to come with a proof immediately below it to allow the reader to judge upon their validity. Further, if the text’s main purpose is educational, proofs are also given for existing results so that they don’t fall from the sky for the reader. If the proof is not too essential for what the text wants to convey, you will frequently see reference to a resource giving the proof. Only if results are sufficiently well-established in the relevant mathematical context (e.g. the Weierstrass Extreme Value Theorem in the context of optimization), you will find that no proof is given at all.

Set Theory

As any good mathematical text should, let us begin our discussion of sets by defining what precisely we will be studying. Since this will be our first definition, the discussion below also outlines some general key insights into reading mathematical definitions.

Definition: Element, Set. A set is a collection of distinct objects, considered as a whole. An object s in a set S is called an element or member of S, denoted s\in S. For an object s' that is no element of S, we write s'\not\in S.


It is important to make sure you know the meaning of every word in a definition. The emphasis here is on “the”, for mathematical expressions rarely have several meanings, as that could generate misunderstandings. The converse, however, is not true, as one can readily see from our definition where “element” and “member” are synonyms. The knowledge of these meanings is mostly gained by regular interaction with the words. In the above definition, for instance, the word “object” should be understood as “any entity that is of interest to the modeler.” Therefore, depending on the context, objects can be real numbers, but also functions, matrices, geometrical figures, or even sets themselves!

Moreover, in good mathematical definitions, no word is redundant, and the meaning does not go beyond what is written. In our example, the word “distinct” suggests that sets do not contain duplicates: thus, the collection \{1,2,\pi\} may represent a set, while the one \{1, 2, \pi, \pi\} may not. Moreover, “considered as a whole” suggests that the set itself should be seen as a distinct object. Conversely, the definition says nothing about the order of elements in a set, so that we may infer that the sets \{1,2,\pi\} and \{2, \pi, 1\} are identical.

In terms of notation, you are likely familiar with the way the sets above are written: two curly braces, and within them the characterization of its elements. The word “characterization” is used deliberately rather than “list”, because typically, the sets we deal with are too big to list all its elements or even contain infinitely many of them, consider e.g. the set of natural numbers \mathbb N. More generally, we define sets by a mathematical statement as introduced in the previous section that characterizes the elements. How this works exactly can readily be seen from the definition of intervals given below. Note that it refers to the extended set of real numbers \bar{\mathbb R} that encompasses all real numbers, \mathbb R, as well as \{-\infty, +\infty\}.

Definition: Real-valued Interval. A real-valued interval is a set that contains all x\in\mathbb R in between two thresholds a,b\in\bar{\mathbb R}, a\leq b. We denote

 \begin{tabular}{cc} $[a,b]:=\{x\in\mathbb R : a\leq x \leq b\}$, & $[a,b):=\{x\in\mathbb R : a\leq x < b\}$, \\ $(a,b]:=\{x\in\mathbb R : a< x \leq b\}$, & $(a,b):=\{x\in\mathbb R : a < x < b\}$. \end{tabular}

If I = (a,b), I is called open, and if I=[a,b], we call I closed. Else, we call I semi-open. If a=-\infty, then the lower bound must be open. Conversely, if b=\infty, the upper bound must be open.


As can be seen, in terms of notation, a round bracket indicates that the threshold value is not included in the interval, whereas a square bracket indicates its inclusion.

In set theory, a key concept is the subset. For the sets A and X, we say that A is a subset of X, denoted by A\subseteq X, whenever all elements of A are contained in X, formally (\forall x\in A: x\in X). A is a proper subset of X, A \subset X, if all elements of A are contained in X but there is at least one element in X that is not an element of A, i.e. (A\subseteq X \land \exists x\in X: x\not\in A).

The approach we adopt toward set theory is the so called “naive” approach. It is naive in the sense that it is not axiomatic. For an economist, there are no costs but many benefits to follow this simpler approach. In order to avoid paradoxes (the interested reader can have a look at Russell’s paradox), however, one needs to assume that every set we consider is itself a subset of a fixed, all encompassing, set called the universal (super)set, which we denote by X. In addition, one defines an “encompassed by all” subset, called the empty set, and conventionally denoted \varnothing. For every set A, we thus have \varnothing \subseteq A \subseteq X. The empty set is always the same and contains no elements, while the universal set varies across applications, so that we may have X=\mathbb R when considering sets of real numbers, and X=\mathbb R^n for sets of real-valued vectors of length n\in\mathbb N.

Sets: Basic Concepts

Now it is time to consider some key concepts related to sets. To define them, let A,B be arbitrary sets and X the universal superset.

    1. Set equality. A and B are said to be equal whenever they contain the same elements, i.e. set equality “A=B” is equivalent to \forall x\in X: (x\in A\Leftrightarrow x\in B)
    2. Disjoint sets. A and B are said to be disjoint whenever they have no elements in common, i.e. \forall x\in X: ((x\in A\Rightarrow x\not\in B)\land(x\in B\Rightarrow x\not\in A)) (recall that “\land” is the logical “and”).
    3. Superset. B is a superset of A whenever A is a subset of B: B\supseteq A \Leftrightarrow A\subseteq B.
    4. Complement. B is the complement of A (with respect to X) whenever it contains all those elements of X that are not contained in A: B = \{x\in X: x\notin A\}. We usually denote the complement of A as A^c.

As with real numbers (addition, subtraction, etc.), we can perform operations on sets:

    1. Union. A\cup B := \{x\in X: (x\in A \lor x\in B)\} (with \lor as the logical “or”). The union contains all elements that are contained in A, B, or both.
    2. Intersection.A\cap B := \{x\in X: (x\in A \land x\in B)\}. The intersection contains all elements that are contained in both A and B.
    3. Difference.A\backslash B := \{x\in X: (x\in A \land x\not\in B)\}. The difference of A and B contains all elements of A that are not contained in B.

These operations facilitate our lives greatly in many dimensions: e.g. the somewhat awkward definition of disjoint sets above, where we required that \forall x\in X: ((x\in A\Rightarrow x\not\in B)\land(x\in B\Rightarrow x\not\in A)), can be simply re-written as A\cap B = \varnothing. The symbol “:=” indicates a defining equality, and is used whenever we introduce a new object of interest. (Note that, in accordance with the introduction of “:” in the table on notation and symbols, you can read “Let S:=\{\ldots\}” as “let S (be) such that it is equal to the set \ldots” In this sense, “:=” is not a new symbol, but rather a combination of two familiar ones!) Alternatively, you will sometimes see “\equiv“.

Many find it helpful to illustrate sets and operations on them using a “circle approach”. Here is an illustration of the set operations, where the circles denote the sets A and B, respectively:

The set difference A\backslash B corresponds to the red area, and B\backslash A to the blue area. The intersection A\cap B is the purple area, and the union A\cup B contains every colored area, i.e. red, purple and blue. The complement can be drawn only with respect to the superset:

The complement A^c of A (which is now given by the red circle and containing the white area) with respect to the universal superset X corresponds to the golden area.

Before moving to slightly more sophisticated issues related to sets, the following table gives an overview of the set notation discussed thus far:

 \begin{tabular}{cc|cc}\hline $\varnothing$ & the empty set & $\in$ & element of\\ $\subseteq$ & is a subset of & $\not\in$ & not an element of\\ $\subset$ & is a proper subset of & $\backslash$ & set difference\\ $\supseteq$ & is a superset of & $\cup$ & union \\ $\supset$ & is a proper superset of & $\cap$ & intersection \\\hline \end{tabular}

Sets of Sets, Index Sets and Set Operation Properties

So far, we have not yet explicitly addressed that elements of sets may be anything else than standard real numbers. To address this aspect, consider the power set: When A (A\subseteq X) denotes a set, then the power set of A is \mathcal P(A) := \{S\subseteq X: S\subseteq A\}, i.e. the set of subsets of A. Note that \varnothing\in \mathcal P(A) for any A, as the empty set is the “encompassed-by-all” set, as introduced above. To give an example, \mathcal P(\{1,2\}) = \{\varnothing, \{1\}, \{2\}, \{1,2\} \}. Note that this class of sets is subject to a different universal set than \{1,2\}. However, it is easily verified that \mathcal P(X) is a suitable universal set for the power sets of A, A\subseteq X, as

     \[ \mathcal P(A) = \{S\subseteq X: S\subseteq A\} \subseteq \{S\subseteq X\} = \mathcal P(X)\hspace{0.5cm}\forall A\subseteq X. \]

At times, it may be convenient to give the individual objects in the set an index, so that we may write A = \{S_1,S_2,\ldots,S_n\}, n\in\mathbb N, or equivalently A = \{S_i:i\in\{1,\ldots,n\}\} = \{S_i\}_{i\in\{1,\ldots,n\}} (depending on your econometrics background, you may have seen that one writes samples of size n pairs of random variables in similar fashion, namely, \{(Y_1, X_1),\ldots,(Y_n, X_n)\} = \{(Y_i, X_i)\}_{i\in\{1,\ldots,n\}}). Of course, we can use a more general index set, denote it by I, that need not be equal to \{1,\ldots,n\} for an n\in\mathbb N. We distinguish finite, countable and uncountable index sets. The set is finite if (and only if) it contains only finitely many elements. The distinction between “countable” and “uncountable” is not too important here. When the elements of A are indeed sets, we can elegantly use the index set for short notations for multiple intersections or unions:

     \[ \cup A := \bigcup_{i \in I} S_i := \{x\in X : (\exists i\in I: x \in S_i)\},\hspace{0.5cm}\cap A := \bigcap_{i \in I} S_i := \{x\in X : (\forall i\in I: x \in S_i)\}. \]

Finally, we say that the collection A = \{S_i: i\in I\} of sets S_i is pairwise disjoint whenever any two elements of A are disjoint, i.e. \forall i,j\in I: (i\neq j \Rightarrow S_i\cap S_j = \varnothing).

As with the operations on real numbers, it is possible to establish a range of properties that set operations satisfy. Let us have a look at the ones most frequently used in economics:

Theorem: Properties of Set Operations. Let A,B,C\subseteq X for a universal set X and \mathcal S = \{S_i:i\in I\} for an index set I, where S_i\subseteq X \forall i\in I. The following properties hold:

(i) Commutativity: A \cup B = B \cup A and A \cap B = B \cap A.
(ii) Associativity: (A \cup B) \cup C = A \cup (B \cup C) and (A \cap B) \cap C = A \cap (B \cap C).
(iii) Distributivity: A \cup (B \cap C) = (A \cup B) \cap (A \cup C) and A \cap (B \cup C) = (A \cap B) \cup (A \cap C).
(iv) Simple De Morgan Laws: (A \cup B)^c= A^c \cap B^c and (A \cap B)^c= A^c \cup B^c.
(v) General De Morgan Laws: (\bigcup_{i \in I} S_i)^c=\bigcap_{i \in I} S^c_i and (\bigcap_{i \in I} S_i)^c=\bigcup_{i \in I} S^c_i.


These rules are a good opportunity to re-familiarize yourself with the expressions Commutativity, Associativity and Distributivity, and they may also be helpful in developing a better intuition for sets using the circle-approach introduced above – take a piece of paper and see whether you can visually “prove” the simple De Morgan laws!

Functions, Relations and Limits

The last introductory section is concerned with functions and limits. It gives an introduction to functions using the concept of relations, partly for formal precision, but also to remind you that relations, as you may come across in your micro-oriented classes when studying (consumer) preference, are nothing fancy, but just a generalization of the concept of functions.

Functions and Relations

To understand the concept of relations, consider the Cartesian product X\times Y of two sets X and Y, defined as

    \[ X\times Y = \{(x, y): x\in X \land y\in Y\}. \]

Then, a binary relation R from X to Y is nothing but a subset of X\times Y: R\subseteq X\times Y, and if (x,y)\in R, we say that y is an image of x under the relation R. The relation is binary because any (x,y) is either an element of R or not, and there is no (continuous) “degree of relatedness”. We write xRy or y\in R(x), where

(1)    \begin{equation*} R(x) = \{y\in Y: xRy\} = \{y\in Y: (x,y)\in R\} \hspace{0.5cm}\text{for any }x\in X. \end{equation*}

Note that the sets R(x), x\in X, are a complete characterization of the relation R, this will be important in a second. Moreover, for any fixed x\in X, R(x) can be empty or contain multiple arguments. As an example, consider X=Y=[0,1], where the relation R_1 is defined as the set

    \[R_1 = \{(x,y)\in [0,1]^2: x>y\}\]

where we use the common notation [0,1]^2:= [0,1]\times [0,1]. Then, R_1(x) = \{y\in [0,1]: y<x\}, so that R_1(0)=\varnothing and R_1(x) = [0,x) for any x>0. Another example that is frequently discussed in undergraduate economics courses (with varying degree of formality) are preference relations, where X=Y contains vectors of goods quantities, and for a consumer i, the relation is given by \{(x_1,x_2)\in X\times X: x_1 \succsim_i x_2\} and if  x_1 \succsim_i x_2, the consumer (weakly) prefers the consumption vector  x_1 to  x_2.

Intuitively, it should be rather natural that we can view a “function” as introduced in high-school courses as a relation, since the values for  x are related to  y=f(x) through the function. Indeed, this is what we call a function also more formally: any relation that assigns exactly one value  y to every argument  x. So, if we call  f a function, that means that for any  x,  f(x) must be a single object (e.g. real number, but also vectors, matrices, etc., as we will see later), and not a set!

Let us go over the line of reasoning defining a function as a relation step by step. Once you have understood this, you will be familiar with the names and nature of all the fundamental concepts relevant for a function, including the domain, codomain, image and graph, which are very important for everything to follow!

A function f that you are likely well-familiar with is the one of a rule which associates every element x\in X in the domain X of f with a single element y\in Y in the codomain Y of f. We write

    \[f: X\mapsto Y, x\mapsto y=: f(x).\]

This statement is a concise summary of all relevant aspects of f: the domain X, the codomain Y, and the rule f(x) that maps x‘s into y‘s. Note that two functions are identical if and only if the mapping x\mapsto y and the domain X coincide; the codomain may well be different (consider e.g. f_1:\mathbb R\mapsto \mathbb R, x\mapsto x^2 and f_2:\mathbb R\mapsto \mathbb R_+, x\mapsto x^2 where \mathbb R_+ = \{x\in\mathbb R: x\geq 0\} is the set of non-negative reals. Then, f_1 and f_2 are clearly identical). To see the connection to relations, consider the graph G(f) of f,

    \[G(f) = \{(x,y)\in X\times Y: y = f(x)\} = \{(x,f(x)): x\in X\}.\]

Clearly, G(f) is a subset of X\times Y, since it contains only elements in X\times Y and adds the restriction y = f(x), which may exclude some elements. Like this, we can view the graph G(f) as the relation from the domain X to the codomain Y, since the set of y‘s related to any fixed x\in X under G(f), denoted R(x) above (cf. equation (1)), is simply

    \[ \{y\in Y: xG(f)y\} = \{y\in Y: (x,y)\in G(f)\} = \{y\in Y: y = f(x)\} = f(x)\]

where the last equality is because Y is a set and sets do not contain duplicate elements. This highlights that the function assigns only one image f(x) = y to any one x\in X.

Mini-Excursion: We see here that when viewing relations as a generalization of functions, the set R(x) can be interpreted as a generalized image of x under R, in a fashion very similar to standard functions. So, if we wish to define a relation associating multiple images to arguments x\in X, we can use the relation concept in a straightforward fashion to do so (an example are so-called correspondences, where the values are sets).

As take-away, one may summarize

    1. The graph G(f) is a set that defines the function f as a relation. (The graph contains the combinations (x, f(x)) and not just the set/ a picture of f(x)‘s!)
    2. The relation is fully characterized by (i) the rule x \mapsto y = f(x) that maps domain objects x\in X onto codomain objects y\in Y, and (ii) the domain X.


Before moving on, a conceptual note. You may be used to calling “f(x)” a function, e.g. from high school. If so, you should stop doing this now. Indeed, people sometimes do this, especially at lower levels of mathematics, but this is arguably imprecise/wrong. f(x) may refer to a specific element in the codomain of f, the value of f when evaluated at a concrete x\in X, or, when considering x as a variable, as the mapping rule x\mapsto y=f(x) of the function f (You may be familiar with this case from specific representations like “f(x) = x^2 + \sin(x)“, which unambiguously summarizes the mapping rule). However, neither case provides sufficient information to fully characterize f (in the latter, it is still unclear what domain and codomain are), and you run into troubles related to notation when it comes to differentiation (see also the discussion in Chapter 3). To be formally precise, in everything to follow, we will call f the function, x\in X an argument and the object f(x) in the codomain of f the value of f at x, and x\mapsto y = f(x) the mapping rule of f.

Key Concepts related to Functions

To conclude our investigations into functions here, let us consider some further important concepts that you will come across frequently in the function context. Again, you don’t need to memorize this by heart by now – just try to become familiar with the expressions.

For what follows, let f:X\mapsto Y be a function as defined above, and in addition, let g:Y\mapsto Z be another function. Then,

    • For a set A\subseteq X, the image of A under f is f[A]:=\{y\in Y: (\exists x\in A: f(x)=y)\}, i.e. the set of y\in Y to which f maps.
    • For a set B\subseteq Y, the preimage of B under f is f^{-1}[B]:=\{x\in X: (\exists y\in B: f(x)=y)\}, i.e. the set of x\in X that are mapped onto elements in B by f.
    • If for any a,b\in X, f(a) = f(b) implies that a = b, (in quantifier notation: \forall a,b \in X:f(a) = f(b) \implies a = b), we say the function f is injective.
    • If for any y\in Y, there exists one x\in X so that f(x) = y, (in quantifier notation: \forall y\in Y:(\exists  x\in X: x \in f^{-1}[\{y}])), we say the function f is surjective.
    • If for any y\in Y, there exists exactly one x\in X so that f(x) = y, (i.e. the function is both injective and surjective, in quantifier notation: \forall y\in Y:(\exists ! x\in X: f(x) = y)), then we can define the inverse function f^{-1}: Y\mapsto X, y\mapsto x = f^{-1}(y) where f^{-1}(y)\in X is such that f(f^{-1}(y)) = y.
    • The composition h = g\circ f of g and f is defined as h:X\mapsto Z, x\mapsto g(f(x)).
    • f is called monotonically increasing (decreasing), if for any x_1, x_2\in X, x_1\geq x_2 implies f(x_1)\geq f(x_2) (f(x_1)\leq f(x_2)), and strictly monotonically increasing (decreasing) if for any x_1, x_2\in X, x_1> x_2 implies f(x_1)> f(x_2) (f(x_1)< f(x_2)).

The word “range” is frequently used synonymously for the image of X under f (also denoted as im(f)). Further, an alternative name for the preimage is “inverse image”, which may be somewhat misleading and easily confused with the image of the inverse function. Thus, let us not use this label, but be aware that some other texts and courses may do so.

The inverse function will be investigated more thoroughly later, but you can already note that (i) its existence depends crucially on the definition of the codomain Y as well as the mapping x\mapsto y, and (ii) that despite looking quite similar, the expressions f^{-1}(y) and f^{-1}[\{y\}] refer to fundamentally different concepts! One is a set in the domain of f that always exists, whereas the other is a value of the inverse function f^{-1} in the domain X, which is only well-defined if f^{-1} exists in the first place, i.e. if f is invertible (for the condition, see the list above)! To tell them apart more easily, one sometimes uses square brackets for (pre)images of sets and round ones for (inverse) images of single elements, as is done here. Make sure that you understand this difference!

As a last note on functions, the table below gives common rules for derivatives of functions where both domain and codomain are \mathbb R.

 \begin{tabular}{c} \begin{tabular}{c|c||c|c} \multicolumn{4}{l}{\textit{Derivatives of specific functions}}\\\hline Function $f(x)$ & Derivative $f'(x)$ & Function $f(x)$ & Derivative $f'(x)$ \\\hline\hline $c$ & 0 & $c\cdot x$ & $c$\\ $\ln(x)$ & $\frac{1}{x}$ & $\exp(x)$ & $\exp(x)$\\ $\sin(x)$ & $\cos(x)$ & $\cos(x)$ & $-\sin(x)$\\ $c^x$ & $\ln(c)\cdot c^x$ & $x^c$ ($c\neq 0$)& $c\cdot x^{c-1}$\\ \hline \end{tabular}\vspace{0.7cm}\\ \begin{tabular}{ccc} \multicolumn{3}{l}{\textit{Rules for Derivatives}}\\\hline Name&Function & Derivative \\\hline\hline Sums Rule & $f(x) + g(x)$ & $f'(x) + g'(x)$ \\ Product Rule & $f(x)\cdot g(x)$ & $f'(x)\cdot g(x) + f(x)\cdot g'(x)$\\ Quotient Rule & $f(x)/g(x)$ & $(f'(x)\cdot g(x) - f(x)\cdot g'(x))/(g(x))^2$ \\ Chain Rule & $(g\circ f) (x) = g(f(x))$ & $f'(x)\cdot g'(f(x))$\\ \hline \end{tabular} \end{tabular}


Limits and Continuity of uni-dimensional Objects

To conclude our investigations into the fundamental background concepts of mathematics that are relevant to the context of the economist, we consider the limit concept in relation to the real line, both for sequences of numbers and univariate, real-valued functions.

Limits of Sequences

Let \{x_n\}_{n\in\mathbb N} be a sequence of real numbers, i.e. \forall n\in\mathbb N: x_n\in\mathbb R. Then, we call x\in\mathbb R the limit of this sequence if

    \[\forall \varepsilon > 0 \exists N\in\mathbb N: (\forall n\in\mathbb N: (n\geq N \Rightarrow |x_n-x|<\varepsilon)).\]

Verbally, for any, and thus especially any arbitrarily small number \varepsilon, there exists a threshold N after which the sequence elements only deviate from x by less than \varepsilon, such that eventually, as n\to\infty, the sequence elements will lie arbitrarily close to x. If the limit x of the sequence \{x_n\}_{n\in\mathbb N} exists, we write x = \lim_{n\to\infty} x_n. Crucially, we also write that \lim_{n\to\infty} x_n = \infty if

    \[\forall x\in\mathbb R\exists N\in\mathbb N: (\forall n\in\mathbb N: (n\geq N\Rightarrow x_n>x)),\]

i.e. if the sequence elements eventually exceed any arbitrarily large but fixed number x. A similar characterization can be written down for \lim_{n\to\infty} x_n = -\infty. Try to write it down on your own, and click the button below to compare your result.

\forall x\in\mathbb R\exists N\in\mathbb N: (\forall n\in\mathbb N: (n\geq N\Rightarrow x_n<x))


Because function limits usually receive less attention in undergraduate economics programs than sequence limits, let us now study this issue, which is not quite the same, but as you will shortly see still highly similar.

Limits of Functions

When X,Y\subseteq\mathbb R, we call f_a\in\mathbb R the limit of the function f:X\mapsto Y at a\in\mathbb R, if

    \[\forall\varepsilon>0\exists\delta>0:(\forall x\in X:(|x-a|\in(0,\delta)\Rightarrow |f(x)-f_a|<\varepsilon)).\]

The concept is similar to the standard limit of a sequence: for any arbitrarily small \varepsilon>0, there must exist a neighborhood N = (a-\delta, a+\delta), \delta > 0 such that f deviates from f_a by less than \varepsilon on N. In other words, by choosing x sufficiently close to a, one may ensure that f deviates from f_a no more than \varepsilon. We write f_a = \lim_{x\to a} f(x). Note that we need not have a\in X, so that a can either be a boundary point (e.g. a=0 when f is defined on (0,\infty)) or a point where f is not defined (e.g. a = 2 when f(x) = 1/(x-2)). Further, we adopt the convention that if for any sequence \{x_n\}_{n\in\mathbb N}, where x_n\in X \forall n\in\mathbb N, so that \lim_{n\to\infty} x_n = a, it holds that \lim_{n\to\infty} f(x_n) = \infty (\lim_{n\to\infty} f(x_n) = -\infty), then we write \lim_{x\to a} f(x) = \infty (\lim_{x\to a} f(x) = -\infty).

To characterize the asymptotic behavior of a function f with domain \mathbb R or intervals unbounded to one side (e.g. (-\infty, a], (b,\infty), etc.), one frequently considers the limits \lim_{x\to \infty}f(x) and \lim_{x\to -\infty}f(x). Here, it is important to know how these quantities are defined. We write \lim_{x\to\infty}f(x) = c for a c\in\mathbb R if

    \[ \forall \varepsilon > 0 \exists x_\varepsilon \in\mathbb R: (\forall x > x_\varepsilon: |f(x) - c| < \varepsilon) \]

Try to write down the analogous formal statement that defines b\in\mathbb R as the left asymptote of f, \lim_{x\to-\infty}f(x).

\forall \varepsilon > 0 \exists x_\varepsilon \in\mathbb R: (\forall x < x_\varepsilon: |f(x) - b| < \varepsilon)


As with the limit at a point a\in\mathbb R, we write \lim_{x\to \infty} f(x) = \infty (\lim_{x\to \infty} f(x) = -\infty) if for any sequence \{x_n\}_{n\in\mathbb N}, where x_n\in X \forall n\in\mathbb N, so that \lim_{n\to\infty} x_n = \infty, it holds that \lim_{n\to\infty} f(x_n) = \infty (\lim_{n\to\infty} f(x_n) = -\infty), and analogously for \lim_{x\to -\infty} f(x) = \infty (\lim_{x\to -\infty} f(x) = -\infty).

An important point is that \lim_{x\to a} f(x)=f(a) need not necessarily hold. Consider, for instance, a=0 and f(x)=1/x, where f(0) is not even defined (a\notin X). Next, consider the indicator function f(x)=\mathds{1}[x>0] on \mathbb R that is equal to 1 if x>0 and zero else. It is defined at x=0, i.e. a=0\in X, but for any f_a\in\mathbb R and any \varepsilon < 1, there exists no \delta > 0 such that |f(x) - f_a|<\varepsilon for all x\in(-\delta, \delta) because f(x) = 0 for x\in(-\delta, 0] and f(x) = 1 for x\in(0,\delta). Thus, \lim_{x\to 0}f(x) does not exist, and especially, \lim_{x\to a} f(x)=f(a) does not hold. Finally, even if the limit exists, the equation need not hold. Look at the function f with f(x)=\mathds{1}[x = 0] that is equal to 1 at x = 0 and zero else. Then \lim_{x \to 0}f(x)= 0 \neq f(0).

Indeed, if \lim_{x\to a} f(x)=f(a), then f features a desirable property called continuity at a. We will have a more rigorous discussion of it later.

Definition: Continuity of Real Functions.
Consider a function f:X\mapsto Y, X,Y\subseteq\mathbb R. Then,

(i) f is called continuous at a\in X if \lim_{x \rightarrow a}f(x)=f(a).
(ii) f is called continuous on the interval I\subseteq X if \forall a \in I: \lim_{x \rightarrow a}f(x)=f(a).


An further concept that you may come across frequently is the one of left and right limits. The left (right) limit of f at a is the value f takes “when moving towards a from the left (right)”. This is useful for two reasons: (i) we can characterize the behavior of functions like f(x)=\mathds{1}[x>0] at points a, here a=0, where the limit x\to a is undefined, and (ii) the concept provides a rather straightforward method to disprove existence of the limit of f at a. Formally, we say that f_a^+ is the right limit of f at a if

    \[\forall\varepsilon>0\exists\delta^+>0:(\forall x\in X:(x-a\in(0,\delta)\Rightarrow |f(x)-f_a^+|<\varepsilon)),\]

and f_a^- is the left limit of f at a if

    \[\forall\varepsilon>0\exists\delta^->0:(\forall x\in X:(x-a\in(-\delta,0)\Rightarrow |f(x)-f_a^-|<\varepsilon)).\]

We write \lim_{x\to a^+} f(x) = f_a^+ and \lim_{x\to a^-} f(x) = f_a^-. Then, it is easily verified (for \varepsilon>0, choose \delta = \min\{\delta^+,\delta^-\}, or respectively \delta^+ = \delta^- = \delta) that the limit of f at a exists and is equal to f_a if and only if the right and left limits exist and f^+_a = f^-_a = f_a. Conversely, this implies that whenever f^+_a \neq f^-_a or either limit does not exist, then f_a does not exist as well. Try to use this method to show non-existence of the limit of f at a for the specific example of f(x) = \mathds{1}[x>0] and a=0. As a final remark, if they exist, proper limits (f_a) as well as left and right limits (f_a^+, f_a^-) are unique.

To conclude this introductory chapter, let us consider a few rules for limits. Some more simple facts are the following (the right column assumes that the respective limits exist):

\begin{tabular}{c|c||c|c} \hline Function $f(x)$ & Limit $\lim_{x\to a} f(x)$ & Function $h(x)$ & Limit $\lim_{x\to a} h(x)$ \\\hline\hline $c$ & $c$ & $f(x) + g(x)$ & $\lim_{x\to a} f(x)$ + $\lim_{x\to a} g(x)$\\ $c\cdot x$ & $c\cdot a$ & $f(x) \cdot g(x)$ & $\left (\lim_{x\to a} f(x)\right )\cdot\left (\lim_{x\to a} g(x)\right )$\\ \hline \end{tabular}

Further, if f is continuous, then \lim_{x\to a}f(g(x)) = f\left (\lim_{x\to a}g(x)\right ). Thus, if also g is continuous, then \lim_{x\to a}f(g(x)) = f(g(a)). The next important fact is L’Hôpital’s rule for the limit of ratios:

Theorem: L’Hôpital’s Rule.
Let f and g be two real valued differentiable functions on an open interval I and a\in I, or a\in\{\pm\infty\}. Let g'(x)\neq 0 for all x\in I, x \neq a. Suppose that \lim_{x\to a}f(x)=\lim_{x\to a}g(x)=0 or \lim_{x\to a}f(x)=\lim_{x\to a}g(x)= \pm \infty. Then, if \underset{x \rightarrow a}{\text{lim}}\frac{f^{\prime}(x)}{g^{\prime}(x)} exists, it holds that

    \[\lim_{x\to a}\frac{f(x)}{g(x)}=\lim_{x\to a}\frac{f^{\prime}(x)}{g^{\prime}(x)}.\]


Thus, we can use derivatives and L’Hôpital’s rule if the product rule does not apply because at least one limit does not exist. Note that when the functions are sufficiently differentiable, you can apply this rule multiple times (i.e., higher order derivatives). An example is \lim_{x\to 0}\frac{-1/x}{\ln(x)}. With x\to 0, both the numerator and denominator approach -\infty, and the limit product rule does not apply. However, by L’Hôpital’s rule, this limit corresponds to the limit of the derivative’s ratio, \lim_{x\to 0}\frac{1/x^2}{1/x} = \lim_{x\to 0} 1/x = \infty.

A final, important rule with a quite memorable name is the following:

Theorem: Sandwich Theorem (Sequences).
Consider three real-valued sequences \{y_n\}_{n\in\mathbb N}, \{z_n\}_{n\in\mathbb N} and \{x_n\}_{n\in\mathbb N} such that \{y_n\}_{n\in\mathbb N} and \{z_n\}_{n\in\mathbb N} are convergent and \lim_{n\to\infty} y_n = \lim_{n\to\infty} z_n = \bar x\in\mathbb R. Further, suppose that there exists N\in\mathbb N such that \forall n\in\mathbb N: (n\geq N \Rightarrow y_n \leq x_n \leq z_n). Then, \{x_n\}_{n\in\mathbb N} is convergent with \lim_{n\to\infty} x_n = \bar x.


This theorem is frequently used to avoid involved mathematical considerations using the \varepsilon/\delta approach from the definition of the limit. Note that y_n or z_n need not necessarily depend on n, for instance, if we have 0\leq x_n\leq z_n with \lim_{n\to\infty} z_n = 0 for all n\geq N\in\mathbb N, then we can also establish \lim_{n\to\infty} x_n = 0 from the sandwich theorem. Finally, the “N\in\mathbb N” part just tells us that it doesn’t matter for the limit if the inequality does not hold for some “early” elements of the sequences, in most applications, you might be lucky enough to choose N=1, i.e. the inequality holds for all n\in\mathbb N. As an example, consider the sequence x_n = -\frac{1}{n^2 + 4n + 25} for n\in\mathbb N. We can bound

    \[ -\frac{1}{n} \leq -\frac{1}{n^2 + 4n + 25} \leq 0 \hspace{0.5cm}\forall n\in\mathbb N\]

and since \lim_{n\to\infty} - 1/n = 0, the sandwich theorem allows us to conclude that \lim_{n\to\infty} -\frac{1}{n^2 + 4n + 25} = 0.

This theorem holds for limits of functions in an analogous way:

Theorem: Sandwich Theorem (Functions).
Consider three real-valued functions, g, h and f such that for a value x_0 in their domain, \lim_{x\to x_0} g(x) and \lim_{x\to x_0} h(x) exist with \lim_{x\to x_0} g(x) =\lim_{x\to x_0} h(x) =f_0. Further, suppose that for any x in proximity to x_0, it holds that  g(x) \leq f(x) \leq h(x). Then, \lim_{x\to x_0} f(x) exists, and \lim_{x\to x_0} f(x) = f_0.


Here, it may not be too clear what “in proximity to x_0” means precisely, at least not formally. To express this fact more formally, we need the distance concept we are to touch upon in the next chapter. As this has not been introduced this point, the vague statement given above shall suffice for now.

If you feel like testing your understanding of the concepts discussed in the second half of this chapter, you can take a short quiz found here.