Chapter 1 – Vector Spaces

An overview of this chapter’s contents and take-aways can be found here.

The topic of vector spaces is one of if not the most important background concepts for economic mathematics and also a large proportion of applied mathematics in general. The discussion to follow is confined to the intuition and gives the most fundamental definitions and results without digging too deep into where they come from. A thorough discussion of vector spaces can be found in the companion script.

Introduction

Let us first develop an intuition for why we need vector spaces. Consider two arbitrary real numbers, for instance, 2 and 5. Then, if you are asked to add them, multiply them, or tell the distance between them, you will not have much of a problem. However, if you are asked the same questions for the functions given by f(x) = x^2 and g(x) = \ln(x), things get tricky. What is the sum of two functions? How, conceptually, would it make sense to think of a distance between functions? Questions like these and many more can be easily addressed once one is familiar with vector spaces.

More generally, math is tractable when we have one or two dimensions of real numbers. When considering a second dimension, we find ourselves in the comfortable situation that we can illustrate a vast amount of problems graphically, which is oftentimes very helpful in solving them. A classical example from undergraduate economics which you may remember is the way one typically looks for the utility-maximizing consumption bundle of two goods when given the budget restriction and the indifference curve. However, when considering more complex problems (e.g. more than two goods/inputs or uncertainty through stochastic components) as we tend to do in more advanced studies, a more general concept is needed. The main objective of the theory of vector spaces is sometimes described as follows: Geometrical insights at hand with 2- or 3-dimensional real vectors are really helpful. Can we, in some way, generalize these insights to other mathematical objects, for which a geometric picture is not available?

To convince you even more of the usefulness of transferring the graphical intuition, consider the example of minimizing the (Euclidean) distance of a point x_0 = (x_{1,0}, x_{2,0}) to a line l defined by x_2 = cx_1 + d, i.e. when choosing a point x^* on l so that the distance d(x^*, x_0) is smaller than d(x_l, x_0) for any point x_l on l, as illustrated graphically here:

300x241

Then, in the 2-D context, clearly, the line through x^* and x_0 must be orthogonal to l. Without this intuitive geometrical representation, however, the result would probably have been very hard to find out. Indeed, the result’s beauty lies within the insight that even when considering higher-dimensional spaces (e.g. of vectors x = (x_1, x_2, \ldots, x_n), n\in\mathbb N, that is, the \mathbb R^n), the least-squares solution that minimizes the Euclidean distance \|x_0 - x_l\|_2 = \left (\sum_{i=1}^n (x_{0,i} - x_{l,i})^2\right )^{1/2} continues to satisfy this orthogonality property!

Basic Concepts

First things first – to begin, let us define what we mean precisely by a vector.

Definition: Vector.
A row vector x of length n\in\mathbb N is an ordered tuple of elements x_i. We write x = (x_1, x_2, \ldots, x_n). A column vector x stacks the elements in a column, i.e. x = \begin{pmatrix} x_1\\x_2\\\vdots\\x_n\end{pmatrix} = (x_1, x_2, \ldots, x_n)' where (\cdot)' indicates vector transposition. Hence, a row vector x is such that x' is a column vector. A “vector” typically refers to a column vector.

 

The order of elements in a vector matters, such that x = (1,2) and y=(2,1) are distinct! Also, the vector can contain an element multiple times, consider e.g. the origin x = (0,0). The definition does not restrict elements to be real numbers, and may also refer to collections of functions, matrices, sets, vectors, etc. Thus, be aware that even though we predominantly deal with vectors of real numbers, the concept is much broader!

The standard set of vectors that economists typically consider is the one of n-dimensional real (column) vectors:

    \[\mathbb R^n:=\{(x_1,\ldots,x_n)':(\forall i\in\{1,\ldots, n\}:x_i\in\mathbb R)\},\text{ } n\in\mathbb N.\]

It turns out that to generalize a great variety of concepts and insights to arbitrary vector spaces, we only need two things: (1) a well-defined way of adding vectors to each other and of multiplying real numbers to them, and (2) that the set of vectors we consider is closed under these two operations, i.e. that any sum of vectors and any product of a vector and a real number is still an element of the set. Furthermore, when we need more mathematical structure, it is oftentimes helpful to consult the basis of the vector space – an object allowing for compact representations of all elements in the space, and providing some insight into its dimension. But let us go over these things more slowly, starting with the intuition for why these two seemingly basic properties are so mathematically powerful.

The Intuition of Vector Spaces

As you may already have done in school, it is useful to think of real vectors as an entity with direction and magnitude. To do so, one writes a vector x=(x_1, x_2)'\in\mathbb R^2 as the product of a direction vector and an augmenting magnitude coefficient: one may write x = (0,4)' as x = 4 \cdot (0, 1)', and y=(2,-4)' as y = 6 \cdot (1/3, -2/3)'. Then, x and y have magnitude coefficients 4 and 6, and directionality (0, 1)' and (1/3, -2/3)', respectively. Indeed, this concept is the first fundamental building block of the structure we assign to sets of vectors to do algebra with them: scalar multiplication (for this course, you can think of scalars and real numbers as being the same thing). This concept gives us a feeling of how much individual vectors extend into some direction, and allows for a spatial comparison of objects within the vector set.

However, for more precision, it is desirable to decompose the extension along the fundamental directions. For our example of the \mathbb R^2, there are two fundamental directions: the horizontal and vertical axes. They can be expressed by the vectors e_1 = (1,0)' and e_2 = (0,1)', respectively. For our examples x=(0,4)' and y = (2,-4)' above, x only extends alongside the second fundamental direction, so that

    \[x = 0\cdot e_1 + 4\cdot e_2.\]

For y, we can decompose

    \[y = 6\cdot (1/3 \cdot e_1 + (-2/3) \cdot e_2) = 2 \cdot e_1 + (-4) \cdot e_2.\]

Notice that we have used the standard properties of multiplication and addition of real numbers here, despite the fact that we are dealing with vectors. This is precisely why we need an appropriate definition of vector addition as the second building block in our vector space definition: we must ensure that addition of vectors is “similar enough” to addition of real numbers.

The collection of (fundamental) directions that completely characterize the dimensions along which vectors can extend within the considered set is typically called a basis. In the exemplary context here, we would call the set \{e_1, e_2\} a basis of the \mathbb R^2. Because the set actually describes the fundamental directions, we would even call this basis the canonical basis of the \mathbb R^2. A basis helps with representation and provides the ground for more complex mathematical analysis. In contrast to the basis operations of vector addition and scalar multiplication, however, its existence and/or structure is not essential for most economic applications.

Defining a Vector Space

In conclusion of the previous section, the operations of vector addition and scalar multiplication are always defined in a way that ensures a behavior “similar to” addition and multiplication of real numbers. Next, we will establish that there are “neutral” elements of these operations. In the real number context, the additive neutral is the zero, as x+0=x for all x\in\mathbb R, and the neutral element of multiplication is 1 since 1\cdot x = x for all x\in\mathbb R. Similarly, for arbitrary vector spaces, we require that 1\cdot x = x for any vector x, and that we have a zero element, that is, some vector \mathbf{0} which gives x+\mathbf{0}=x for all vectors x, and for which 0 \cdot x = \mathbf{0} for all vectors x. Should you be interested in more details, please consult the companion script.

A set alone (think of the \mathbb R^n with arbitrary n\in\mathbb N, or a concrete example such as \mathbb R^3) can not yet be a vector space. A vector space is the combination of a set of vectors, e.g. \mathbb R^3, with the two “basis operations” of vector addition and scalar multiplication. In consequence, we call the collection \mathbf{V:=(\mathbb R^3, +, \cdot)} a vector space, where + represents addition of real vectors of length 3, and \cdot represents multiplication of real numbers with these vectors. However, people (especially applied mathematicians such as economists) tend to be a bit sloppy with this distinction, and you will oftentimes read that sets, such as the \mathbb R^3, are indeed called vector spaces. For the case of the \mathbb R^n, we usually consider addition element-wise so that for x = (x_1, x_2, \ldots, x_n)'\in\mathbb R^n and y = (y_1, y_2, \ldots, y_n)'\in\mathbb R^n, we have

    \[x + y = \begin{pmatrix}x_1\\x_2\\\vdots\\x_n\end{pmatrix} + \begin{pmatrix}y_1\\y_2\\\vdots\\y_n\end{pmatrix} = \begin{pmatrix}x_1 + y_1\\x_2 + y_2\\\vdots\\x_n + y_n\end{pmatrix}.\]

Further, scalar multiplication is also defined element-wise, so that for x = (x_1, x_2, \ldots, x_n)'\in\mathbb R^n and \lambda\in\mathbb R, we have

    \[\lambda\cdot x = \lambda \cdot \begin{pmatrix}x_1 \\ x_2\\\vdots\\ x_n\end{pmatrix} = \begin{pmatrix}\lambda x_1 \\\lambda x_2\\\vdots\\\lambda x_n\end{pmatrix}.\]

There are many more vector spaces than the \mathbb R^n (with addition and scalar multiplication as defined above). The most important ones for economic applications are the following:

(1) The set \mathbb R^\infty = \{(x_n)_{n\in\mathbb N}: (\forall n\in\mathbb N: x_n\in\mathbb R)\} of real-valued sequences, endowed with vector addition and scalar multiplication in a way defined analogously to the corresponding operations for vectors of finite length.

(2) The set F_X = \{f:X\mapsto\mathbb R\} of real-valued functions with domain X, where X may be an arbitrary set, but is the common domain for all functions in F_X, endowed with addition defined by (f + g): X\mapsto R, x\mapsto f(x) + g(x) for all f,g\in F_X and scalar multiplication defined by (\lambda\cdot f):X\mapsto\mathbb R, x\mapsto \lambda\cdot f(x) for all f\in F_X, \lambda\in\mathbb R.

(3) The set \mathcal M_{n\times m} of n\times m matrices, n,m\in\mathbb N, endowed with “element-wise” addition and scalar multiplication (see chapter 2 for more detail).

Take a minute to make sure that you understand the definitions of addition and scalar multiplication for functions. First, for vector addition, by the concept’s nature, we need to consider two “vectors”, in this case functions, which we do by picking out (arbitrary) functions f,g\in F_X when defining the operation “+“. The resulting object must again be a function, namely the one that maps any x\in X on the sum of the values f(x) and g(x). Similarly, for scalar multiplication, we need to take a real number \lambda and a “vector”, here function, f, to define which function will correspond to the scalar product of \lambda with f.

As they hold little direct value to the remainder of this course, we skip the discussions related to the basis, spans and subspaces here. Should you be interested and/or need these concepts at a later point in your studies, please refer to the companion script for an introduction.

Key Concepts related to Vector Spaces

To conclude our discussion of the basics of vector spaces, let us consider a few concepts that you will come across frequently when dealing with vectors. The first one, we have already used in Chapter 0 without formally defining it:

Definition: Cartesian Product.
Let \mathbb{X}:=(X,+_X,\cdot_X) and \mathbb{Y}:=(Y,+_Y,\cdot_Y) be two real vector spaces. Then, the Cartesian product of \mathbb{X} and \mathbb{Y}, denoted \mathbb{X} \times \mathbb{Y}, is the collection of ordered pairs (x,y) with elements x\in X and y\in Y together with addition and scalar multiplication, respectively defined as (x_1,y_1)+(x_2,y_2)=(x_1+_X x_2,y_1+_Y y_2) and \lambda\cdot (x,y)=(\lambda\cdot_X x, \lambda\cdot_Y y).

 

Indeed, the Cartesian product of two real vector spaces is itself a real vector space. As a very simple example, note that we can write \mathbb R^5 = \mathbb R^3 \times \mathbb R^2.

To conclude the section on definitions, let us define a very special vector operation that is extensively used in all economic disciplines: the scalar product (alternative names are dot product, inner product or vector product):

Definition: Scalar product.
Let  x=(x_1, ..., x_n)', y=(y_1,...,y_n)' \in \mathbb{R}^n. Then, the scalar product \cdot is defined as

    \[\cdot: \mathbb R^n \times \mathbb R^n \mapsto \mathbb R, (x,y)\mapsto x \cdot y = \sum_{i=1}^n (x_i\cdot y_i) = x_1 \cdot y_1 + ... + x_n \cdot y_n.\]

 

It tells us how to multiply elements of the \mathbb R^n with each other. Note that the scalar product as stated here is defined only for the \mathbb R^n and not more general vector spaces where vectors potentially contain elements other than real numbers (this is because multiplying elements within vector spaces is too context-specific to be generalized into a broad concept, as we have done with addition and scalar multiplication). But fear not, you will also know how to multiply, among others, functions and matrices with each other by the end of this course.

The remaining important concepts refer to combinations of the basis operations of vector addition and scalar multiplication:

Definition: Linear Combination.
Let \mathbb{X}:=(X,+,\cdot) be a real vector space, x,y\in X and \lambda_x,\lambda_y\in\mathbb R. Then, the linear combination z of x and y with coefficients \lambda_x and \lambda_y is z=\lambda_x \cdot x + \lambda_y \cdot y. More generally, for k\in\mathbb N, the linear combination of x_1,\ldots, x_k\in X with coefficients \lambda_1,\ldots,\lambda_k\in\mathbb R is \sum_{j=1}^k\lambda_j\cdot x_j.
We say that \mathbb X is closed under linear combination if \forall x,y\in X\forall\lambda_x,\lambda_y\in\mathbb R:\lambda_x \cdot x + \lambda_y \cdot y\in X.

 

Definition: Convex Combination and Convex Set.
Let \mathbb X be a real vector space based on the set X. A convex combination x^c of the vectors x_1,\ldots, x_n\in X is a linear combination x^c = \sum_{i=1}^n \lambda_i x_i, for which \forall i\in\{1,\ldots,n\}:\lambda_i\geq 0 and \sum_{i=1}^n \lambda_i = 1.
A set A\subseteq X is convex if it contains all convex combinations of any two of its elements, i.e. \forall a_1,a_2\in A\forall\lambda\in[0,1]: \lambda a_1 + (1-\lambda) a_2\in A.

 

Actually, we can show that being closed under linear combination is equivalent to being closed under both vector addition and scalar multiplication (recall: we required this for the vector space property).

Definition: Linear Dependence, Linear Independence.
Let \mathbb{X}:=(X,+,\cdot) be a real vector space, and let S\subseteq X, x\in X. x is said to be linearly dependent upon the set S if it can be expressed as a linear combination of its elements, i.e.

    \[\exists k\in\mathbb N\exists s_1,\ldots, s_k\in S, \lambda_1,\ldots,\lambda_k\in\mathbb R: x = \sum_{j=1}^k \lambda_j\cdot s_j.\]

Otherwise, the vector x is said to be linearly independent of S. A set B\subseteq X is said to be linearly independent if each vector in the set is linearly independent of the remainder of the set, i.e. if \forall b\in B: (b\text{ is lin. indep. of }B\backslash\{b\}).

 

In words, x is linearly dependent of S if the elements in S can be (linearly) combined to obtain x. Then, x does not add a new, independent direction, which is why we call it dependent of (the directions in) S.

Theorem: Testing Linear Independence.
An equivalent condition for linear independence of the set of vectors B = \{b_1,b_2,...,b_k\} is that

(1)   \begin{equation*} \sum_{j=1}^k \lambda_j b_j = \mathbf{0} \Rightarrow (\forall j\in\{1,...,k\}:\lambda_j=0). \end{equation*}

 

This result is really important and in fact a key take-away of this chapter! Hence, let us consider two simple examples of how we can use it to prove or disprove linear independence.

First, take the set S_1 = \{(1,2)', (0,3)'\} \subseteq \mathbb R^2 and investigate whether S_1 is a linearly independent set using the theorem above.


S_1 is linearly independent. Suppose that \lambda_1, \lambda_2\in\mathbb R are so that \lambda_1 (1,2)' + \lambda_2 (0,3)' = (0,0)', i.e. (i) 1\cdot\lambda_1 + 0\cdot \lambda_2 = 0 and (ii) 2\cdot\lambda_1 + 3\cdot \lambda_2 = 0. From (i), it follows that \lambda_1 = 0. With this and (ii), 3\cdot\lambda_2 = 0 and hence also \lambda_2 = 0. Hence the implication in equation (1) holds, and S_1 is a linearly independent set.

 

Second, take the set S_2 = \{(1,2,1)', (3,0,4)', (1,-4,2)'\} \subseteq \mathbb R^3 and investigate whether S_2 is a linearly independent set using the theorem above.


S_2 is not linearly independent. Suppose that \lambda_1, \lambda_2, \lambda_3\in\mathbb R are so that

    \[\lambda_1 (1,2,1)' + \lambda_2 (3,0,4)' + \lambda_3(1,-4,2)' = (0,0,0)'.\]

This time, we get 3 equations, the second of which postulates that 2\lambda_1 - 4\lambda_3 = 0 or equivalently, \lambda_1 = 2\lambda_3. Plugging this into the remaining two, we get 3\lambda_3 + 3\lambda_2 = 0 and 4\lambda_3 + 4 \lambda_2 = 0. This can be satisfied as long as \lambda_2 = -\lambda_3. For instance, if we set \lambda_2 = 1, \lambda_3 = -1 and, from the first condition, \lambda_1 = -2, then \lambda_1, \lambda_2, \lambda_3 solve \lambda_1 (1,2,1)' + \lambda_2 (3,0,4)' + \lambda_3(1,-4,2)' = (0,0,0)'. Because the \lambda_j‘s are non-zero, however, the implication in equation (1) does not hold, and S_2 is not a linearly independent set.

 

If you are interested in why the Theorem indeed provides an equivalent condition for linear independence of a set, you can consult the proof in the companion script, which formally establishes the equivalence, that is, it shows you why the theorem implies linear independence and vice versa.

As a final note, recall that for the purpose of this course, scalars and real numbers are the same thing. Essentially, this remains true as long as we don’t deal with complex numbers. To indicate that we confine ourselves to studying real numbers, we call the corresponding vector spaces “real vector spaces”. To be formally precise, the following definitions, propositions and theorems will refer only to such spaces.

Normed Vector Spaces and Continuity

The previous section has aimed at conveying how, for very general and abstract sets of vectors, we can define a space, where we can find a helpful spatial representation – as with the simple two-dimensional plane \mathbb R^2 – by characterizing the “position of” elements in a set of vectors using addition and multiplication in a fashion similar to real numbers. Building on this, this section addresses how we properly define the “length” (or: magnitude) of a vector and, even more important, how we assess the distance of two points in general vector spaces. Furthermore, we will learn how to use this distance concept transfer the standard definition of continuity of simple functions mapping real numbers on real numbers to much more general functions.

Before going into the formal details, let us consider an easy, intuitive example to grasp on what will be going on formally and more abstractly below, and to (hopefully) understand distances – as viewed by mathematicians – are indeed very intuitive and straightforward concepts. As you may know, Mannheim, similar to Manhattan, is organized in squares. Roughly, if you move north, the street names are increasing in letters (e.g. L1, M1, N1, etc.) whereas when moving east, they increase in numbers (L1, L2, L3, …). Note that a map of Mannheim can be thought of as the \mathbb R^2 with fundamental directions “north” and “east” (south is “negative north” and west “negative east”, if you’re confused, draw it on a piece of paper). So, suppose you are in the econ building in L7 and tired of studying, so you wish to go see a movie in the Cineplex in P4. Then, regardless of how you walk precisely (but abstracting from wrong turns), you will have to go four blocks north and three blocks west, so a total number of seven blocks. This simple calculation (going only “zig-zag”) is called the “Manhattan metric”, a commonly used mathematical distance measure! Conversely, if you were a bird and could fly there, you would probably go the direct way (so the minimum distance necessary). Recalling the Pythagorean theorem, this distance is \sqrt{3^2+4^2} = \sqrt{25} = 5 blocks. This is what we call the Euclidean distance, one of, if not the most natural mathematical definition of a distance!

The following discusses how we can generalize these intuitive concepts and introduce them to the more abstract framework of vector spaces, to be able to apply them to arbitrary vectors that we might wish to consider – which will allow us to, for instance, assess the distance of two functions.

This concludes our introductory discussion of vector spaces. If you feel like testing your understanding of the concepts discussed thus far, you can take a short quiz found here.

Metric and Norm in a Vector Space

Many basic mathematical concepts are very intuitive; this is especially true for the concept of a metric or distance function. Consider two objects that stand nearby you, and ask yourself what properties you would like the “distance” between these two objects to have. Clearly, the distance should not below zero or respectively non-negative, and zero if and only if the objects are in fact in the exact same location (e.g. same building but different level in the maps example). Second, it seems natural that the distance should be the same from object 1 to object 2 as for the other way around, i.e. that the distance measure is symmetric. Finally, a third natural requirement is the following: when asked to measure the distance traveled from object 1 and 2 (i) directly and (ii) while passing by some arbitrarily located object 3, one should hope the outcome from (i) to be, in some sense, “smaller” than the outcome from (ii). As the following formal definition will show, these three properties are exactly what defines, in the eyes of mathematicians, a distance function.

In the following, we will assume that we consider vectors in some vector space \mathbb X = (X, +, \cdot). Thus, you can assume that vector addition and scalar multiplication are well-defined even when they are not explicitly introduced in definitions.

Definition: Metric and Metric Space.
Let \mathbb{X} = (X, +, \cdot) be a real vector space. Then, a function d: X\times X\mapsto\mathbb R defines a metric on X if it satisfies the following three properties:

\begin{tabular}{cl|r} & Condition & Name \\\hline (i) & $\forall x,y \in X: d(x,y) \geq 0 $, and $d(x,y)=0 \Leftrightarrow x=y$ & non-negativity\\ (ii) & $\forall x,y \in X: d(x,y) = d(y,x)$ & symmetry\\ (iii) & $\forall x,y,z \in X: d(x,y) \leq d(x,z)+d(z,y)$ & triangle inequality\\ \end{tabular}

If d defines a metric on X, we call (\mathbb X, d) a metric space.

 

Note that the metric is defined on the Cartesian product of X with itself, because the metric takes two elements of X and assesses their distance (i.e. the first element must be in X and also the second, and thus the complete input must lie in the Cartesian product)!

The metric is the most crude distance concept that we usually consider. It is crude in the sense that it is based on satisfaction of only on a set of minimum requirements that already rules out a number of erratically behaving functions as distance measures, but still leaves a high degree of freedom of how a metric may be defined, and in consequence also some room for properties that may frequently be viewed as inconvenient in applications.

A simplistic example of a metric is the so-called binary metric, defined as d_B(x,y) = \mathds{1}[x\neq y]. It is simply an indicator equal to one if the points considered are not the same, and verbally, indicates whether there is a positive distance between the two points or not. As an exercise, verify the metric property for this function.

From the example of the binary metric, it becomes apparent that the metric concept may indeed be too crude to give us what we intuitively want when thinking of a distance: while technically satisfying all requirements of a metric, the binary metric is not helpful in answering “how far” two objects are apart – the answer we get is always only “yes” or “no”.

This means that we need to extend our list of aspects that we desire in a distance measure that makes intuitive sense. First, you would agree that a further natural characteristic of a distance measure is that, when starting from two objects, call their positions x and y, then moving them in the exact same fashion, e.g. \tilde x = x + z, \tilde y = y + z, should also not change the measured distance, right? In terms of a measure d, this means that d(x,y) = d(x+z, y+z). This property is called translation invariance. Note that it is not part of our definition of a metric above, and indeed, it is not ensured to hold for any function that we may call a metric according to this definition.

A further (related but distinct) issue of the metric concept is the one of scaling or “distance from the origin”. Suppose that we are considering some “origin point”; for the \mathbb R^n, this would usually be the zero vector \mathbf 0 = (0,0,\ldots,0)'. The origin point is special because it has no magnitude, that is, it does not extend into any of the vector space’s directions. Typically, we find it practical to think of the length of a vector x as its distance from the origin, i.e. d(x,\mathbf 0). Then, intuitively, when doubling the magnitude (e.g. “zooming in”, if we imagine the \mathbb R^2 as a map) of x, we should double its length, so that d(2\cdot x,\mathbf 0) = 2 d(x,\mathbf 0). Like translation invariance, this is neither part of the definition of a metric nor ensured by it.

These two points motivate the following concept:

Definition: Norm and Normed Vector Space.
Let \mathbb{X} = (X, +, \cdot) be a real vector space. Then, a function \|\cdot\|: X\mapsto\mathbb R defines a norm on X if it satisfies the following three properties:

\begin{tabular}{cl|r} & Condition & Name \\\hline (i) & $\forall x \in X: \|x\| \geq 0 $, and $\|x\|=0 \Leftrightarrow x=\mathbf 0$ & non-negativity\\ (ii) & $\forall x,y \in X: \|x+y\| \leq \|x\| + \|y\|$ & triangle inequality\\ (iii) & $\forall x\in X, \lambda\in\mathbb R: \|\lambda \cdot x\| = |\lambda|\cdot\|x\|$ & absolute homogeneity\\ \end{tabular}

If \|\cdot\| defines a norm on X, we call (\mathbb X, \|\cdot\|) a normed vector space.

 

We can use norms to define distances as follows:

Definition: Norm-induced Metric.
Let (\mathbb X, \|\cdot\|) be a normed vector space. Then, the metric induced by \|\cdot\| is d_N: X\times X\mapsto\mathbb R, (x,y)\mapsto \|x-y\|.

 

It may be a useful exercise for you to verify that the norm-induced metric is, indeed, a metric.


We have to check properties (i)-(iii) in the metric’s definition for the norm-induced metric d_N(x,y) = \|x-y\|. For (i), because \|\cdot\| is a norm, clearly, \forall x,y\in X: d_N(x,y)\geq 0, and d_N(x,y) = 0 \Leftrightarrow \|x-y\| = 0 \Leftrightarrow x-y = \mathbf 0 \Leftrightarrow x=y, where the second equivalence follows from non-negativity of the norm. This establishes (i). Next, let x,y\in X. Then, the norm absolute homogeneity gives

    \[d_N(x,y) = \|x-y\| = \|(-1)(y-x)\| = |-1|\|y-x\| = d_N(y,x),\]

proving (ii), i.e. symmetry of d_N. Finally, let x,y,z\in X. Then, from the norm triangle inequality, it follows that

    \[d_N(x,z) = \|x-z\| = \|x-y + y-z\| \leq \|x-y\| + \|y-z\| = d_N(x,y) + d_N(y,z),\]

which establishes (iii), the triangle inequality of the norm-induced metric.

 

Make sure that you understand the following distinction conceptually: The norm by itself is not a distance function. Rather, it can be used to define a norm-induced metric which is a more specific sub-concept of the more general metric definition.

The reason that we define it is because the norm fixes the two issues of general, unrestricted metrics when it comes to a natural distance interpretation that we discussed above, as summarized by the following result:

Theorem: Norm vs. Metric.
Let (\mathbb{X}, \|.\|) be a normed vector space, \mathbb{X} = (X,+,\cdot), and d_N the metric induced by \|\cdot\|. Then, d_N defines a metric on \mathbb X. Further, d_N exhibits the following extra properties:

\begin{tabular}{cl|r} & Property & Name \\\hline (i) & $\forall x,y \in X$ $\forall \lambda \in \mathbb{R}:$ $d_N(\lambda x, \lambda y)= |\lambda |d_N(x,y)$ & absolute homogeneity\\ (ii) & $\forall x,y,z \in X:$ $d_N(x+z,y+z)=d_N(x,y)$ & translation invariance\\ \end{tabular}

 

In case you want to understand why this is true, note that the theorem directly follows by plugging in the definition of the norm-induced metric, and using absolute homogeneity of the norm for (i).

As we have shown, norm-induced metrics are indeed metrics, and therefore satisfy the intuitive “basis characteristics” of a distance measure as discussed in the introductory paragraph above. Moreover, because of this result here, norms are extremely helpful in defining distance functions with a broader range of appealing properties. Indeed, in almost all applications relevant to economists, norm-induced metrics are our go-to way of defining a distance in the mathematical sense.

A further appealing feature of norms is that we can use them (or the metrics induced by them) to define the length of a vector in the desired way: usually, when talking about the length of some vector x, we simply refer to the norm \|x\|. To see why intuitively, recall our decomposition of vectors into magnitude and directionality, x = m_x\cdot d_x, where m_x\in\mathbb R is the magnitude and d_x is the direction vector of the same shape as x. Suppose that for direction vectors d, we have \|d\| = 1 (this is true for most common norms when we consider the fundamental directions of any \mathbb R^n and their convex combinations to arbitrary directions). Then, the length is simply the absolute magnitude:

    \[\|x\| = \|m_x\cdot d_x\| = |m_x|\cdot \|d_x\| = |m_x|.\]

The remainder of this subsection is concerned with (i) which norms are natural candidates to consider when defining specific norm-induced metrics, and (ii) which central results help us when handling norms.

We know that norm-induced metrics are a very promising concept for measuring distances in a mathematical way. Still, in practical applications, the general concept does not yet give us sufficient guidance on how we can measure distances – for this, we need specific functions that are norms and can be used to define concrete norm-induced metrics. In the context of the \mathbb R^n, the most commonly used class of functions is the following:

Definition: p-Norm and Euclidean space.
Consider the real vector space (\mathbb R^n, +, \cdot). Then, the p-norm over \mathbb R^n with p\in\mathbb N is the norm \|\cdot\|_p: \mathbb R^n \mapsto\mathbb R, x \mapsto \left (\sum_{k=1}^n|x_k|^p\right )^{1/p}. Moreover, we define \|\cdot\|_\infty: \mathbb R^n \mapsto\mathbb R, x \mapsto\max_{1\leq k\leq n} |x_k| as the maximum norm. When d_N^2 is the metric induced by the 2-norm (“Euclidean norm”), we call ((\mathbb R^n, +, \cdot), d_N^2) the Euclidean space of dimension n.

 

Note that when n=1, i.e. when considering \mathbb R rather than an actual vector space, all p-norms are simply equal to the absolute value. Indeed, the resulting metric, d(x,y) = |x-y| for x,y\in\mathbb R, is the so-called natural metric of the \mathbb R, and is the common metric used to measure distances between points in \mathbb R. The interested reader may want to verify that the p-norm indeed constitutes a norm. You can use that the mapping x\mapsto x^{1/p} is concave for p\geq 1, and that thus, (x+y)^{1/p} \leq x^{1/p} + y^{1/p}, then this should be a simple exercise. The classical spaces considered in economics are metric spaces (\mathbb R^n, +, \cdot) endowed with norm-induced metrics to have all the intuitive properties we are interested in. For instance, the “zig-zag” Manhattan-metric discussed earlier corresponds to the metric induced by the 1-norm, and the “direct way” Euclidean metric to one induced by the 2-norm. Mostly, we are interested in the “direct” or “shortest” distance, so that we consider the Euclidean space.

As a take-away of the discussions thus far, if you define a distance measure (a “metric”) from a norm, you are guaranteed a broad set appealing, intuitive properties. For the \mathbb R^n, norms are rather easy to come by, and can e.g. be constructed as p-norms. We usually deal with the Euclidean norm, a special p-norm with p=2, because it has an intuitive “direct distance” interpretation in the \mathbb R^2.

Let’s get to the helpful facts related to norms that were promised above. First, we can find a “reversed” version of the triangle inequality:

Proposition: Inverse Triangle Inequality.
Let (\mathbb X, \|\cdot\|) be a normed vector space. Then,

    \[\forall x,y\in X: \|x-y\|\geq | \|x\| - \|y\| |.\]

 

Since this is our first proposition, recall that, as already mentioned, everything labeled a “proposition” is proven in the companion script, and if you are ever interested in digging deeper into some of the presented facts, you can have a look there. Especially students struggling with formality and/or finding the variety of new concepts challenging may benefit from looking into the proofs – most of them are relatively accessible once you have understood the concepts previously discussed, and having studied the proof of a fact is frequently helpful for memorizing and correctly applying the fact itself.

Further, a nice relationship of p-norms that you may want to be aware of is the following:

Proposition: p-Norm and Maximum-Norm.
Consider the vector space (\mathbb R^n, +, \cdot), n\in\mathbb N, and let p<\infty. Then, for any x\in\mathbb R^n,

    \[\|x\|_\infty\leq \|x\|_p \leq n^{1/p}\cdot\|x\|_\infty.\]

 

Open, Closed and Compact Sets

A very important concept is the following:

Definition: \varepsilon-Ball, Neighborhood.
Let \mathbb X = (X, +, \cdot) be a real vector space and (\mathbb{X}, d) be a metric space. Further, let x_0\in X, and \varepsilon>0. The \varepsilonopen ball or neighborhood B_{\varepsilon}(x_0) centered at x_0 is the set of points whose distance from x_0 is strictly smaller than \varepsilon, that is:

    \[B_{\varepsilon}(x_0)=\left\lbrace x \in X: d(x,x_0) < \varepsilon \right\rbrace.\]

Conversely, the \varepsilonclosed ball \overline{B}_{\varepsilon}(x_0) centered at x_0 is the set of points whose distance from x_0 is not larger than \varepsilon:

    \[ \overline{B}_{\varepsilon}(x_0)=\left\lbrace x \in X: d(x,x_0) \leq \varepsilon \right\rbrace. \]

 

501x398
Note that we only call open balls “neighborhoods”. The label “ball” comes from the \mathbb R^2, especially the Euclidean space (where we, recall, use the metric induced by the Euclidean norm). Here, you can check that an \varepsilon-ball around x_0 is merely a circle with radius \varepsilon; you may find it easiest to do so with \varepsilon = 1 (looking up the definition of the unit circle may help). Then, whether the ball is closed or open is just a matter of whether the boundary (defined below) is included in the set, or not.

Recall also that we said that in \mathbb R, all p-norms reduce to the absolute value. Thus, in \mathbb R with its natural metric, that is, the metric induced by the absolute value, d(x,y) = |x-y|, the balls are just intervals around their middle point: B_\varepsilon(x_0) = (x_0-\varepsilon, x_0+\varepsilon) and \bar B_\varepsilon(x_0) = [x_0-\varepsilon, x_0+\varepsilon].

Below, we consider the concepts of interior, boundary and closure points of a set, and set closedness and openness. To develop an intuition for them, which will be the focus of the elaborations below, is rather straightforward using the figure you have just seen. To give you some chance to practice mathematical and formal correctness, the formal definitions of these concepts are also given. But remember, it is all quite intuitive, so don’t freak out about the notation!

In the figure above, you can immediately imagine what we mean by the “interior” and the “boundary” of a ball, right? As we did with addition and scalar multiplication previously, we now extend these concepts to general metric spaces, which allows to generalize this graphical intuition to more abstract scenarios that we can not sketch. But the intuition that we will use is precisely the one from the \mathbb R^2: an interior point should have only other interior points “very close” to it, and no matter how “close” we move to a boundary point, it will always be surrounded by points that do and do not belong to the set.

Definition: Interior Point, Interior.
Let \mathbb X = (X, +, \cdot) be a real vector space and (\mathbb{X}, d) be a metric space. Let A\subseteq X. Then, a\in A is said to be an interior point of A if there exists \varepsilon>0 such that the \varepsilon-open ball centered at a lies entirely inside of A, i.e. \exists\varepsilon>0:B_\varepsilon(a)\subseteq A. The set of interior points of A is called the interior of A, denoted int(A) or \mathring{A}, i.e. int(A) = \{a\in A: (\exists\varepsilon>0:B_\varepsilon(a)\subseteq A)\}.

 

As you have seen, the open ball includes only interior points, i.e. the set itself is in fact equal to its interior! Accordingly, we use the interior concept to define what we mean by openness of a set more generally:

Definition: Open Set.
Let \mathbb X = (X, +, \cdot) be a real vector space and (\mathbb{X}, d) be a metric space. Let A\subseteq X. Then, A is said to be an open set if A=int(A).

 

Note that trivially, int(A)\subseteq A. Hence, any set A contains its interior, but the converse is true if and only if A is open. Indeed, this is the key take-away also for proofs that seek to establish openness of a set A: it suffices to check that any point a\in A is also contained in int(A)!

The boundary, on the other hand, corresponds to the circle separating the open ball from points that should are be included and lie outside of it in our figure. Accordingly, for a more general set A, we can view it as the line between the interior of A and the points that lie outside A.

Definition: Boundary Point, Boundary.
Let \mathbb X = (X, +, \cdot) be a real vector space and (\mathbb{X}, d) be a metric space. Let A\subseteq X. Then, x\in X is said to be a boundary point of A if , for every \varepsilon>0, the \varepsilon-open ball centered on x contains both points that belong to A and ones that do not, i.e. \forall\varepsilon>0: (B_\varepsilon(a)\cap A \neq \varnothing \land B_\varepsilon(a)\cap (X\backslash A)\neq \varnothing). The set of boundary points of A is called the boundary of A and denoted \partial{A}, i.e. \partial{A} = \{a\in A: (\forall\varepsilon>0: (B_\varepsilon(a)\cap A \neq \varnothing \land B_\varepsilon(a)\cap (X\backslash A)\neq \varnothing))\} .

 

The closed ball as shown in the figure, beyond its interior points, contains also all points on its boundary. Thus, for a point to be included in the closed ball, it must be either an interior point or a boundary point. We will summarize these two types of points as “closure points” to more easily talk about closed sets:

Definition: Closure Point, Closure.
Let \mathbb X = (X, +, \cdot) be a real vector space and (\mathbb{X}, d) be a metric space. Let A\subseteq X. Then, x\in X is said to be a closure point of A if , for every \varepsilon>0, the \varepsilon-open ball centered at x contains at least one point a that belongs to A, i.e. \forall \varepsilon > 0 \exists a\in B_\varepsilon(x): a\in A. The set of closure points of A is called the closure of A, denoted \overline{A}, i.e. \overline{A} = \{x\in X: (\forall \varepsilon > 0 \exists a\in B_\varepsilon(x): a\in A)\}.

 

As stated above, verbally, a closure point is either an interior point or a boundary point of the set. To see that the somewhat unwieldy definition of closure points as given here is indeed equivalent to this, you can read it in the following way: closure points are either elements of A, or they lie outside of A but “touch” A in the sense that no matter how small a ball we choose around them, they still contain elements of A. Graphically, the former type of points corresponds to interior points of A, while the latter type corresponds to the boundary.

According to our discussion above, we will call a set closed if it contains all closure points:

Definition: Closed Set.
Let \mathbb X = (X, +, \cdot) be a real vector space and (\mathbb{X}, d) be a metric space. Let A\subseteq X. Then, A is said to be a closed set if A=\overline{A}.

 

For establishing closedness, note that any set A is included in its closure, but the converse is true if and only if A is closed. Transferring the intuition of the illustrating figure above more directly, we now can characterize the boundary as the a set of elements such that, if they all belong to A, then A is closed, and, if none of them belong to A, then A is open.

For our usual metric space and (\mathbb{X}, d) and A\subseteq X, we may now rephrase our concepts of open and closed sets as follows:

    • A is open if and only if none of the boundary points of A lie in A: A \cap \partial{A}= \varnothing.
    • A is closed if and only if all the boundary points of A lie in A: A \cap \partial{A}= \partial{A}.

Note that a set may be neither open nor closed, namely, if only a fraction of boundary points lie in the set! As an example, consider the half-open interval [0,1):=\{x \in \mathbf{R}: 0 \leq x <1\}. Therefore, make sure to remember that when it comes to closed and open, a set need not always be one and not the other!

Now we have an idea of what closed and open sets and balls are, and it will soon become evident that they are very useful when studying functions and characterizing the behavior and properties. However, when given a specific set (e.g. think about the budget set B(p,y) = \{x = (x_1, x_2)' \in\mathbb R^2_+: p_1 x_1 + p_2 x_2 \leq y\} with income y and prices p = (p_1,p_2)), it is typically not immediately clear to determine whether it is open, closed, or neither, and directly applying the definitions above may be cumbersome. To overcome this issue, there are a wide range of results providing equivalent and sufficient conditions for openness and closedness of sets. Below, you can find the ones that you should be familiar with as those are the ones used most frequently, at least in the economics context.

Theorem: Properties of Open Sets.
In a metric space (\mathbb{X}, d),
(i) \varnothing and X are open in \mathbb{X}.
(ii) A set A\subseteq X is open if and only if its complement A^c = X\backslash A is closed.
(iii) The union of an arbitrary (possibly infinite) collection of open sets is open.
(iv) The intersection of a finite collection of open sets is open.

 

Theorem: Properties of Closed Sets.
In a metric space (\mathbb{X}, d),
(i) \varnothing and X are closed in \mathbb{X}.
(ii) A set A\subseteq X is closed if and only if its complement A^c = X\backslash A is open.
(iii) The union of a finite collection of closed sets is closed.
(iv) The intersection of an arbitrary (possibly infinite) collection of closed sets is closed.

 

Two more theorems might be helpful at times to establish closedness:

Theorem: Weak Inequalities and the Limit: Functions.
Suppose that \mathbb{X} = (X, +, \cdot) is a real vector space, f:X\mapsto\mathbb R and g: X\mapsto\mathbb R so that \forall x\in X: f(x) \leq g(x) (in function notation, we would write f\leq g). Let x_0\in X, and suppose that \exists f_0, g_0\in \mathbb R so that \lim_{x\to x_0} f(x) = f_0, \lim_{x\to x_0} g(x) = g_0. Then, it holds that f_0\leq g_0.

 

The theorem exists for sequences in an analogous way:

Theorem: Weak Inequalities and the Limit: Sequences.
Suppose that \mathbb{X} = (X, +, \cdot) is a real vector space. Let \{x_n\}_{n\in\mathbb N} and \{y_n\}_{n\in\mathbb N} be convergent sequences over X, i.e. \forall n\in\mathbb N: x_n,y_n\in B, with limits x\in X and y\in X, respectively. If \forall n\in\mathbb N, it holds that x_n\leq y_n, then, we also have x\leq y.

 

This is extremely useful for the context of set closedness and openness because:

Theorem: Closedness and Sequences.
Suppose that \mathbb{X} = (X, +, \cdot) is a real vector space, and let B\subseteq X. Then, B is closed if and only if, for any convergent sequence \{x_n\}_{n\in \mathbb N} over B, i.e. \forall n\in\mathbb N: x_n\in B, it holds that \lim_{n\to\infty} x_n \in B.

 

To see the intuition, consider again the figure illustrating balls in the \mathbb R^2. As n\to\infty, convergent sequences \{x_n\}_{n\in \mathbb N} are restricted to an ever smaller ball around the limit point x = \lim_{n\to\infty} x_n. Therefore, if the sequence is over the set B, it will reduce to an ever smaller ball “in proximity to” points of B, if not in B — the precise definition of the closure. This means that either, the limit point lies in the interior or on the boundary. However, for the point x to be certainly included in the set, next to all interior points, any boundary point must be contained in the set — which precisely describes boundedness!

To see how the last theorem can be used to simplify investigations of closedness, consider the budget set with income y and prices p = (p_1,p_2):

    \[\begin{split} B(p,y) &= \{x = (x_1, x_2)' \in\mathbb R^2_+: p_1 x_1 + p_2 x_2 \leq y\} \\&=\{x\in\mathbb R^2_+: p\cdot x \leq y\}.\end{split}\]

We are yet to discuss convergence and continuity for higher-dimensional functions formally, so let us just discuss the intuition here: consider an arbitrary sequence \{x_n\}_{n\in \mathbb N} over B(p,y) that converges to some point x\in\mathbb R^2. We will see that the dot product is a continuous function (intuitively, it is a generalization of multiplication, which is a continuous operation). Hence, we can pull the limit in, so that

    \[\lim_{n\to\infty} p\cdot x_n = p\cdot \lim_{n\to\infty} x_n = p\cdot x.\]

Because the sequence is over B(p,y), by its definition, it holds that

    \[\forall n\in\mathbb N:\hspace{0.5cm}p\cdot x_n \leq y.\]

Applying the theorem on weak inequalities and the limit:

    \[p\cdot x = \lim_{n\to\infty} p\cdot x_n \leq \lim_{n\to\infty} y = y.\]

Therefore, x is also a point in the budget set! Because we have started from an arbitrary, convergent sequence over the budget set, the last theorem allows us to conclude that the budget set is closed!

Indeed, we can use this way of thinking about sets characterized by inequalities one step further: whenever we have weak inequalities and a continuous function of x, the set will be closed. For strict inequalities, we can also say something: note that the complement set will be characterized by a weak inequality (e.g. \{x\in\mathbb R^2: x_1<0\}^c =\{x\in\mathbb R^2: x_1\geq 0\}) – and therefore closed. Hence, the set characterized by the strict inequality will be open! Finally, for a set characterized by an equality, the complement can be split into the union of two sets characterized by a strict inequality: e.g.

    \[\{x\in\mathbb R^2: x_1=0\}^c =\{x\in\mathbb R^2: x_1 \neq 0\} = \{x\in\mathbb R^2: x_1 > 0\} \cup \{x\in\mathbb R^2: x_1 < 0\}.\]

Because the RHS sets are characterized by strict inequalities, they are open, and the complement of the set characterized by the equality, as a union of two open sets, is open! As such, the set itself is a closed set. To summarize:

Theorem: Closedness, Openness and Inequalities.
Suppose that \mathbb{X} = (X, +, \cdot) is a real vector space, and let B\subseteq X. Further, let f:X\mapsto\mathbb R be a continuous function, and consider a threshold value c\in\mathbb R. Then,

    • if B = \{x\in X: f(x) \leq c\} or B = \{x\in X: f(x) \geq c\}, then B is closed.
    • if B = \{x\in X: f(x) < c\} or B = \{x\in X: f(x) > c\}, then B is open.
    • if B = \{x\in X: f(x) = c\}, then B is closed.

 

This yields that the budget set is closed also when we impose that all money is spent, i.e. p\cdot x = y. Be cautious that this theorem can only be applied if the function f is continuous, which has to be verified in a first step.

A further important property of sets in metric spaces is the one of boundedness. It is defined as follows:

Definition: Bounded Set.
Let \mathbb X = (X, +, \cdot) be a real vector space and (\mathbb{X}, d) be a metric space. Let A\subseteq X. Then, A is said to be a bounded set if it is contained in an open ball of finite radius r, i.e.

    \[\exists x_0\in X\exists r: 0\leq r<\infty: A\subseteq B_r(x_0).\]

 

Verbally, the set is bounded if the distance between any two points in the set can not get arbitrarily large, but rather, it is bounded by some finite threshold value. Indeed, the definition above is equivalent to

    \[\exists r^*: 0\leq r^*<\infty: (\forall a_1, a_2\in A: d(a_1,a_2)<r^*),\]

which more explicitly highlights this interpretation. The intuition is that within a ball of finite radius r, no two points in the ball can lie further apart than the diameter 2r. As the bounded set A is contained in the ball, i.e. it is a subset of it, points in A also lie in the ball, and their distance is bounded by r^* = 2r. As usual, should you be interested in the formalities behind the equivalence, you can find them in the companion script.

You will shortly see why boundedness is a very useful property. But first, let us turn to how we can establish it. Either, you can show any of the two equivalent definitions above directly, or, should you be working with a (p-) norm induced metric, as we almost always will, you can refer to the following more easily checked result:

Proposition: Checking Boundedness with a Norm-induced Metric.
Let \mathbb X = (X, +, \cdot) be a real vector space and (\mathbb{X}, d) be a metric space such that d is norm-induced, i.e. for x,y\in X, d(x,y) = \|x-y\|. Let A\subseteq X. Then, A is bounded if the norm is bounded on A, i.e. \exists b<\infty:(\forall x\in A: \|x\|\leq b).

 

This proposition is very important and in fact, it is easily established using the triangle inequality of the metric. For any x,y\in A, the triangle inequality of the metric gives

    \[d(x,y) \leq d(x,0) + d(y,0) = \|x\| + \|y\|.\]

If the norm is bounded by b<\infty on A, then d(x,y)\leq 2b. Thus, for an arbitrary x_0\in A, there exists r = 2b + 1 so that \forall x\in A: d(x,x_0)<r, that is, A\subseteq B_{r}(x_0), which is precisely what we require in the definition of boundedness that we have seen above. Thus, the set A is bounded.

The last key concept discussed in this section is compactness. Don’t worry about the definition which is rather abstract, it’s just stated here for completeness, the intuition and how we investigate it, as discussed below, are far more important.

Definition: Compact Set.
Let \mathbb X = (X, +, \cdot) be a real vector space and (\mathbb{X}, d) be a metric space. Let A\subseteq X. Then, A is said to be compact if every open covering \{U_i\}_{i\in I} with index set I, i.e. \{U_i\}_{i\in I} such that U_i is open \forall i\in I and \bigcup_{i\in I} U_i \supseteq A, has a finite subcovering, i.e. \exists I^*\subseteq I such that I^* contains finitely many elements, and \bigcup_{i\in I^*} U_i \supseteq A.

 

For the \mathbb R^n, the following equivalence holds:

Theorem: Heine-Borel.
Consider the metric space (\mathbb R^n, d), where d is induced by a p-norm, and let A\subseteq \mathbb R^n. Then, A is compact if and only if A is closed and bounded.

 

Indeed, almost all compactness proofs in economics use Heine-Borel’s theorem, so that when asked to show compactness, it is the starting point for you. To apply it, one separately shows closedness and boundedness.

To see the value of compact sets, consider the \mathbb R, where intervals [a,b] are a special form of closed and bounded and thus compact sets. Clearly, any continuous function f defined on the whole interval will assume a maximum and minimum on such a set, either in the interior (a,b), or otherwise at a or b (this is precisely the Weierstrass Extreme Value Theorem that you may remember from chapter 0)! As we will see, similar reasoning applies to more general spaces, and compact sets are a powerful concept for functional analysis and optimization.

To get some feeling for boundedness and compactness, let us re-consider the budget set. If we can show that it is compact, we know that when we look at a continuous, but otherwise unrestricted utility function, there will always be a utility-maximizing consumption bundle given the budget constraint because of the Weierstrass Extreme Value Theorem! This would be quite a general and powerful result, and because we have already shown closedness of the budget set, by the Heine-Borel Theorem, all we need to worry about for this is boundedness of the budget set.

Let us first think about this issue intuitively. When can the distance between to possible consumption bundles x^{(1)} and x^{(2)} in the budget set

    \[B(p,y) = \{x = (x_1, x_2)' \in\mathbb R^2_+: p_1 x_1 + p_2 x_2 \leq y\}\]

get arbitrarily large? For this, we would either have to move the consumption of the first or the second good (or both) in the two consumption bundles infinitely far apart from each other. Because consumption can not be negative for either good, this effectively means that in one consumption bundle, we would have to consume infinitely much of one of the goods. However, this is only possible if the price of the associated good is zero (assuming prices can not be negative), else, infinite consumption is not possible with a finite budget y<\infty.

Indeed, we can formally show that for strictly positive prices p_1>0, p_2>0, the budget set is bounded! To make life more simple, let’s assume that we’re dealing with a p-norm-induced metric (for instance our usual Euclidean metric). Then, using our boundedness theorem above, what we need to show is that the norm we use is bounded on B(p,y), i.e. that there exists some threshold b\in\mathbb R so that \|x\|\leq b for any x\in B(p,y).

Without loss of generality, assume p_1\geq p_2, that is, assume that the goods are labeled in a way that the least expensive one is the second good – if this is not the case, we would re-label the goods and then proceed the same as we do now (that is what “without loss of generality” means: we make an assumption for analytical simplicity that does in no way restrict the generality of the obtained result). Using our proposition on the p-norm and the maximum norm at the first inequality, note that for x\in B(p,y),

    \[\begin{split}\|x\| \leq 2^{1/p}\|x\|_\infty &= 2^{1/p} \max\{x_1, x_2\} = \frac{2^{1/p}}{p_2} \max\{p_2x_1, p_2x_2\} \\&\leq \frac{2^{1/p}}{p_2} (p_2x_1 + p_2x_2) \leq \frac{2^{1/p}}{p_2} (p_1x_1 + p_2x_2) \\&\leq \frac{2^{1/p}}{p_2} y =: b. \end{split}\]

This yields the conclusion that the budget set of the \mathbb R^2 with strictly positive prices is bounded and also closed, as we have shown earlier. Therefore, by Heine-Borel’s theorem, it is compact, a result which is very important for optimization. Indeed, also for budget sets in the \mathbb R^n, we can proceed in an analogous way as we have done here to establish:

Proposition: Compactness of the Budget Set.
Consider the budget set

    \[B(p,y) = \{x \in\mathbb R^n_+: p_1 x_1 + p_2 x_2 + \ldots + p_n x_n \leq y\}.\]

Then, B(p,y) is closed. If \min_{i\in\{1,\ldots,n\}} p_i > 0, then B(p,y) is also bounded and therefore compact.

 

Note that closedness follows directly from the result on sets characterized by inequalities. For boundedness, we just would have to modify the line of reasoning above slightly, assuming without loss of generality that p_n is the smallest price. Should you be motivated, go ahead and try to come up with this more general argument.


For x\in B(p,y),

    \[\begin{split}\|x\| \leq n^{1/p}\|x\|_\infty &= n^{1/p} \max\{x_i: i\in\{1,\ldots,n\}\} = \frac{n^{1/p}}{p_n} \max\{p_nx_i: i\in\{1,\ldots,n\}\} \\&\leq \frac{n^{1/p}}{p_n} \sum_{i=1}^n p_nx_i \leq \frac{n^{1/p}}{p_n} \sum_{i=1}^n p_ix_i \\&\leq \frac{n^{1/p}}{p_n} y =: b. \end{split}\]

 

Continuity and Convergence

If you remember the preliminary chapter, using the limit concept for real-valued functions, we associated continuity with the requirement that as two points become ever closer, their images should not be too far apart one from another. Now, we know how to mathematically handle distances more generally, it is time to formalize and generalize the continuity concept, which is the purpose of this section.

Start again from the \mathbb R and a function f:X\mapsto Y with X,Y\subseteq\mathbb R, where we call f_a\in\mathbb R the limit of f at a\in\mathbb R, if

    \[\forall\varepsilon>0\exists\delta>0:(\forall x\in X:(|x-a|\in(0,\delta)\Rightarrow |f(x)-f_a|<\varepsilon)).\]

The continuity requirement for f at x_0\in X, f(x_0) = \lim_{x\to x_0} f(x) can be written as

    \[\forall\varepsilon>0\exists\delta>0:(\forall x\in X:(|x-x_0|<\delta\Rightarrow |f(x)-f(x_0)|<\varepsilon)).\]

Now, recall that the common metrics that we use for the \mathbb R^n are p-norm-induced, and that for the \mathbb R, any p-norm is equal to the absolute value, and therefore, we commonly use the so-called natural metric of the real line, d(x,y) = |x-y| for x,y\in\mathbb R. Then, the definition of continuity at x_0\in X is equivalent to

    \[\forall\varepsilon>0\exists\delta>0:(\forall x\in X:(d(x,x_0)<\delta\Rightarrow d(f(x),f(x_0))<\varepsilon)).\]

This step is indeed all that is necessary to generalize the continuity concept to arbitrary metric spaces:

Definition: Continuous Function.
Let (\mathbb{X} , d_{X}) and (\mathbb{Y} , d_{Y}) be metric spaces based on the sets X and Y, respectively. Then, a function f:X\mapsto Y is continuous at x_0 \in X if for every \varepsilon>0, there exists a \delta>0 such that the image of the \delta-open ball around x_0 is contained in the \varepsilon-open ball around f(x_0), i.e.

    \[\forall\varepsilon>0\exists\delta>0:(\forall x\in X:(d_X(x,x_0)<\delta\Rightarrow d_Y(f(x),f(x_0))<\varepsilon)).\]

A function that is continuous at every point of its domain is said to be continuous.

 

Be sure to understand how the statement in quantifiers relates to the verbal statement referring to the open balls. Note also that a function can not be continuous at x_0 if in any \delta-open ball around x_0, there is a point at which f is not defined!

The definition uses two metrics, d_X and d_Y. This is because the distance of x and x_0, two points in the domain X of f, needs to be assessed by a metric defined on X\times X, while f(x) and f(x_0) are points in the codomain Y of f, and assessing their distance requires a metric defined on Y\times Y. Since we intend to be as general as possible and do not want restrict domain and codomain to be identical here, we have to make sure to use one metric for the domain, d_X, and one for the codomain, d_Y.

Similarly to continuity, we can also straightforwardly generalize convergence of sequences in metric spaces: recall that in \mathbb R, x is the limit of a sequence \{x_n\}_{n\in\mathbb N} if

    \[\forall \varepsilon > 0 \exists N\in\mathbb N: (\forall n\in\mathbb N: (n\geq N \Rightarrow |x_n-x|<\varepsilon)).\]

Using again the natural metric of \mathbb R, the condition in brackets can be equivalently written as \forall n\in\mathbb N: (n\geq N \Rightarrow d(x_n,x)<\varepsilon). Exploiting this intuition, we can define convergence of sequences more generally:

Definition: Convergent Sequence.
Let (\mathbb{X} , d) be a metric space based on the set X, and let \mathbf x:= \{x_n\}_{n\in\mathbb N} be a sequence over X, i.e. \forall n\in\mathbb N: x_n\in X. Then, \mathbf x is said to be convergent if

    \[\exists x\in X: (\forall \varepsilon > 0 \exists N\in\mathbb N: (\forall n\in\mathbb N: (n\geq N \Rightarrow d(x_n,x)<\varepsilon))).\]

If \mathbf x is convergent, the point x satisfying this condition is called the limit of \mathbf x, denoted x = \lim_{n\to\infty} x_n.

 

To conclude this section, let us consider one last theorem that is very helpful for investigations into continuity that tries to avoid dealing with the direct, formal definition. It combines the concepts of limits and continuity, and is perhaps the most important tool to disprove continuity, so you can gain a lot from being familiar to it.

Theorem: Sequence Characterization of Continuity.
Let (\mathbb{X} , d_{X}) and (\mathbb{Y} , d_{Y}) be metric spaces based on the sets X and Y, respectively. Then, the function f:X\mapsto Y is continuous at x_0 \in X if and only if for every sequence \mathbf x:= \{x_n\}_{n\in\mathbb N} over X, f(x_0) = \lim_{n\to\infty} f(x_n).

 

Thus, to establish that f is not continuous at x_0, it suffices to find a sequence \mathbf x:= \{x_n\}_{n\in\mathbb N} so that \lim_{n\to\infty} x_n = x_0 and either \lim_{n\to\infty} f(x_n) does not exist, or it does but \lim_{n\to\infty} f(x_n) \neq f(x_0).

Further, the sequence characterization gives us that the very useful tool of “pulling limits into continuous functions” works also for general functions as we have considered them here: simply plugging in that x_0 = \lim_{n\to\infty} x_n into the theorem above, we get for any continuous function f and any sequence \{x_n\}_{n\in\mathbb N} over X with limit x\in X:

    \[ \lim_{n\to\infty} f(x_n) = f(x_0) = f\left (\lim_{n\to\infty} x_n\right ). \]

This concludes our investigations into vector spaces here. To test how well you understood the second half of the discussions on vector spaces, you can do a short quiz found here.