An overview of this chapter’s contents and take-aways can be found here.
Broadly, Chapters 0 and 1 have covered topics of fundamental concern to all mathematical disciplines, while the elaborations of Chapter 1 were already more closely linked to what mathematicians call linear algebra, that is, the study of linear operations such as scalar multiplication and vector addition. Subsequently, Chapter 2 has discussed a central building block of linear algebra: characterizing and solving systems of linear equations. For the rest of this course, we want to move away from linear algebra and instead consider key issues in mathematical analysis, where we are concerned with analyzing mathematical objects, especially functions and related equations (say ), and investigate whether they are continuous, differentiable, invertible, have maxima and minima, and much more. This chapter deals with what is perhaps one of if not the most central building block in analysis: differentiation and integration of “general” functions, that is, functions mapping between vector spaces, without further restrictions on domain and co-domain.
As always, let us first consider why we as economists should bother with the concepts discussed here. While you certainly know how to e.g. take derivatives of functions mapping from to
, being familiar with more general methods of (functional) analysis is invaluable because the typical functions we consider have more than one argument. To give a highly non-exhaustive list, you may think about utility derived from a vector of goods (quantities), the welfare given quantities of a private and a public good or production cost with multiple inputs. When thinking about optimization of these quantities, we can not proceed without the insights from multivariate calculus.
Even more generally, most fundamentally, in economic problems, we rarely care about the choice variables (e.g. quantities of goods consumed) directly, but rather about the outcome they produce (e.g. utility derived from consumption), and therefore, there is a function mapping choice into outcome in the background of almost every economic consideration. Thus, it should be clear that functional analysis is a key competence an economist should acquire.
Of the tools of functional analysis, especially differentiation and integration are important for the economist (but of course not only those). Beyond the elaborations above, this is as a number of quantities of fundamental concern to the economist are derived directly from these concepts: consider, for instance, the price of a good in a perfectly competitive market: it corresponds to the demand side’s marginal willingness-to-pay and the marginal cost of production for the firm at the last good traded. Especially when there are multiple markets to consider and consumers and/or firms are active on more than one of them (e.g. households consuming both apples and bananas, and a fruit company producing both these products). Also, a social planner will always care about the aggregate welfare in an economy, which has to be computed from integration over all individual households.
As stated initially, the functions that we study here map between vector spaces. There is some related terminology that one should be familiar with before moving on, so let us turn to it here in a first step. Recall from Chapter 0 that we write a function as
This statement is a complete summary of all of ‘s relevant aspects: the domain
, the codomain
and the rule
that associates elements in the domain with the respective element in the codomain. Depending on what sets the domain and codomain come from, we attach different labels to
. The most important ones are:
To familiarize yourself with these expressions, think about the domain and codomain of
and what labels we attach to
Let us begin with some basics. Here, we will have a brief discussion of invertibility, a concept which translates in a one-to-one fashion from univariate real-valued functions, and then consider the very important concepts of convexity and concavity.
Recall from our introductory discussion of functions in Chapter 0 that we could invert a function ,
if and only if for any
, there exists exactly one
such that
. This is the case because then, and only then, can we identify a unique element
that is mapped onto
for any
— recall that when considering
as a candidate for the inverse function, we require
to be defined everywhere on
, i.e. for all
, and by definition of
as a function,
must take exactly one value
for any
, rather than multiple values or no value at all. An alternative way to think about this is that in the case that for any
, there exists exactly one
such that
, knowing the
is equivalent to knowing the
in terms of identifying the candidate pair
one considers, and from the rule that associates
‘s with
‘s, by knowing the pair
in the graph of
associated with every
, we can define the rule that associates
‘s with
‘s to yield the same pairs
as
. To see these abstract elaborations graphically, let’s consider two examples of functions where
and
contain only three elements.
Here, the function is represented by the blue arrows. As the green dotted boxes indicate, there are well-defined pairs
, every one of which can be identified by knowing either the
– or the
-value. Hence, we can invert the rule of
and define the inverse function as the rule associating
‘s with
‘s to obtain the same pairs as we do under
, as indicated by the green arrows.
Here, again, is represented by the blue arrows. However, now we have two sources of ambiguity in the attempt to invert the mapping of
: first, the value
is associated with two
-values. By the definition of a function, we can, however, map
only onto one
value when considering a mapping
, so that it is not possible to define a value of the inverse function at
. Secondly, the value
is associated with no
-value at all, and there is no candidate value for the inverse function to take at
.
What should be especially clear from these examples is that the conditions under which the inverse function exists or does not, respectively, are not specific to univariate, real-valued functions, but generally refer to any function with arbitrary domain
and codomain
.
We can define invertibility elegantly using the concepts of injectivity and surjectivity.
Definition: Surjective Function.
Let for some sets
and
. Then,
is said to be surjective if
, i.e. for every
in the codomain of
, there exists at least one element
in the domain that is mapped onto it.
Surjectivity rules out issues like the one we faced with in the example above. Note that next to the mapping rule of
(
), surjectivity crucially depends on the set
we choose to define
. Consider e.g.
where
. Is
surjective? It depends! If we define
, i.e.
, then any
does not have an
for which
, so that
is not surjective. On the other hand, if we set
, then for any
there exists
, and
is surjective! This principle holds true more generally: given the domain
that we consider, we can simply define
to “throw out” the values not mapped onto by
and ensure surjectivity (of course, in defining the inverse function, we must then pay special attention to restricting its domain to
). As we have seen, surjectivity is the first requirement satisfaction of which tells us that we can find elements in
to map
onto when contemplating existence of the inverse function. Now, we just have to know whether the element in
is unique — enter injectivity:
Definition: Injective Function.
Let for some sets
and
. Then,
is said to be injective if
, i.e. every two different elements in
have a different image under
.
For the inverse function, injectivity rules out that for an , we have two different elements
so that
, like
and
in our example above. Coming back to the example of the square function, is
injective? Clearly not: e.g.
, so that
. Thus, it may also depend on the domain that we consider whether we can invert a given function — setting e.g.
achieves also injectivity because for
, if
then also
. Thus, if
is defined on
(rather than
), we can invert
on
(rather than
) as
. Then, indeed for any
,
.
In terms of language, sometimes, we also call surjective functions onto, because they map onto the whole space , and injective functions one-to-one, because they map every one element in
to one distinct element in
. Before moving on, a last definition:
Definition: Bijective Function.
Let for some sets
and
. Then,
is said to be bijective if
is injective and surjective.
Clearly, if we have inverted to the function
, then the function
is also invertible with
. This allows us to conclude:
Definition: Inverse Function.
Let for some sets
and
. Then, the function
such that
and
is called the inverse function of
. We write
.
Theorem: Existence of the Inverse Function.
Let for some sets
and
. Then, the inverse function
of
exists if and only if
is bijective.
Indeed, the proof of this theorem is really simple – it does nothing but put our verbal elaborations into a more formal mathematical argument. If you are curious, you can find it in the companion script.
The considerations above reveal when a function is invertible, but do not address how we compute the inverse function if it exists. Formally, we are looking for the composition that we need to apply to
to arrive at the initial value again, i.e.
such that
for all
in the domain of
. For univariate, real-valued functions as the square-function we discussed in our example, this is relatively straightforward as we know a variety of inverse relationships –
and
,
and
,
and
, etc. For multivariate functions, things get more tricky. There is, however, one exception where we can easily compute the inverse function: a function
of the form
with a square, invertible matrix . In this case, it is easily seen that the inverse function must be characterized by
, as then,
Before moving on, be reminded again to not confuse the inverse function with the preimage of a set
,
!! The latter quantity is always defined, but captures a fundamentally different concept – it is a set potentially containing a multitude of values (or none at all), whereas
is a function and
a concrete value in its codomain, existence of which however depends on bijectivity of
.
In this subsection, we consider two elementary properties functions can have: convexity and concavity. We restrict attention to multivariate real-valued functions, i.e. those functions that may take vectors as arguments but map into real numbers. The properties’ importance stems from optimization and will thus be emphasized in the next chapter.
First, be reminded of the definition of a convex set from chapter 1:
Definition: Convex Set.
A set is said to be convex if for any
and any
,
.
It is the set that contains any convex combination of its elements. Verbally, a set is convex if for any two points in the set, the line piece connecting them is fully contained in the set. To develop a bit more intuition for this concept, consider the following figure and determine which of the illustrated subsets of the are convex (think especially about the intuition of the line):
Definition: Convex and Concave Real Valued Function.
Let be a convex set. A function
is convex if for any
and
,
Moreover, if for any such that
and
,
we say that f is strictly convex. Moreover, we say that is (strictly) concave if
is (strictly) convex.
Note that the definition of a concave real-valued function also requires that the function be defined on a convex domain — i.e. a set which satisfies
. For the most frequent cases,
and
, this is extremely straightforward to verify, and nothing you need to be scared of, but it should be kept in mind nonetheless. We require this in the definition because else,
is not always defined, and we can not judge on the inequality defining convexity/concavity. The definition of concavity using
may be a bit awkward, to check concavity, you can equvialently consider the defining inequalities
and for strict concavity
For univariate real-valued functions, you are likely well-familiar with the graphical representation of these concepts: note that all points with
lie on the line segment connecting
and
. Then, convexity (concavity) states that the graph of
must lie below (above) this line segment everywhere between
and
. This relationship is illustrated in the figure below.
When considering functions with multiple arguments, the conceptual idea is similar, yet graphically more challenging to display. Let us have a look at a simple convex function defined in , say,
. It is composed of two strictly convex univariate functions (because the square function is strictly convex, this can be easily verified using the intuition from the figure above). Indeed, we can formally show that it is strictly convex according to our definition introduced above. Let’s see how the graph of this function looks:
The graph of illustrated here,
lies in
. Recall that we consider real-valued functions that map to
, and that the codomain of
corresponds to the third, or vertical dimension in the plot.
Like with the univariate function, for any two points , i.e. points in the domain of
, we want to consider the line
and investigate whether on this line, the function lies “below” the line segment connecting and
. The challenge in graphical representation added by multiple dimensions is that the line
does not live in the same space as the domain anymore: for the univariate functions considered above, the domain was a subset of
with dimensionality 1, as was the line in the domain connecting any two points
and
. Here, however, the domain is a subset of the
with dimensionality 2, and the line
still only extends along one direction, that is, the vector
, and is therefore still of dimensionality 1.
This suggests that to graphically represent convexity of multivariate functions, for any two candidate points and
, we have to restrict the plotted domain to 1 dimension. Intuitively, this dimension must capture the directionality of the line, and allow further extension to the left and right in order to get a plot like in the univariate case. This is indeed what we do: given the candidates
and
in the domain of
, we restrict the plotted domain to
Clearly, this domain features only one direction along which it extends: . This is precisely the directionality of the line piece we consider. To see the uni-dimensionality more directly, note that we can define the restriction of
on
as
where points on the line correspond to
for
.
A graphical illustration of this technical procedure looks like this:
Clearly, we see that once the function is restricted to the uni-dimensional domain, our graphical investigation of convexity works as usual – we just need to judge upon the convexity of the restricted function as defined above! Note, however, that the restriction is possible only after picking the points
and
! For investigations of convexity, we need to consider any possible combination of
and
, and most of them will give rise to different directionalities
and therefore different uni-dimensional restrictions and different plots!
In other words, (strict) convexity of a function corresponds to the scenario where for any
, the restricted function
is strictly convex. As the following theorem shows, this conclusion holds not only intuitively but also formally:
Theorem: Graphical Characterization of Convexity.
Let be a convex set and
. Then,
is (strictly) convex if and only if
such that
, the function
is (strictly) convex.
The proof, given in the companion script, can be helpful for you if you feel the need to practice dealing with formal investigations of convexity; the approach taken there is very similar to the majority of convexity proofs you come across in economic coursework.
Now that we have a proper idea of how convexity (and concavity as its opposite) looks like in more general vector spaces or respectively, for general multivariate real-valued functions , we move to some related but weaker concept: quasi-convexity, with the natural opposite quasi-concavity. The reason is that for many applications, requiring convexity in the narrow sense as discussed above is too restrictive: consider the example monotonic transformations. An increasing transformation of
is
such that the function
is increasing, i.e.
. A decreasing transformation is the opposite, where
. Strict versions with strict inequalities also exist. See also the definition of a monotonic function in the introductory chapter. Then, for a monotonic transformation of an initially convex function, it is not guaranteed that the resulting function will also be convex. As such, the narrow range of functions convexity (and concavity) applies to restricts our ability to perform general functional analysis. The appealing aspect of considering quasi-convexity instead is that while applying to a much broader class of functions, it preserves most of the convenient properties of convex functions that we are interested in.
As you will see in the next chapter, the convexity of the upper-level set (for concave functions) and convexity of the lower-level set (for convex functions) are the specific characteristics of concave and convex one would wish to preserve. As multivariate convexity and concavity can be reduced to univariate ones, let us consider these concepts for the univariate case. For what follows, note that a subset of the real line is convex if and only if
is an interval, i.e. if there are
such that
,
,
or
(cf. the definition of a convex set above).
Definition: Lower and Upper Level Set of a Function.
Let be a convex set and
be a real-valued function. Then, for
, the set
is called the lower-level set of at
, and
is called the upper-level set of at
.
To understand what follows, it is crucial to note that both the lower and the upper level sets at any level are subsets of the domain of
collecting arguments of the function that satisfy the respective level restriction (
or
) – in a two-dimensional plot, they correspond to collections of points on the horizontal axis!
If one considers a convex function and draws a horizontal line (a “level” line), the lower-level set of the function at this level, i.e. the set of elements in the domain with an image below this line, is convex. Similarly, if one considers a concave function and draws the level line, the upper-level set of the function, containing those elements
in the domain with an image above this line, is convex. The following figure, plotting a strictly convex function left and a strictly concave function right, illustrates this relationship graphically:
Quasiconvexity and quasiconcavity are precisely defined so as to preserve these two characteristic properties (and only these):
Definition: Quasiconvexity, Quasiconcavity.
Let be a convex set. A real-valued function
is called quasiconvex if for all
, the lower-level set of
at
is convex. Alternatively,
is called quasiconcave if for all
, the upper-level set of
at
is convex.
The following is an often more workable characterization:
Theorem: Quasiconvexity, Quasiconcavity.
Let be a convex set. A real-valued function
is quasiconvex if and only if
Conversely, is quasiconcave if and only if
In the spirit of the definitions above, we further have the following characterizations that can sometimes be helpful:
Definition: Strict Quasiconvexity, Strict Quasiconcavity.
Let be a convex set. A real-valued function
is called strictly quasiconvex if
Conversely, is strictly quasiconcave if
When considering quasi-concavity and quasi-convexity, note that we are in fact dealing with a strict broadening of concepts: all convex functions are quasi-convex, and all concave functions are quasi-concave. This stems from the fact that we have defined the concepts from a characteristic feature of convex or respectively, concave functions.
To practice your understanding of this concept, considering the level sets of the functions graphically illustrated below, determine which are convex, concave, quasi-convex and quasi-concave.
Be sure to remember that, as you have seen in the examples, like concavity and convexity, quasi-concavity and quasi-convexity are not mutually exclusive. However, for the “non-quasi” concepts, the only class of functions satisfying both properties are linear functions. Accordingly, we will call a function that is both quasi-concave and quasi-convex quasi-linear. While linear functions are indeed also quasi-linear, they are not the only functions with this property: for instance, as the example above has already hinted at, monotonic functions are another instance of quasi-linear functions! Of course, unlike the specific example you just saw above, monotonic functions can be also convex or concave, consider e.g. or
.
As a final note of caution before moving on, an established result is that convex and concave functions are continuous. This was not the property we wanted to maintain when coming up with our definitions of quasi-convexity, and indeed, there are quasi-convex or quasi-concave functions that are discontinuous, e.g. indicator functions such as .
Wikipedia provides a good explanation of what calculus actually is about (https://en.wikipedia.org/wiki/Calculus, accessed August 03, 2019.):
“Calculus […] is the mathematical study of continuous change, in the same way that geometry is the study of shape and algebra is the study of generalizations of arithmetic operations. It has two major branches, differential calculus and integral calculus. Differential calculus concerns instantaneous rates of change and the slopes of curves. Integral calculus concerns accumulation of quantities and the areas under and between curves. These two branches are related to each other by the fundamental theorem of calculus [stating that differentiation and integration are inverse operations].”
The remainder of this chapter is concerned with introducing you to both of these branches in the context of multivariate functions, with greater emphasis on differential calculus, which surpasses integral calculus in importance in most economic disciplines (an exception is theoretical econometrics) any may indeed be one of, if not the most important mathematical concept an economist should be well-familiar with. This is because not only is differentiation at the heart of our favorite exercise, namely constrained optimization (as discussed in the next chapter), but also, as pointed out already in the introduction to this chapter, fundamentally important concepts such as marginal utility or marginal costs are based upon the derivative.
When introducing the matter of multivariate differential calculus, there are two central issues of interest: how differentiation of functions mapping between vector spaces works, and why. Given the complexity of the latter issue, the former is fascinatingly simple and relatively easy to understand. Thus, one may be tempted to think that for the purpose of the applied mathematician (such as the economist), we only need to know the “how” and need not worry so much about the “why”. This is a premature conclusion for two reasons: first, understanding the “why” promises great returns to one’s overall mathematical skill when it comes to precise formal reasoning, intuition for key concepts of functional analysis, calculus in metric spaces and the multivariate limit concept, notation, and much more. Secondly, it facilitates access to concepts building on the derivative (e.g. Taylor expansions and approximations), as without proper understanding of the mechanics behind differentical calculus, these methods appear to fall from the sky and are harder both to remember and to apply correctly.
On the other hand, the formal justification of multivariate differential calculus is quite an extensive topic, and may, depending on previous experience in formal reasoning (or rather, the lack of it), be quite tough to absorb. Hence, the approach taken in this course is the following:
The classes preceeding your first semester also touches upon the “why”. It will depend upon the availability of time how deep we will be able to dig into this issue. But enough talk about what we are going to do, let’s get started!
As repeatedly done before, let’s start from the most simple case — and the one we are at least somewhat familiar with: univariate real-valued functions where the domain is a subset of the real line:
. In the next step, we will again be concerned with generalization of this concept. Now, when asked what the instantaneous rate of change, or slope, of
at
is, how would you go about to answer this? Don’t think about rules on how to determine the slope, but rather, how to conceptually and formally describe the concept of the slope for a general function
at an arbitrary point
!
One common characterization is that the slope tells us how sensitive is to changes in
, i.e. how much
varies relative to the variation in
. You may have also heard (e.g. from the Wikipedia quote above) that the slope at
corresponds to the rate of change in
associated with an infinitely small change in in the argument — the marginal rate of change in
. But how do we write this down formally? Let’s consider a fixed real change in the argument from
to a fixed
, where “real” means that the argument indeed changes, so that
. Let us define the change as
so that for the new argument
,
Then clearly, is equal to a fixed real number and
, so that
is not “infinitely small”. But this consideration is very helpful because it allows to characterize the relative change of
, i.e. the ratio of the variation in
and the one in the argument when moving from
to
:
(1)
Now, we know the relative change for any fixed change . This suggests that, when concerned with finding the relative change induced by a marginal, i.e. infinitely small variation in
, we should be able to derive it from letting
go to zero in equation (1)! Indeed, this is exactly how we proceed to define the derivative — we just have to be careful about one detail: the expression in equation (1) is always well-defined for fixed
; a limit, on the other hand, is not guaranteed to exist.
Definition: Differentiability and Derivative of a Univariate Real-Valued Function.
Let and consider the function
. Let
. If
exists, is said to be differentiable at
, and we call this limit the derivative of
at
, denoted by
. If for all
,
is differentiable at
,
is said to be differentiable over
or differentiable. If
is differentiable, the function
is called the derivative of
.
Note the following crucial distinction: the derivative of at
,
, is a limit and takes a value in
, i.e. it is a real number. On the other hand, the derivative of
,
, like
is a function that maps from
to
!
To summarize in words what we just did, we defined the derivative by first looking at a fixed change and then studying what happens to
if
becomes infinitely small. If (and only if) we arrive at the same, well-defined rate
regardless of how we let
approach
, this rate of marginal change is unique and well-defined, and we can use it to infer on the function’s behavior at
. This concept is extremely helpful because it allows us to study the local behavior of functions (i.e. in small neighborhoods around fixed points
) even if we can not graphically represent them anymore — hence, it lets symbols and equations become our eyes when we can no longer draw the objects we are interested in!
Above, we had already seen two levels of concepts at the heart of differential calculus: a derivative of a function at a point in the domain (real number) and a derivative function mapping points onto the derivative of the function at them. The third, and highest level concept is the derivative operator, a mapping between function spaces, associating the function with its derivative
. As the derivative operator can associate a derivative function only with functions
that are indeed differentiable, its domain corresponds to the set of once differentiable functions:
Definition: Set of times Differentiable Functions.
Let . Consider a differentiable function
. If its derivative
is also differentiable, we say that
is two times differentiable, and call the derivative
of
the second derivative of
. In analogy, we define
, the
-th derivative of
, recursively as the derivative of
, the
-st derivative of
. If
exists, we call
times differentiable. For any
, we define
as the set of univariate real-valued functions with domain that are
times differentiable. Moreover, we define
as the set of k times continuously differentiable functions, i.e. times differentiable functions with continuous
-th derivative
.
If , we write
and
.
Now we know the domain of the differential operator. As for the derivative, we impose no restrictions but its function property, the codomain of differential operator is simply the space of all functions mapping from to
, which we denote as
:
This gives us everything we need to define the differential operator:
Definition: Differential Operator for Univariate Real-Valued Functions.
Let . Then, the differential operator is defined as the function
where denotes the derivative of
.
Take the time to appreciate what this means. While the definition appears rather straightforward, it encompasses two details that are frequently missed even in advanced textbooks and university lecture material, but that anyone claiming to have a proper command of mathematics should be well-aware of. First, the derivative and the differential operator are not the same thing, indeed, they are fundamentally different functions, because one maps between function spaces and the other between sets of real vectors. The precise relationship is that the derivative of a specific function (in the domain
of the differential operator) is a specific value (in the codomain) of the differential operator!
Secondly, you frequently see the expressions
(2)
Without further explanation, note that these quantities are not defined! So what should we make of them? Despite their frequent use, things are actually a bit tricky; the deliberations to follow give a thorough discussion. If you care less about these technical details, it is fine if you just memorize the take-away, as summarized below.
The formally correct way of writing the mapping rule of the derivative function, i.e. the rule of the function
, is:
This states that we first evaluate the differential operator at to obtain the derivative
:
; the resulting function is then evaluated at
. Because this looks a bit weird, one commonly writes
, so that the first expression in equation (2) represents as a justified notational convention for evaluating the derivative function at specific points
, or respectively, when
is interpreted as a variable argument, as the mapping rule
of the derivative function.
The second expression in equation (2) is supposed to refer to the same object. However, this is arguably problematic: is not a function (like
), but rather a specific value in
, or alternatively, a description of the mapping rule
of
, as e.g. in
, and thus not an element in the domain of the differential operator! As such, this notation blurs the lines between the concepts of functions and mapping rules, and may easily cause conceptual misunderstandings. Hence, this course will not make use of this notation.
Still, in concrete applications, it may be more convenient to work with the mapping rules directly as a reduced form representation of the function , especially when it is clear what the domain
is. For instance, suppose that we want to compute the derivative of
. Here, we know that the derivative of
will, like
, have domain
, and to fully characterize it, we only need to compute its mapping rule. We write this rule as
, where
is the mapping rule of
. Then, it is formally correct to write:
To summarize, in practice,
Generally, to express yourself as unambiguously as possible, you are well-advised to omit the argument completely if possible, and in any case to not write it in the numerator of the derivative expression.
To get used to this (perhaps less familiar, but in advanced texts more prominent!) way of handling derivatives, as an exercise, let us re-state the central rules for derivatives of univariate real-valued functions we considered in Chapter 0. Before doing so, because this is the first time we formally deal with function spaces, we first need to define the concept formally and introduce the basis operations we will use.
Theorem: Basis Operations in Function Spaces.
Let be a set of functions with domain
and codomain
. Then, the usual basis operations “
” and “
” are such that
and
If is closed under these operations, then
constitutes a vector space.
As a technical note, this is a theorem rather than a definition because it asserts that closure under the operations just defined is sufficient for the vector space property. Further, in this space, vector multiplication is defined as
and vector division as
Note that all objects considered above (,
,
and
) refer to functions, and the defining statement refers to a property the function has to satisfy at every value in the domain.
Now, for the derivative rules:
Theorem: Rules for Univariate Derivatives.
Let ,
and
. Then,
Note that all expressions in the theorem are sums, products and compositions of \underline{functions} and therefore functions themselves! To make this distinction even more clear, Let us re-consider the theorem for derivatives at specific points, which are no longer functions, but values in — carefully pay attention to where the argument
is put: never in the numerator of the differential expression!
Theorem: Rules for Univariate Derivatives at Specific Points.
Let ,
and
. Let
and suppose that
and
are differentiable at
. Then,
Before moving on, make sure that you are thoroughly familiar with the three conceptual levels of differential calculus and know the differences between them (here summarized from highest to lowest level):
Now that we are hopefully on the same page about the concept of derivatives, let’s review the most interesting properties of differentiable functions, i.e. let’s look at how precisely we can derive insight about the local behavior of a function from computing the derivative. Let be a function with domain
and codomain
. Maintaining these three properties will be one key objective of our generalization to the multivariate case.
The existence and value of the derivative of a function gives us three important pieces of information about
:
1. If is differentiable at
, then
is also continuous at
.
To see this, recall that in Chapter 0, we characterized continuity at by the property
! Now, consider the derivative of
at
,
This gives
where the third equality follows from the product rule of the limit. Because is a constant,
, and the equation above becomes
2. If is differentiable at
, then there exists a “good” linear approximation to
around
(called the Taylor Approximation).
We like linear functions because they are simple and we know how they work. Unfortunately, it is not likely that the functions involved in our applications be linear. If a given function is too complex to handle but differentiable around a point of interest, a good solution is often to work with a local linear approximation to the function rather than the function itself, and rely on the following result that ensures that when is “close enough” to
, the linear approximation based on the derivative is “sufficiently good”: let’s make no further restrictions on
than assuming differentiability and else allow it to be any (arbitrarily complex and potentially erratically-behaving) function. Consider the following approximation to
at
:
the so-called Taylor-Approximation to at
of order
(because we only take the first derivative). This expression is a linear function with the known values
as intercept and
as slope, with the difference to the point of investigation,
, as the variable argument. Now, what do we mean when we say that this is a “good approximation around
“? Denote by
the error we make when approximating
using
at
. Because
is an arbitrary function, when
is far away from
, this error may be quite large — however, as we approach
, the error becomes negligibly small relative to the distance
! Formally:
where the limit result follows from the definition of the derivative.
The graphical intuition is illustrated in the figure below, where the point is too far away from
for approximation quality to be decent, but
is close enough to
such that the Taylor approximation and the true function almost coincide — note especially that
is much smaller than
.
In practice, however, we usually don’t know how close is close enough — the Taylor approximation is just a limit statement for moving infinitely close to the point , and for specific functions, even at small but fixed distances
, the difference may be quite large, so treat this result with a grain of caution.
Note also that, if is more than once differentiable, we can get an even better approximation by taking higher order derivatives, and computing the Taylor approximation as a polynomial of higher degree — the higher the polynomial order, the more flexible the function and the better the approximation (or instance, our graphical example looks rather close to
, so that a second order approximation should fare much better on a wider neighborhood of
). Because this concept will be helpful repeatedly, let’s take the time to consider the formal definition.
Theorem: Taylor Expansion for Univariate Functions.
Let and
where
, i.e.
is
times differentiable. Then, for
,
, the Taylor expansion of order
for
at
is
where is the approximation error of
for
at
, and
denotes the faculty of
. Then, the approximation quality satisfies
. Further, if
is
times differentiable, there exists a
such that
Some things are worth noting: (i) in contrast to the Taylor approximation for
at
, the Taylor expansion is always equal to the function value
because it encompasses the approximation error as an unspecified object, (ii) when considering small deviations from
rather than arbitrary points
, it may at times be more convenient to express the expansion directly in terms of the deviation
rather than
:
and (iii) , says that higher
yield better approximations because
goes “faster” to zero the larger
is. To see this, consider a small
, e.g.
, and compute
,
,
etc. Thus, when considering larger
, we can divide the error by ever smaller numbers and still have convergence to zero — indeed, as
, because
for any
, the Taylor approximation of infinite order
yields perfect approximation so that
!
An immediate corollary, and nonetheless a very useful one, of the Taylor expansion theorem is the following:
Corollary: Mean Value Theorem.
Let and
, i.e.
is a differentiable function. Then, for any
such that
, there exists
such that
The interested reader can check the companion script to see that this is indeed a direct implication of the Taylor expansion theorem. We will see later that this theorem is incredibly helpful for establishing existence of (local) maxima and minima that satisfy a first order condition — indeed, you can already see here that all we need is two different points in
with the same value for the differentiable function
.
3. If is differentiable on the interval
, then
This fact relates the numeric value of the derivative to the graphical shape of the function’s graph, helping us – like, in fact, points 1. and 2. – to get an idea of how the function looks like around some point without referring to the graphical illustration. As initially stated, this is the key motivation for us wanting to preserve these properties: like with the basis operations in the vector space, we strive to ensure that the objects we can no longer visually grasp behave in a fashion “similar to” those we can, allowing us to think more intuitively about highly abstract mathematical objects.
Note that 3.3. only provides a sufficient condition for strict monotonicity. As we will also see in the next chapter, there are strictly monotonic functions (e.g. on
) that have a zero derivative at some points in their domain.
Now that we have convinced ourselves that derivatives of univariate functions are extremely helpful in characterizing them when they exist, we are concerned with transferring these concepts and their intuitions to multivariate functions. In fact, this is all that is left to do for a complete investigation into multivariate differential calculus — next to generalizing the Taylor theorem to higher orders and multivariate functions!
If you feel like testing your understanding of the basics of functional analysis and univariate differential calculus discussed thus far, you can take a short quiz foundĀ here.
The last section has given an extensive discussion of the formal nature of the univariate derivative and its usefulness for functional analysis. Our next step to be taken is to look at how this concept generalizes to multivariate and vector-valued functions. In doing so, we will go over the very central equations motivating this generalization here, but leave all thorough formal discussion to the companion script.
In the following, let us consider multivariate, but still real-valued functions, that is, functions of the form
You are already familiar with many examples, e.g. the scalar product, vector norms, or utility or cost functions with multiple goods/inputs.
Now, how can we generalize the concept of the derivative? The fundamental issue is that unlike with the , when considering the
, we can move away from
in multiple directions (recall that the
has
fundamental directions), and without specification of the direction, it is ex ante ambiguous what precisely we mean by a “small change”. Still, you will see shortly that this issue can be resolved with only a slight conceptional twist.
To define the multivariate derivative, we start from the univariate case. Recall that we call the derivative of
at
in the domain of
if
Our conceptual problem now is that when the domain is multivariate, i.e. and
, then there is no clear notion of what we mean with “
“, as explained in the paragraph above. So, how can we rephrase this statement to something that does generalize to the
? Note that we can re-write the equation above as
(3)
A key fact that we can use now is the following:
Proposition: Continuity of the Norm.
Consider the normed vector space where
is a real vector space. Then, the norm
is continuous.
While sounding rather abstract, this fact is actually strikingly intuitive: recall the general definition of continuity of a real-valued function at
in its domain
:
where we use the absolute value distance as the natural metric of ‘s codomain, the real line. Verbally, continuity says that if two arguments
and
considered don’t lie far apart, the function values don’t lie far apart either. Consider now the scenario where
. Clearly, for two vectors
and
to lie “close” to each other, a necessary condition is that they are of similar length – which is precisely what is measured by the norm of the vectors, which need to be “close” to each other as well in consequence!
To see how this helps us, recall that we can pull the limit in and out of any continuous function. Thus, the proposition above tells us that it can especially be pulled in and out of norms. Recall that on , the absolute value
is a norm (in fact, it is the norm that induces the natural metric of the
,
!), and hence, the continuity proposition applies to it. Further, the norm property ((i): non-negativity) yields that for
,
. Therefore, equation (3) is equivalent to
where the second step follows from norm continuity and the third because for any ,
.
Finally, because is a norm on
, we can argue (see the mini-excursion below) that the expression derived above is indeed equivalent to
(4)
where and
are any two norms on
. This expression appears much more suitable for generalization: regardless of the dimensionality of
and the codomain of
, the norms in the numerator and denominator will continue to be real-valued, such that the fraction exists also for higher-dimensional functions.
To summarize the deliberations above, we started from equation (3), an equivalent way to define the univariate derivative. The difficulty in generalizing this expression to the multivariate case was division by a term that would become a vector. With equation (4), we found an equivalent way to express the derivative, where now, we have norms in numerator and denominator, and these norms will always be real-valued, regardless of whether we use real numbers or vectors as inputs. Because all of the reasoning relied on equivalent representations, there is no loss in generality when working with equation (4) rather than equation (3) or respectively, the initial definition of the univariate derivative!
Mini-Excursion: The last step is without loss of generality!
You would be very right to argue that the last step, going from the absolute value as a specific example of a norm on to the general norm notation is quite a stretch – even if the absolute value induces the natural metric of the
, there are of course a great variety of other norms we could potentially use. So is the expression in equation (4) with arbitrary norms on
really equivalent to the representation with the absolute value?
Yes! This is due to the fact that for any arbitrary norm that we can come up with for the
, there will be a positive constant
such that for all
, it holds that
. Then, plugging this into equation (4), we just need to multiply both sides by
to arrive at the statement in absolute values.
But why does this hold? This relationship is ensured by the absolute homogeneity of the norm: note that for any
(this results from viewing
as the
in the norm definition), so that for any norm
on
, there exists
such that
. Hence, the issue of defining a norm on
can actually be reduced to defining the “baseline magnitude” that the norm assumes at
.
Therefore, not only when replacing by a general norm, but even if we use different norms in the numerator and denominator, say
and
, respectively, we get constants
so that
Hence,
and one expression is equal to zero if and only if the other is as well.
With the equivalent representation of the univariate derivative given in equation (4), we are (almost) set up for generalizing the derivative concept. For the case of real-valued functions , we can stick with the absolute value as the norm in the numerator of the limit expression defining the derivative as absolute value continues to be an appropriate norm for the codomain
of
, whereas in the denominator, we make use of a norm for the domain (e.g. the Euclidean or a p-norm, but any other works as well). This inspires the following definition:
Definition: Multivariate Derivative of Real-valued Functions.
Let ,
and
a norm on
. Further, let
be an interior point of
. Then,
is differentiable at
if there exists
such that
Then, we call the derivative of
at
, denoted
or
. If
is differentiable at any
, we say that
is differentiable, and we call
the derivative of
.
Because is no longer a scalar but now a vector, to be able to subtract
from
, we require
to be uni-dimensional, and therefore,
must now be a row vector of length
. The only thing that we don’t know yet is how we let a norm go to zero, that is, how precisely we can understand
.
Formally, we call the limit
of a function
if
That is, we consider a ball of radius around
, the only element of
with
, and study the function’s behavior on this potentially very small ball, which however covers vectors in any of
‘s directions — this solves the issue of approaching
only from a single direction! Moreover, as a more technical note, you can see from the definition that the derivative is only defined on the interior of
. The simple reason is that for any
int
, for every
, there exist points in
where
is not defined and the limit can not exist by definition.
This concludes our definition of the derivative of multivariate real-valued functions. Still, we started out from investigating functions mapping between vector spaces. Hence, the next step is to consider the general case of ,
, as the codomain of our function. Fortunately, although the notation gets a bit more intense, the mechanics are exactly the same as before: consider again our “generalization equation” (4). Because we still consider arguments of length
, i.e. a domain
, nothing changes in the denominator. However, because
is now vector-valued, the object fed into the numerator norm becomes a vector of length
, and we have to use an appropriate norm here as well. Hence, we define the general multivariate derivative as follows:
Definition: Multivariate Derivative of Vector-valued Functions.
Let and
. Further, let
be an interior point of
. Denote
as a norm of
,
. Then,
is differentiable at
if there exists a matrix
such that
Then, we call the derivative of
at
, denoted
or
. If
is differentiable at any
, we say that
is differentiable, and we call
the derivative of
.
Again, the dimensions of the derivative – which is now a matrix – are given by the dimension conformity requirement in the numerator, that is, by the fact that we need to be able to subtract from
.
Having defined the derivatives formally, it is a good time to make ourselves aware of the three conceptual levels of differential calculus in the context of general derivatives of multivariate, potentially vector-valued functions: Consider the class of differentiable functions with domain
and codomain
, a subset of the set
of all functions
. Then, the fundamental concepts relevant to differential calculus are
Mini-Excursion: Two Notations for the Differential Operator.
You have seen that the definitions above use the notations and
for the differential operator interchangeably. While both of these are quite common, classical mathematical textbooks are more prone to using
as the notation for the derivative of a multivariate function
at
in its domain. However, you may think of this object as the same thing as
, it is the exact same concept! The reason why textbooks may hesitate to write the multivariate differential operator as
is because changes
in the denominator refers to instantaneous variation in a multivariate object, and we don’t know how to divide by vectors. Then again, as we will see below, the notation
for taking the partial derivatives with respect to multiple elements
of
(but not all, i.e.
, thus the “partial” operator
), is widely accepted. Long story short, if the new notation
confuses you, be clear that it is nothing else but (the generalization of) the regular derivative of
at
,
, and sticking to it also in the multivariate context is perfectly fine.
We have spent quite some effort on discussing how to conceptually think of a multivariate derivative: we know the condition a candidate vector or matrix
must satisfy in order to be called the derivative of a real- or vector-valued function at a point in its, respectively. Yet, in practice, we care less about characterizing the derivative in such an abstract way, but rather want to explicitly compute the function and evaluate it at certain points! So how do we go about this? This is where the online course takes its shortcut: we will not discuss why we may proceed as we do, but just how to compute multivariate derivatives. As already mentioned, the interested reader may find a comprehensive treatment of the mathematical background in the companion script.
First, we need a bunch of definitions:
Definition: Partial Derivative.
Consider a function where
, and let
. Then, if for
, the function
is differentiable at , we say that
is partially differentiable at
with respect to (the
-th argument)
, and we call
the partial derivative of
at
with respect to
, denoted by
or
.
Again, note the three conceptual levels: the operator , the function
and the value
. Verbally, the
-th partial derivative looks at the function
, considers all arguments but the
-th one fixed at
, that is, it treats them as constants, and takes the univariate derivative with respect to the
-th argument. Intuitively, it can be viewed as the description of how
varies along the
-th fundamental direction of the
starting from
. If we collect all these partial derivatives in a row vector, we call the resulting object the gradient:
Definition: Gradient.
Consider a function where
, and let
. Then, if for all
,
is partially differentiable with respect to
at
, then we call the row vector
the gradient of at
. If for all
and for all
is partially differentiable with respect to
at
, then we call the function
the gradient of
.
Also the gradient has an operator, function and value level. To get some feeling for the gradient concept, let’s consider some examples:
and
Recall that the -th partial derivative is obtained from differentiating the function as if it had only one variable argument, namely
, and try to compute the gradients of
,
and
.
Continue reading only if you are done with the exercise above, as the following paragraph includes a discussion of the results one obtains.
As you see, the partial derivatives can include none, some, or all of the components of the vector ! Generally, the point to be made is that simply because we are taking the derivative into one direction (say
, the other components (here:
, more generally,
) do not drop out because they may interact non-linearly in
! Only if terms containing
are strictly linearly separable (e.g.
or
in
), they will drop out when taking the partial derivative with respect to a different
,
.
Verbally, the gradient value summarizes how
extends along all the fundamental directions of the space
around the point
. As such, it offers a complete characterization of the instantaneous variation
exhibits around
, and is intuitively in line with what we would view as a suitable candidate for the derivative of
at
. Indeed, the following relationship holds:
Theorem: The Gradient and the Derivative.
Let and
. Further, let
be an interior point of
, and suppose that
is differentiable at
. Then, all partial derivatives of
at
exist, and
.
As we have ignored the tedious steps that bring about this result, it may not appear too special on first sight. But thinking about it, it is actually quite impressive: we have started from an abstract object that we called a “multivariate derivative” but could only characterize through some equation, and were far from computing it. Then, we looked at the next best thing we could actually compute, namely the univariate partial derivatives of , collect them in a vector, and hope that this object, intuitively capturing the intuition we are after quite well, meets the formal requirement of the abstract derivative object we initially wanted to pin down. And it indeed does!
As a technicality, the theorem contains the condition that is differentiable at
. How do we verify this? A quite straightforward sufficient condition is the following:
Theorem: Partial Differentiablility and Differentiability.
Let and
. Further, let
be an interior point of
. If all the partial derivatives of
at
exist and are continuous, then
is differentiable at
.
Definition: Set of Continuously Differentiable Functions.
Let and
. Then, we define
as the set of continuously differentiable real-valued functions over . If the context reveals that the codomain of considered functions
is equal to
, it is appropriate to use the alternative notation
.
Note that the condition “” also implicitly asserts that the
-th partial derivative of
at
exists at any
as else, the partial derivative function would not exist and could especially not be continuous. Then, the Theorem above tells us more compactly that “if
, then
is differentiable.” As the theorem only provides a sufficient condition, there is still room for exceptional cases where not all partial derivatives are continuous, but the function is still differentiable. However, we rarely deal with such functions in economics, such that this technicality should be of little concern to you.
Now that we know how to differentiate multivariate real-valued functions, the last step is to consider vector-valued functions. Indeed, we already more or less know how to do it! The key insight here is that we can re-write the vector-valued function as a collection of real-valued functions that we know how to deal with: note that we may write as
(5)
where for any ,
is a multivariate real-valued function. Let’s see how we stack the partial derivatives of all these functions into a collecting object:
Definition: Jacobian.
Let ,
and
and for
, let
such that
. Let
. Then, if at
,
,
is partially differentiable with respect to any
,
, we call
the Jacobian of at
. If the above holds at any
, we call the mapping
the Jacobian of
.
Note again the three conceptual levels: the Jacobian operator , function
and value
. As with the gradient, this object is an intuitive candidate for the derivative: it summarizes how any component function
varies along all the fundamental directions of the
starting from
. Again, it turns out that the Jacobian precisely meets the theoretical requirements of the derivative:
Theorem: The Jacobian and the Derivative.
Let ,
and
such that equation (5) holds. Further, let
be an interior point of
, and suppose that
is differentiable at
. Then, for any
,
, all partial derivatives of
at
exist, and
.
While the gradient and the Jacobian may look intimidating at first, they are nothing but mere collections of partial derivatives, i.e. they describe rules for how we order them when presenting them together. This means that once you have understood firmly what a partial derivative is, these concepts are indeed rather straightforward to grasp, so don’t let the intense notation fool you!
To conclude our discussion of the first multivariate derivative, note that the rules you are well-familiar with (linearity of the derivative, product rule and chain rule) go through also for the multi-dimensional case. An exception is the quotient rule, because the quotient of two vectors is not a well-defined object. That being said, we need to apply some care to ensure concerning the order of derivatives, because unlike with real-valued functions where e.g. , recall that matrix multiplication is not commutative! Thus, be sure to respect the order in which the differential expressions appear in the following theorem:
Theorem: Rules for General Multivariate Derivatives.
Let ,
and
. Suppose that
and
are differentiable functions. Then,
At the product rule, note that the prime indicates transposition and does not refer to our notation for the derivative, which is reserved for the univariate context!
As an example for the chain rule, consider where
and
. Then, what is
? Note that we can write
where
and
. Albeit somewhat tedious, taking the partial derivatives of
is rather straightforward, and you should arrive at
. For
, you should obtain
. Then, the chain rule tells us that
This way, you have elegantly avoided dealing with squared expressions in taking the derivative directly.
This result could have also been derived from the product rule with and
so that
, i.e.
. Then, we can rather straightforwardly compute that
and
, and the product rule gives us
To test your understanding, try to apply the product rule again with , this should in fact be the simplest way of obtaining this result.
Now we know how to take the first derivative of general functions. But what about higher order derivatives? You may have grasped that when starting from a function where
, then taking the derivative comes with an increase in dimension: while for an
,
,
, and for
where
, the derivative
is already a matrix in
. Because we do not touch multi-dimensional matrices here (i.e. spaces of the form
), which you indeed never come across in regular economic studies, this puts a natural limit to the derivatives we consider here: the first derivative for
, which you already know from the last subsection, and the second derivative for
. Like with univariate functions, it can be obtained from taking the derivative of the first derivative, provided that it exists.
Definition: Second Order Partial Derivative.
Let be an open set and
. Further, let
, and suppose that
is differentiable at
. Then, if the
-th partial derivative of
,
is differentiable at
, then we call its
-th partial derivative at
the (i,j)-second order partial derivative at
, denoted
.
Higher order partial derivatives are defined in the exact same way, so that e.g. the fourth order derivative of
at
is
. By requiring
to be an open set, we ensure
int
so that it has only interior points. Recall that like the function
, the partial derivatives
are functions from
to
, so it makes indeed sense to think about their partial derivatives. As an example, take the function given by
with gradient
Should you need practice with the gradient concept, try verifying that this expression is correct. Before turning to the second order partial derivatives of this function, let us first study how we need to order them to obtain a second derivative, and make ourselves familiar with a very powerful rule in computing them.
Definition: Hessian or Hessian Matrix.
Let be an open set and
. Further, let
, and suppose that
is differentiable at
and that all second order partial derivatives of
at
exist. Then, the matrix
is called the Hessian of at
.
Note again the three conceptual levels: Hessian operator, function, and value. Requiring to be open ensures that
, which must be the case for any differentiable function.
Now, remember the set that we defined for
to indicate that all partial derivatives of
exist and are continuous? Analogously, we can define more generally:
Definition: Set of k times Continuously Differentiable Functions.
Let and
. Then, we define
as the set of k times continuously differentiable real-valued functions over . If the context reveals that the codomain of considered functions
is equal to
, it is appropriate to use the alternative notation
.
This concept is useful in the given context because:
Theorem: Schwarz’s Theorem/Young’s Theorem.
Let be an open set and
. If
, then the order in which the derivatives up to order
are taken can be permuted.
Here, permuted means simply to interchange in order, so that e.g. . You can assume that the functions we are typically concerned with are sufficiently well-behaved such that their partial derivatives we consider are continuous, and the order is interchangeable! Nonetheless, if in doubt, continuity of the respective partial derivatives is of course subject to investigation before applying this theorem.
Corollary: Hessian and Gradient.
Let and
. Then, the Hessian is symmetric and corresponds to the derivative, i.e. the Jacobian of the transposed gradient
:
. By its nature as the derivative of the first derivative,
is the second derivative of
at
.
From a technical side, note that is the function that maps
onto the column vector
. The corollary holds because if the second order partial derivatives, i.e. the partial derivatives of the functions in the gradient, are all continuous, then we can take the derivative of the (transposed) gradient
by our sufficient condition for multivariate differentiability studied above. Because the (transposed) gradient is nothing but a vector-valued function, its derivative will coincide with its Jacobian
. However, from the way the second order partial derivatives are arranged in the Hessian
, it follows that these two objects are precisely the same! Note, however, that the Hessian is certainly equal to second derivative only if
because otherwise, the second derivative may not even be defined! Also note that the Hessian is always a Jacobian (of the transposed gradient), but not every Jacobian is a Hessian — be sure to know the distinction between these two concepts.
Finally, let’s put all this knowledge to work. For the function we considered above, given by , compute the second order partial derivatives at a point
to obtain the Hessian at this point (Hint: the second order partial derivatives are continuous and you can exploit symmetry; this should save you 3 computations). Does the second derivative of
exist? If so, what is it equal to?
In the end of the last subsection, we have learned about the second derivative of a multivariate function. Next to its importance in optimization where it determines the nature of an extreme value, where it tells us whether a solution to the first order condition it is a maximum, a minimum, or neither, the second derivative, equal to the Hessian matrix, can also be used to improve over the linear fist order Taylor approximation of a multivariate function by introducing a “squared” term:
Theorem: Second Order Multivariate Taylor Approximation.
Let be an open set and consider
. Let
. Then, the second order Taylor approximation to
at
is
The error approaches 0 at a faster rate than
, i.e.
.
Again, this theorem tells us how around , we can arrive at a “good” functional approximation of an arbitrary
, where “good” again means that the error becomes negligible relative to the squared distance
to the approximation point
as we move very close to it.
What about approximations of other orders? Recall that every time we take the derivative, we increase the order of the object to be considered, so that the first derivative of a real-valued function is vector-valued, and the second derivative, i.e. the derivative of the vector-valued first derivative, is matrix-valued. Accordingly, the third derivative requires three dimensions, the fourth requires four, and so forth. Although it is mathematically possible to define Taylor Approximations of orders also for multivariate functions, this complication is the reason for why applied mathematicians usually stick to approximations of orders
or
.
The approximations of order and
, respectively, are defined in the way we would expect: if
, the first-order Taylor expansion is
and the order approximation is just
.
The multivariate Taylor expansion, adding the error to the Taylor approximation, is also defined analogy to the univariate case. However, for the second order approximation, due to the dimensional complication, an explicit formula (using the third derivative) is harder to come by. For the first order approximation, if , the error is equal to
for a , and thus a direct generalization of the univariate case. This generalization principle holds also for errors of order
approximations if
, where
for a
.
Recall that in the univariate case, we had the mean value theorem as a corollary of Taylor’s theorem. What about the multivariate case? In analogy to before, for a real-valued function , using the order
expansion (approximating at
and using
), we may arrive at
where is a convex combination of
and
. Now, the issue arises that the RHS is a scalar product, and we can not solve for
. Thus and unfortunately, a multivariate generalization of the mean value theorem does not exist, such that we can not as easily derive a sufficient condition for existence of a vector
that sets the gradient to zero using the Taylor approach.
The Total Derivative. An object frequently used in economics is the total derivative. It is instructive to discuss it here, since it is closely linked to Taylor’s theorem. Typically, you will read the two-variable version like this:
(6)
So, how do we make sense of this? And how can we use it to derive insights on the function ? The purpose of this expression is to capture the instantaneous rate of change of
as the the vector of arguments marginally varies in a specific direction
. That is, it characterizes
when considering changes of the form
with a fixed vector
of relative variation of elements in
. Accordingly,
is a function of
and moreover of the direction components
and
! As such, a more explicit way of writing this concept is
Why should we care about fixing specific ratios for the variation in the argument’s components? After all, we already have the gradient which tells us about the variation along all the fundamental directions of the , isn’t that enough? Well, yes and no. In economics, we are frequently concerned with trade-offs so that increasing one argument (e.g. consumption of the first good) can not go without decreasing the other (e.g. consumption of the second good, leisure, etc.), and the exchange ratio is usually exogenously given, at least when fixing the starting point of variation
. The total derivative tells us more directly how instantaneous variations in light of such trade-offs look like! Indeed, it tells us that computing this variation is as simple as multiplying the respective vector of relative variations to the gradient.
The formal justification behind this relationship and some more elaborations can be found in the companion script. Intuitively, you can see that it closely relates to Taylor’s theorem by noting that the RHS of the above equation looks a lot like the middle term you get in the first order Taylor expansion. To see this rather abstract concept in action, consider the following scenario: suppose you care only about studying and sleeping, so that your utility function may be written as
where is the number of pages you read in your favorite economics textbook in a day, and
is the number of hours of sleep you are getting per night. Suppose you are currently getting
hours of sleep and reading
pages. You are thinking about reading just a bit more at the expense of sleeping. Assuming that you can read
pages in
hour, the vector characterizing how you marginally exchange pages for sleep is
. So, how does your utility change as you move towards reading more and sleeping less? Let us consult the total derivative:
Plugging in your current schedule and the variation vector
, you get
Thus, your utility is decreasing, and you should in fact be reading less and sleeping more!
If, on the other hand, you currently manage to read the pages and still get
hours of sleep, and you are more efficient at reading, managing 6 pages per hour, things look different:
The idea we saw above for two arguments can, of course, be generalized to arbitrary functions . Defining the total derivative as a function of the considered location
and the direction of change
, we write
or more compactly
(7)
In economics, these considerations are valuable in theoretical models when we are doing comparative statics, i.e. we consider how some equilibrium state (corresponding to ) and an economic output quantity, e.g.
, marginally responds to exogenous impulses that change economic quantities in fixed ratios, e.g. technology shocks that increase capital productivity twice as much as labor productivity. Oftentimes, as the example just given already hints at, we will not choose these the ratio of changes ad hoc, but assume that they are driven by some background variable, such as technology shocks. If
denotes the level of technology in the economy, we consider
as the outcome, and
are all relevant determinants of GDP, to highlight that the ratios are driven by the technology variable and endogenously depend upon it, we would write accordingly:
(8)
Note that this relationship also directly follows from the chain rule: varies with
, such that
can be written as a function of
:
. Then, the outcome we consider is actually a composite function
, and the chain rule gives
so that with :
Before moving on, a note of caution: equations (7) and (8) look quite similar, indeed, the naive mathematician could think that the latter could be obtained from just dividing the former by ““. Of course, this is in no way a well-defined operation, as
is not a well-defined mathematical object, but arises only from our notational convention for the derivative of
with respect to
,
. Thus, remember that the total derivative works also when the direction of change is determined implicitly through a background variable
, but that this result does not follow by dividing the total derivative by the change
! Moreover, do not extrapolate the total derivative to non-marginal changes! The concept, by its definition, captures an instantaneous variation relying on the Taylor approximation, and for non-marginal changes, the linear approximation is by no means guaranteed to fare well. To illustrate correct and false interpretation in an example, re-consider our reading-and-sleeping utility: here, we saw that moving marginally in direction
(exchange rate 6 units of pages for one unit of sleep) when starting from the status quo of
could increase utility locally around this point. When considering instead the non-marginal change of reading 6 more pages at the expense of sleeping one less hour, one gets
and thus a loss in utility.
You should take away from this:
captures the same concept and follows directly from the chain rule.
Let us re-consider the valuable properties of differentiable functions we initially highlighted for univariate derivatives: while we saw the linear approximation generalization in the previous subsection, the other two points, i.e. continuity and monotonicity, have not yet been addressed.
Indeed, the following generalizes to the multivariate case:
Theorem: Multivariate Differentiability implies Continuity.
Suppose that ,
,
is differentiable at
. Then,
is continuous at
. Accordingly, if
is differentiable,
is also continuous.
However, be careful to note that differentiability and partial differentiability, i.e. existence of all partial derivatives, are not equivalent! Indeed, there may be discontinuous functions where all partial derivatives exist. One such example is
It is partially differentiable with respect to and
with partial derivative
, but it is not continuous in
.
That being said, recall also that partial differentiability with continuous partial derivatives was sufficient for differentiability, so that if all partial derivatives exist and are continuous, then the function itself must be continuous as well.
Next, let us turn to the third important feature (and not yet generalized) of univariate derivatives: they told us whether a given function was increasing, decreasing or constant on some interval. For multivariate functions, this characterization is no longer too meaningful — if the gradient is zero everywhere, then the function will be “constant” in the same sense as a univariate function, but such functions are typically not too interesting. On the other hand, the concept of monotonicity is difficult to transfer because how evolves along one dimension depends on the positions in the other dimension: e.g.
is monotonically increasing in
if and only if
. Thus, the more convenient concept to characterize multivariate functions is convexity, which, as we have seen, can be (more or less) easily generalized to the
case!
Now, the following results for univariate functions will be immensely helpful:
Proposition: Convexity of Twice Differentiable Univariate Functions.
Let be a convex, open subset of
and suppose that
, i.e.
is a twice differentiable univariate function such that
is continuous. Then,
is convex if and only if
:
.
Corollary: Strict Convexity of Univariate Functions.
Let be a convex subset of
and suppose that
. Then, if for any
,
is strictly convex.
Note that we again focus only on interior points where the derivative exists. Verbally, the second derivative, i.e. the “slope of the slope”, gives us an equivalent condition for convexity and a sufficient condition for strict convexity. With some formal effort (see the companion script), we can generalize this concept to multivariate functions. Conceptually, the issue is how we should about statements such as “” or “
” when
is a matrix, for instance the second derivative Hessian matrix of a multivariate real-valued function. It turns out that definiteness of a matrix can be viewed as a generalization of its “sign”! To see why, consider again a real number
. Then, if
, for any
, it holds that
. Similarly, if
is positive semi-definite, it holds that for any
,
. Accordingly, we can derive:
Proposition: Multivariate Convexity.
Let be a convex subset of
and
. Then,
is convex if and only if, for all
,
is positive semi-definite. Further, if for all
,
is positive definite, then
is strictly convex.
For the last bit on multivariate calculus, let us discuss integration — albeit far less extensively as differentiation. The conceptual perspective here is quite the opposite as with differentiation: while thus far, we were interested in the marginal change of a function , we now care about its accumulation in the codomain. Actually, it is quite intuitive that we should consider integration and differentiation as “inverse” operations also in a narrow sense. This is because
is the instantaneous change of its accumulation, i.e. the rate at which the area under it accumulates! Accumulation is also an issue of frequent interest to economists, e.g. when we care about aggregation of (the choices of) individual firms/households to a national economy, or when forming expectations about outcomes, where we aggregate all possible events and weight them by their probability.
Indeed, also more formally, the idea is to construct the integral as an, in an appropriate sense, inverse operator to the differential operator. Recall that the derivative, , is an operator on the space of differentiable functions, mapping functions
onto their derivative
, or, in our notation,
. Now, we ask ourselves: does this operator have an inverse, i.e. can we find
such that
, or respectively, can we, for any function
, find a unique function
such that
? If we again restrict attention to univariate functions for the moment, you are likely aware that this is not possible, for the reason that constants vanish when taking derivative. Thus,
and
, such that the function
has more than one function characterized by the feature we are looking for. In other words, the derivative is not injective, and thus, as we discussed earlier, we can not invert it!
However, similar to the non-invertible (because non-injective) function , we can of course define the preimage of
under the differential operator,
of functions that have
as their derivative, just like we can define
as the pre-image of any value
of
. For the case of univariate functions, you likely know the following characterization:
where is the stem function of
. Recall that the reason for ambiguity in the inverse derivative, or antiderivative, was that constants vanish. Thus, up to said constant, we should be able to uniquely pin down the antiderivative through the function
that does not contain a constant…\ and we indeed can! The object in this equation is called the indefinite integral of
and, in some generalized sense, describes a “function”. Note, however that the expression is a notational simplification for the pre-image of
under the differential operator, and that we describe a set here, rather than an equation.
Before moving on to the well-defined definite integral, check that you are familiar with the following rules for indefinite integrals:
Theorem: Rules for Indefinite Integrals.
Let be two integrable functions and let
be constants,
. Then
Although there is a more formal definition of integrability, it suffice here to understand an integrable function as being a functions whose integral you can compute with the usual rules. Another important rule, which can be thought of as the reverse of the product rule, is integration by parts:
Theorem: Integration by parts.
Let be two differentiable functions. Then,
Remaining with univariate functions for now, we know that a unique function can be attributed to
such that
if we require that
does not contain a constant. For simplicity, let’s focus on convex sets
, i.e. intervals. So, how do we compute
? The idea is quite easy: note that while the antiderivative is not well-defined in general because of the constants
, for any function
, i.e. any function that satisfies
, for specific values
, we have that
Supposing that , this can be used to compute the uniquely defined definite intergral that tells us the accumulation of
from
to
, that is, on the interval
,
Definition: Definite Integral.
Let and consider an integrable function
. Then, the definite integral of
from
to
, is
This gives us the usual rule that you are likely familiar with: to compute , compute the stem function
, and take the difference
. For instance, the stem function of
is
, such that
.
Before moving on, keep the following in mind: the inverse of the differential operator is not generally well-defined. However, any function
in the preimage of a function
under
is characterized by a uniquely defined accumulation between any two points
and
, called the definitive integral of
. Because we like uniquely defined quantities, we mostly restrict attention to this object — indeed, when we care about accumulation as we mostly do when considering antiderivatives, it’s all we need.
Definition: Infimum and Supremum of a Set.
Let . Then, the infimum
of
is the largest value smaller than any element of
, i.e.
, and the supremum
of
is the smallest value larger than any element of
, i.e.
.
These concepts are a helpful generalization of maximum and minimum, and exist under much more general conditions. For instance, for an interval , there is no maximum or minimum, but infimum and supremum exist and are equal to
and
, respectively. I need them for the theorem below to ensure that
always defines an interval
as
,
,
or
), regardless of whether the lower bound is open or closed. Note that we may have
.
Theorem: Fundamental Theorem of Calculus.
Let be an interval in
and
. Let
, suppose that
is integrable, and define
. Then,
is differentiable, and
A slightly informal but – especially given the theorem’s importance – conveniently short and manageable proof can be found in the companion script.
As we did with the derivatives, let us extend the notion of the integral to the multivariate case by first looking at a function mapping from to
. If in the univariate case, the definite integral was measuring an area under the graph, it is now only natural to require the definite integral to measure the volume under the graph. In higher dimensions, we have to go on without graphic illustrations, but the concept of “summing up the function values over infinitely small areas of the domain” remains valid. Also, indefinite integrals should still be considered the antiderivative, but now to the multivariate derivative, and again intuition might fail us here. Luckily, for probably all intends and purposes that you will come across integrals in your master courses, the following theorem will be of practical help:
Theorem: Fubini’s Theorem.
Let and
be two intervals in
, let
and suppose that
is continuous. Then, for any
with intervals
and
,
and all the integrals on the right-hand side are well-defined.
It tells us that when concerned with a multi-dimensional integral, we can integrate with respect to each dimension (or fundamental direction) “in isolation” or rather, integrate in an arbitrary succession with respect to all the single variables. The theorem is pretty powerful as it only needs continuity of the function as a prerequisite, and then allows you to reduce a multivariate integral to a lower-dimensional one! You can also apply the theorem repeatedly if you are faced with higher dimensional integrals, so that
Thus, a scheme applies that is very similar to what we have seen for differentiation of multivariate functions: if the operation can be performed, that is, here if we can integrate the function , so long as
satisfies a continuity condition, then the multivariate version of the operation can be computed by repeatedly applying the univariate concept subject to a certain scheme of ordering!
For a final property, note that linearity of the integral implies especially that we can pull constants with respect to the integrating variable , i.e. any expression
that may depend on arbitrary variables
but not on
, out of the integral, so that
. Thus, we obtain the following corollary of Fubini’s theorem:
Corollary: Integration of Multiplicatively Separable Functions.
Let ,
,
continuous functions. Then, for any intervals
,
,
This follows directly from applying Fubini’s Theorem. Note that and
can be multivariate, so that whenever you can separate a function into two factors that depend on a disjoint subset of variables, you can multiplicatively separate integration! An important economic example is for instance the Cobb-Douglas production function: Suppose that firms’ stock of capital
and labor
are independently and uniformly distributed on
(it is not too important what this means here, it just ensures that the first equality below holds), so that individual level output is
. Here, the theorem for multiplicatively separable variables can help us determine the aggregated output of the whole economy
as
To conclude this section on integrals, take away that (i) the differential operator can generally not be inverted, (ii) the definite integral, referring to the accumulation of a function between to points, can be well-defined nonetheless, and corresponds to the usual integral you are familiar with, and (iii) that like differentiation, we can handle multivariate integration by applying techniques for univariate functions according to a certain scheme of ordering, which applies under rather general conditions.
If you feel like testing your understanding of the concepts discussed in this chapter since the last quiz, you can take a short quiz foundĀ here.