To investigate whether some functions are injective and/or surjective, we can typically make our lives easier than working with the raw definitions. In this exercise, we turn to two helpful results which we will use in the exercises to follow.
(i) Show that if a function where is strictly monotonous, i.e. or , then is injective, i.e. that .
Let such that . Without loss of generality, assume that (there is no loss of generality, as if this were not the case, we can simply re-label and to make true). By strict monotonicity of , either or . Either way, . Hence, is injective.
A further fact that is helpful also outside the invertability context is the following:
Theorem: Intermediate Value Theorem.
Let for some set , and assume that is continuous. Then, for any with and (and ), for any (for any ), there exists with .
Verbally, this theorem relates to the intuition of being able to draw continuous functions without lifting the pen: if the continuous function attains two different values within the codomain, it will also reach every value in between along the way. The exercise to follow extends this intuition by establishing that for two continuous functions, if one lies above the other at one point but below at another point, then the functions must intersect in between the points.
(ii) Use the intermediate value theorem to show that if two continuous functions with domain and codomain are such that and for some , then there exists a value in between and (i.e., when and else) such that .
Define . Because and are continuous, is continuous as well.
Then, and .
By the intermediate value theorem, there exists in between and (i.e., when and else) such that
(iii) Is inversion of a function a linear operation? That is, does it hold for any two arbitrary, invertible functions with the same domain and codomain that ? Give a formal argument if you think this is true, or a counterexample otherwise. Also think about whether is always guaranteed to be invertible when and are bijective functions.
Inversion is not linear. For instance, you can think of and , i.e. both functions are the so-called “identity function” on .
To determine the inverse, we look at an arbitrary in the codomain and ask ourselves which in the domain is mapped onto it, i.e. we look at the mapping rule and try to solve it for .
Here, for any , is mapped onto by and so that the mapping rules for the inverse functions are .
Conversely, for with mapping rule , for , , so that
where the inequality holds for every except . Hence, is not equal to .
An alternative example is and . Then, solving for in the mapping rules of and yields the mapping rules for the inverse functions: and . Define . If were the inverse of , then for any , we would have . However, e.g. for , we have
and
Hence, is not equal to .
Finally, it is also not guaranteed that are invertible even if and are bijective (and therefore invertible) functions. To see this, consider an invertible function , and , i.e. for any in the domain of . Then, because is both injective and surjective, is as well (if you find this obvious, you need not formally argue why, the elaboration is just given below in case this may not be clear just yet):
Consider such that . By injectivity of , . Hence, it also holds that , i.e. . Therefore, it holds that , i.e. is injective.
Consider . By surjectivity of , there exists such that , or equivalently, . Therefore, it holds that , i.e. is surjective.
Hence, , like , is an invertible function. However, , where , i.e. is the function that is constant at zero. But we know that constant functions are not injective when the domain has more than one value, and they are not surjective when the codomain has more than one value. Therefore, is not invertible.
b. Concrete Examples
(i) Consider the function . Is this function invertible? Hint. The results derived in Ex. 1.a.i and 1.a.ii (or alternatively, the intermediate value theorem) may be helpful in investigating injectivity and surjectivity.
Injectivity. Ex. 1.a.i tells us that strict monotonicity is a sufficient condition for injectivity. This property can be investigated using the first derivative:
Hence, is strictly monotonically decreasing and therefore injective. Surjectivity. Surjectivity can be shown using Ex. 1.a.ii or the intermediate value theorem (IVT) directly. This solution illustrates both methods.
Using Ex. 1.a.ii: Pick an arbitrary . Then, is equivalent to . Clearly, for , , and for , . Hence, by Ex. 1.a.ii, there exists such that , i.e. . Hence, for any there exists such that , and therefore, is surjective.
Using the IVT: Pick an arbitrary . Then, and . Hence, by continuity of and the IVT, there exists for which . Hence, for any there exists such that , and therefore, is surjective. Bijectivity. Because is both injective and surjective, it is bijective, and thus invertible.
(ii) Consider the function . Is this function invertible? If so, can you determine the inverse?
This function can be re-written in matrix notation: for , with . Hence, is invertible if and only if is invertible. We have , so that this is indeed the case.
To determine the inverse, we invert , either using Gauss-Jordan or the rule for matrices. Either way, we obtain
so that the inverse function’s mapping rule is
(iii) Consider the function . Is this function invertible? If so, can you determine the inverse?
The components of this function each use only one distinct element of the argument. Hence, we can test invertability component-wise. It is straightforward to verify that both and are invertible on . Hence, is invertible on . We obtain the mapping rule from solving and for and , respectively. One obtains
Exercise 2: A Caveat on Strict Monotonicity
For univariate real-valued functions, if they are differentiable, it is commonplace to view the sign of the derivative as an equivalent condition for monotonicity. While this is justified for the non-strict versions, it is indeed not the case that for a differentiable function , , there is equivalence between () and being a strictly increasing (decreasing) function. The reason for this are so-called “saddle points” (as will be thoroughly discussed in Chapter 4) which can occur for strictly monotonous functions and feature .
Now for the task of this exercise: show that for a differentiable function , , is a sufficient, but not a necessary condition for being strictly monotonically increasing. For the latter point, you may consider the function with mapping rule as a counterexample to necessity. Hint 1: Recall the formal definition of strict monotonicity: , , is strictly monotonically increasing if . Hint 2: To compare values of using the derivative, recall that for , , and that for a function with for an interval of non-zero length and an “exception set” that contains at most finitely many values, it holds that . Remark: Showing that for a differentiable function , , is a sufficient, but not a necessary condition for being strictly monotonically decreasing can be done in perfect analogy to the investigation here. To avoid tedious case distinctions, we just focus on the case of strictly monotonically increasing functions in this exercise.
Sufficiency. Suppose that holds. Then, for any such that , we have (cf. Hint 2)
Hence, is strictly monotonous. Necessity. If the condition were necessary, we would have
(1)
Hence, to disprove this, we need to find a monotonically increasing which does not satisfy . As the exercise suggests, let us pick . This function has derivative .
Now, let such that . Note that unless , i.e. . Hence, we have two cases:
If , i.e. there exists no such that , then , and there is no “exception set” to consider for Hint 2.
If, on the other hand, there exist some such that , note that there exists , and the exception set is . Note that only applies to a finite number of , as when grows too large (small), the latter (former) inequality fails to hold. Hence, is of finite cardinality, and Hint 2 still applies.
Hence, in both cases, we can conclude that
Hence, we have for . Therefore, for arbitrary , implies , and is strictly monotonically increasing.
However, as we have seen, for all , such that does not apply to the concrete example of we considered here. Therefore, the implication in (1) does not hold, and the condition is not necessary.
Exercise 3: Convexity and Concavity
a. Set Convexity
Are the following sets convex? Justify your answer!
for some . What about a closed ball?
1.
is convex: let and . Then,
where the second-to-last equality holds as . Hence, and is convex.
2.
is not convex. To show this, we need to find a counterexample, i.e. and such that . It turns out that combinations where are indeed the exception, such that the counterexample is not hard to come by. Consider e.g. and . Then, clearly, , but for ,
so that . Hence, is not convex.
3.
-open balls are always convex sets: let and . Then,
Hence, and is convex.
Graphically, this intuition can be seen from the ball we saw in Chapter 3: any line piece connecting two points in the ball is fully contained in it.
For an -closed ball, analogous reasoning applies to show that it is convex as well – the only adjustment we need to make is to use a weak inequality in the fourth line above.
b. Function Convexity
(i) Investigate the following function with respect to (strict) convexity/concavity:
Hint: Recall that we can use the second derivative to investigate convexity.
As suggested by the hint, let us consider the second derivative, i.e. the Hessian of , to investigate convexity. We have
and therefore
For ,
Consider and . For , , and for , . Hence, at any , is indefinite. Therefore, is neither concave nor convex, and there exists no subset of on which it attains either property.
(ii) Investigate the following function with respect to (strict) convexity/concavity:
where is a norm on , . Hint 1: We know that norms are continuous, but they need not be differentiable. Hence, the criterion for the second derivative is not useful here, and it is instructive to proceed with the “raw” definition of convexity. Hint 2: If you already solved Exercise 2.a.3., this solution may be helpful here.
Consider and . Then,
Hence, the norm is convex. This already rules out strict concavity as it is mutually exclusive to convexity. Note that imposing and will generally not suffice to make the inequality in the second line strict; therefore, we can not strengthen the result to strict convexity. Indeed, for such either or (without loss of generality, assume that ), you have
and hence not
such that no norm is strictly convex.
Because the norm is convex, we know that it is also concave only if it is linear. This can never be the case: otherwise, for , we would have
a contradiction. Hence, the norm is not concave.
In conclusion, every norm is convex, but not strictly convex. Therefore, it is not strictly concave. Also, it is not linear, which rules out concavity.
(iii) Is the following function convex? Is it quasi-convex?
It is relatively straightforward to see that this function is not convex: consider , and . Then, so that . On the other hand, , so that
which violates convexity. Of course, you can also come up with a variety of other examples; finding any one such example suffices to disprove convexity. is quasi-convex if for any and , . To show this, we proceed as follows:
Note that the last inequality works only because is non-negative.
Admittedly, this computation is quite lengthy and looks ugly on first sight. However, it relies on only two simple mathematical facts related to the maximum that are applied repeatedly, namely that (1) for and any , and that (2) for any , .
In conclusion, is a quasi-convex function. Thus, this function is an example of a multivariate function that is quasi-convex but not convex.
c. Convexity and Composition
(i) Consider a univariate monotonically increasing, convex “transformation function” and a convex, potentially multivariate function , . Answer the following:
Is the transformed function , i.e. the function with mapping rule convex?
Does this result depend on being a monotonically increasing function? What if it were monotonically decreasing – can you change an assumption on such that is convex?
Do analogous results exist for concavity of ?
Summarize your conclusions.
1.
Let and . Then,
where the first line follows from being a monotonically increasing function and convexity of , and the second from convexity of . Hence, is a convex function. Therefore, convex, monotonically increasing transformations preserve convexity.
2.
The reasoning in 1. has explicitly used that was monotonically increasing. Indeed, if were monotonically decreasing and were concave, we could also argue that
if is convex, then will also be convex by an argument in analogy to 1. Hence, a convex, monotonically decreasing transformation inverts concavity to convexity.
3.
With a concave, monotonically increasing transformation and a concave function , with and , we can argue that
so that is concave. Therefore, concave, monotonically increasing transformations preserve concavity.
Similar to 2., if is convex and is monotonically decreasing, we sustain
Convex, monotonically decreasing transformations invert concavity to convexity.
Concave, monotonically decreasing transformations invert convexity to concavity.
Comment. Strict versions of these statements exist as well; establishing them is however not exactly analogous to what we did above. Hence, we do not discuss them further here.
(ii) Can you use (i) to say something about convexity of the function ? Is this consistent with the Hessian criterion? Hint: Think about the Euclidean norm.
For , where is the Euclidean norm. The transformation is monotonically increasing on : . It is also strictly convex by the second derivative criterion: . Finally, from Exercise 2.b.ii, we know that the Euclidean norm is convex.
From the results of (i), because is a convex, monotonically increasing transformation of a convex function, we conclude that is convex.
The Hessian criterion should yield something consistent with this result. The Hessian of at is:
for which for any :
and thus, and for any : . Hence, is positive definite everywhere, and is actually not only convex, but indeed strictly convex. Comment 1. To take away, the Hessian criterion may be more informative than the rules we derived in (i). However, for a quick convexity check, they can still be very helpful as you may use them to avoid computation of a second derivative – especially for higher-dimensional, complex functions, this may come in handy at times. Comment 2. Of course, you can iterate on the rules of (i). For example, if you have convex, monotonically increasing transformations and a convex function , then the function with mapping rule is also convex. To continue the example we just saw, this, for instance, implies that is a convex function because , for a constant and multiplication by a constant are all monotonically increasing, convex transformations.
Exercise 4: Multivariate Differentiation
a. A Concrete Function
Here, we consider one example of a first and second order multivariate derivative. More exercises can be found on Problem Set 3 and in the examples given in Chapter 3.
Consider the function
You can take for granted that as a composition of infinitely many times differentiable functions (polynomial, logarithm and exponential function), is infinitely many times differentiable. Compute the first and second derivative of , and evaluate them at . Hint: For the Hessian, you can reduce the number of computations by exploiting symmetry and certain interrelationships of the first order partial derivatives.
First derivative
We need to use chain rule for the logarithm expression in every of the first order partial derivatives, and product rule for . Accordingly, we obtain
Accordingly,
At ,
where it was used that for , .
Second derivative
For the second order derivatives, we can exploit (1) that is (at least) twice continuously differentiable and therefore, the Hessian will be symmetric, and (2) that , which also saves some computations. Here, we need to use quotient rule. Note that the quotient is always strictly positive, so that the rule applies. We have
and
Hence, all entries in the upper 2×2-block of the Hessian are identical and equal to expression . At ,
Next,
At ,
Now for the final second order partial derivative: we already know that
from the last computation, and
from the first order derivative. Hence, all we need to apply is a simple product rule:
At ,
Hence, the Hessian is
and when evaluated at ,
b. Matrix Functions
Consider a matrix , .
(i) Show that .
We decompose into its component functions:
where are the row vectors containing the rows of . For the -th entry,
accordingly, the -th partial derivative of is
and
Therefore,
(ii) What is the derivative of ? Hint: Use (i) and the multivariate product rule.
Define the helper functions and (with domain ). Then, and and, with (i), . With the multivariate product rule,
(iii) If , can you find values for and so that the second derivative of is positive definite everywhere? Can you find an alternative combination where is positive semi-definite but not positive definite?
From (ii), we know that the first derivative of is . Hence, with (i), the second derivative is
For ,
For , this expression is strictly positive if . Thus, you can choose arbitrarily and set to make positive definite.
If you choose or respectively, arbitrarily and , then you can apply a binomial formula to obtain
which is weakly positive for every , such that is positive semi-definite. is not positive definite as e.g. for , and thus not .
Alternatively, you also obtain positive semi-definiteness from ; here you just apply a different binomial formula. Comment: are the easiest cases to investigate. Generally, you would need to solve an optimization problem to determine the minimal value of . An exercise of in the collection of Chapter 4 addresses this general issue.
Exercise 5: Taylor Approximation
In this exercise, we consider univariate Taylor Approximations. Exercises for the multivariate case can be found on Problem Set 3. Using the univariate case, we can reduce intensity of notation a bit, and focus on getting familiar with the mechanics, investigate approximation quality and study some further properties.
(i) Compute the first and second order Taylor Approximations to the exponential function around and . For , illustrate the exponential function and its approximations. Is one globally preferable to the other, i.e. does it yield a weakly superior approximation everywhere?
For a function , the first order approximation to around is
and the second order approximation is
For the exponential function,
Iterating on this, for the -th derivative of ,
Hence,
and
With ,
and
With ,
and
Next, we compare the approximation quality at graphically:
As can be seen, the higher order approximation fares better around the point of approximation, and also for larger values of generally. However, for , approximation quality deteriorates, and becomes inferior to the first order approximation. This once again raises attention to the issue that Taylor Approximations are “good” only locally around the point of approximation, and that even higher order approximations may perform disproportionately bad when we move too far away from .
(ii) Compute the -th order Taylor Approximation to the exponential function for for variable . Can you find an infinite sum representation for the exponential function using polynomial terms?
For ,
Plugging in for which , we obtain
Now recall from Chapter 3 that an order of infinite approximation produces no error, i.e. that for an infinitely many times differentiable function , for any in its domain, . Accordingly,
Exercise 6: Integration
Integrate the following functions over the given interval:
over
over
1.
We want to compute
where the equality is due to Fubini’s theorem that tells us that we can iteratively integrate with respect to the individual dimensions. Of course, you can also choose to integrate via the -dimension first, if you do so, beware of the respective integral bounds.
To compute the antiderivative of with respect to , note that we have to “invert” a chain rule. The inner derivative with respect to is , which is therefore inverted by multiplication with . The antiderivative of the outer function, , is . In total, we obtain
The inner integral is therefore given by
Therefore,
This time, the antiderivative of the function to be integrated is
Accordingly,
2.
For this integral, we can make our lives easier by using linearity of the integral operation:
Further, both sub-functions are multiplicatively separable. Therefore, applying Fubini gives
In fact, we only need to compute two integrals to solve this expression: