Chapter 3 – Exercises

Exercise 1: Inverse Functions

a. Helpful Results

To investigate whether some functions are injective and/or surjective, we can typically make our lives easier than working with the raw definitions. In this exercise, we turn to two helpful results which we will use in the exercises to follow.

(i) Show that if a function f: X\mapsto\mathbb R where X\subseteq\mathbb R is strictly monotonous, i.e. \forall x,y\in X: (x>y\Rightarrow f(x)>f(y)) or \forall x,y\in X: (x>y\Rightarrow f(x)<f(y)), then f is injective, i.e. that \forall x,y\in X: (x\neq y\Rightarrow f(x)\neq f(y)).

Let x,y\in X such that x\neq y. Without loss of generality, assume that x>y (there is no loss of generality, as if this were not the case, we can simply re-label x and y to make x>y true). By strict monotonicity of f, either f(x) > f(y) or f(x) < f(y). Either way, f(x)\neq f(y). Hence, f is injective.


A further fact that is helpful also outside the invertability context is the following:

Theorem: Intermediate Value Theorem.
Let f:X\mapsto \mathbb R for some set X\subseteq\mathbb R, and assume that f is continuous. Then, for any a,b\in X with a<b and f(a) \leq f(b) (and f(a) \geq f(b)), for any y\in [f(a), f(b)] (for any y\in[f(b),f(a)]), there exists c\in[a,b] with f(c) = y.


Verbally, this theorem relates to the intuition of being able to draw continuous functions without lifting the pen: if the continuous function attains two different values within the codomain, it will also reach every value in between along the way. The exercise to follow extends this intuition by establishing that for two continuous functions, if one lies above the other at one point but below at another point, then the functions must intersect in between the points.

(ii) Use the intermediate value theorem to show that if two continuous functions f,g with domain X\subseteq \mathbb R and codomain \mathbb R are such that f(a)\geq g(a) and f(b)\leq g(b) for some a,b\in X, then there exists a value x\in X in between a and b (i.e., x\in[a,b] when a\leq b and x\in[b,a] else) such that f(x) = g(x).

Define h:= f-g. Because f and g are continuous, h is continuous as well.
Then, h(a) = f(a) - g(a)\geq 0 and h(b) = f(b) - g(b) \leq 0.
By the intermediate value theorem, there exists x in between a and b (i.e., x\in[a,b] when a\leq b and x\in[b,a] else) such that

    \[ 0 = h(x) = f(x) - g(x) \hspace{0.5cm}\Rightarrow\hspace{0.5cm}f(x) = g(x) \]


(iii) Is inversion of a function a linear operation? That is, does it hold for any two arbitrary, invertible functions f,g with the same domain and codomain that (f+g)^{-1} = f^{-1} + g^{-1}? Give a formal argument if you think this is true, or a counterexample otherwise. Also think about whether f+g is always guaranteed to be invertible when f and g are bijective functions.

Inversion is not linear. For instance, you can think of f:\mathbb R\mapsto\mathbb R, x\mapsto x and g=f, i.e. both functions are the so-called “identity function” on \mathbb R.
To determine the inverse, we look at an arbitrary y in the codomain and ask ourselves which x in the domain is mapped onto it, i.e. we look at the mapping rule y=f(x) and try to solve it for x.
Here, for any y\in\mathbb R, x=y is mapped onto y by f and g so that the mapping rules for the inverse functions are f^{-1}(y) = g^{-1}(y) = y.
Conversely, for f+g with mapping rule (f+g)(x) = x+x = 2x, for y\in\mathbb R, y=2x\Leftrightarrow x=\frac{1}{2}y, so that

    \[(f+g)^{-1}(y) = \frac{1}{2}y \neq 2y = y + y = f^{-1}(y) + g^{-1}(y)\]

where the inequality holds for every y\in\mathbb R except y=0. Hence, (f+g)^{-1} is not equal to f^{-1} + g^{-1}.

An alternative example is f:\mathbb R_+\mapsto\mathbb R_+, x\mapsto \sqrt{x} and g:\mathbb R_+\mapsto\mathbb R_+, x\mapsto x^2. Then, solving for x in the mapping rules of f and g yields the mapping rules for the inverse functions: f^{-1}(y) = y^2 and g^{-1}(y) = \sqrt{y}. Define h: f^{-1} + g^{-1}. If h were the inverse of f+g, then for any x\in\mathbb R_+, we would have (f+g)(h(x)) = x. However, e.g. for x=1, we have

    \[ h(1) = 1^2 + \sqrt{1} = 1+1 = 2 \]


    \[ (f+g)(h(1)) = (f+g)(2) = \sqrt{2} + 2^2 = 4 + \sqrt{2} \neq 1 \]

Hence, (f+g)^{-1} is not equal to f^{-1} + g^{-1}.

Finally, it is also not guaranteed that f+g are invertible even if f and g are bijective (and therefore invertible) functions. To see this, consider an invertible function f:\mathbb R\mapsto\mathbb R, and g = -f, i.e. g(x) = (-1)\cdot f(x) for any x in the domain of f. Then, because f is both injective and surjective, g is as well (if you find this obvious, you need not formally argue why, the elaboration is just given below in case this may not be clear just yet):
Consider x_1,x_2\in\mathbb R such that x_1\neq x_2. By injectivity of f, f(x_1)\neq f(x_2). Hence, it also holds that -f(x_1)\neq -f(x_2), i.e. g(x_1)\neq g(x_2). Therefore, it holds that \forall x_1,x_2\in\mathbb R:(x_1\neq x_2\Rightarrow g(x_1)\neq g(x_2)), i.e. g is injective.
Consider y\in\mathbb R. By surjectivity of f, there exists x\in\mathbb R such that f(x) = -y, or equivalently, y = -f(x) = g(x). Therefore, it holds that \forall y\in\mathbb R:(\exists x\in\mathbb R: g(x) = y), i.e. g is surjective.
Hence, g, like f, is an invertible function. However, f + g = f-f = \mathbf 0, where \mathbf 0(x) = 0, i.e. \mathbf 0 is the function that is constant at zero. But we know that constant functions are not injective when the domain has more than one value, and they are not surjective when the codomain has more than one value. Therefore, f+g is not invertible.


b. Concrete Examples

(i) Consider the function f:\mathbb R\mapsto\mathbb R, x\mapsto \sin(x) - \frac{3}{2}x. Is this function invertible?
Hint. The results derived in Ex. 1.a.i and 1.a.ii (or alternatively, the intermediate value theorem) may be helpful in investigating injectivity and surjectivity.

Injectivity. Ex. 1.a.i tells us that strict monotonicity is a sufficient condition for injectivity. This property can be investigated using the first derivative:

    \[ f'(x) = \cos(x) - \frac{3}{2} \leq 1 - \frac{3}{2} = -\frac{1}{2} < 0 \]

Hence, f is strictly monotonically decreasing and therefore injective.
Surjectivity. Surjectivity can be shown using Ex. 1.a.ii or the intermediate value theorem (IVT) directly. This solution illustrates both methods.
Using Ex. 1.a.ii: Pick an arbitrary y\in \mathbb R. Then, y=\sin(x) - \frac{3}{2}x is equivalent to y+\frac{3}{2}x = \sin(x). Clearly, for x^L = -\frac{2}{3}(y + 1), y + \frac{3}{2}x^L = -1 \leq \sin(x^L), and for x^H = -\frac{2}{3}(y - 1), y + \frac{3}{2}x^H = 1 \geq \sin(x^H). Hence, by Ex. 1.a.ii, there exists x\in[x^L,x^H] such that y + \frac{3}{2}x = \sin(x), i.e. y = \sin(x) - \frac{3}{2}x. Hence, for any y\in \mathbb R there exists x\in\mathbb R such that y = \sin(x) - \frac{3}{2}x, and therefore, f is surjective.
Using the IVT: Pick an arbitrary y\in \mathbb R. Then, f(-\frac{2}{3}y - \frac{2}{3}) = \sin(\frac{2}{3}y - \frac{2}{3}) + y + 1 \geq y and f(-\frac{2}{3}y + \frac{2}{3}) = \sin(\frac{2}{3}y + \frac{2}{3}) + y - 1 \leq y. Hence, by continuity of f and the IVT, there exists x\in[-\frac{2}{3}y - \frac{2}{3}, -\frac{2}{3}y + \frac{2}{3}] for which f(x) = y. Hence, for any y\in \mathbb R there exists x\in\mathbb R such that y = \sin(x) - \frac{3}{2}x, and therefore, f is surjective.
Bijectivity. Because f is both injective and surjective, it is bijective, and thus invertible.


(ii) Consider the function f:\mathbb R^2\mapsto\mathbb R^2, x=(x_1,x_2)'\mapsto \begin{pmatrix}x_1+x_2\\2x_2\end{pmatrix}. Is this function invertible? If so, can you determine the inverse?

This function can be re-written in matrix notation: for x\in \mathbb R^2, f(x) = Ax with A = \begin{pmatrix}1 & 1\\0 & 2\end{pmatrix}. Hence, f is invertible if and only if A is invertible. We have \det(A) = 2 \neq 0, so that this is indeed the case.
To determine the inverse, we invert A, either using Gauss-Jordan or the rule for 2\times 2 matrices. Either way, we obtain

    \[ A^{-1} = \begin{pmatrix}1 & -\frac{1}{2}\\0 & \frac{1}{2}\end{pmatrix} \]

so that the inverse function’s mapping rule is

    \[ f^{-1}(y) = A^{-1}y = \begin{pmatrix}y_1 -\frac{1}{2}y_2\\\frac{1}{2}y_2\end{pmatrix} \]


(iii) Consider the function f:\mathbb R_+^2\mapsto\mathbb R_+^2, x=(x_1,x_2)'\mapsto \begin{pmatrix}2x_2\\4x_1^2\end{pmatrix}. Is this function invertible? If so, can you determine the inverse?

The components of this function each use only one distinct element of the argument. Hence, we can test invertability component-wise. It is straightforward to verify that both f(x) = 4x^2 and f(x) = 2x are invertible on \mathbb R_+. Hence, f is invertible on \mathbb R_+^2. We obtain the mapping rule from solving y_1 = 2x_2 and y_2 = 4x_1^2 for x_1 and x_2, respectively. One obtains

    \[ f^{-1}(y) = \frac{1}{2} \begin{pmatrix}\sqrt{y_2}\\y_1\end{pmatrix} \]


Exercise 2: A Caveat on Strict Monotonicity

For univariate real-valued functions, if they are differentiable, it is commonplace to view the sign of the derivative as an equivalent condition for monotonicity. While this is justified for the non-strict versions, it is indeed not the case that for a differentiable function f:X\mapsto\mathbb R, X\subseteq\mathbb R, there is equivalence between f'(x)>0 \forall x\in X (f'(x)<0 \forall x\in X) and f being a strictly increasing (decreasing) function. The reason for this are so-called “saddle points” (as will be thoroughly discussed in Chapter 4) which can occur for strictly monotonous functions and feature f'(x) = 0.
Now for the task of this exercise: show that for a differentiable function f:X\mapsto\mathbb R, X\subseteq\mathbb R, f'(x)>0 \forall x\in X is a sufficient, but not a necessary condition for f being strictly monotonically increasing. For the latter point, you may consider the function with mapping rule f(x) = \sin(x) + x as a counterexample to necessity.
Hint 1: Recall the formal definition of strict monotonicity: f:X\mapsto\mathbb R, X\subseteq\mathbb R, is strictly monotonically increasing if \forall x_1,x_2\in X:(x_2>x_1\Rightarrow f(x_2)>f(x_1)).
Hint 2: To compare values of f using the derivative, recall that for b>a, f(b) - f(a) = \int_{[a,b]}f'(x)dx, and that for a function g with g(x)>0\forall x\in I\backslash N for an interval I of non-zero length and an “exception set” N that contains at most finitely many values, it holds that \int_I g(x)dx > 0.
Remark: Showing that for a differentiable function f:X\mapsto\mathbb R, X\subseteq\mathbb R, f'(x)<0 \forall x\in X is a sufficient, but not a necessary condition for f being strictly monotonically decreasing can be done in perfect analogy to the investigation here. To avoid tedious case distinctions, we just focus on the case of strictly monotonically increasing functions in this exercise.

Sufficiency. Suppose that f'(x)>0 \forall x\in X holds. Then, for any x_1,x_2\in\mathbb R such that x_2>x_1, we have (cf. Hint 2)

    \[ f(x_2) - f(x_1) = \int_{x_1}^{x_2} f'(x)dx > 0 \hspace{0.5cm}\Rightarrow\hspace{0.5cm}f(x_2) > f(x_1) \]

Hence, f is strictly monotonous.
Necessity. If the condition were necessary, we would have

(1)   \begin{equation*} \text{"}f\text{ is strictly monotonically increasing" }\Rightarrow (f'(x)>0 \forall x\in X) \end{equation*}

Hence, to disprove this, we need to find a monotonically increasing f which does not satisfy f'(x)>0 \forall x\in X. As the exercise suggests, let us pick f:X\mapsto\mathbb R, x\mapsto \sin(x) + x. This function has derivative f'(x) = \cos(x) + 1.
Now, let x_1,x_2\in\mathbb R such that x_2>x_1. Note that f'(x) > 0 unless \cos(x) = -1, i.e. \exists z\in\mathbb Z: x = (2z+1)\cdot \pi. Hence, we have two cases:
If \forall x\in[x_1,x_2]: (\forall z\in\mathbb Z: x\neq (2z+1)\cdot \pi), i.e. there exists no x\in[x_1,x_2] such that \cos(x) = -1, then \forall x\in[x_1,x_2]:f'(x) > 0, and there is no “exception set” to consider for Hint 2.
If, on the other hand, there exist some x\in[x_1,x_2] such that \cos(x) = -1, note that there exists z\in\mathbb Z: x=(2z+1)\pi, and the exception set is N=\{(2z+1)\pi:z\in Z\}\cap [x_1,x_2]. Note that x_1 \leq (2z+1)\pi\leq x_2 only applies to a finite number of z\in Z, as when z grows too large (small), the latter (former) inequality fails to hold. Hence, N is of finite cardinality, and Hint 2 still applies.
Hence, in both cases, we can conclude that

    \[ f(x_2) - f(x_1) = \int_{x_1}^{x_2} f'(x)dx > 0 \hspace{0.5cm}\Rightarrow\hspace{0.5cm}f(x_2) > f(x_1) \]

Hence, we have f(x_2) > f(x_1) for x_2>x_1. Therefore, for arbitrary x_1,x_2\in\mathbb R, x_2>x_1 implies f(x_2)>f(x_1), and f is strictly monotonically increasing.
However, as we have seen, f'((2z+1)\pi) = 0 for all z\in\mathbb Z, such that f'(x)>0 \forall x\in X does not apply to the concrete example of f we considered here. Therefore, the implication in (1) does not hold, and the condition is not necessary.


Exercise 3: Convexity and Concavity

a. Set Convexity

Are the following sets convex? Justify your answer!

    1. S_1:=\{x\in\mathbb R^2: x_1 + x_2 = 3\}
    2. S_2:=\{x\in\mathbb R^2: x_1 \cdot x_2 = 3\}
    3. S_3:= B_\varepsilon(x_0) = \{x\in X: \|x-x_0\|<\varepsilon\} for some \varepsilon > 0. What about a closed ball?


S_1 is convex: let x,y\in S_1 and \lambda \in [0,1]. Then,

    \[ (\lambda x_1 + (1-\lambda) y_1) + (\lambda x_2 + (1-\lambda) y_2) = \lambda (x_1 + x_2) + (1-\lambda)(y_1 + y_2) = \lambda\cdot 3 + (1-\lambda)\cdot 3 = 3 \]

where the second-to-last equality holds as x,y\in S_1. Hence, \lambda x + (1-\lambda)y \in S_1 and S_1 is convex.


S_2 is not convex. To show this, we need to find a counterexample, i.e. x,y\in S_2 and \lambda \in [0,1] such that \lambda x + (1-\lambda)y \notin S_2. It turns out that combinations where \lambda x + (1-\lambda)y \in S_2 are indeed the exception, such that the counterexample is not hard to come by. Consider e.g. x = (3,1)' and y=(1,3)'. Then, clearly, x,y\in S_2, but for \lambda = \frac{1}{2},

    \[ (\lambda x_1 + (1-\lambda) y_1)\cdot (\lambda x_2 + (1-\lambda) y_2) = \frac{1}{2}(1 + 3)\cdot \frac{1}{2}(3 + 1) = \frac{4^2}{4} = 4 \neq 3 \]

so that \frac{1}{2}x + \frac{1}{2}y\notin S_2. Hence, S_2 is not convex.



\varepsilon-open balls are always convex sets: let x,y\in S_3 and \lambda\in [0,1]. Then,

    \[\begin{split} \|(\lambda x + (1-\lambda) y) - x_0\| & = \|\lambda (x - x_0) + (1-\lambda) (y - x_0) \| \\& \leq \|\lambda (x - x_0)\| + \|(1-\lambda)(y - x_0)\| \hspace{0.5cm}\text{by the norm's triangle inequality} \\& = \lambda \|x - x_0\| + (1-\lambda)\|y - x_0\|\hspace{0.5cm}\text{by absolute homogeneity of the norm} \\& < \lambda \varepsilon + (1-\lambda)\varepsilon\hspace{0.5cm}\text{as }x,y\in S_3 \\& = \varepsilon \end{split}\]

Hence, \lambda x + (1-\lambda)y \in S_3 and S_3 is convex.
Graphically, this intuition can be seen from the ball we saw in Chapter 3: any line piece connecting two points in the ball is fully contained in it.
For an \varepsilon-closed ball, analogous reasoning applies to show that it is convex as well – the only adjustment we need to make is to use a weak inequality in the fourth line above.


b. Function Convexity

(i) Investigate the following function with respect to (strict) convexity/concavity:

    \[ f:\mathbb R^2\mapsto\mathbb R, x=(x_1,x_2)'\mapsto \exp(x_1) + x_1x_2 + 5x_1 + 4 \]

Hint: Recall that we can use the second derivative to investigate convexity.

As suggested by the hint, let us consider the second derivative, i.e. the Hessian of f, to investigate convexity. We have

    \[ \nabla f(x) = (\exp(x_1) + x_2 + 5, x_1) \]

and therefore

    \[ H_f(x) = \begin{pmatrix} \exp(x_1) & 1\\ 1 & 0 \end{pmatrix} \]

For v\in\mathbb R^2,

    \[ v'H_f(x) v = (\exp(x_1)v_1 + v_2, v_1) \cdot v = \exp(x_1)v_1^2 + v_1v_2 + v_1v_2 = \exp(x_1)v_1^2 + 2v_1v_2 \]

Consider v_1 = 1 and v_2 \in \{0, -\exp(x_1)\}. For v_2 = 0, v'H_f(x) v = \exp(x_1) > 0, and for v_2 = -\exp(x_1), v'H_f(x) v = -\exp(x_1) < 0. Hence, at any x\in\mathbb R^2, H_f(x) is indefinite. Therefore, f is neither concave nor convex, and there exists no subset of \mathbb R^2 on which it attains either property.


(ii) Investigate the following function with respect to (strict) convexity/concavity:

    \[ f:\mathbb R^n\mapsto\mathbb R, x\mapsto \|x\| \]

where \|\cdot\| is a norm on \mathbb R^n, n\in\mathbb N.
Hint 1: We know that norms are continuous, but they need not be differentiable. Hence, the criterion for the second derivative is not useful here, and it is instructive to proceed with the “raw” definition of convexity.
Hint 2: If you already solved Exercise 2.a.3., this solution may be helpful here.

Consider x,y\in \mathbb R^n and \lambda\in[0,1]. Then,

    \[\begin{split} f(\lambda x + (1-\lambda) y) & = \|\lambda x + (1-\lambda) y\| \\&\leq \|\lambda x\| + \|(1-\lambda)y\| \hspace{0.5cm}\text{by the norm's triangle inequality} \\& = \lambda \|x\| + (1-\lambda)\|y\|\hspace{0.5cm}\text{by absolute homogeneity of the norm} \\& = \lambda f(x) + (1-\lambda) f(y) \end{split}\]

Hence, the norm is convex. This already rules out strict concavity as it is mutually exclusive to convexity. Note that imposing \lambda\in (0,1) and x\neq y will generally not suffice to make the inequality in the second line strict; therefore, we can not strengthen the result to strict convexity. Indeed, for x,y\in\mathbb R^n such either x = \mathbf 0 or y = \mathbf 0 (without loss of generality, assume that y = \mathbf 0), you have

    \[ \|\lambda x + (1-\lambda) y\| = \|\lambda x\| = \lambda \|x\| = \lambda \|x\| + (1-\lambda)\|y\| \]

and hence not

    \[ \|\lambda x + (1-\lambda) y\| < \lambda \|x\| + (1-\lambda)\|y\| \]

such that no norm is strictly convex.
Because the norm is convex, we know that it is also concave only if it is linear. This can never be the case: otherwise, for x\neq \mathbf 0, we would have

    \[\begin{split} 0 &\neq 2\|x\| = \|x\| + \|-x\|\hspace{0.5cm}\text{by absolute homogeneity of the norm}\\ & = \|x + (-x)\| \hspace{0.5cm}\text{by linearity}\\ & = \|\mathbf 0\| = 0 \end{split}\]

a contradiction. Hence, the norm is not concave.
In conclusion, every norm is convex, but not strictly convex. Therefore, it is not strictly concave. Also, it is not linear, which rules out concavity.


(iii) Is the following function convex? Is it quasi-convex?

    \[ f:\mathbb R^2_+\mapsto\mathbb R, x\mapsto \sqrt{\max\{x_1, x_2\}} \]

It is relatively straightforward to see that this function is not convex: consider x = (0, 0)', y = (1, 1)' and \lambda = \frac{1}{2}. Then,
f(x) = 0, f(y) = 1 so that \lambda f(x) + (1-\lambda)f(y) = \frac{1}{2}. On the other hand, \lambda x + (1-\lambda) y = (\frac{1}{2}, \frac{1}{2})', so that

    \[ f(\lambda x + (1-\lambda) y) = \sqrt{\frac{1}{2}} = \frac{1}{\sqrt{2}} > \frac{1}{2} = \lambda f(x) + (1-\lambda)f(y) \]

which violates convexity. Of course, you can also come up with a variety of other examples; finding any one such example suffices to disprove convexity.
f is quasi-convex if for any x,y\in\mathbb R^2 and \lambda\in[0,1], f(\lambda x + (1-\lambda) y) \leq \max\{f(x),f(y)\}. To show this, we proceed as follows:

    \[\begin{split} f(\lambda x + (1-\lambda) y) & = \sqrt{\max\{\lambda x_1 + (1-\lambda) y_1, \lambda x_2 + (1-\lambda) y_2\}} \\&\leq \sqrt{\max\{\lambda \max\{x_1,x_2\} + (1-\lambda) y_1, \lambda \max\{x_1,x_2\} + (1-\lambda) y_2\}} \\&\leq \sqrt{\max\{\lambda \max\{x_1,x_2\} + (1-\lambda) \max\{y_1, y_2\}, \lambda \max\{x_1,x_2\} + (1-\lambda) \max\{y_1, y_2\}\}} \\& = \sqrt{\lambda \max\{x_1,x_2\} + (1-\lambda) \max\{y_1, y_2\}} \\& = \sqrt{\lambda f(x)^2 + (1-\lambda) f(y)^2} \\& \leq \sqrt{\lambda \max\{f(x), f(y)\}^2 + (1-\lambda)\max\{f(x), f(y)\}^2} \\& = \sqrt{\max\{f(x), f(y)\}^2} \\& = \max\{f(x), f(y)\}. \end{split}\]

Note that the last inequality works only because f is non-negative.
Admittedly, this computation is quite lengthy and looks ugly on first sight. However, it relies on only two simple mathematical facts related to the maximum that are applied repeatedly, namely that (1) for b\leq c and any a\in\mathbb R, \max\{a,b\}\leq \max\{a,c\} and that (2) for any a,b\in\mathbb R, a\leq \max\{a,b\}.
In conclusion, f is a quasi-convex function. Thus, this function is an example of a multivariate function that is quasi-convex but not convex.



c. Convexity and Composition

(i) Consider a univariate monotonically increasing, convex “transformation function” t:\mathbb R\mapsto\mathbb R and a convex, potentially multivariate function f:\mathbb R^n\mapsto\mathbb R, n\in\mathbb N. Answer the following:

    1. Is the transformed function t\circ f, i.e. the function with mapping rule t(f(x)) convex?
    2. Does this result depend on t being a monotonically increasing function? What if it were monotonically decreasing – can you change an assumption on f such that t\circ f is convex?
    3. Do analogous results exist for concavity of f?

Summarize your conclusions.


Let x,y\in\mathbb R^n and \lambda \in [0,1]. Then,

    \[\begin{split} t(f(\lambda x + (1-\lambda) y)) &\leq t(\lambda f(x) + (1-\lambda) f(y)) \\& \leq \lambda t(f(x)) + (1-\lambda)t(f(y)) \end{split}\]

where the first line follows from t being a monotonically increasing function and convexity of f, and the second from convexity of t. Hence, t\circ f is a convex function. Therefore, convex, monotonically increasing transformations preserve convexity.



The reasoning in 1. has explicitly used that t was monotonically increasing. Indeed, if t were monotonically decreasing and f were concave, we could also argue that

    \[ t(f(\lambda x + (1-\lambda) y)) \leq t(\lambda f(x) + (1-\lambda) f(y)) \]

if t is convex, then t\circ f will also be convex by an argument in analogy to 1. Hence, a convex, monotonically decreasing transformation inverts concavity to convexity.



With a concave, monotonically increasing transformation t and a concave function f, with x,y\in\mathbb R^n and \lambda \in [0,1], we can argue that

    \[\begin{split} t(f(\lambda x + (1-\lambda) y)) &\geq t(\lambda f(x) + (1-\lambda) f(y)) \\& \geq \lambda t(f(x)) + (1-\lambda)t(f(y)) \end{split}\]

so that t\circ f is concave. Therefore, concave, monotonically increasing transformations preserve concavity.
Similar to 2., if f is convex and t is monotonically decreasing, we sustain

    \[ t(f(\lambda x + (1-\lambda) y)) \geq t(\lambda f(x) + (1-\lambda) f(y)) \]

such that with concave t, the function t\circ f is concave.



We can conclude:

    1. Convex, monotonically increasing transformations preserve convexity.
    2. Concave, monotonically increasing transformations preserve concavity.
    3. Convex, monotonically decreasing transformations invert concavity to convexity.
    4. Concave, monotonically decreasing transformations invert convexity to concavity.

Comment. Strict versions of these statements exist as well; establishing them is however not exactly analogous to what we did above. Hence, we do not discuss them further here.


(ii) Can you use (i) to say something about convexity of the function f:\mathbb R^2\mapsto\mathbb R, x=(x_1,x_2)'\mapsto x_1^2 + x_2^2? Is this consistent with the Hessian criterion?
Hint: Think about the Euclidean norm.

For x\in\mathbb R^2, f(x) = (\|x\|_2)^2 where \|\cdot\|_2 is the Euclidean norm. The transformation t(x) = x^2 is monotonically increasing on \mathbb R_+: t'(x) = 2x \geq 0 \forall x\in\mathbb R_+. It is also strictly convex by the second derivative criterion: t"(x) = 2 > 0 \forall x\in\mathbb R_+. Finally, from Exercise 2.b.ii, we know that the Euclidean norm is convex.
From the results of (i), because f is a convex, monotonically increasing transformation of a convex function, we conclude that f is convex.
The Hessian criterion should yield something consistent with this result. The Hessian of t\circ \|\cdot\|_2 at x\in\mathbb R^2 is:

    \[ H_f(x) = \begin{pmatrix}2 & 0 \\ 0 & 2\end{pmatrix} \]

for which for any v\in\mathbb R^2:

    \[ v'H_f(x)v = 2(v_1^2 + v_2^2) \]

and thus, v'H_f(x)v\geq 0 and for any v\neq \mathbf 0: v'H_f(x)v>0. Hence, H_f is positive definite everywhere, and f is actually not only convex, but indeed strictly convex.
Comment 1. To take away, the Hessian criterion may be more informative than the rules we derived in (i). However, for a quick convexity check, they can still be very helpful as you may use them to avoid computation of a second derivative – especially for higher-dimensional, complex functions, this may come in handy at times.
Comment 2. Of course, you can iterate on the rules of (i). For example, if you have convex, monotonically increasing transformations t_1,\ldots, t_k and a convex function f, then the function with mapping rule t_1(t_2(\ldots(t_k(f(x))\ldots)) is also convex. To continue the example we just saw, this, for instance, implies that f(x) = \frac{1}{9}\min\{\exp((\|x\|_2)^2) ,10^3\} is a convex function because \exp(\cdot), \min\{\cdot, c\} for a constant c\in\mathbb R and multiplication by a constant are all monotonically increasing, convex transformations.


Exercise 4: Multivariate Differentiation

a. A Concrete Function

Here, we consider one example of a first and second order multivariate derivative. More exercises can be found on Problem Set 3 and in the examples given in Chapter 3.

Consider the function

    \[ f:\mathbb R^3 \mapsto\mathbb R, x=(x_1, x_2, x_3)'\mapsto x_3^2\cdot\ln(\exp(x_1 + x_2) + x_3^2) \]

You can take for granted that as a composition of infinitely many times differentiable functions (polynomial, logarithm and exponential function), f is infinitely many times differentiable. Compute the first and second derivative of f, and evaluate them at x_0 = (1,-1,1)'.
Hint: For the Hessian, you can reduce the number of computations by exploiting symmetry and certain interrelationships of the first order partial derivatives.

First derivative

We need to use chain rule for the logarithm expression in every of the first order partial derivatives, and product rule for \frac{\partial f}{\partial x_3}. Accordingly, we obtain

    \[ \frac{\partial f}{\partial x_1}(x) = x_3^2\cdot \frac{\exp(x_1 + x_2)}{\exp(x_1 + x_2) + x_3^2} \]

    \[ \frac{\partial f}{\partial x_2}(x) = x_3^2\cdot \frac{\exp(x_1 + x_2)}{\exp(x_1 + x_2) + x_3^2} \]

    \[\begin{split} \frac{\partial f}{\partial x_3}(x) &= x_3^2\cdot \frac{2x_3}{\exp(x_1 + x_2) + x_3^2} + 2x_3 \cdot \ln(\exp(x_1 + x_2) + x_3^2) \\&= 2x_3 \cdot \left (\frac{x_3^2}{\exp(x_1 + x_2) + x_3^2} + \ln(\exp(x_1 + x_2) + x_3^2) \right ) \end{split}\]


    \[\begin{split} \frac{df}{dx}(x) &= \left (\frac{\partial f}{\partial x_1}(x), \frac{\partial f}{\partial x_2}(x), \frac{\partial f}{\partial x_3}(x)\right ) \\&= \frac{1}{e^{x_1 + x_2} + x_3^2}\left (x_3^2e^{x_1 + x_2}, x_3^2e^{x_1 + x_2}, 2x_3[x_3^2 + (e^{x_1 + x_2} + x_3^2)\cdot\ln(e^{x_1 + x_2} + x_3^2)]\right ) \end{split}\]

At x_0 = (1,-1,1)',

    \[\begin{split} \frac{df}{dx}(x_0) &=\frac{1}{e^{1 -1} + 1^2}\left (1^2\cdot e^{1 -1}, 1^2\cdot e^{1 -1}, 2\cdot 1 \cdot [1^2 + (e^{1 - 1} + 1^2)\cdot\ln(e^{1 - 1} + 1^2)]\right ) \\&= \frac{1}{2}\left (1, 1, 2\cdot(1 + 2\cdot \ln(2))\right ) \\&= \left (\frac{1}{2}, \frac{1}{2}, \ln(4) + 1\right ) \end{split}\]

where it was used that a\ln(b) = \ln(b^a) for b>0, a\in\mathbb R.

Second derivative

For the second order derivatives, we can exploit (1) that f is (at least) twice continuously differentiable and therefore, the Hessian will be symmetric, and (2) that \frac{\partial f}{\partial x_1} = \frac{\partial f}{\partial x_2}, which also saves some computations. Here, we need to use quotient rule. Note that the quotient is always strictly positive, so that the rule applies. We have

    \[\begin{split} \frac{\partial^2 f}{\partial x_1^2}(x) &= \frac{\partial^2 f}{\partial x_1 \partial x_2}(x) = \frac{\partial^2 f}{\partial x_2 \partial x_1}(x) \\&= x_3^2\cdot \frac{\exp(x_1 + x_2)\cdot [\exp(x_1 + x_2) + x_3^2] - \exp(x_1 + x_2)\cdot \exp(x_1 + x_2)}{[\exp(x_1 + x_2) + x_3^2]^2} \\&= x_3^4\cdot \frac{\exp(x_1 + x_2)}{[\exp(x_1 + x_2) + x_3^2]^2} \end{split}\]


    \[\begin{split} \frac{\partial^2 f}{\partial x_2^2}(x) &= x_3^2\cdot \frac{\exp(x_1 + x_2)\cdot [\exp(x_1 + x_2) + x_3^2] - \exp(x_1 + x_2)\cdot \exp(x_1 + x_2)}{[\exp(x_1 + x_2) + x_3^2]^2} \\&= x_3^4\cdot \frac{\exp(x_1 + x_2)}{[\exp(x_1 + x_2) + x_3^2]^2} =: E_1(x) \end{split}\]

Hence, all entries in the upper 2×2-block of the Hessian are identical and equal to expression E_1(x). At x_0 = (1,-1,1)',

    \[ E_1(x_0) = 1^4\cdot \frac{\exp(1 - 1)}{[\exp(1 - 1) + 1^2]^2} = \frac{1}{2^2} = \frac{1}{4} \]


    \[\begin{split} \frac{\partial^2 f}{\partial x_1 \partial x_3}(x) &= \frac{\partial^2 f}{\partial x_3 \partial x_1}(x) = \frac{\partial^2 f}{\partial x_3 \partial x_2}(x) = \frac{\partial^2 f}{\partial x_2 \partial x_3}(x) \\&= \exp(x_1 + x_2) \cdot \frac{\partial}{\partial x_3}\left (\frac{x_3^2}{\exp(x_1 + x_2) + x_3^2}\right ) \\&= \exp(x_1 + x_2) \cdot \left (\frac{2 x_3 \cdot [\exp(x_1 + x_2) + x_3^2] - x_3^2\cdot 2x_3}{[\exp(x_1 + x_2) + x_3^2]^2}\right ) \\&= 2x_3\cdot \exp(x_1 + x_2)\cdot \frac{\exp(x_1 + x_2)}{[\exp(x_1 + x_2) + x_3^2]^2} =: E_2(x) \end{split}\]

At x_0 = (1,-1,1)',

    \[ E_2(x_0) = 2\cdot 1\cdot \exp(1 - 1)\cdot \frac{\exp(1 - 1)}{[\exp(1 - 1) + 1^2]^2} = \frac{2}{2^2} = \frac{1}{2} \]

Now for the final second order partial derivative: we already know that

    \[\frac{\partial}{\partial x_3}\left (\frac{x_3^2}{\exp(x_1 + x_2) + x_3^2}\right ) = \frac{2x_3\exp(x_1 + x_2)}{[\exp(x_1 + x_2) + x_3^2]^2}\]

from the last computation, and

    \[\frac{\partial}{\partial x_3}\left (\ln(\exp(x_1 + x_2) + x_3^2)\right ) = \frac{2x_3}{\exp(x_1 + x_2) + x_3^2} \]

from the first order derivative. Hence, all we need to apply is a simple product rule:

    \[\begin{split} \frac{\partial^2 f}{\partial x_3^2}(x) &= 2\cdot \left (\frac{x_3^2}{\exp(x_1 + x_2) + x_3^2} + \ln(\exp(x_1 + x_2) + x_3^2) \right ) \\& \hspace{0.5cm}+ 2x_3 \cdot \left (\frac{2x_3\exp(x_1 + x_2)}{[\exp(x_1 + x_2) + x_3^2]^2} + \frac{2x_3}{\exp(x_1 + x_2) + x_3^2}\right ) \\&= 2\ln(\exp(x_1 + x_2) + x_3^2) + \frac{2x_3^2}{\exp(x_1 + x_2) + x_3^2}\left (3 + \frac{2\exp(x_1 + x_2)}{\exp(x_1 + x_2) + x_3^2}\right ) \\&=: E_3(x) \end{split}\]

At x_0 = (1,-1,1)',

    \[\begin{split} E_3(x_0) &= 2\ln(\exp(1 - 1) + 1) + \frac{2\cdot 1^2}{\exp(1 - 1) + 1^2}\left (3 + \frac{2\exp(1 - 1)}{\exp(1 - 1) + 1^2}\right ) \\&= 2\ln(2) + \frac{2}{2} \left (3 + \frac{2}{2}\right ) = 4 + \ln(4) \end{split}\]

Hence, the Hessian is

    \[ H_f(x) = \begin{pmatrix} E_1(x) & E_1(x) & E_2(x)\\ E_1(x) & E_1(x) & E_2(x)\\ E_2(x) & E_2(x) & E_3(x)\\ \end{pmatrix} \]

and when evaluated at x_0 = (1,-1,1)',

    \[ H_f(x_0) = \begin{pmatrix} \frac{1}{4} & \frac{1}{4} & \frac{1}{2}\\ \frac{1}{4} & \frac{1}{4} & \frac{1}{2}\\ \frac{1}{2} & \frac{1}{2} & 4 + \ln(4)\\ \end{pmatrix} \]

b. Matrix Functions

Consider a matrix A\in\mathbb R^{n\times n}, n\in\mathbb N.

(i) Show that \frac{d}{dx} (Ax) = A.

We decompose Ax into its component functions:

    \[ Ax = \begin{pmatrix} a_1\cdot x \\ a_2\cdot x\\\vdots \\ a_n\cdot x \end{pmatrix} \]

where a_1, a_2,\ldots, a_n\in\mathbb R^{1\times n} are the row vectors containing the rows of A. For the j-th entry,

    \[ a_j\cdot x = a_{j1}x_1 + a_{j2}x_2 + \ldots + a_{jn}x_n = \sum_{i=1}^n a_{ji}x_i \]

accordingly, the i-th partial derivative of a_j\cdot x is

    \[ \frac{\partial}{\partial x_i}(a_j\cdot x) = a_{ji} \]


    \[ \nabla (a_j\cdot x) = (a_{j1}, a_{j2},\ldots, a_{jn}) = a_j \]


    \[ \frac{d}{dx} (Ax) = \begin{pmatrix} \nabla (a_1\cdot x) \\ \nabla (a_2\cdot x)\\\vdots \\ \nabla (a_n\cdot x) \end{pmatrix} = \begin{pmatrix} a_1 \\ a_2\\\vdots \\ a_n \end{pmatrix} = A \]


(ii) What is the derivative of f:\mathbb R^n\mapsto\mathbb R, x\mapsto x'Ax?
Hint: Use (i) and the multivariate product rule.

Define the helper functions h_1(x) = x and h_2(x) = Ax (with domain \mathbb R^n). Then, f = h_1'h_2 and D_{h_1}(x) = \mathbf{I}_n and, with (i), D_{h_2}(x) = A. With the multivariate product rule,

    \[\begin{split} D_f(x) &= h_1'(x) D_{h_2}(x) + h_2'(x) D_{h_1}(x) \\&= x' A + (Ax)' \mathbf{I}_n \\& = x'A + x' A' \\& = x' (A + A') \end{split}\]


(iii) If A = \begin{pmatrix} 1 & \alpha\\ \beta & 4\end{pmatrix}, can you find values for \alpha and \beta so that the second derivative of x'Ax is positive definite everywhere? Can you find an alternative combination where A is positive semi-definite but not positive definite?

From (ii), we know that the first derivative of x'Ax is x' (A + A') = [(A' + A) x]'. Hence, with (i), the second derivative is

    \[(A' + A)' = A' + A = \begin{pmatrix} 2 & \alpha + \beta \\ \alpha + \beta & 8 \end{pmatrix} \end{split}\]

For v\in\mathbb R^2,

    \[v'(A' + A)v = 2v_1^2 + 2(\alpha + \beta)v_1v_2 + 8v_2^2 = 2 (v_1^2 + (\alpha + \beta)v_1v_2 + 4v_2^2) \]

For v\neq \mathbf 0, this expression is strictly positive if \alpha + \beta = 0. Thus, you can choose \alpha \in\mathbb R arbitrarily and set \beta = -\alpha to make A' + A positive definite.
If you choose \alpha + \beta = 4 or respectively, \alpha \in\mathbb R arbitrarily and \beta = 4 - \alpha, then you can apply a binomial formula to obtain

    \[ v'(A' + A)v = 2 (v_1 + v_2)^2 \]

which is weakly positive for every v\in\mathbb R^2, such that A' + A is positive semi-definite. A' + A is not positive definite as e.g. for v = (1, -1) \neq \mathbf 0, v'(A' + A)v = 2 (v_1 + v_2)^2 = 2 \cdot 0^2 = 0 and thus not v'(A' + A)v > 0.
Alternatively, you also obtain positive semi-definiteness from \alpha + \beta = -4; here you just apply a different binomial formula.
Comment: \alpha + \beta \in \{0, \pm 4\} are the easiest cases to investigate. Generally, you would need to solve an optimization problem to determine the minimal value of v_1^2 + (\alpha + \beta)v_1v_2 + 4v_2^2. An exercise of in the collection of Chapter 4 addresses this general issue.


Exercise 5: Taylor Approximation

In this exercise, we consider univariate Taylor Approximations. Exercises for the multivariate case can be found on Problem Set 3. Using the univariate case, we can reduce intensity of notation a bit, and focus on getting familiar with the mechanics, investigate approximation quality and study some further properties.

(i) Compute the first and second order Taylor Approximations to the exponential function \exp:\mathbb R\mapsto\mathbb R_+, x\mapsto\exp(x) around x_{0,1} = 1 and x_{0,2} = \ln(2). For x_{0,1}, illustrate the exponential function and its approximations. Is one globally preferable to the other, i.e. does it yield a weakly superior approximation everywhere?

For a function f, the first order approximation to f around x_0 is

    \[ T_{1,x_0}(x) = f(x_0) + f'(x_0)(x-x_0) \]

and the second order approximation is

    \[ T_{2,x_0}(x) = f(x_0) + f'(x_0)(x-x_0) + \frac{1}{2}f"(x_0)(x-x_0)^2 \]

For the exponential function,

    \[ \frac{d}{dx}\exp(x) = \exp(x) \]

Iterating on this, for the k-th derivative of \exp(x),

    \[ \frac{d^k}{dx^k}\exp(x) = \exp(x) \]


    \[ T_{1,x_0}(x) = \exp(x_0) + \exp(x_0)(x-x_0) = \exp(x_0)(x + 1 - x_0) \]


    \[\begin{split} T_{2,x_0}(x) &= \exp(x_0) + \exp(x_0)(x-x_0) + \frac{1}{2}\exp(x_0)(x-x_0)^2 \\&= \exp(x_0)\left (1 + x - x_0 + \frac{1}{2}(x-x_0)^2\right ) \\&= \exp(x_0)\left (\frac{1}{2}x^2 + (1 - x_0)\cdot x + \frac{1}{2}x_0^2 - x_0 + 1\right ) \end{split}\]

With x_0=1,

    \[ T_{1,1}(x) = \exp(1)(x + 1 - 1) = e \cdot x \]


    \[ T_{1,2}(x) = \exp(1)\left (\frac{1}{2}x^2 + (1 - 1)\cdot x + \frac{1}{2}1^2 - 1 + 1\right ) = \frac{e}{2}\cdot (x^2 + 1 ) \]

With x_0 = \ln(2),

    \[ T_{1,1}(x) = \exp(\ln(2))(x + 1 - \ln(2)) = 2\cdot (x - (\ln(2) - 1)) \]


    \[\begin{split} T_{1,2}(x) &= \exp(\ln(2))\left (\frac{1}{2}x^2 + (1 - \ln(2))\cdot x + \frac{1}{2}\ln(2)^2 - \ln(2) + 1\right ) \\&= x^2 - 2(\ln(2) - 1)\cdot x + \ln(2)^2 - \ln(4) + 2 \end{split}\]

Next, we compare the approximation quality at x_0 = 1 graphically:


As can be seen, the higher order approximation fares better around the point of approximation, and also for larger values of x generally. However, for x\to -\infty, approximation quality deteriorates, and becomes inferior to the first order approximation. This once again raises attention to the issue that Taylor Approximations are “good” only locally around the point of approximation, and that even higher order approximations may perform disproportionately bad when we move too far away from x_0.


(ii) Compute the n-th order Taylor Approximation to the exponential function for x_0 = 0 for variable n\in\mathbb N. Can you find an infinite sum representation for the exponential function using polynomial terms?

For n\in\mathbb N,

    \[\begin{split} T_{k,x_0}(x) &= \exp(x_0) + \exp(x_0)(x-x_0) + \frac{1}{2}\exp(x_0)(x-x_0)^2 + \ldots + \frac{1}{n!}\exp(x_0)(x-x_0)^n \\&= \exp(x_0)\cdot \sum_{k=0}^k \frac{1}{k!} (x-x_0)^k \end{split}\]

Plugging in x_0 = 0 for which \exp(x_0) = 1, we obtain

    \[ T_{k,0}(x) = \sum_{k=0}^n \frac{x^k}{k!} \]

Now recall from Chapter 3 that an order of infinite approximation produces no error, i.e. that for an infinitely many times differentiable function f, for any x in its domain, f(x) = \lim_{n\to\infty} T_{n, x_0}(x). Accordingly,

    \[ \exp(x) = \lim_{n\to\infty} T_{n,0}(x) = \sum_{k=0}^\infty \frac{x^k}{k!} \]


Exercise 6: Integration

Integrate the following functions over the given interval:

    1. f(x) = 9(2x_1 + 3x_2)^{2} over I=[-2,2]\times[0,\frac{2}{3}]
    2. g(x) = \exp(x_1)\cdot x_2 + 2x_1x_2 over I:=[0,1]^2


We want to compute

    \[ \int_{[-2,2]\times[1,2]}f(x) dx = \int_{-2}^2 \int_0^{\frac{2}{3}} 9(2x_1 + 3x_2)^{2}dx_2 dx_1 \]

where the equality is due to Fubini’s theorem that tells us that we can iteratively integrate with respect to the individual dimensions. Of course, you can also choose to integrate via the x_1-dimension first, if you do so, beware of the respective integral bounds.
To compute the antiderivative of 9(2x_1 + 3x_2)^{2} with respect to x_2, note that we have to “invert” a chain rule. The inner derivative with respect to x_2 is \frac{\partial}{\partial x_2}\left (2x_1 + 3x_2\right ) = 3, which is therefore inverted by multiplication with \frac{1}{3}. The antiderivative of the outer function, h(x) = x^2, is H(x) = \frac{1}{3}x^3. In total, we obtain

    \[ 9\cdot \frac{1}{3}\cdot \frac{1}{3}(2x_1 + 3x_2)^{3} = (2x_1 + 3x_2)^{3} \]

The inner integral is therefore given by

    \[\begin{split} \int_0^{\frac{2}{3}} 9(2x_1 + 3x_2)^{2}dx_2 &= [(2x_1 + 3x_2)^{3}]_{x_2 = 0}^{x_2 = \frac{2}{3}} \\&= (2x_1 + 2)^{3} - (2x_1)^3 \\&= 2^3 [(x_1 + 1)^3 - x_1^3] \\& = 8(x_1^3 + 3x_1^2 + 3x_1 + 1 - x_1^3) \\& = 8(3x_1^2 + 3x_1 + 1) \end{split}\]


    \[ \int_{[-2,2]\times[1,2]}f(x) dx = 8\cdot\int_{-2}^2 (3x_1^2 + 3 x_1 + 1) dx_1 \]

This time, the antiderivative of the function to be integrated is

    \[3\left (\frac{1}{3}x_1^3\right ) + 3\left (\frac{1}{2}x_1^2\right ) + x_1 = x_1^3 + \frac{3}{2} x_1^2 + x_1\]


    \[\begin{split} \int_{[-2,2]\times[1,2]}f(x) dx &= 8\cdot [x_1^3 + \frac{3}{2} x_1^2 + x_1]_{x_1 = -2}^{x_1 = 2} \\&= \left (2^3 + \frac{3}{2} 2^2 + 2\right ) - \left ((-2)^3 + \frac{3}{2} (-2)^2 + (-2)\right ) \\&= (8 + 6 + 2) - (-8 + 6 - 2) \\&= 16 - (-4) \\&= 20 \end{split}\]



For this integral, we can make our lives easier by using linearity of the integral operation:

    \[ \int_{[0,1]^2}g(x) dx = \int_{[0,1]^2}(\exp(x_1)\cdot x_2 + 2x_1x_2) dx = \int_{[0,1]^2}\exp(x_1)\cdot x_2 dx + 2 \int_{[0,1]^2}x_1x_2 dx \]

Further, both sub-functions are multiplicatively separable. Therefore, applying Fubini gives

    \[ \int_{[0,1]^2}g(x) dx = \left (\int_0^1 \exp(x_1) dx_1\right )\left (\int_0^1 x_2 dx_2\right ) + 2 \left (\int_0^1 x_1 dx_1\right )\left (\int_0^1 x_2 dx_2\right ) \]

In fact, we only need to compute two integrals to solve this expression:

    \[ \int_0^1 \exp(x) dx = [\exp(x)]_{x=0}^{x=1} = e^1 - e^0 = e - 1 \]


    \[ \int_0^1 x dx = \left [\frac{1}{2}x^2\right ]_{x=0}^{x=1} = \frac{1}{2} (1^2 - 0^2) = \frac{1}{2} \]

From this, it results that

    \[ \int_{[0,1]^2}g(x) dx = (e - 1)\cdot \frac{1}{2} + 2 \cdot \frac{1}{2} \cdot \frac{1}{2} = \frac{1}{2}(e - 1 + 1) = \frac{e}{2} \]