### Chapter 4 – Optimization

An overview of this chapter’s contents and take-aways can be found here.

We have seen a lot of new concepts in the last three chapters. This chapter is concerned with putting all the pieces together and uses them to give you an in-depth understanding of the economist’s favorite exercise: optimization. As anyone with some background in economics should know by now, optimization is virtually everywhere in economic considerations: households optimize the allocation of their time budget to working for income and leisure, optimally divide their budget into consumption and savings over time, optimize the composition of their consumption bundles, firms optimize input use, output quantities, factor composition (i.e., the production shares of labor and capital, respectively), and governments (or: social planners) seek to maximize welfare through public goods provision, minimize negative externalities through market regulations, and optimize the social value generated from each unit of tax money over a broad portfolio of possible actions. In doing so, the agents we consider are almost always constrained in some form: households do not have infinitely much time, firms face output targets, and governments can not spend more money as is generated from tax revenue. Thus, to be able to track all these issues mathematically, we as economists need to firmly understand the techniques of constrained optimization.

Table of Contents

## Introduction and Key Concepts

As a first step to solving it, let us consider how to write down an optimization problem mathematically.

Here, , and , are functions mapping from the set of ‘s, the domain dom, to (i.e., there is only one quantity to be maximized, not a vector). We call our objective function and the choice variables. In words, we are looking for the in the domain of that yields either the smallest (or largest) value can possibly attain when we require that the functions must attain the value and can not lie strictly above — we want to minimize (or maximize) subject to the equality constraints given by the and the inequality constraints by . You may wonder if it is not a restrictive formulation to have only equality constraints requiring a function to be equal to zero, and “less or equal” inequalities. Indeed, it is not: note that we can always re-write a condition as and as . Lastly, if , we say that the problem is unconstrained.

Throughout the entire chapter, we will most often consider maximization problems, as it is the problem more common in economics. Indeed, every minimization problem with objective can be equivalently solved as the maximization problem of , such that this approach is without loss of generality. Still, conceptually, maximization and minimization problems are of course distinct (one example is the different second order condition for the solution).

Before moving on, a few words on our conceptual approach: as with multivariate differentiation, the elaborations in the online course will focus on more on the “how” of optimization rather than the “why”, which is paid more attention in the companion script and perhaps our course week. Still, we do discuss why the conditions we use in our approaches to solving optimization problems are valid so as to allow you to understand what you are doing when applying them, which should make your applications less error-prone. In doing so, we focus again on intuitive and suggestive reasoning, and omit the formal proofs. While it is of course more important that you know how to correctly apply the methods of constrained optimization and you need not master their formal justification completely just yet, feel encouraged to dig deeper into the formalities behind this issue when you find the time! Due to the importance of constrained optimization in economics, any good economist should be thoroughly aware of why his or her approach to solving these optimization problems works, and once you dig deeper into economic or econometric theory, you will find that formally understanding constrained optimization methods is a necessary condition for being able to deal with the more advanced techniques and more complex issues studied there. So while the details of multivariate differentiation are good to know, the ones of constrained optimization are the ones you are strongly advised to familiarize yourself with at least until the end of your master’s studies – but the sooner the better, since doing so will facilitate any application you see during your coursework and also exams, and also greatly train your skill of dealing with Taylor expansions.

### Definitions

Before digging into the whole process of optimization, it is wise to (i) formally define what we are looking for (i.e. maximum or minimum) and (ii) evaluate how “likely” it is that we will find it by looking at specific properties of the objective function and of its domain.

Definition: Extremum: Minimum and Maximum of a Set.
Let . Then, is called the maximum of , denoted , if and . Conversely, is called the minimum of , denoted , if and . is called an extremum of if or .

Verbally, is the maximum of if is the smallest number greater or equal than all elements of , and it is also contained in . More simply put, it is the largest value in the set if there is any such value. This need not be the case, recall that e.g. open intervals neither have a maximum nor a minimum, but only infimum and supremum. However, for sets of real numbers, if there is a minimum or a maximum, it (i) is unique and (ii) coincides with the infimum or the supremum, respectively.

In terms of our optimization problem, we will be looking for the maximum of the set of attainable values of under the constraints of the problem :

Now, next to the maximum attainable value, we are frequently also (most of the time: even more) interested in the (set of) solution(s), i.e. the arguments that maximize under the constraints of . So, let’s define them in a next step:

Definition: Local and Global Maximizers.
Let , . Then, is

• a global maximizer for if
• a strict global maximizer for if .
• a local maximizer for if there exists such that
• a strict local maximizer for if there exists such that

Verbally, local means that there must be a neighborhood (i.e. an open ball around , restricted to the domain of to ensure that is defined for every considered argument) such that maximizes in this neighborhood, or respectively, it maximizes the restricted function

Strict means that is the unique maximizer of (local: when restricted to a neighborhood), such that all other (in this neighborhood) yield strictly lower values for if is defined in , i.e. . Note that local implies global, because if all elements of yield smaller values for than , then so do especially all those that lie in neighborhoods of . The converse is not always true, and some local maximizers may not be global ones.

In optimization, infrequently care about the whole set , as it contains many points that do not satisfy the given constraints. Let us see how we can formally restrict the problem to only those points consistent with the problem’s constraints:

Definition: Restriction of a Function.
Let be sets and a function. Then, for , the function is called the restriction of on .

Definition: Constraint Set.
Consider an optimization problem with objective function , , equality constraints and inequality constraints . Then, the set

is called the constraint set of .

The constraint set of a problem defines the restriction . Beyond now being able to represent optimization problems more compactly as

the value of having defined the constraint set is that it more formally narrows down what we are indeed after when considering the problem mathematically: finding the maximizers of the restricted function ! Note that for an unconstrained problem, we simply have so that . Beyond the general maximizers defined above, this allows us to define the maximizers we care about in optimization problems:

Definition: Solutions, Maximizing Arguments.
Consider an optimization problem with constraint set . The solutions to are given by the set of global maximizers of ,

We call the maximizing arguments or the arg max of the problem . If this set contains only a single element , we also write .

As indicated by the bold words, it is crucial to note that the solutions typically constitute a set that may contain arbitrarily many elements or none at all (consider e.g. or ). Only if there is a unique maximizer, by convention, the arg max also refers to a number or a vector (but still to the set as well)! Finally, note that regardless of whether the problem has solutions, and even regardless of whether there are any values that satisfy the constraints (i.e., whether ), the arg max is always defined!

Now that we have introduced all key concepts of the optimization context, it is a good time to make sure that we are thoroughly familiar with them. To practice, consider these two objects that you frequently see across all economic fields, and applications of mathematics more generally:

Try to answer the following questions on their differences and their relationship:

1. What types of mathematical objects do the quantities refer to (real number/vector/set/function/etc.)?
2. How do these quantities relate to the domain and codomain of ?
3. Can you characterize the relationship of the quantities by a quantifying statement?

In doing so, be aware that the former expression is shorthand notation for .

1. is a real number, is a set.
2. is an element in the codomain of , is a subset of the domain of .
3. The relationship between the quantities can be described by

### Characterizing the Set of Solutions

For this subsection, make sure to recall the concepts of (i) a bounded set, (ii) a closed set, (iii) supremum and infimum, and (iv) the Heine-Borel theorem for characterization of compactness in that have been introduced throughout the earlier chapters of this script.

First, we need one more definition:

Definition: Bounded Function.
Let be a real-valued function. If , the image of under is bounded, we say that is a bounded function. Moreover, for , we say that is bounded on if is bounded.

Recall that when discussing Heine-Borel, we highlighted the value of compact subsets of for optimization: a continuous function will necessarily assume both a (global) maximum and minimum on them, such that compactness of the set is a sufficient condition for existence of solutions to optimization problems! To transfer this intuition to the , we need to review what Heine-Borel precisely says about compact sets: they are closed and bounded. Boundedness of a set , defined as being contained in some ball where and is immensely helpful because it restricts the degree to which the arguments of a function can extend into any direction in — if they move too far away from , then boundedness tells us that they can no longer be arguments of ! Similar to the univariate case where bounded sets are intervals with finite bounds, this prevents solution-breakers like limits , as e.g. in where if x is allowed to infinitely expand e.g. in direction , then it is allowed to diverge to as . As with univariate functions, we can prevent this by defining on a bounded set. Next, why do we need closedness? For univariate functions, this is also rather straightforward: if e.g. , then the set is bounded and can not diverge due to “too large” arguments, but clearly, for any , and since the domain of does not include its boundary, there are neither minimizing nor maximizing values! This logic extends to the multivariate case in a one-to-one fashion.

Finally, why do we need continuity? To see this, consider the function

the graph of which is illustrated in the figure above. What is the maximizer of ? Clearly, it is not an element of , so we can restrict our search to . But this set is no longer closed, and we may run into our boundary problem again — and the way is defined, we indeed do! More generally, if is discontinuous, the function can “jump”, i.e. it can increase or decrease steadily into one direction, but just before reaching the high/low point, attain an entirely different value. In other words, when approaches , it approaches the level . If this level would be a maximum/minimum, we need to ensure that it lies in the range of by requiring , which is precisely the definition of continuity.

It is possible to show formally that the intuitive line of reasoning above goes through formally:

Theorem: Weierstrass Extreme Value Theorem.
Suppose that is compact, and that is continuous, then, assumes its maximum and minimum on , such that and .

## Unconstrained Optimization

As per our tradition of moving from easier to harder problems, we begin the study of optimization problems with those that are not subject to any constraints, but just seek to maximize a function over its domain, i.e.

where dom and is a real-valued function.

From a technical side, note that when considering constrained problems, we may equivalently consider the “unconstrained” problem with objective , but the key complication is that continuity of and appealing properties of the domain do not transfer easily to this issue, such that we need to consider it later on its own right.

### First and Second Order Necessary Conditions

To solve optimization problems, we typically rely on a combination of necessary and sufficient conditions. Here, we focus on the perhaps most commonly known conditions: the first and second order necessary conditions for a local extremizer.

For univariate functions , , you may know how to approach the problem of unconstrained optimization: set the first derivative of to zero, and check that the second derivative is smaller than — any point that satisfies these conditions is a candidate for an interior solution. Next to the interior solutions, you will have to consider border solutions: points of non-differentiability and points on the boundary of the support of . There are, of course, applications where you need not worry about border solutions: if is differentiable everywhere (especially: , so that the second derivative is easily computed) or the boundary points are either non-existent () or excluded from the domain (). Either way, you evaluate at all candidate points that you found, and the one yielding the highest value is your global maximizer. Here, we focus on how this procedure extends to the multivariate context.

First of all, it is noteworthy that we restrict attention to local maximizers in our analytical search, and only subsequently identify the global maximum from the rather crude and inelegant technique of comparing the value of the objective at all candidates. To find necessary conditions, we restrict attention to the interior, the (finite set of) boundary points are considered in isolation subsequently! Note that we assume away the non-differentiability issue by imposing , otherwise, we would worry about these points too.

To come up with the necessary conditions we use, we imagine for the moment that there is a point that maximizes locally, and search for some (ideally strong) characteristic features must necessarily have. Starting from the univariate case, suppose is a local maximizer, such that for the -ball restricted to with , there are no values of above , i.e. . Then, how does look like around , provided that it is (twice) differentiable and thus especially continuous?

As you can see, must either be flat around , or constitute a “hill” with peak (a mixture also exists, with flatness on one side and a “downhill” part on the other). What does this mean? As illustrated, intuitively, we should have a zero slope of at , or for a multivariate function , a zero gradient, i.e. a zero slope into any direction (generalizing the concepts of flatness and hilltop to the ). Indeed, this intuition holds formally:

Theorem: Unconstrained Interior Maximum – First Order Necessary Condition.
Let and . Suppose that is a local maximizer of . Then, .

The intuition for the formal reason behind this theorem is easy to see from the univariate case: recall that a differentiable function was strictly increasing (decreasing) in if and only if , and because is twice differentiable, is differentiable and thus especially continuous. Thus, if , (), by continuity, () for points “close enough” to , and thus, there lie points slightly to the right (left) of with strictly larger values, and can not be a maximizer! Thus, can not be a local maximizer unless – which gives the necessary condition. For the multivariate case, this reasoning applies along any of the fundamental directions of the domain, which involves slightly more notation, but the principle is the same.

The theorem above suggests that points with a zero gradient are important. Indeed, they deserve an own label:

Definition: Critical Point or Stationary Point.
Let , and . Then, if is differentiable at and , we call a critical point of or a stationary point of .

Note that just because we have defined generally, a critical point need not exist in every specific scenario! There are a broad variety of functions that have a non-zero gradient everywhere, e.g. , just like there are univariate functions with a globally non-zero derivative. Also, it is easily seen that this condition is not sufficient for a local maximizer, because applies also to local minima that are not local maxima. Accordingly, the first order necessary condition is the same for local minima and maxima! Moreover, applies to “saddle points”, as illustrated in the left panel of the figure above, which are neither local minima nor maxima.

As has just emerged, the distinction between local maxima and minima lies beyond the first derivative. Thus, it is generally not sufficient to set the first derivative to zero when looking for a maximum! But as you know, we can consult the second derivative to tell maximizers and minimizers apart. To find a further helpful necessary condition, our thought experiment is again to start from a local maximizer and think about a property it necessary satisfies.

To see the intuition, consider again the univariate case: if is a local maximizer of , in a small-enough neighborhood around , there can be no points so that . Accordingly, it must necessarily be the case that to the left and to the right of – or respectively, must be a decreasing function around . Because is differentiable, we can express this notion using its derivative : in a neighborhood of , we must have that . To extend this to the multivariate case, recall that matrix definiteness can be viewed as a generalization of the sign, so that the second derivative of a multivariate function, its Hessian, is “smaller or equal to zero” if it is negative semi-definite. Indeed, it formally holds that:

Theorem: Unconstrained Interior Maximum — Second Order Necessary Condition.
Let and . Suppose that is a local maximizer of . Then, if is twice continuously differentiable at , is negative semi-definite.

Similarly, for a local minimum, we can of course invert the logic of our intuitive reasoning above. This suggests:

Theorem: Unconstrained Interior Minimum — Second Order Necessary Condition.
Let and . Suppose that is a local minimizer of . Then, if is twice continuously differentiable at , is positive semi-definite.

Indeed, also the formal reasoning behind the second order necessary condition for the minimum is exactly analogous to the one for the maximum.

One can nicely illustrate the grasp of this necessary condition by illustrating critical values of bivariate functions it rules out: consider . The necessary condition of a critical point tells us that any candidate must satisfy

Thus, there is a unique candidate… But is it also a maximizer (or minimizer)? Let’s consult the Hessian — it is straightforward to verify that for all ,

What about its definiteness? Pick . Then,

Thus, there exist and such that

and is neither positive nor negative semi-definite, i.e. it is indefinite. Thus, the solution can neither constitute a local maximum nor a local minimum: we call this a saddle point. To see why, consult the figure below, and think how this would fit on a horse (or your preferred animal to ride).

The crucial thing to take away from multivariate saddle points is that the critical value constitutes a maximum in one direction (here: , i.e. moving along the -axis) but a minimum in the other ().

Also, it is important to know that this condition is indeed only necessary – it still applies to more points than we would ideally hope for, and even combining both necessary conditions introduced above, we still find points that are no local extremizers. To see this in an example, consider :

Clearly, the function has a saddle point at . Since and , we get the unique critical value at with second derivative , which is indeed negative semi-definite! Therefore, both necessary conditions for a local maximum hold at this point. However, so does the positive semi-definiteness condition for the local minimum. The graph will ensure you that the point is neither: for any , the function strictly increases (decreases) in value for infinitely small deviations to the right (left) of . Thus, both the first and second order necessary conditions hold, but we don’t have a maximum (or a minimum). Hence, we need more conditions to rule out points like in the example here, and ideally this time sufficient ones that leave no room for critical points that are not the type of local extremum we want to find.

### Sufficient Conditions and Application Procedure

Continuing our intuitive investigations into the univariate case, we can indeed strengthen our requirements imposed on the second derivative to get a sufficient condition. Above, we said that around a local maximum, must be decreasing, a notion that also allows to be constant. However, if it is not, i.e. if and is strictly decreasing around , then we know that any point in a small-enough neighborhood of will lead to strictly smaller values of the objective – and in such cases, always constitutes a strict local maximizer! From the word “always”, you can immediately see that the condition considered will be sufficient. To generalize to the multivariate case, we again rely on the intuitive link between the sign of a scalar second derivative and the definiteness of a matrix valued one.

Note that we only strengthened the second order condition; the first order necessary condition, i.e. being a critical point, must still be imposed separately. This gives:

Theorem: Unconstrained Interior Strict Local Maximum — Sufficient Conditions.
Let , and . Suppose that is a critical point of , and that is negative definite. Then, is a strict local maximizer of .

Again, analogous reasoning can be applied to obtain the corresponding theorem for the minimum:

Theorem: Unconstrained Interior Strict Local Minimum — Sufficient Conditions.
Let , and . Suppose that is a critical point of , and that is positive definite. Then, is a strict local minimizer of .

While the necessary condition was not sufficiently restrictive to find only local maxima, the sufficient condition faces the converse issue: while it only finds local maxima, it may be even too restrictive to find all of them! This needs to be taken into account when coming up with our solution technique below.

Now that we have established the sufficient condition for a local interior maximum, let us take a moment to think about what to make of the results thus far. Indeed, we are ready to discuss the general approach to solving unconstrained optimization problems. When concerned with interior maxima, we can restrict the candidate set to critical values, because all potential maxima are necessarily critical values! Then, if is “sufficiently smooth” (i.e., it satisfies all required differentiability conditions — here: being twice continuously differentiable at all critical values) we can check the Hessian and determine its definiteness. If at a critical value , is negative (positive) definite, we have sufficient evidence that is a local maximum (minimum)! That may not be necessary — but we can rule out any value where is not at least negative (positive) semi-definite, because they violate the necessary conditions. Thus, we are generally happy if all critical values are associated with either a positive/negative definite Hessian or an indefinite one — but we have to beware semi-definite Hessians, for which no theorem tells us whether or not such points are extrema! As a further note of caution, all theorems thus far have concerned interior points — for boundary points, no theorem exists, and we must either consider them in isolation or simply consider with an open set such that the boundary issue is avoided altogether, which, however, may prevent us to argue for solution existence using the Weierstrass theorem (as typically, an open set is either not closed or not bounded).

Accordingly, the consideration of the Hessian function helps us to categorize all our candidates – the critical points solving the first order condition plus “border solutions” – into the following categories:

The last two colums indicate whether we keep the respective point in the set of candidates for which we compare values in the final step when searching for a global maximum, and why (or why not). Note that indefinite and positive definite Hessians are special cases of Hessians that are not negative definite. The reason that they are listed separately is that we can derive a destinctive characterization from the conditions we discussed.

Now, we know the final set of candidates for which we know, or can not rule out, that they constitute a local maximum. In the last step, it remains to evaluate the function at all these points and see which point yields the largest value to conclude on the global maximum. When doing so, one thing has to be kept in mind: we need some justification that the local maximum we found indeed constitutes a global maximum, that is, that the global maximum does indeed exist (in which case, recall, it will definitely be also a local maximum). This is not ex ante guaranteed! In most cases, we can establish this either through the Weierstrass theorem, or by considering the limit behavior of the function – in the multivariate context, along any of its (fundamental) directions.

To see the intuition in the univariate case, consider

Here, the only critical point is , where . Thus, we identify the point as a strict local maximum per our conditions. Since , the only boundary point here, , can be ruled out as the global maximum. Still, ex ante, it is not guaranteed that , which we have to additionally establish to argue for the global maximum. This is straightforward here, since

where the last equality follows from the quotient rule of the limit.

If instead, the function had been defined as

we knew that the global maxiumum must exist by the Weierstrass theorem (closed intervals of are compact), and here, it is sufficient to compare the interior solution with the boundary points, i.e. to and .

In summary, we can summarize our approach to finding the global maximizer as follows:

Algorithm: Solving the Unconstrained Maximization Problem.

1. Find the set of initial candidates: interior points that satisfy the first order necessary condition (“critical points”) and border solutions – boundary points and points of non-differentiability.
2. Compute the Hessian at every interior point and classify the set of candidates according to the table above.
3. Keep all points that can not be ruled out as local maximizers.
4. Find the candidate(s) that yield the largest value for among all remaining candidates.
5. In case of uncertainty about existence of the global maximizer, study the limit behavior of .

Indeed, strictly speaking, this “cookbook recipe” for the solution of the unconstrained problem is everything you need to know in practice, and many undergraduate programs indeed teach nothing but this. However, as always, the person knowing the reasons for why some procedure works are much more likely to correctly perform all steps than the one who has only memorized the sequence, especially when things get tricky and non-standard. Also, the correct procedure itself is of course much easier to memorize if you know how it comes to be. Thus, feel encouraged to understand at least the intuition-based elaborations above, if not the formal details outlined in the companion script.

When discussing convexity of functions in the last chapter, we briefly noted that the property is also quite useful for optimization. Let’s see how precisely this is:

Theorem: Sufficiency for the Global Unconstrained Maximum.
Let be a convex set, and . Then, if is concave and for , it holds that , then is a global maximizer of .

Analogously, it holds that:

Theorem: Sufficiency for the Global Unconstrained Minimum.
Let be a convex set, and . Then, if is convex and for , it holds that , then is a global minimizer of .

The reason for this fact can be seen intuitively again from the univariate case: the second derivative of a concave (convex) function is non-positive (non-negative) everywhere. As such, the first derivative is non-increasing (non-decreasing), and any critical point will only have points with () to its left and only points with () to its right. Thus, increases (decreases) before the point and decreases thereafter – which defines a global maximum. However, be careful to note that there may be a plateau of global maxima (minima): it is not ruled out that remains at on some interval in the domain!

This is extremely powerful: when we have a concave objective function (and many classical economic functions are concave, e.g. most standard utility or production functions), then the critical point criterion (before only necessary for a local maximizer) is now sufficient for a global maximizer! This reduces our multi-step procedure defined in the algorithm above to just looking at the first-order condition, which represents a dramatic reduction in the effort necessary to solve the problem. Thus, it may oftentimes be worthwhile to check convexity and concavity of the objective function in a first step.

If you feel like testing your understanding of the optimization basics and our discussion of unconstrained optimization, you can take the a short quiz found here.

## Optimization with Equality Constraints

Let us take the next step to solving the general optimization problem by now allowing for equality constraints:

Alternatively, you may read the problem in forms like

Perhaps, you are familiar with the Lagrangian method — but unless you have had some lectures explicitly devoted to mathematical analysis in your undergraduate studies, I believe you are not familiar with its formal justification, and perhaps also not how we may apply it for multivariate functions. Thus, the following addresses precisely these issues, while restricting attention to the intuition when it comes to the formal details.

Before moving on, make yourself aware that what we are doing here is actually already quite useful in a broad range of applications, since the loss in generality from omitting inequality constraints from the problem formulation, depending on the application in mind, may not be too severe: consider for instance utility maximization with the budget constraint , meaning that the consumer can not spend more on consumption than his income . Now, if there is no time dimension (and thus no savings motive) and he has a utility function which strictly increases in both arguments, as we typically assume (“strictly more is strictly better”), the consumer will never find it optimal to not spend all his income, so that the problem is entirely equivalent to one with the equality constraint !

### Level Sets and Implicit Functions for Optimization

Do you recall our definition of the lower- and upper-level sets when we discussed the generalization of convexity to the multivariate case? Here, we need a closely related concept: the level set, which is both a lower and the upper level set given a certain value!

Definition: Level Set.
Let , , and . Then, we call the -level set of .

This definition may look indeed familiar, in the sense that you should be able to describe with vocabulary that has been introduced very early on: think about what we typically would call a set when is a value in the codomain of . As a hint, we denote it by . … if you have practiced your notation, you will know from this that the level set is nothing but the pre-image of under ! To see why we care about level sets is analytically straightforward: in our optimization problem, when there is only one constraint , we are looking for solutions on the level set , i.e. , in words, the constraint set is nothing but the zero-level set of ! With multiple constraints, note that

and that we are looking for solutions in the intersections of the level sets.

From a geometrical perspective, level sets are oftentimes useful to map three-dimensional objects onto a two-dimensional space. The most famous practical example is perhaps the one of contour lines as used for geographical maps, as shown below (figure taken from https://www.ordnancesurvey.co.uk/blog/2015/11/map-reading-skills-making-sense-of-contour-lines/, accessed at June 16, 2020):

Also in economics, we have various important examples: for instance budget sets (with the restriction that all money is spent) where the level is equal to the disposable income , and utility functions with level sets for the . For utility, we usually call sets indifference sets, and their graphical representation in the the indifference curve, which represents all points that yield the same utility level . For the Cobb-Douglas utility with equal coefficients, i.e. , this relationship is shown here (the RHS plot displays utility on the vertical axis):

Coming back to our pre-image interpretation, make sure to understand that one fixes a value of interest, , that the function may take, for instance either the altitude of a coordinate point, or the utility level associated with a consumption choice. Then, the level set is the collection of all the points that yield this value when evaluating at , i.e. . Note that level sets may just consist of one element, be empty altogether, but conversely may also encompass even the whole domain of , namely when is a “flat” function.

Now that we have convinced ourselves that equality-constrained maximization is nothing but maximization on level sets, let us think about whether we can use this insight to reduce our new issue of solving the equality-constrained problem to the old one that we already know, the unconstrained problem. There is good news and bad news. The first piece of good news is that precisely this is indeed possible! The (full set of) bad news, however, is that things will get quite heavy in notation. To somewhat mitigate this problem, the following restricts attention to the scenario of one equality constraint — but the generalization entirely works in a one to one fashion, the respective result is given at the end of this section. Finally, the last piece of good news, and perhaps the most re-assuring fact is that while mathematically somewhat tedious to write down, the intuition of the approach is really simple.

To understand how we generally proceed to dealing with equality constrained problems, consider the following simple exemplary problem:

Our solution approach is to reduce the problem to one without constraints, which we know how to solve from the previous section. The level set that we consider is for . Here, we can express the condition through an explicit function for with argument :

Thus, the constraint indeed fixes given , and we are only truly free to choose . Of course, you could also have gone the other way around to express as a function of ; the key take-away is that the constraint reduces the number of free variables by one. Imposing the restriction of the constraint onto the objective, the problem becomes

The constraint need not be imposed separately, as it has already been plugged in for and is guaranteed to hold if we set after solving for the optimal . Thus, being able to plug in the explicit function representing the constraint into the objective has helped us to re-write the problem explicitly as an unconstrained one! If you need practice in unconstrained optimization, go ahead and solve it, as the only local maximizer (which is not a global maximizer because the objective diverges to as ).

The issue to transferring this idea to general applications is that coming up with the explicit function, depending on how looks like, can be prohibitively complex or simply infeasible. In fact, this approach is very unlikely to work outside the context of linear constraints as we have seen it in the example above; already if we introduce square terms, e.g.

it is no longer possible to solve for as a function of explicitly, as you arrive at , that is, two values rather than one that give a value for consistent with the constraint given , but per our definition of a function, we can have only one value. This is why we need the concept of implicit functions.

To understand the concept of implicit functions, let us think again about why the explicit function was valuable. Starting from a point in the constraint set , when we varied slightly, we were able to derive an explicit rule on how to adjust in response in order to remain on the level set. That is, when we write the explicit function as , for any movement from , we knew that

and the point would also be an element of the level set. So what was the key condition for this to work? Well, we can only “pull back towards ” by adjusting if the function indeed moves with , else, the variation in induced by variation in can not be counterbalanced by adjustment of (e.g. if and ). Formally, it is necessary that . This is precisely the approach we want to generalize now. To be applicable to more general problems, we will weaken the two following strong requirements of the explicit function: (i) being able to derive an explicit mathematical expression for , and (ii) “global” validity of the adjustment rule, that is, applicability to arbitrarily large changes . The motivation of (ii) is that we will again be concerned with local extremizers in our search, so that we are already happy with adjustment rules in small neighborhoods of some value for .

The intuition above should help understand the rather abstract implicit function theorem presented below. To make it even more accessible, before introducing it, let us re-state the above in terms of a general problem in arguments, i.e. and , and think about the conditions that we need to locally express the adjustment in one argument in response to variation in all others necessary to remain on the level set in this context.

Suppose that and are “smooth” in an appropriate sense, for now at least once continuously differentiable, and start from a point on the level set . We are looking for a way to express a rule for how to stay on this level set locally, that is, in some neighborhood of , by defining a function of one variable in all others. Now, suppose that we start from and only marginally vary the components at positions of , but not the first component, say to where and where small. Because is continuous, should still lie “really close” to ( holds because !). By continuous differentiability of , the first partial derivative of is continuous, and we should be able to vary the first argument by to move back to zero, i.e. cancel any potential change in induced by the marginal variation in the other arguments through a function such that

Of course, we typically need not pick the first element to do this, so that we may instead simply decompose where is the univariate variable that we want to adjust, and are all the other variables that are allowed to move freely (note that although we write , we don’t require to be the first element of ! You may e.g. have seen this in decompositions like that says that is composed of the -th element and all other elements). Rather, we can pick any variable for that is able to induce variation in in a neighborhood of , i.e. any with , as by continuity of ‘s partial derivatives, this is sufficient to ensure that will remain non-zero in some small neighborhood around .

To complete the theorem, consider one more interesting observation: if we define for , then around , we have , so that in the neighborhood we consider, . Assuming for now that is differentiable, even though we can not compute itself, we can actually use this to derive an expression for the derivative of using the chain rule or respectively, the total derivative of with respect to :

where denotes the vector of partial derivatives with respect to elements of of length , and the second line follows as is equal to , the identity matrix of dimension . Re-arranging the expression, noting that is a non-zero real number around and we can thus divide by it, we get

Finally, we are ready to put everything together by stating the result justifying our approach above:

Theorem: Univariate Implicit Function Theorem.
Let , and , and . Suppose that , and that for a , . Then, if , there exists an open set such that and for which and . Moreover, it holds that with derivative

Hopefully, the elaborations above help you in understanding the rather abstract statement of this theorem. The next section sheds light on how we end up using it for our optimization context.

### The Lagrangian Method for One Constraint

The idea that we employ to make use of the implicit function theorem in our constrained optimization context is the following: we consider again a local maximizer as the starting point and contemplate how to characterize it, first through necessary conditions. For this, we locally re-write the constrained problem to an unconstrained one by making use of the implicit function. So, let us focus again on a local maximizer , this time of the constrained problem

This means that and for an , for any , . If we now meet the requirement of the implicit function theorem, i.e. , then we can define our implicit function , for which especially , and we also have a such that:

Don’t worry too much about the here, it is certainly , and represents the room of variation left for when imposing that and simultaneously that . The important message of this statement is that if is a local maximizer in the constrained problem and , then is a local maximizer in the unconstrained problem

(1)

But as a local maximizer in an unconstrained problem, we know how to characterize through necessary and sufficient conditions from the previous section! Hence, it should be possible to derive the same kind of conditions also for the constrained problem, and it indeed is. The following discusses how this transfer works.

Let us start with the first order condition: must be a critical point of where as we already defined above. This means that . To obtain a workable first order condition for the constrained problem from this, we need to get rid of the unknown function . In analogy to the derivation above, this can be achieved by applying the total derivative:

Now, we just plug in what we know about from the implicit function theorem, and use that :

Re-arranging terms and plugging in that , we get

where we define as the expression in square brackets. Moreover, it holds that

Thus, we can summarize more compactly

This is indeed what we were after: a characterization of the local maximizer in the constrained problem using first derivatives. Note that this result is an implication of being a local maximizer, such that it is a necessary condition for a local maximum. Let us write this down:

Theorem: Lagrange’s Necessary First Order Condition.
Consider the constrained problem where and . Let and suppose that . Then, is a local maximizer of the constrained problem only if there exists . If such exists, we call it the Lagrange multiplier associated with .

The condition “” ensures that we can find the implicit function to express how we may remain within the level set around . Conversely, this means that in comparison to the unconstrained problem, we need to take care of one more “border solution” candidate: singularities of the level set, i.e. points where and .

Let’s take a moment to consider and formally justify the Lagrangian function, or simply, the Lagrangian. The necessary condition is ““. Thus, for said , it is an easy exercise to verify that evaluated at , the gradient of the following function is equal to :

We call this function the Lagrangian. Importantly, from our considerations in the previous section, the necessary condition for a solution to unconstrained maximization of the Lagrangian is , which is however equivalent to our necessary condition for the constrained problem we had just derived. Finally, note that the derivative of with respect to is just , and requiring is thus equivalent to requiring . Thus, any interior solution candidate in the constrained problem is associated with a so that is a critical point of the Lagrangian function! However, as a technical comment, as you can convince yourself with the help of the companion script, the Lagrangian function never has maxima or minima, but only saddle points. Thus, be careful not to say that constrained maximization is equivalent to maximization of the Lagrangian, as the Lagrangian has no local maximizers and the latter problem never has a solution!

Still, the equivalence of critical points gives the usual Lagrangian first order conditions for a solution that you may be familiar with:

This gives a system of equations is unknowns, representing our starting point for searching our solution candidates of any specific equality-constrained problem with one constraint. In the unconstrained case, we could consult the second derivative to further narrow down the set of candidates we needed to compare, and were able to also find a sufficient condition for local maximizers. Let us see how this generalizes to the constrained case. Doing so, we will confine ourselves to studying the theorem and not care too much how precisely this comes to be.

Definition: Leading Principal Minor.
Consider a symmetric matrix . Then, for , the -th leading principal minor of , or the leading principal minor of of order is the matrix obtained from eliminating all rows and columns with index above from , i.e. the matrix .

In words, the -th leading principal minor of corresponds to the upper block of . Let’s consider an example. Let

Then, the leading principal minors of order and , respectively, are

Because it will important for the theorem, answer the following: what are the last two leading principal minors of ?

and .

The last concept that we need for our “second order condition” is the Bordered Hessian Matrix, given by the Hessian of the Lagrangian function:

where and denote the Hessians of and , respectively.

Theorem: Sufficient Conditions for the Constrained Problem.
Consider the constrained problem where and . Let and such that is a critical point of the Lagrangian function, i.e. and . If is the number of equality constraints, denote by the last principal minors of . If

• , then is a local minimizer of the constrained problem.
• , then is a local maximizer of the constrained problem.

Here, sgn denotes the sign function equal to if , to for and to else.

Note that we introduced the variable to indicate the generalization to multiple constraints. Further, note that (i) here, we only have a second order sufficient, but not a second order necessary condition, so our ability to rule out critical values based on failure of a necessary condition is comparatively limited (this is not too bad, usually, there are not too many critical values anyway) and (ii) as with the second order condition in the unconstrained case, we require the functions and to be , i.e. twice continuously differentiable, such that again, the second order condition is subject to a stronger smoothness regularity. Typically in economic problems, this regularity assumption will be satisfied.

Finally, note that the search for global maxima, because the level set is usually not convex, the theorem for the unconstrained case does not transfer, such that a concave objective is not sufficient for the global maximum.

### Lagrangian with One Equality Constraint: Example and Interpretation

To get a feeling for the Lagrangian method, before moving on to multiple constraints, let us consider an example here: consider the problem

That is, we seek the vector with minimum Euclidean length in the that satisfies (for economic context, you may think of the length as a cost function and as a production target). How do we approach this issue? First, a trick to get rid of the unwieldy square root; let’s consider the equivalent problem

This may always be worthwhile to think about — can you re-write the objective to a simpler expression without changing the solutions? Just remember to plug the solutions into the original objective in the end, then this approach works just fine and may save you a lot of work!

In step zero, we think about points that the Lagrangian method won’t find: boundary points and singularities! However, because is an open set, boundary points are not an issue. Next, at any , the gradient of the constraint function is , such that there are no singularities. Thus, we can restrict attention to critical values as found by the Lagrangian method. Make sure that you always think about this step!

In the first step, we look for our candidates using the first order necessary conditions, or the Lagrangian first order conditions (FOCs), given by

So, let’s compute the gradient of (the one of , we already took care of above):

Thus, the Lagrangian FOC for tells us that

This gives either or . Considering the FOC for , i.e. the constraint, the former case can be ruled out; plugging in the latter gives

Hence, there is just a unique candidate to consider: . Let’s consult the second order sufficient condition and see whether we can decide upon the type of extremum: recall the form of the Bordered Hessian above. Computing it for an arbitrary gives

Thus, it is the same irrepsective of and , and we don’t need to plug in our candidate solution . Now, let’s apply the theorem. We have variables and constraints, so , and indeed, we only have to compute the determinant of the last leading principal minor, so the Bordered Hessian itself. Before doing so, let’s form expectations: the point is (i) a maximum if the determinant’s sign is , i.e. , (ii) a minimum if the determinant’s sign is , i.e. , and (iii) may be either of the two, or a saddle point if .

Applying our determinant rule, all the right-diagonals yield a zero product, while we twice get on the left-diagonal, such that , and we have indeed found a maximum!

To conclude, minimizes the Euclidean length of vectors in that satisfy , and it has length
.

To conclude our investigations on the case of one equality constraint, let us focus on some intuition and interpretation of the method. First, for our weird and abstract second order sufficient condition that more or less fell from the sky, there is something intuitive to note about it: the number of principal minors to consider. Recall that the gradient of , , was ruled out to have a “row rank deficit”, which for row vectors is a fancy way of saying that they are not zero (generally, a matrix has a row rank deficit if its rank is below ; this concept will be important in the multivariate extension below). In general, this means that all constraints in move in independent directions at the solution, in our case of just one constraint, it is sufficient that the function moves at all with variation in around the solution, i.e. we have not included a nonsense constraint such as , or even worse, (also the seemingly smart attempt to re-write inequality constraints using indicator functions, e.g. as falls victim to this issue at every point where is differentiable). As such, the constraint restricts directions of the in maximization of , but it leaves free the remaining , which is exactly the number of leading principal minors that we have to consider! Intuitively, you can think about this in analogy to our discussions of the saddle point graph — we need to rule out that we have a minimum in one and a maximum into the other direction.

The second, and perhaps far more important piece of interpretation concerns the Lagrangian multiplier. Have you ever heard the expression “shadow price of the budget constraint”? Then you may know that we associate it with the Lagrangian multiplier in constrained utility maximization subject to the budget constraint. But what does it mean? Let’s consider a general constraint , but to ease interpretation, keep in mind that may be the budget and we may have as the sum of expenditures on individual goods. Note that given , we can express the optimal solution (assuming for now that it is unique) as , i.e. it will depend only on the value of (the budget available) that constrains the problem. Denote as the objective function’s value at the optimum (the implicit utility function ), and let be the implicit constraint function (expressing expenditures at the optimum, ). Then, note that by the chain rule,

and thus

In case of the utility maximization problem, tells us the ratio of change in utility and change in expenditures. Because the household always spends all his income (provided that the utility function is strictly increasing in consumption), we have so that , i.e. it resembles the utility’s responsiveness to increases in . Hence, we can conversely interpret it as the marginal utility cost of being budget-constrained! For the general problem, the intuition is similar, because if we continue to require , then trivially , and we can interpret as the marginal increase in the objective function associated with marginally relaxing the constraint. Note that this means that the multiplier MUST BE non-negative, i.e. ! If , we are in an interesting special case: here, the constraint does not matter, in the sense that is the solution to the unconstrained problem as well.

Indeed, this intuition will be crucial when we generalize our insights to inequality constraints: if an inequality constraint “matters”, in the set of such that , it constrains our optimization opportunities, then it must be the case that it has a non-zero “cost” in terms of the objective, that is, it is associated with a multiplier unequal to zero, and the constraint is binding at in the sense that . Otherwise, if the , we say that the constraint is slack, we could deviate marginally from and still satisfy the constraint — it thus seems not to have a value cost for the objective, and must be associated with a multiplier . This suggests that when is the multiplier associated with the constraint , we should have an equality of the form , called a complementary slackness condition (think about why this equality encompasses the two cases before). Indeed, next to the constraint holding, this is almost all we need to generalize the method to inequality constraints!

Consider again the Lagrangian (with constraint ):

Suppose now that we marginally relax the (budget) constraint, that is, assume that marginally increases (the household gets more money). Then, note that

### Lagrangian Optimization: Multiple Constraints

To conclude our studies of equality-constrained optimization problems, we need to briefly consider the method’s generalization to multiple constraints. There is no need for much discussion, all formal ideas and the intuition transfer from the case of one constraint. The only tricky part in generalization here is the rank condition for the Jacobian of the constraint function , the intuition of which we have already considered above. The reason for the close-to one-to-one transferability is that when starting from a problem of the form

we can always define as above so that

which looks very much like the constraint set of the one-constraint problem, only that we are now dealing with a vector-valued function that must be equal to the zero vector. But, as you may recall from Chapter 1, vector spaces are endowed with very similar algebraic structures as real numbers, that in many cases, including this one, allow for convenient and relatively straightforward generalizations. Indeed, we may rely on the multivariate implicit function theorem:

Theorem: Multivariate Implicit Function Theorem.
Let , and , and . Suppose that , and that for a , . Then, if , there exists an open set such that and for which and . Moreover, it holds that with derivative

The important changes from the univariate theorem are underlined. You can see that the only thing that has really changed is the rank condition, which encompasses the univariate case, as when , is equivalent to , the rest is just adjustment to the increased dimensionality of . Note that in the general case, is the matrix-valued Jacobian. The intuition is the same as before: for every constraint , we need to vary independently from the other constraints so as to ensure that it holds in a neighborhood of the point of interest. As in the univariate case, we can not have , i.e. more (meaningful) constraints than directions of variation.

In analogy to above, this theorem allows us to derive the necessary first order condition that gives us the set of “interior” candidates for extrema:

Theorem: Lagrange’s Necessary First Order Conditions for Multiple Constraints.
Consider the constrained problem where and . Let and suppose that . Then, is a local maximizer of the constrained problem only if there exists . If such exists, we call the Lagrange multiplier associated with for the -th constraint.

Here, instead of , you may be more familiar with the condition

You can check that these two characterizations are equivalent using the definition of the Jacobian. As before, this theorem gives us all the candidates for local extrema that we have to consider when searching for the global maximum/minimum. As with the univariate case, the -th multiplier can be interpreted as the (non-negative) value cost of the -th constraint at .

Ruling out extreme values with the second order necessary condition is exactly analogous to before (because we had already formulated the second order condition quite generally earlier):

Theorem: Second Order Sufficient Conditions for Multiple Constraints.
Consider the constrained problem where and . Let and such that is a critical point of the Lagrangian function, i.e. and . Denote by the last principal minors of . If

• , then is a local minimizer of the constrained problem.
• , then is a local maximizer of the constrained problem.

Next to the slightly different first order condition, the only difference is the form of the Bordered Hessian. It still corresponds to the Hessian of the Lagrangian function, however, due to the multitude of constraints, this Hessian now looks slightly different:

You can try to compute this yourself using the definition of the Hessian and appreciating the vector structure of . Note again that we need a stronger smoothness assumption for the second order necessary condition, as and need to be .

### Equality Constraints and the Lagrangian: A Recipe

We have done quite some math to establish our approach to equality-constrained problems using the Lagrangian function. Having a good understanding of this analytic justification will greatly aid your general knowledge in econ-related math, however, it may be a bit overwhelming, such that the clear path to approach a specific problem may not be crystal clear just yet. Thus, in our last discussion of problems with equality constraints, as we did for unconstrained problems, let us summarize our findings in a “cookbook recipe” to follow when confronted with specific applications.

Starting Point. Consider a problem

and suppose that and are sufficiently smooth, i.e. where supp and . This ensures that both our first and second order conditions are applicable. If there are points of non-differentiability, they have to considered as border solution points separately.

Analytical Steps.

1. Collect all candidate points that can not be identified via the “interior” (local extremum) method, i.e. boundary points of the support of (if any), and singularities of the level set , i.e. with rk. Next, turn to the interior candidates.
2. (First Order Necessary Condition/”FOC”) Compute the set of interior candidates, i.e. the critical points of the Lagrangian function and the vectors of associated multipliers .
3. (Second Order Sufficient Condition/”SOC”) Can we identify the type of all candidates using the Bordered Hessian Criterion? Rule out Minima!
4. (Value Comparison; only in case of multiple candidates) For all candidates that remain until here (including those of step 1), compute and determine which point(s) remain(s) as candidates for the global extremizer.
5. (Solution Existence) Argue that the global extremizer candidate you have found indeed yields the global extremum of interest by consulting the limits of (within the constraint set) or using the Weierstrass theorem.

For the last step, for economic applications, it is oftentimes helpful to recall that the budget set, and sets defined in a very similar way, are compact, as are the intersections of a finite number of compact sets (as closedness and boundedness are preserved under finite set intersections). The “solution existence” part can also be taken care of in the first step, however, if you have to rely on a limits argument, it may be helpful to know for your global extremizer candidate .

## Inequality Constraints

The logic we employ to solve problems featuring also inequality constraints is the one outlined above: a solution candidate will be required to satisfy the Lagrangian conditions for all equality constraints, and further, for the inequality constraints we will impose (i) feasibility, i.e. being contained in the constraint set for a solution candidate, and (ii) our complementary slackness condition. The result we rely on is the Karush-Kuhn-Tucker theorem that tells us the necessary and sufficient conditions for an optimum. Here, we only state it, as the application procedure is more or less analogous to the “Lagrangian cookbook formula”, and the central take-away from this chapter should be the general foundation of unconstrained optimization and how we formally/intuitively justify the Lagrangian approach.

For the problem with equality and inequality constraints (but potentially or )

where are all functions, the result may be summarized as follows:

Theorem: Karush-Kuhn-Tucker Theorem.
For and , consider the optimality conditions

• (Feasibility) and ,
• (FOC for ) ,
• (Complementary Slackness) .

Then, if is a local maximum of the constrained problem for which the set is linearly independent, there exist and such that satisfy the optimality conditions.

In contrast to the Lagrange Theorem discussed earlier, what has changed is that the rank condition has yet become more complex again. Intuitively, when varying candidates , the set of “relevant” equality constraints changes because at different , different will hold with equality. In consequence, we have a different rank condition at different , expressed by the more unwieldy characterization above. Nonetheless, the concept is the same as before, and as there, we must separately identify singularity candidates from points that violate this condition, if there are any. The rest of the theorem follows the logic discussed earlier: points must be feasible, and either an inequality constraint holds with equality , or it does not constrain our ability to maximize , i.e. the value of the objective function, locally , which we summarize in the complementary slackness condition. Finally, note that this theorem, like the Lagrangian method, only gives (first-order) necessary conditions, such that existence must be ensured in a first step! Sufficient conditions may be obtained when considering concave objective functions and only quasi-convex inequality constraints, an issue called “convex optimization”. However, this issue is beyond the scope of this class. most problems you face will be “nice enough”, in the sense that you have very few candidates survive the first order condition anyways, and this issue is less severe than it sounds.

### Problem Simplification

In the remaining discussion on inequality-constrained problems, we will actually be concerned with re-writing seemingly inequality-constrained problems as equality-constrained, which greatly facilitates analytical solution. To start off, consider again the general constrained maximization problem,

Instead of solving it using the Kuhn-Tucker method, we may be able to re-write it as

which we can apply more familiar and straightforward the Lagrangian method to. Generally, there are two approaches to simplifying the problem: (i) imposing that an inequality constraint is binding, i.e. that it necessarily holds with equality (and thus is “not really an inequality constraint at all”), or (ii) dropping inequality constraints.

Let us study these methods in the context of the budget-constrained utility maximization problem we already introduced,

with price vector such that for all and . Utility functions that we typically consider are such that is strictly increasing in all arguments, i.e. , an assumption that we are usually ready to impose because strictly more is typically strictly better in the view of the consumer. Moreover, we assume that the marginal utility of the first bit is infinitely high, i.e. . Common examples that satisfy these conditions are Cobb-Douglas or Constant Elasticity of Substitution utility functions.

The first simplification (“not really an inequality constraint”) can be applied to the budget constraint: a point with can never be a solution, because the consumer can spend more to increase utility! Thus, all solutions of the initial problem will be solutions of the re-written problem

Whenever this condition in bold holds, the simplification is justified. Next, note that the marginal utility from spending money on good is (because 1 dollar gives units of the good). For any such that and , we can marginally decrease spending on and increase spending on – and, by our zero limit assumption for the marginal utility, generate a large leap in utility. Thus, the point with can not be an optimum. Thus, will never bind, and if solves our problem, it is also a local maximizer in the problem

The Lagrangian approach of this problem gives , and we can ex-post impose that , i.e. that all entries in must be non-negative. In case of a bivariate Cobb-Douglas function , , it may be a good exercise to formally verify that the optimal input ratio is given by , pay special attention to the line of arguments that ensures that this is indeed the unique global maximum. Thus, we can restrict the role of inequality constraints to feasibility of points in the Lagrangian problem, making them a further criterion in the “cookbook recipe” outlined above according to which we may rule out candidates identified from the FOC.

Accordingly, we may be able to reduce the problem to either a standard Lagrangian problem if we are able to argue that the inequality constraints are actually equality constraints, or solve an equivalent Lagrangian problem with an additional feasibility condition that rules out some of the critical values. Indeed, most optimization problems you face during the Master’s classes that feature inequality constraints can be reduced to such Lagrangian problems that you can solve using the methods discussed above.

As a final note on problem simplification, recall our motivation for deriving our one-equality-constraint procedure from the implicit function theorem: it may not always be possible to represent constraints using global, explicit functions. On the other hand, sometimes, this may indeed be the case. If so, you can always get rid of the respective constraint – solve for the explicit function representing a constraint and plug it into the problem! This usually makes the problem much easier to solve, as you reduce its dimensionality and the number of constraints. As an example, you can consult the elaborations on the explicit function in the section “Level Sets and Implicit Functions for Optimization” that we used to re-write the two-dimensional constrained problem as a one-dimensional, unconstrained one. When applying this trick, be sure to plug the explicit function not only in to the objective, but also to the other constraints, if any.

### Exploiting Intuition: Lagrangian Multipliers as a Sufficient Condition

The intuition of Lagrangian multipliers, when combined with our re-writing approach presented above, yields an extremely simplistic alternative to the computation-intensive second order condition approach to determining the type of extreme values. For simplicity, we consider again the scenario of one constraint.

Suppose that we start from some problem (either the initial one, or a re-written inequality-constrained problem)

To use Lagrangian multipliers as a necessary condition, as stated above, we need a clear notion of how the constraint can be “relaxed”. For this, a first requirement must hold: the problem must be equivalent to the inequality-constrained problem

For instance, this is the case for utility maximization subject to a budget constraint, but this equivalence also applies to a broad range of further economic optimization problems. Generally, as discussed in the previous section, this equivalence is more easily established when starting out from the inequality-constrained problem, and holds e.g. when both the objective and constraint are strictly monotonic in one variable. The equivalence to the inequality-constrained problem ensures that we have a well-defined idea of what “relaxing” the constraint means. For instance, in the example of the budget constraint, this notion would be to increase the budget .
On this, make sure that your equality-constrained problem is equivalent to a problem of the form “” rather than ““. The intuition of relaxing the constraint critically depends on this form, and if you assume a constraint “” you are actually considering the problem with ““, therefore inverting the sign of Lagrangian multipliers and confusing the intuition! Of course, in this case the method presented below continues to be applicable, but with constraint function and the first order condition (if you instead write the FOC as , you get , introducing the sign error on the Lagrangian multiplier).

Now, let’s turn to how to put this Lagrangian multiplier intuition to use in our context: if at a critical point, the value cost of the constraint imposed on the objective function, equal to the Lagrangian multiplier, is strictly larger zero at some candidate , then at , what is constrained is our ability to increase the objective, and the candidate should be a local maximizer! Indeed, one can show formally that the following holds:

Theorem: Lagrangian Multipliers and Type of Extremum.
Let be an equality-constrained problem with objective and constraint function (constraint: ), and suppose that is equivalent to the inequality-constrained problem with objective and constraint function (constraint: ). Then, if and are such that , ,

• if , then is a local maximizer of
• if , then is a local minimizer of

The reason why this works has to do with the directional derivative of at with direction :

If is a direction pointing to the interior of the constraint set (i.e., ) , then

so that

Therefore, if (), moving marginally to the interior of the constraint set , we obtain (). Thus, for points on around , it holds that (). To see this, note that points on around can be interpreted as the limit of a sequence around with , i.e. . As weak inequalities are preserved under the limit, and since is continuous, for all implies . Therefore, is a constrained local maximizer (minimizer) of .

For the multivariate case, an analogous variant of the theorem applies so long as you can re-write all inequality constraints as equality constraints ( to ) and vice versa, in which case is sufficient for a constrained local maximizer. As a note of caution, this condition classifies with certainty only critical points with . Thus, points with are not classified, and must be kept as candidate solutions for comparison in the final step. Intuitively, tells us that the constraint does not matter, as relaxing it has no effect on the solution. This scenario can occur in both minimization and maximization problems, so that gives you no information of the type of the extremizer.

This Lagrangian multiplier condition may be extremely helpful at times, as you can convince yourself of in the exercise problems and on the problem sets. The reason is that it offers an alternative, and in comparison to the Bordered Hessian criterion involving computation of multiple determinants for each candidate, fascinatingly simple way to determine the kind of local extremum a certain critical value of the Lagrangian corresponds to. The only “cost” this simplification comes at is that it applies only when there is equivalence in the equality- and inequality-constrained problem, which may be non-obvious to verify in some applications. Even if you can use it, you should be very careful to apply the trick correctly — one mess-up with the sign, and you’re accidentally throwing away all candidates that you actually care about! To prevent this from happening, keeping in mind:

(1) Convince yourself thoroughly that the equality- and inequality-constrained problems are equivalent. Also make sure to have the correct form for the inequality-constrained problem, i.e. . If you have , multiply the constraint function by before proceeding.

(2) The proper way to write down the Lagrangian is with a minus: . Only like this, we get the FOC , and you obtain the correct sign for .

For (2), if you use plus instead of minus, you’re multiplying the constraint function by and lose equivalence to the inequality-constrained problem! Thus, it may be advisable to simply proceed without the Lagrangian and start from the FOC directly. So, before using the Lagrange multiplier trick, ask yourself: “Is the associated inequality-constrained problem equivalent? And am I using the correct FOC?” If your answer is a definitive yes, you’re good to go!

## Conclusion

We have started by formally studying unconstrained optimization problems, which has allowed us to highlight these problems’ key aspects and properties, to formally justify our first-and second order condition solution approach, and to make clear how we may generalize the respective insights to the case where optimization is subject to equality constraints. For notational simplicity, we have proceeded to a rigorous formal study of the problem with one equality constraint, where we outlined necessary and sufficient conditions for interior solutions. Thereafter, we generalized this approach to multiple constraints exploiting the structural similarity of the vector space to the space of real numbers, and identified a stepwise procedure to solving general equality-constrained problems using the Lagrangian method. Subsequently, we investigated problems that feature inequality constraints (such as budget constraints and non-negativity constraints) and outlined possible approaches to reduce them to standard Lagrangian problems. In case this reduction is not applicable, we may consult the Karush-Kuhn-Tucker conditions. Finally, we saw that when we have an intuitive way of thinking about what “relaxing the constraint” refers to, we can utilize a mathematically simplistic and elegant way to determine the type of an extremizer that avoids computing a large amount of matrices and their determinants. From our discussions in this chapter, it has emerged that thorough understanding of the Lagrangian method and especially the intuition for the Lagrangian multipliers suffices to understand the approach to most economic optimization problems.

To test how well you understood the second half of the discussions on optimization, you can do a short quiz found here.