An overview of this chapter’s contents and take-aways can be found here.
Table of Contents
Very frequently as economists, we have to deal with matrices because they are at the heart of (constrained) optimization problems where we have more than one variable, e.g. utility maximization given budget sets, cost-efficient production given an output target with multiple goods or deviation minimization in statistical analysis when considering multiple parameters. To engage in such analysis, we need to know a variety of matrix properties, what operations we can apply to one or multiple matrices, and very importantly, how to use matrices to solve systems of linear equations.
When considering such systems, a number of (frequently non-obvious) questions are
We will shortly define how matrices can be multiplied with each other and with vectors formally. For now, let us confine to the intuition of why matrices are a useful concept in the context of linear equation systems. Consider the system
which, by stacking the equations into a vector, we can re-write as
where and is the matrix on the LHS of the last equation. We will worry about why the equivalence holds after formally introducing matrix multiplication. Here, we see that generally, a system of equations in variables has a matrix representation with a coefficient matrix of dimension and a solution vector . You may be familiar with a few characterizations of that help us to determine the number of solutions: In many cases, if , we can find at least one vector that solves the system, and if , then there are generally infinitely many solutions. However, there is much more we can say about the solutions from looking at , and how exactly this works will be an central aspect of this chapter.
Because we want to do mathematical analysis with matrices, a first crucial step is to make ourselves aware of the algebraic structure that we attribute to a set of matrices with given dimensions that allow to perform mathematical basis operations (addition and scalar multiplication), that serve as ground for any further analysis we will eventually engage in. As a first step, let us formally consider what a matrix is:
Definition: Matrix of dimension .
Let be a collection of elements from a basis set , i.e. . Then, the matrix of these elements is the ordered two-dimensional collection
We call the row dimension of and the column dimension of . We write . As an alternative notation, one may write , or, if , .
In the following, we restrict attention to , so that the are real numbers. Note however that this need not be the case; indeed, an important concept in econometrics are so-called block-matrices where the are matrices of real numbers, and for derivatives, we frequently consider matrices of (derivative) operators, that is, functions, as opposed to numbers. Moreover, the entries can be random variables (in econometrics), functions, and even operators, as we will see in chapter 3.
To apply the vector space concept to matrices, note that matrices of real numbers can be viewed as a generalization of real vectors: a vector is simply a matrix of dimension . We now consider objects that may have multiple columns, or respectively, stack multiple vectors in an ordered fashion. Thus, when are the columns of a matrix , we may also write (an alternative, yet less frequently used notation stacks the rows of on top of each other). Therefore, a natural way of defining addition and scalar multiplication for matrices is to apply the operators of the real vector context “element”-wise, i.e. for each column separately:
Definition: Addition of Matrices.
For two matrices and of identical dimension, i.e. and , their sum is obtained from addition of their elements, that is
Note that to conveniently define addition, we have to restrict attention to matrices of the same dimension! This already means that we will consider not the whole universe of matrices as a vector space, but each potential specific combination of dimensions separately.
Definition: Scalar Multiplication of Matrices.
Let and . Then, scalar multiplication of with is defined element-wise, that is
From chapter 1, we know that for graphical and also formal intuition, it is very useful if some algebraic structure constitutes a vector space, because then, we can deal with it very “similar” to the way we work with real numbers. Indeed, the set of matrices of given dimensions , together with addition and scalar multiplication as we just defined, constitutes a real vector space:
Theorem: Vector Space of Matrices.
The set of all matrices, together with the algebraic operations matrix addition and multiplication with a scalar as defined above defines a vector space.
While we will not be concerned much with distances of matrices in this course, defining them in accordance with the previous chapter is indeed possible: the matrix norm commonly used is very similar to the maximum-norm defined earlier:
Should you be curious or feel that you need some more practice with the norm concept, you can try to verify that this indeed constitutes a norm, checking all 3 parts of the norm definition.
Before we move to deeper analysis of matrices and their usefulness for the purpose of the economist, some review and introduction of important vocabulary is necessary. In this section, you can find a collection of the most central terminology for certain special characteristics matrices may have.
First, when characterizing matrices, it may be worthwhile to think about when we say that two matrices are equal.
Definition: Matrix Equality.
The matrices and are said to be equal if and only if (i) they have the same dimension, i.e. , and (ii) all their elements are equal, i.e. .
Note especially that it is thus not sufficient that e.g. all elements equal the same value, so that the matrices and are not equal.
Definition: Transposed Matrix.
Let . Then, the transpose of is the matrix where , . Alternative notations are or .
Thus, the transpose “inverts” the dimensions, or respectively, it stacks the columns of in its rows (or equivalently, the rows of in its columns). Note that dimensions “flip”, i.e. that if is of dimension , then is of dimension . An example is given below in equation (1).
Definition: Row and Column Vectors.
Let . If , is called a row vector, and a column vector if . By convention, one also calls a column vector simply vector.
According to this definition, as we also did before with vectors in , when you just read the word “vector”, you should think of a column vector.
Definition: Zero Matrices.
The zero matrix of dimension , denoted , is the matrix where every entry is equal to zero.
Definition: Square Matrix.
Let . Then, if , is said to be a square matrix.
Definition: Diagonal Matrix.
A square matrix is said to be a diagonal matrix if all of its off-diagonal elements are zero, i.e. . We write .
Note that the diagonal elements , need not be non-zero for to be labeled “diagonal”, and thus, e.g. the zero matrix is diagonal, and so is .
Definition: Upper and Lower Triangular Matrix.
A square matrix is said to be upper triangular if , i.e. when the entry equals zero whenever it lies below the diagonal. Conversely, is said to be lower triangular if is upper triangular, i.e. .
Rather than studying the definition, the concept may be more straightforward to grasp by just looking at an upper triangular matrix:
(1)
From its transpose, you can see why the transposition concept is used in the definition for the lower triangular matrix.
Definition: Symmetric Matrix.
A square matrix is said to be symmetric if .
Equivalently, symmetry is characterized by coincidence of and its transpose , i.e. .
Definition: Identity Matrices.
The identity matrix, denoted , is a diagonal matrix with all its diagonal elements equal to 1.
Again, the concept may be more quickly grasped visually:
To check your understanding and to get more familiar with the concepts, think about the following questions:
Now that we have laid the formal foundation by introducing the vector spaces of matrices of certain dimension and made ourselves familiar with a variety of important matrices, it is time to take a closer look on how we do calculus with matrices beyond the basis operations. Similar to the scalar product discussed for vectors, we first should know how to multiply two elements of the vector space with each other, rather than just one element with a scalar:
Definition: Matrix Product.
Consider two matrices and so that the column dimension of is equal to the row dimension of . Then, the matrix of column dimension equal to the one of and row dimension equal to the one of is called the product of and , denoted , if .
As made clear by the bold text, matrix multiplication is subject to a compatibility condition, that differs from the one discussed before for addition. Thus, not all matrices that can be multiplied with each other can also be added, and the converse is also true.
To see just how closely this concept relates to the scalar product, write in row notation and in column notation, i.e. let be row vectors such that and column vectors so that . Then,
and the matrix product emerges just as an ordered collection of the scalar products of ‘s rows with ‘s columns!
It may require some practice to confidently handle matrix multiplication, but if you aren’t already, you should become familiar with it rather quickly. Here is an example that visually illustrates how the entries in the matrix product come to be:
If you try to multiply this expression the other way round, you will quickly see that this doesn’t work: recall that the “inner” dimensions needed to coincide, so if is , must be for the product to exist. Thus, and exist if and only if the matrices are square and of equal dimension! And even then, it will generally not hold that .
Besides its complicated look, matrix multiplication does have some desirable properties:
Theorem: Associativity and Distributivity of the Product.
Assuming that are matrices of appropriate dimension, the product for matrices is
In terms of computing a matrix product, this especially means that the order in which you multiply matrices with each other does not matter. The theorem tells us that standard rules related to addition and multiplication continue to hold for matrices, e.g. when and are matrices of appropriate dimension, then . An exception is of course that, as discussed above, commutativity of multiplication is not guaranteed.
It is noteworthy that the zero and identity element in the matrix space can be dealt with in a fashion very similar to the numbers 0 and 1 in :
Theorem: Zero and Identity matrix.
Let . Then,
Be sure to carefully think about where the dimensions of the zero and identity matrices come from, i.e. why they are chosen like this in the theorem! From this, take away that zero and identity matrix work in the way you would expect them to, and that there are no extraordinary features to be taken into account. Further useful properties of matrix operations are
Theorem: Transposition, Sum, and Product.
While the former two points are more or less obviously useful, the third may appear odd; isn’t this obvious?! Why should it be part of a theorem? Well, the practical use is that frequently, this can be used to achieve a more convenient representation of complex expressions. For instance, in econometrics, when denotes a vector of linear regression model coefficients (if you are not familiar with this expression, it’s not too important what precisely this quantity is right now, just note that , where is the number of model variables), the squared errors in the model ( random variable, random vector of length ) that are of interest to estimator choice are
Now, when taking expectations of such an expression (summand-wise), we want the non-random parameters (here: ) to either be at the beginning or the end of an expression. For the last one, this is not immediately the case: . However, noting that is scalar, (with Point 2. in the Theorem), and thus, , an expression that we can handle well also when considering the expected value.
As a final remark on notation, note that we can use matrices (and vectors as special examples of them) for more compact representations. Consider the problem of estimating the parameter that we just considered above. Here, our standard go-to choice is the ordinary least squares (OLS) estimator , which minimizes the sum of squared residuals in prediction of using the information and the prediction vector over all possible prediction vectors ‘s:
When defining , we can simply write . Generally, note that the scalar product of a vector with itself is just the sum of squares:
Similarly, this applies to sums of matrices and sums of vector products.
This concludes our introductory discussion of matrices and matrix algebra. If you feel like testing your understanding of the concepts discussed thus far, you can take a short quiz found here.
Re-consider the system of linear equations discussed earlier in this chapter. Here, you saw that stacking the equations into a vector, one could arrive at a matrix representation with just one equality sign, characterized by
(2)
where is a matrix of LHS coefficients of the variables stacked in , and is the vector of RHS “result” coefficients. In the case of the example above, the specific system is
As an exercise of how matrix multiplication works, you can multiply out in this example and verify that is indeed equivalent to the system of individual equations.
Recall that our central questions to this equation system were:
Thus, the issue at hand is to characterize the solution(s) for , i.e. the vectors that satisfy equation (2), ideally in a computationally tractable way.
As always, let us get an intuition for the things we don’t know starting from something we know: if and were real numbers and was unequal to zero, you immediately know how to solve for : just bring to the other side by dividing by it. If instead (i.e. can not be “inverted”), you know that there is no solution for if , but if , there are a great variety of solutions for — indeed, every would solve the equation. The idea is very similar with matrix equations, we just need a slightly different or respectively more general notion of “dividing by” and “invertibility”.
Similar to calculus with real numbers, we can define a multiplicative inverse for some :
Definition: Inverse Matrices.
Consider two invertible square matrices . Then, is called the inverse of if . We write so that .
As with real numbers, we can show that the multiplicative inverse is unique, i.e. that for every , there exists at most one inverse matrix :
Proposition: Uniqueness of the Inverse Matrix.
Let and suppose that the inverse of exists. Then, the inverse of is unique.
However, existence is not guaranteed, and in contrast to the real numbers, more than a single element () will be non-invertible in the space . Existence of the inverse is rigorously discussed below. For now, you should take away that the easiest case of a system of linear equations is one where the matrix is invertible. Indeed, the following sufficient condition is what economists rely on most of the time when solving linear equation systems:
Proposition: Invertibility of Unique Solution Existence.
Consider a system of linear equations with unknowns and coefficients , . Then, if and is invertible, the system has a unique solution given by .
To see this, suppose is invertible, and that solves the system. Then,
As discussed above, if we can invert , we can just bring it to the other side, this is exactly the same principle as with a single equation with real numbers. For this sufficient condition to be applicable, we must have that for to be square, i.e. we must have as many equations as unknowns. It may also be worthwhile to keep in mind that the converse of the Proposition above is also true: If has a unique solution and is square, then is invertible.
While we have just seen the value of the inverse matrix in solving linear equation systems, we do not yet know how we determine whether can be inverted and if so, how to determine the inverse — so let us focus on these issues now.
First, some helpful relationships for inverse matrices are:
Proposition: Invertibility of Derived Matrices.
Suppose that are invertible. Then,
To get more familiar with inverse matrices, let us briefly consider why this proposition is true:
For the first point, note that . Thus,
Therefore, is the inverse matrix of .
For the second point, by associativity of the matrix product,
For the third, for any , is a matrix, and we know that it is invertible with . Thus, the third point is a direct implication of the second.
An important corollary is obtained by iterating on the second point:
Corollary: Invertibility of Long Matrix Products.
For any , if are invertible, then is invertible with inverse .
While this proposition and its corollary are very helpful, they still assume invertibility of some initial matrices, and therefore do not fundamentally solve the “when-and-how” problem of matrix inversion. In this course, we first focus on the “when”, i.e. the conditions under which matrices can be inverted. Here, there are four common possible approaches: the determinant, the rank, the eigenvalues and the definiteness of the matrix.
Before introducing and discussing these concepts, it is instructive to make oneself familiar with the elementary operations for matrices. Not only are they at the heart of solving systems of linear equations, but they also interact closely with all concepts discussed next.
Let’s begin with the definition:
Definition: Elementary Matrix Operations.
For a given matrix with rows , consider an operation on that changes the matrix to , i.e. where . The three elementary matrix operations are
To increase tractability of what we do, the following is very helpful:
Proposition: Matrix Representation of Elementary Operations.
Every elementary operation on a matrix can be expressed as left-multiplication a square matrix to .
To see these rather abstract characterizations in a specific example, consider a system with 4 rows and suppose that we use (E1) to exchange rows 1 and 3, (E2) to multiply the fourth row by 5 and (E3) to add row 2, multiplied by 2, to row 3. Then, the associated matrices are
If you have some time to spare, it may be a good exercise to come up with some examples and verify that this matrix-approach indeed does what we want it to do.
As stated above, the practical value of this proposition lies in the fact that when we bring a matrix to another matrix using the elementary operations , where the index of indicates that was the -th operation applied, then we can write
This increases tractability of the process of going from to , a fact which we will repeatedly exploit below.
The key feature of the elementary operations is that one may use them to bring any square matrix to upper triangular form as defined in the previous section (more generally, any arbitrary matrix can be brought to a “generalized” upper triangular form – since we only deal with square matrices here, we need not bother too much with this result, though):
Theorem: Triangulizability of a Matrix.
Consider a matrix . Then, if , can be brought to upper triangular form applying only elementary operations to .
This is helpful because, as will emerge, both the determinant and rank invertibility condition hold for an initial matrix if and only if they hold for an upper triangular matrix obtained from applying elementary operations to . You can find the details of why this theorem applies in the companion script. For now, it is enough to know that this works. How a given matrix can be triangularized will be studied once we turn to the Gauss-Jordan algorithm, our go-to procedure that we use for matrix inversion.
So, how do elementary operations help us when thinking about the inverse of a matrix? The connection is stunningly simple: suppose that we can bring a square matrix not only to triangular, but diagonal form with an all non-zero diagonal using the operations , i.e.
where . Then, multiplying all columns by , summarized by , with , one has . This is very convenient: the transformation matrix is precisely what we call the inverse matrix of , i.e. ! Thus, we can summarize the following:
Proposition: Invertibility and Identity Transformation.
If we can use elementary operations to bring a matrix to the identity matrix, then is invertible with inverse where .
Indeed, we will see later that the converse is also true. As you will see in the last section, the ensuing result is what we base upon our method of computing inverse matrices, the Gauß-Jordan algorithm: to compute the matrix , note that , so that when bringing an invertible matrix to identity form, we just have to apply the same operations in the same order to the identity matrix to arrive at the inverse! Don’t worry if this sounds technical, it’s not, hopefully, the examples we will see later will convince you of this.
Before moving on, let us briefly consider why we cared about the upper triangular matrix in the first place – after all, it is the diagonal matrix with the ‘s that we want, isn’t it?! Well, the thing is, once we arrive at the triangular matrix we are basically done: suppose that, starting from some arbitrary matrix , we have arrived at the triangular matrix
Then, multiplying the last row with and subsequently adding (1) times the last row to row 2 and (2) times the last row to row 1 gives
Next, multiplying the second row with and subsequently adding it times to the first gives the desired diagonal. With a bit more technicality and notation, this line of reasoning generalizes to upper triangular matrices of any dimension.
Thus, we can summarize: If a matrix can be brought to upper triangular form with an all non-zero diagonal using elementary operations, it can also be brought to the identity matrix using elementary operations, and therefore inverted. Because any matrix can be triangularized, we can test invertability by checking if the associated triangular form has zeros on the diagonal.
Now, it is time to turn to our most common matrix invertibility check, the determinant. The very first thing to note about the determinant concept is that ONLY square matrices have a determinant!, i.e. that for any non-square matrix, this quantity is NOT defined!
To give some intuition on the determinant, it may be viewed as a kind of “matrix magnitude” coefficient similar to the one we studied in the vector case, which augments a vector’s directionality. This intuition is helpful because it turns out that as with real numbers, we can invert anything that is of non-zero magnitude — i.e. any matrix with non-zero determinant! However, note that unlike with a vector, a matrix with non-zero entries can have a zero determinant and thus have “zero magnitude”, so that this reasoning should be viewed with some caution.
The general definition of the determinant is unnecessarily complex for our purposes. We will define the determinant recursively here, that is, we first define it for simple matrices, and then express the determinant of an matrix as a function of that of several matrices, which each can be expressed with the help of determinants of matrices, and so on. It is conceptually more straightforward and sufficient for the use you will make of them.
Definition: Determinant.
Let . Then, for the determinant of , denoted or , we define
with and as the matrix resulting from eliminating row and column from , i.e.
This definition allows to obtain the determinant for any arbitrary matrix by decomposing the ‘s into smaller matrices until we have just matrices, i.e. scalars, where we can determine the determinant using 1. The reason why there is the index in 2. is the following relationship:
Theorem: Laplace Expansion.
For any , it holds that
Typically, we would write down the definition without the use of the star, but I put it there to emphasize that the respective index is fixed. If we fix a row index to calculate , we call the computation method a Laplace expansion by the -th row, if we fix the column index , we call it a column expansion by the -th column. This method is the general way how we compute determinants. However, it is quite computationally extensive, and luckily, most matrices that we deal with analytically (rather than with a computer who doesn’t mind lengthy calculations) are of manageable size where we have formulas for the determinant.
Proposition: Determinants of “small” Matrices.
For 1., we can understand the rule using e.g. the Laplace expansion by the first row: note that and . Thus, we have
As an exercise and to convince you of the Laplace expansion, try expanding by the second column and verify that you get the same formula. You can also try to verify point 2. of this proposition using the Laplace expansion method if you desire some practice.
Graphically, we can think about the 3×3-determinant as adding the right-diagonal products and subtracting the left-diagonal products:
These two results are typically enough for most of our applications, and if not, they at least allow us to break the determinant down to rather than matrices using the Laplace expansion method, for which we can directly compute the determinants.
Equipped with these rules, two comments on the Laplace method deem worthwhile. First, when we have a row or column containing only one non-zero entry, we can reduce the dimension of determinant computation avoiding a sum: consider the lower-triangular matrix
The Laplace-expansion for the first column is . However, for any , , so that the expression reduces to . Applying the -rule, it results that . Second, for triangular matrices generally, the determinant is given by the trace.
Definition: Trace of a Square Matrix.
For a matrix , the trace is given by the product of diagonal elements, i.e. .
Proposition: Determinant of a Triangular Matrix.
The determinant of an upper or upper triangular matrix is given by its trace, i.e. .
The proposition follows from simply iterating on what we have done in the example above for a general matrix: consider the upper triangular matrix
Expanding iteratively by the first column, it follows that
For the lower-triangular matrix, the procedure would be analogous, here, we just have to expand for the first row instead of column iteratively.
The two key take-aways are that when you have to compute the determinant of big matrices by hand, look for rows and/or columns with many zeros to apply the Laplace-expansion, or if you’re lucky and face a lower or upper triangular matrix, you can directly compute the determinant from the diagonal (this is of course especially true for diagonal matrices, since they are both upper and upper triangular!). The latter fact is nice especially because the elementary matrix operations introduced above affect the determinant in a tractable way, so that it is possible to avoid Laplace altogether:
Theorem: Determinant and Elementary Operations.
Let and the resulting matrix for the respective elementary operation. Then,
Another important fact that will be helpful frequently is the following:
Theorem: Determinant of the Product.
Let . Then, .
Note that in contrast to the product, for the sum, it does not hold in general that .
Now that we now how to compute a determinant, we care about its role in existence and determination of inverse matrices. As already stated above, the rule we will rely on is inherently simple:
Theorem: Determinant and Invertibility.
Let . Then, is invertible if and only if .
The “only if” part is rather simple: suppose that is invertible. Note that (cf. Proposition “Determinant of a Triangular Matrix”). Then,
Therefore, . Moreover, this equation immediately establishes the following corollary:
Corollary: Determinant of the Inverse Matrix.
Let and suppose that is invertible. Then, .
The key take-away here is that invertibility is equivalent to a non-zero determinant. Consequently, because invertibility implies unique existence of the solution, so does a non-zero determinant. While the general determinant concept is a little notation-intensive, computation is easy for smaller matrices. Thus, the determinant criterion represents the most common invertibility check for “small” matrices or respectively, the most common unique solution check for “small” systems of linear equations when we have as many unknowns as equations.
As the take-away on how to compute determinants, we can summarize the following:
Clearly, we don’t always find ourselves in the comfortable situation that we have as many equations as unknowns. To determine whether these systems have a unique solution, we need to understand the rank of a matrix. The rank of a matrix links closely to our discussion of linear dependence in Chapter 1.
To limit the formal complexity of the discussion, in contrast to the companion script, we still focus on square systems here. This will allow us to talk about the role the rank of a matrix plays for it’s invertibility. In an excursion into non-square systems at the end of this section, yyou will learn what the discussion below implies for solving non-square systems.
Let’s again consider the equation system in matrix notation, , and assume that is square. When writing in column notation as , for all , it is straightforward to check that (try to verify it for the case).
Thus, the LHS of the system is nothing but a linear combination of the columns of with coefficients ! Therefore, the problem of solving the system of linear equations can be rephrased as as looking for the linear combination coefficients for the columns of our coefficient matrix that yields the vector ! Let us illustrate this abstract characterization with an example.
Consider the following system:
with associated matrix form
Recall that the linear combination coefficients can be viewed as combination magnitudes (or weights) of the vectors . Thus, less formally, we are looking for the distance we need to go in every direction indicated by a column of to arrive at the point . Geometrically, you can imagine the problem like this:
Feel encouraged to repeat this graphical exercise for other points ; you should always arrive at a unique solution for the linear combination coefficients. The fundamental reason is that the columns of , and are linearly independent. Think a second about what it would mean geometrically if they were linearly dependent before continuing.
Done? Good! In case of linear dependence, the points lie on the same line through the origin, and either, does not lie on this line and we never arrive there, i.e. there are no solutions, or it does, and an infinite number of combinations of the vectors can be used to arrive at . Either way, there is no unique solution to the system.
Now, it is time to formalize the idea of the “number of directions” that are reached by a matrix and whether “a point can be reached combining the columns of “.
Definition: Column Space of a Matrix.
Let with columns , i.e. . Then, the column space Co of is the space “spanned by” these columns, i.e.
Analogously, we define the row space as the space spanned by the rows of . As a technical detail, note that we called the column space a “space”, and recall from chapter 1 that formally, this means that we need to think about the basis operations that we use. Here, we assume that as a space of real-valued vectors, we use the same operations as for the .
Definition: Rank of a Matrix.
Let . Then, the column (row) rank of is the dimension of the column (row) space of . It coincides with the number of linearly independent columns (rows) of and is denoted as rk, rank or rk .
You may wonder why we use the same notation for the column and row rank, respectively. This is due to the following fact:
Theorem: Column Rank = Row Rank.
Let . Then, the column rank and the row rank of coincide.
Thus, “the rank” of , rk is a well-defined object, and it does not matter whether we compute it from the columns or rows.
Definition: Full Rank.
Consider a matrix . Then, has full row rank if rk and has full column rank if rk. If the matrix is square, has full rank if rk.
Since column and row rank and at most rows and columns can be linearly independent, we get the following result:
Corollary: A Bound for the Rank.
Let . Then, rk.
From the definitions above, you should be able to observe that the rank captures the number of directions into which we can move using the columns of , and that the column space is the set of all points we can reach using linear combinations of ‘s columns. Consequently, a solution exists if and only if , and it is unique, i.e. there is at most one way to “reach” it, if and only if does not have more columns than , that is, if every column captures an independent direction not reached by the remaining columns. In the case of a square system, this reduces to .
Let us rephrase this: if (and only if) , then (i) every column captures an independent direction which ensures that there is no multitude of solutions, and (ii) the matrix reaches different directions, that is, it allows extension alongside every direction in the , so that every point can be reached: ! To summarize:
Theorem: Rank Condition for Square Systems.
Let and . Then, is invertible or respectively, the system has a unique solution if and only if has full rank, i.e. .
Above, we had seen that inverting a matrix can be done by bringing it to identity form using elementary operations. Note that this means that any intermediate matrix that we get in the process of applying the elementary operations is invertible as well, since it can also be brought to identity form using only elementary operations. This means that for an invertible matrix with full rank , all intermediate matrices of the inversion procedure must also have full rank! This is indeed true:
Theorem: Rank and Elementary Operations.
Let . Then, rk is invariant to elementary operations, i.e. for any associated with operations (E1) to (E3), rkrk.
To test your understanding, think about the following question: Consider an upper triangular matrix
When are the columns of linearly independent (think of the linear independence test, and go over the rows backwards)? What does this mean for the determinant?
As already stated above, this course restricts attention to square systems. Of course, in applications, nothing guarantees that we will always work with such systems. This excursion is concerned with the intuition of non-square systems, without going into any formal, mathematical details.
Above, we had just seen that unique solutions exist in “square systems” with an invertible (square) matrix . Indeed, it turns out that more generally, unless the system can be “reduced” to a square system with invertible matrix , there can be no unique solution! This follows from isolated consideration of the multiple cases of general equation systems. Here, it is helpful to think of the number of rows of the matrix in as the “information content” of the equation system, that is, the number of pieces of information that we have for the vector .
First, if we have less equations than unknowns (an “under-identified” system, ), there are always more columns than dimensions, so that it remains ambiguous which columns to combine to arrive at . We can think about this as that the system can never have enough information to uniquely pin down the solution, as we would require at least restrictions on the value should take, one for every entry, but we only have . Still, if multiple columns are linearly dependent, it can be the case that , and certain vectors may not be reached using the columns of , such that there might be no solution at all.
Conversely, with more equations than unknowns (an “over-identified” system, ), it can never be that as the columns of can reach at most independent directions of the . This means that some rows contain either contradictory or redundant information given the statements in the other rows. Roughly, the -th row of is contradictory if the corresponding entry in , , can not be reached through choosing in a way consistent with the other rows, and it is redundant if is already implied by the restrictions imposed through the other rows (i.e., the other equations). Redundant information can be left out of the system without changing the result, but reducing the row dimension by 1. To see conflicting and redundant information in very simplistic examples, consider the following two equation systems:
In both systems, we know per the second equation that . In the first, we get and from the first and third equations, respectively, an information conflict. In the second, equations 1 and 3 give and , one of which is redundant because the other is already sufficient to conclude that .
A unique solution may potentially be obtained if (i) there is no conflicting information and (ii) we can eliminate enough redundant information to arrive at an equivalent square system. For this system, we can perform our usual determinant check.
The concepts of eigenvalues and -vectors and matrix definiteness have a purpose far beyond the context of invertibility, and presumably, you will come across them frequently throughout your master studies. However, as they don’t really belong to any of the overarching topics discussed in this course, and since they are also quite handy to check for invertability in square systems, their introduction is placed here. Before getting started, as with the determinant, make sure to keep in mind that only square matrices have eigenvalues and definiteness! Thus, unlike the rank, the associated invertibility criteria do not generalize to non-square systems!
Definition: Eigenvectors and -values.
Consider a square matrix . Then, is said to be an eigenvector of if there exists a such that . is then called an eigenvalue of .
To practice again some quantifier notation, you can try to write down the definition of an eigenvalue: We call an eigenvalue of if
Let’s think about intuitively what it means if : clearly, and are linearly dependent: , and thus, and lie on the same line through the origin! Note that if , then trivially, for any it holds that , so that any would constitute an eigenvalue and make the definition meaningless. This is why we require that . On the other hand, can indeed be an eigenvalue, namely if there exists so that . Then, geometrically, is indeed equal to the origin.
The following focuses on the answers to two questions: (i) how can we find eigenvalues (and associated eigenvectors)? and (ii) what do the eigenvalues tell us about invertibility of the matrix?
Starting with (i), we can re-write the search for an eigenvector of for an eigenvalue candidate as a special system of linear equations: If is an eigenvector of for ,
Thus, we have a square system of equations with coefficient matrix and solution vector . Now, how does this help? Note that if there is an eigenvector of , it is not unique: if , for any , , and is also an eigenvector of associated with ! Thus, we are looking precisely for the situation where the square system does not have a unique solution, i.e. where and is not invertible! This suggests that we can find the eigenvalues of by solving
i.e. by setting the characteristic polynomial of to zero, or respectively, by finding its roots. You may already have seen applications of this method in different contexts.
Let us consider an example here to facilitate understanding. Let . Then,
Solving can be done with the p-q-formula: is equivalent to
Consequently, our eigenvalue candidates are and . To find the eigenvectors, we have to solve the equation system: for , . Clearly, you can see that this matrix does not have full rank and thus a multitude of solutions. is equivalent to or respectively, . Thus, the eigenvectors of are multiples of . The set of all these vectors is is the so-called eigenspace of . For , the eigenvectors are multiples of and the eigenspace of is . Note that an eigenvalue may occur “more than once” and generally be associated with multiple linearly independent eigenvectors. In such a case, we still define the eigenspace as the set of linear combinations of all linearly independent eigenvectors associated with the eigenvalue.
To the second question, how do eigenvalues help in determining invertibility? This is very simple:
Proposition: Eigenvalues and Invertibility.
Let . Then, is invertible if and only if all eigenvalues of are non-zero.
The simple reason for this relationship is that is invertible if and only if
which is the case if and only if is not an eigenvalue of .
The practical value is that sometimes, you may have already computed the eigenvalues of a matrix before investigating its invertibility. Then, this proposition can help you avoid the additional step of computing the determinant.
Now, coming to the last concept: definiteness. Let’s first look at the definition:
Definition: Definiteness of a Matrix.
A symmetric square matrix is called
Otherwise, it is called indefinite.
Note that the concept not only applies only to square matrices, but also requires them to be symmetric! Further, we exclude the zero vector from definiteness because for all matrices . The concept’s relation to invertibility is the given through the following characterization:
Proposition: Definiteness and Eigenvalues.
A symmetric square matrix is
To see this relationship intuitively, note that for any eigenvalue with eigenvector , it holds that
where is the Euclidean norm (recall the sum-of-squares property of the scalar product of a vector with itself). Since the norm is non-negative, the sign of corresponds to the sign of .
Thus, with the two previous Propositions, the following corollary emerges:
Corollary: Definiteness and Invertibility.
If is symmetric and positive definite or negative definite, it is invertible.
This follows because positive and negative definiteness rule out zero eigenvalues. Thus, positive and negative definiteness are sufficient conditions for invertibility!
Definiteness is a useful criterion especially for matrices of the form with a general matrix of dimension . The form ensures that the matrix is both square and symmetric, and hence, its definiteness is defined. Further, it is at least positive semi-definite, as for any ,
To achieve positive definiteness, it must be ruled out that , as otherwise, the sum of squares is always strictly positive. Suppose that . Then, what does mean? We can re-write in column notation. Then, it is easily verified that . Thus, if for , there exists a linear combination of the columns of with non-zero coefficients that is equal to zero. Thus, some columns of must be linearly dependent, i.e. .
Therefore, any matrix where has full column rank is invertible by the definiteness criterion. This comes in handy for instance in the context of the OLS-estimator that we had already briefly considered above, . Indeed, in the OLS context, we make a specific “no-multi-collinearity” assumption that ensures .
After the extensive discussion of invertibility above, let’s finally discuss how, if we have established invertibility, we can actually compute the inverse matrix. Indeed, if you managed to follow what we did thus far, you actually know already how this works: we can apply the same elementary operations that we use to transform an invertible matrix to the identity matrix to an identity matrix and arrive at the inverse.
Before turning to an example on how this works, we need to briefly worry about whether this always works! Thus far, we only know that when we can apply the suggested procedure, then we have found the inverse, but it remains to establish formally that it also holds that whenever there exists an inverse, the suggested procedure will identify it. This is indeed true:
Theorem: Gauß-Jordan Algorithm Validity.
Suppose that is an invertible matrix. Then, we can apply elementary operations in ascending order of the index to to arrive at the identity matrix , and the inverse can be determined as .
The interested reader can find the proof in the companion script to develop a deeper understanding of why this relationship holds.
In applying the procedure in practice, to keep things tractable, we write the identity matrix next to the matrix to be inverted and perform operations iteratively. To convert to the identity, you will first want to bring it to upper triangular form; as we have seen, from there on out, it’s really simple.
Let us consider an example: Start from the matrix
First, we want to know whether it is invertible — a quick check of the determinant criterion (that we can apply because the matrix is square), using e.g. our formula (do it!), yields , so the matrix is indeed invertible. So, let’s start the procedure by writing the matrix next to an identity matrix of appropriate dimension:
Our first goal is to get a triangular block in the upper left corner — this is always a good start. For this, because the (1,1) entry is non-zero, we add times row 1 to row 2 to eliminate the at position (2,1) (if the (1,1) entry was zero, we could simply exchange rows). Applying this transformation to both matrices gives
Now, note that we want ones on the diagonal. Thus, we multiply rows 1 and 2 by and , respectively:
One more step to get to upper triangular form — add row 2 to row 3:
Let’s get our last one on the diagonal by multiplying the last column with :
Now, we have our triangular form and we’re almost there. First, get rid of the non-zero entry in position (2,3) by adding times row 3 to row 2:
Finally, it remains to add times row 2 to row :
Thus, our algorithm tells us that . If you’re suspicious and don’t fully trust the abstract concept, verify that !
The example given here was very extensive, usually, to save space and time, you would produce the zeros for the triangular form in the same step where you set the diagonal elements to one. If doing so, you may only need three steps, or even less, depending on how many transformations you manage to track in one step. Being less extensive will help you save time in exams or at problem sets, but when going fast you are also more prone to errors, so watch out!
Finally, when considering a matrix, there is a rule that allows us to avoid the algorithm:
Proposition: Inverse of a Matrix.
Consider the matrix where . Then,
This fact gives the arguably quickest way to compute the inverse for any matrix and it is worthwhile memorizing.
Let us summarize what we have concluded for equation systems and invertibility:
Before moving on, if you feel like testing your understanding of the concepts discussed since the last quiz, you can take a short quiz found here.