College Math Teaching

July 23, 2023

Every Vector Space has a basis

Note: I will probably make a video for this. AND this post (and video to come) is for beginners. The pace will be too slow for the seasoned math student.

Let V be a vector space with some scalars \alpha \in F . I’ll assume you know what a spanning set is and what it means to be linearly independent.

Recall: a basis is a linearly independent spanning set. We’d like to prove that every vector space HAS a basis.

Those new to linear algebra might wonder why this even requires proof. So, I’ll say a few words about some vector spaces that have an infinite number of basis vectors in their basis.

  1. Polynomial space: here the vectors are polynomials in x , say, 3 + 2x -x^2 +\pi x^3 . The coefficients are real numbers and the set of scalars F will be the real numbers. (note: the set of scalars are sometimes called a “field”). So one basis (there are many others) is 1, x , x^2, ....x^k.... and since there is no limit to the degree of the polynomial, the basis must have an infinite number of vectors.

Note: vector addition is FINITE. So, though you may have learned in calculus class that { 1 \over 1-x} =1+x+x^2 + x^3.....+x^k + x^{k+1} .... (x \in (-1, 1) ) this definition depends on infinite series, which in turn requires a limiting process, which then requires a notion of “being close” (the delta- epsilon stuff) In some vector spaces there IS a way to do that, but you need the notion of “inner product” to define size and closeness (remember the dot product from calculus?) and then you can introduce ideas like convergence of an infinite sum. For example, check out Hilbert Spaces . But such operations can be thought of as an extension of vector space addition; they are NOT the addition operation itself.

2. The vector space of all real variable functions that are continuous f on [0,1] . The scalars will be the real numbers. Now this is a weird vector space. Remember that what is included are things like polynomials, rational functions without roots in [0,1] , exponential functions, trig functions whose domain includes the unit interval, functions like ln(2+x) and even piecewise defined functions whose graphs meet up at all points. And that is only the beginning.

Any basis of this beast will be impossible to list exactly, and, in fact, no basis will be able to be put into one to one correspondence with the positive integers (we say the basis is uncountably infinite)

But this vector space indeed has a basis, as we shall see.

So, how do we prove our assertion that every vector space has a basis?

Let V be our non-empty vector space. There is some vector in it, say \vec{v_1} . Let V_1 denote the span of this vector. Now if the span is not all of V we can find \vec{v_2} not in the span. Let the span of \{\vec{v_1}, \vec{v_2} \} be denoted by V_2 . If V_2 = V we have our basis, we are done. Otherwise, we continue on.

We continue indefinitely. And here is where some set theory comes in: our index set might well become infinite. But that is ok; by looking at the span of vectors \cup_{\gamma} \vec{v_{\gamma}} =V_{\gamma} we obtain a chain of nested subsets V_1 \subset V_2 \subset .... V_k \subset......V_{\gamma} .... and this chain has an upper bound, namely V , the given vector space itself.

Now we have to use some set theory. Zorn’s Lemma (which is equivalent to the axiom of choice) implies that any ordered chain of subsets that has an upper bound has a maximal element; that is, a set that contains ALL of the other sets in the order. So, in this chain (the one that we just constructed), call that set V^*.

Now we claim that the set of vectors that span v^* are linearly independent and span all of V.

Proof of claim: remembering that vector addition has only a positive number of summands, any FINITE sum of vectors, say

\vec{v_{n1}} + ... \vec{v_{nk}}

must lie in some V_{\gamma} and, by construction, are linearly dependent (order these vectors and remember how we got them: we added vectors by adding what was NOT in the span of the previous vectors.

Now to show that they span: suppose \vec{x} is NOT in the span. Then let W = V^{*} \cup \{\vec{x} \} . This is then in the chain and contains V^* which violates the fact that V^{*} is maximal.

So, we now have our basis.

May 26, 2012

Eigenvalues, Eigenvectors, Eigenfunctions and all that….

The purpose of this note is to give a bit of direction to the perplexed student.

I am not going to go into all the possible uses of eigenvalues, eigenvectors, eigenfuntions and the like; I will say that these are essential concepts in areas such as partial differential equations, advanced geometry and quantum mechanics:

Quantum mechanics, in particular, is a specific yet very versatile implementation of this scheme. (And quantum field theory is just a particular example of quantum mechanics, not an entirely new way of thinking.) The states are “wave functions,” and the collection of every possible wave function for some given system is “Hilbert space.” The nice thing about Hilbert space is that it’s a very restrictive set of possibilities (because it’s a vector space, for you experts); once you tell me how big it is (how many dimensions), you’ve specified your Hilbert space completely. This is in stark contrast with classical mechanics, where the space of states can get extraordinarily complicated. And then there is a little machine — “the Hamiltonian” — that tells you how to evolve from one state to another as time passes. Again, there aren’t really that many kinds of Hamiltonians you can have; once you write down a certain list of numbers (the energy eigenvalues, for you pesky experts) you are completely done.

(emphasis mine).

So it is worth understanding the eigenvector/eigenfunction and eigenvalue concept.

First note: “eigen” is German for “self”; one should keep that in mind. That is part of the concept as we will see.

The next note: “eigenfunctions” really are a type of “eigenvector” so if you understand the latter concept at an abstract level, you’ll understand the former one.

The third note: if you are reading this, you are probably already familiar with some famous eigenfunctions! We’ll talk about some examples prior to giving the formal definition. This remark might sound cryptic at first (but hang in there), but remember when you learned \frac{d}{dx} e^{ax} = ae^{ax} ? That is, you learned that the derivative of e^{ax} is a scalar multiple of itself? (emphasis on SELF). So you already know that the function e^{ax} is an eigenfunction of the “operator” \frac{d}{dx} with eigenvalue a because that is the scalar multiple.

The basic concept of eigenvectors (eigenfunctions) and eigenvalues is really no more complicated than that. Let’s do another one from calculus:
the function sin(wx) is an eigenfunction of the operator \frac{d^2}{dx^2} with eigenvalue -w^2 because \frac{d^2}{dx^2} sin(wx) = -w^2sin(wx). That is, the function sin(wx) is a scalar multiple of its second derivative. Can you think of more eigenfunctions for the operator \frac{d^2}{dx^2} ?

Answer: cos(wx) and e^{ax} are two others, if we only allow for non zero eigenvalues (scalar multiples).

So hopefully you are seeing the basic idea: we have a collection of objects called vectors (can be traditional vectors or abstract ones such as differentiable functions) and an operator (linear transformation) that acts on these objects to yield a new object. In our example, the vectors were differentiable functions, and the operators were the derivative operators (the thing that “takes the derivative of” the function). An eigenvector (eigenfunction)-eigenvalue pair for that operator is a vector (function) that is transformed to a scalar multiple of itself by the operator; e. g., the derivative operator takes e^{ax} to ae^{ax} which is a scalar multiple of the original function.

Formal Definition
We will give the abstract, formal definition. Then we will follow it with some examples and hints on how to calculate.

First we need the setting. We start with a set of objects called “vectors” and “scalars”; the usual rules of arithmetic (addition, multiplication, subtraction, division, distributive property) hold for the scalars and there is a type of addition for the vectors and scalars and the vectors “work together” in the intuitive way. Example: in the set of, say, differentiable functions, the scalars will be real numbers and we have rules such as a (f + g) =af + ag , etc. We could also use things like real numbers for scalars, and say, three dimensional vectors such as [a, b, c] More formally, we start with a vector space (sometimes called a linear space) which is defined as a set of vectors and scalars which obey the vector space axioms.

Now, we need a linear transformation, which is sometimes called a linear operator. A linear transformation (or operator) is a function L that obeys the following laws: L(\vec{v} + \vec{w}) = L(\vec{v}) + L(\vec{w} ) and L(a\vec{v}) = aL(\vec{v}) . Note that I am using \vec{v} to denote the vectors and the undecorated variable to denote the scalars. Also note that this linear transformation L might take one vector space to a different vector space.

Common linear transformations (and there are many others!) and their eigenvectors and eigenvalues.
Consider the vector space of two-dimensional vectors with real numbers as scalars. We can create a linear transformation by matrix multiplication:

L([x,y]^T) = \left[ \begin{array}{cc} a & b \\ c & d \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]=\left[ \begin{array}{c} ax+ by \\ cx+dy \end{array} \right]  (note: [x,y]^T is the transpose of the row vector; we need to use a column vector for the usual rules of matrix multiplication to apply).

It is easy to check that the operation of matrix multiplying a vector on the left by an appropriate matrix is yields a linear transformation.
Here is a concrete example: L([x,y]^T) = \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]=\left[ \begin{array}{c} x+ 2y \\ 3y \end{array} \right]

So, does this linear transformation HAVE non-zero eigenvectors and eigenvalues? (not every one does).
Let’s see if we can find the eigenvectors and eigenvalues, provided they exist at all.

For [x,y]^T to be an eigenvector for L , remember that L([x,y]^T) = \lambda [x,y]^T for some real number \lambda

So, using the matrix we get: L([x,y]^T) = \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]= \lambda \left[ \begin{array}{c} x \\ y \end{array} \right] . So doing some algebra (subtracting the vector on the right hand side from both sides) we obtain \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] - \lambda \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

At this point it is tempting to try to use a distributive law to factor out \left[ \begin{array}{c} x \\ y \end{array} \right] from the left side. But, while the expression makes sense prior to factoring, it wouldn’t AFTER factoring as we’d be subtracting a scalar number from a 2 by 2 matrix! But there is a way out of this: one can then insert the 2 x 2 identity matrix to the left of the second term of the left hand side:
\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] - \lambda\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

Notice that by doing this, we haven’t changed anything except now we can factor out that vector; this would leave:
(\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right]  - \lambda\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] )\left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

Which leads to:

(\left[ \begin{array}{cc} 1-\lambda & 2 \\ 0 & 3-\lambda \end{array} \right] ) \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

Now we use a fact from linear algebra: if [x,y]^T is not the zero vector, we have a non-zero matrix times a non-zero vector yielding the zero vector. This means that the matrix is singular. In linear algebra class, you learn that singular matrices have determinant equal to zero. This means that (1-\lambda)(3-\lambda) = 0 which means that \lambda = 1, \lambda = 3 are the respective eigenvalues. Note: when we do this procedure with any 2 by 2 matrix, we always end up with a quadratic with \lambda as the variable; if this quadratic has real roots then the linear transformation (or matrix) has real eigenvalues. If it doesn’t have real roots, the linear transformation (or matrix) doesn’t have non-zero real eigenvalues.

Now to find the associated eigenvectors: if we start with \lambda = 1 we get
(\left[ \begin{array}{cc} 0 & 2 \\ 0 & 2 \end{array} \right]  \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] which has solution \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] . So that is the eigenvector associated with eigenvalue 1.
If we next try \lambda = 3 we get
(\left[ \begin{array}{cc} -2 & 2 \\ 0 & 0 \end{array} \right]  \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] which has solution \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] . So that is the eigenvector associated with the eigenvalue 3.

In the general “k-dimensional vector space” case, the recipe for finding the eigenvectors and eigenvalues is the same.
1. Find the matrix A for the linear transformation.
2. Form the matrix A - \lambda I which is the same as matrix A except that you have subtracted \lambda from each diagonal entry.
3. Note that det(A - \lambda I) is a polynomial in variable \lambda ; find its roots \lambda_1, \lambda_2, ...\lambda_n . These will be the eigenvalues.
4. Start with \lambda = \lambda_1 Substitute this into the matrix-vector equation det(A - \lambda I) \vec{v_1} = \vec{0} and solve for \vec({v_1} . That will be the eigenvector associated with the first eigenvalue. Do this for each eigenvalue, one at a time. Note: you can get up to k “linearly independent” eigenvectors in this manner; that will be all of them.

Practical note
Yes, this should work “in theory” but practically speaking, there are many challenges. For one: for equations of degree 5 or higher, it is known that there is no formula that will find the roots for every equation of that degree (Galios proved this; this is a good reason to take an abstract algebra course!). Hence one must use a numerical method of some sort. Also, calculation of the determinant involves many round-off error-inducing calculations; hence sometimes one must use sophisticated numerical techniques to get the eigenvalues (a good reason to take a numerical analysis course!)

Consider a calculus/differential equation related case of eigenvectors (eigenfunctions) and eigenvalues.
Our vectors will be, say, infinitely differentiable functions and our scalars will be real numbers. We will define the operator (linear transformation) D^n = \frac{d^n}{dx^n} , that is, the process that takes the n’th derivative of a function. You learned that the sum of the derivatives is the derivative of the sums and that you can pull out a constant when you differentiate. Hence D^n is a linear operator (transformation); we use the term “operator” when we talk about the vector space of functions, but it is really just a type of linear transformation.

We can also use these operators to form new operators; that is (D^2 + 3D)(y) = D^2(y) + 3D(y) = \frac{d^2y}{dx^2} + 3\frac{dy}{dx} We see that such “linear combinations” of linear operators is a linear operator.

So, what does it mean to find eigenvectors and eigenvalues of such beasts?

Suppose we with to find the eigenvectors and eigenvalues of (D^2 + 3D) . An eigenvector is a twice differentiable function y (ok, we said “infinitely differentiable”) such that (D^2 + 3D) = \lambda y or \frac{d^2y}{dx^2} + 3\frac{dy}{dx} = \lambda y which means \frac{d^2y}{dx^2} + 3\frac{dy}{dx} - \lambda y = 0 . You might recognize this from your differential equations class; the only “tweak” is that we don’t know what \lambda is. But if you had a differential equations class, you’d recognize that the solution to this differential equation depends on the roots of the characteristic equation m^2 + 3m - \lambda = 0 which has solutions: m = -\frac{3}{2} \pm \frac{\sqrt{9-4\lambda}}{2} and the solution takes the form e^{m_1}, e^{m_2} if the roots are real and distinct, e^{ax}sin(bx), e^{ax}cos(bx) if the roots are complex conjugates a \pm bi and e^{m}, xe^{m} if there is a real, repeated root. In any event, those functions are the eigenfunctions and these very much depend on the eigenvalues.

Of course, reading this little note won’t make you an expert, but it should get you started on studying.

I’ll close with a link on how these eigenfunctions and eigenvalues are calculated (in the context of solving a partial differential equation).

February 16, 2012

The “equals” sign: identities, equations to be solved and all that…

Here is the sort of thing that got me thinking about this topic: a colleague had a student complain about how one of her quiz problems was scored. The problem stated: “show that \sqrt{2} + \sqrt{3} \neq \sqrt{5} “. She was offended that her saying “\sqrt{x} + \sqrt{y} \neq \sqrt{x+y} ” wasn’t enough to receive credit and would NOT take his word for it. In fact, she took this to the student ombudsman!!!

But that raised the question: “what do we mean when we tell our students “\sqrt{x} + \sqrt{y} \neq \sqrt{x+y} “?

Of course, there are some central issues here. The first issues is that our “sure of herself” student thought that “\sqrt{x} + \sqrt{y} \neq \sqrt{x+y} ” meant that this relation is NEVER true for any choice of x, y , which of course, is false (e. g. let y = 0 and x \ge 0 .) In fact, \sqrt{x} + \sqrt{y} \neq \sqrt{x+y} is the logical negation of the statement \sqrt{x} + \sqrt{y} = \sqrt{x+y} ; the latter means that “this statement is true for ALL x, y and its negation means “there is at least one choice of x, y for which the statement is not true. “Equal” and “not equal” are not symmetric states when it comes identities, which can be thought of as elements in the vector space of functions.

So, \sqrt{x} + \sqrt{y} \neq \sqrt{x+y} means that \sqrt{x} + \sqrt{y} and \sqrt{x+y} are not equal in function space, though they might evaluate to the same number for certain choices in the domain.

So, what is the big deal?

Well, what about equations such as x^2 + 3x + 2 = 0 or y^{\prime \prime} + y = 0 ?
These are NOT equalities in the space of functions; the first means “what values in the domain does f^{-1}(0) take given f(x)=x^2 + 3x + 2 and the second asks one to find the inverse image of 0 for the operator D^2+1 where the domain is the set of all, say, twice differentiable functions.

But, but…would the average undergraduate student understand ANY of this? My experience tells me “no”; hence I intentionally allow for this vagueness and only address it as I need to.

December 9, 2011

Striking a balance between precision and being intelligible

Ok, what do we mean by: x + 2 = 1 ? Now, what do we mean by (A+B)x + (B-A) = 1 ? Of course, the answer is “it depends”. The most common use of the first “equation” is “find the real number x such that that number added to 2 equals 1.” In the second case, the most common use is “find real numbers A, B such that this equation is true for all real x .

In short, we are using the equal sign very differently: in the first case we are using it as the equivalence relation in the field of real numbers. In the second case, we are really talking about vector space equivalence.

We see this multiple use in calculus all the time; for example \int \int_{A} f dx dy = \int \int_{A} f dy dx but \int \int_{A} f dx\wedge dy = -\int \int_{A} f dy\wedge dx Of course, the first is the usual non-oriented integral that we talk about in calculus courses (absolute values of the Jacobians!) and the latter is the oriented integral that we use for 2-forms, which, when you think about it, is the logical extension of the usual calculus I definite integral.

There are certainly more examples.

What got me to thinking about this was an office hour encounter I had with a numerical methods student (a good student who is doing solid work in the course). We were talking about various methods of solving the matrix problem AX = B where X is a column vector of variables and B is the “answer” vector of numbers. We were discussing the number of operations (multiplications/divisions and additions/subtractions) required to obtain a solution if we had that A = LDU where D was a diagonal matrix with non-zero entries, L, U are lower and upper triangular matrices (respectively) with 1’s on the diagonal.

She kept on being off by a peculiar factor on the multiplication count.

Eventually we figured out the problem. When we converted the matrix equations to equations, she was counting the matrix entry multiplied by the unsolved for variables as a multiplication. Why? Well, once we solved for the variable we then counted operations with it AFTER it had been “solved for”. Example: given a_{1,1}x_1 +a_{1,2}x_2 = 3, a_{2,2}x_2=5 we don’t count the “coefficient times the variable” as a multiplication. But once we solve and obtain x_2  = \frac{5}{a_{2,2}} we then count operations involving x_2 . (of course, the diagonal elements are non-zero).

It is clear why we do this: prior to being solved for, the variables are really storage locations, and we are interested in counting the numerical operations that can contribute to round off error. But when we think about it, we are actually distinguishing between several types of multiplications: matrix multiplication, scalar multiplication in a vector space between a vector and a scalar, and the scalar (numerical) multiplication.

However, explaining that in class might lead to confusion among the students; it is probably best to bring this up only if someone is confused about it.

The language of mathematics can be so subtle that sometimes it probably good pedagogy to speak a bit informally, at least to beginning students.

Blog at WordPress.com.