College Math Teaching

January 30, 2015

Nilpotent ring elements

Filed under: advanced mathematics, algebra, matrix algebra, ring theory — Tags: , — collegemathteaching @ 3:12 am

I’ve been trying to brush up on ring theory; it has been a long time since I studied rings in any depth and I need some ring theory to do some work in topology. In a previous post, I talked about ideal topologies and I might discuss divisor toplogies (starting with the ring of integers).

So, I grabbed an old text, skimmed the first part and came across an exercise:

an element x \in R is nilpotent if there is some positive integer n such that x^n = 0 . So, given x, y nilpotent in a commutative ring R one has to show that x+y is also nilpotent and that this result might not hold if R is not a commutative ring.

Examples: in the ring Z_9, 3^2 =0 so 3 is nilpotent. In the matrix ring of 2 by 2 matrices,

\left( \begin{array}{cc}  0 & 0 \\   1 & 0 \end{array} \right) and \left( \begin{array}{cc}  0 & 1 \\   0 & 0 \end{array} \right) are both nilpotent elements, though their sum:

\left( \begin{array}{cc}  0 & 1 \\   1 & 0 \end{array} \right) is not; the square of this matrix is the identity matrix.

Immediately I thought to let m, n be the smallest integers for x^m =y^n = 0 and thought to apply the binomial theorem to (x+y)^{mn} (of course that is overkill; it is simpler to use (x+y)^{m+n} . Lets use (x+y)^{m+n} . I could easily see why x^{m+n} = y^{m+n} =0 but why were the middle terms {m+n \choose k} x^{(m+n)-k}y^k also zero?

Then it dawned on me: x^n=0 \rightarrow x^{n+k}=0 for all k \geq 0 . Duh. Now it made sense. 🙂

May 4, 2014

How to create tridiagonal matrices in Matlab (any size)

Filed under: linear albegra, matrix algebra, numerical methods, pedagogy — Tags: , , — collegemathteaching @ 1:38 am

Suppose we wanted to create a tridiagonal matrix in Matlab and print it to a file so it would be used in a routine. For demonstration purposes: let’s create this 10 x 10 matrix:

\left( \begin{array}{cccccccccc}  1 & e^{-1} &0 & 0 & 0 &0 &0 &0 &0 &0\\  \frac{1}{4} & \frac{1}{2} & e^{-2} &0 &0 &0 &0 &0 &0 &0\\  0 &  \frac{1}{9} & \frac{1}{3} & e^{-3} & 0 &0 &0 &0 &0 & 0 \\  0 & 0 &  \frac{1}{16}  & \frac{1}{4} & e^{-4} & 0 &0 &0 &0 &0 \\  0 & 0 & 0 & \frac{1}{25} & \frac{1}{5} & e^{-5} & 0 &0 &0 &0 \\  0 & 0 & 0 & 0 & \frac{1}{36} & \frac{1}{6} & e^{-6} & 0 & 0 &0 \\  0 & 0 & 0 & 0 & 0 & \frac{1}{49} & \frac{1}{7} & e^{-7} & 0 & 0 \\  0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{64} & \frac{1}{8} & e^{-8} & 0 \\  0 & 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{81} & \frac{1}{9} & e^{-9} \\  0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{100} & \frac{1}{10}   \end{array} \right)

To take advantage of Matlab’s “sparse” command we should notice the pattern of the entries.
The diagonals: m_{(i,i)} = \frac{1}{i} for i \in \{1, 2, ...10 \} .
The “right of diagonal” entries: m_{(i, i+1)} = e^{-i} for i \in \{1, 2, ...9 \}
The “left of diagonal” entries: m_{(i-1, i)} = \frac{1}{i^2} for i \in \{2, 3, ...10 \}

Now we need to set up Matlab vectors to correspond to these indices.

First, we need to set up a vector for the “row index” entries for: the diagonal entries, the right of diagonal entries, then the left of diagonal entries.
One way to do this is with the command “i = [1:10 1:9 2:10]; ” (without the quotes, of course).
What this does: it creates a list of indices for the row value of the entries: 1, 2, 3,…10 for the diagonals, 1, 2, 3, …9 for the right of diagonals, and 2, 3, …10 for the left of diagonals.

Now, we set up a vector for the “column index” entries for: the diagonal entries, the right of diagonal entries and the left of diagonal entries.
We try: “j = [1:10 2:10 1:9]; ”
What this does: it creates a list of indices for the column value of the entries: 1, 2, 3, ..10 for the diagonals, 2, 3, …10 for the right of diagonals and 1, 2, …9 for the left of diagonals.

As a pair, (i,j) goes (1,1), (2,2), …(10,10), (1,2), (2,3), ….(9, 10), (2,1), (3,2)…..(10, 9).

Now, of course, we need to enter the desired entries in the matrix: we want to assign \frac{1}{i} to entry (i,i), \frac{1}{i^2} to entry (i, i-1) and e^{-i} to entry (i, i+1).

So we create the following vectors: I’ll start with “MD = 1:10 ” to get a list 1, 2, 3, ..10 and then “D = 1./M” to get a vector with the reciprocal values. To get the left of diagonal values, use “ML = 2:10” and then “L = 1./ML.^2 “. Now get the list of values for the right of diagonal entries: “MU = 1:9” then “U = exp(-MU)”.

Now we organize the values as follows: “s = [D L U]”. This provides a vector whose first 10 entries are D, next 9 are L and next 9 are U (a list concatenation) which is in one to one correspondence with (i,j).

We can then generate the matrix with the command (upper/lower cases are distinguished in Matlab): “S =sparse(i, j, s)”.

What this does: this creates a matrix S and assigns each (i,j) entry the value stored in s that corresponds to it. The remaining entries are set to 0 (hence the name “sparse”).

Let’s see how this works:


(click to see a larger size)

Now what does this put into the matrix S?


Note how the non-zero entries of S are specified; nothing else is.

Now suppose you want to store this matrix in a file that can be called by a Matlab program. One way is to write to the file with the following command:

“dlmwrite(‘c:\matlabr12\work\tridag.dat’, S , ‘ ‘)”

The first entry tells what file to send the entries to (this has to be created ahead of time). The next is the matrix (in our case, S) and the last entry tells how to delineate the entries; the default is a “comma” and the Matlab programs I am using requires a “space” instead, hence the specification of ‘ ‘ .

Now this is what was produced by this process:


Suppose now, you wish to produce an augmented matrix. We have to do the following:
add row indices (which, in our case, range from 1 to 10), column indices (column 11 for each row) and the augmented entries themselves, which I’ll call 1, 2, 3, …10.

Here is the augmenting vector:

>> B = [1 2 3 4 5 6 7 8 9 10];

Here is how we modify i, j, and s:

>> i =[1:10 2:10 1:9 1:10];
>> j = [1:10 1:9 2:10 11*ones(1,10)];
>> s = [D L U B];
>> S = sparse(i, j, s);

For “i”: we append 1:10 which gives rows 1 through 10.
For “j”: we created a vector of all 11’s as each augmented entry will appear in the 11’th column of each row.
That is, to our (i,j) list we added (1,11), (2,11), (3, 11), ….(10, 11) to the end.
Now we add B to our S: the list of non-zero matrix entries.


“dlmwrite(‘c:\matlabr12\work\tridage.dat’, S , ‘ ‘)”



So now you are ready to try out matrix manipulation and algebra with (relatively) large size matrices; at least sparse ones.

May 26, 2012

Eigenvalues, Eigenvectors, Eigenfunctions and all that….

The purpose of this note is to give a bit of direction to the perplexed student.

I am not going to go into all the possible uses of eigenvalues, eigenvectors, eigenfuntions and the like; I will say that these are essential concepts in areas such as partial differential equations, advanced geometry and quantum mechanics:

Quantum mechanics, in particular, is a specific yet very versatile implementation of this scheme. (And quantum field theory is just a particular example of quantum mechanics, not an entirely new way of thinking.) The states are “wave functions,” and the collection of every possible wave function for some given system is “Hilbert space.” The nice thing about Hilbert space is that it’s a very restrictive set of possibilities (because it’s a vector space, for you experts); once you tell me how big it is (how many dimensions), you’ve specified your Hilbert space completely. This is in stark contrast with classical mechanics, where the space of states can get extraordinarily complicated. And then there is a little machine — “the Hamiltonian” — that tells you how to evolve from one state to another as time passes. Again, there aren’t really that many kinds of Hamiltonians you can have; once you write down a certain list of numbers (the energy eigenvalues, for you pesky experts) you are completely done.

(emphasis mine).

So it is worth understanding the eigenvector/eigenfunction and eigenvalue concept.

First note: “eigen” is German for “self”; one should keep that in mind. That is part of the concept as we will see.

The next note: “eigenfunctions” really are a type of “eigenvector” so if you understand the latter concept at an abstract level, you’ll understand the former one.

The third note: if you are reading this, you are probably already familiar with some famous eigenfunctions! We’ll talk about some examples prior to giving the formal definition. This remark might sound cryptic at first (but hang in there), but remember when you learned \frac{d}{dx} e^{ax} = ae^{ax} ? That is, you learned that the derivative of e^{ax} is a scalar multiple of itself? (emphasis on SELF). So you already know that the function e^{ax} is an eigenfunction of the “operator” \frac{d}{dx} with eigenvalue a because that is the scalar multiple.

The basic concept of eigenvectors (eigenfunctions) and eigenvalues is really no more complicated than that. Let’s do another one from calculus:
the function sin(wx) is an eigenfunction of the operator \frac{d^2}{dx^2} with eigenvalue -w^2 because \frac{d^2}{dx^2} sin(wx) = -w^2sin(wx). That is, the function sin(wx) is a scalar multiple of its second derivative. Can you think of more eigenfunctions for the operator \frac{d^2}{dx^2} ?

Answer: cos(wx) and e^{ax} are two others, if we only allow for non zero eigenvalues (scalar multiples).

So hopefully you are seeing the basic idea: we have a collection of objects called vectors (can be traditional vectors or abstract ones such as differentiable functions) and an operator (linear transformation) that acts on these objects to yield a new object. In our example, the vectors were differentiable functions, and the operators were the derivative operators (the thing that “takes the derivative of” the function). An eigenvector (eigenfunction)-eigenvalue pair for that operator is a vector (function) that is transformed to a scalar multiple of itself by the operator; e. g., the derivative operator takes e^{ax} to ae^{ax} which is a scalar multiple of the original function.

Formal Definition
We will give the abstract, formal definition. Then we will follow it with some examples and hints on how to calculate.

First we need the setting. We start with a set of objects called “vectors” and “scalars”; the usual rules of arithmetic (addition, multiplication, subtraction, division, distributive property) hold for the scalars and there is a type of addition for the vectors and scalars and the vectors “work together” in the intuitive way. Example: in the set of, say, differentiable functions, the scalars will be real numbers and we have rules such as a (f + g) =af + ag , etc. We could also use things like real numbers for scalars, and say, three dimensional vectors such as [a, b, c] More formally, we start with a vector space (sometimes called a linear space) which is defined as a set of vectors and scalars which obey the vector space axioms.

Now, we need a linear transformation, which is sometimes called a linear operator. A linear transformation (or operator) is a function L that obeys the following laws: L(\vec{v} + \vec{w}) = L(\vec{v}) + L(\vec{w} ) and L(a\vec{v}) = aL(\vec{v}) . Note that I am using \vec{v} to denote the vectors and the undecorated variable to denote the scalars. Also note that this linear transformation L might take one vector space to a different vector space.

Common linear transformations (and there are many others!) and their eigenvectors and eigenvalues.
Consider the vector space of two-dimensional vectors with real numbers as scalars. We can create a linear transformation by matrix multiplication:

L([x,y]^T) = \left[ \begin{array}{cc} a & b \\ c & d \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]=\left[ \begin{array}{c} ax+ by \\ cx+dy \end{array} \right]  (note: [x,y]^T is the transpose of the row vector; we need to use a column vector for the usual rules of matrix multiplication to apply).

It is easy to check that the operation of matrix multiplying a vector on the left by an appropriate matrix is yields a linear transformation.
Here is a concrete example: L([x,y]^T) = \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]=\left[ \begin{array}{c} x+ 2y \\ 3y \end{array} \right]

So, does this linear transformation HAVE non-zero eigenvectors and eigenvalues? (not every one does).
Let’s see if we can find the eigenvectors and eigenvalues, provided they exist at all.

For [x,y]^T to be an eigenvector for L , remember that L([x,y]^T) = \lambda [x,y]^T for some real number \lambda

So, using the matrix we get: L([x,y]^T) = \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]= \lambda \left[ \begin{array}{c} x \\ y \end{array} \right] . So doing some algebra (subtracting the vector on the right hand side from both sides) we obtain \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] - \lambda \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

At this point it is tempting to try to use a distributive law to factor out \left[ \begin{array}{c} x \\ y \end{array} \right] from the left side. But, while the expression makes sense prior to factoring, it wouldn’t AFTER factoring as we’d be subtracting a scalar number from a 2 by 2 matrix! But there is a way out of this: one can then insert the 2 x 2 identity matrix to the left of the second term of the left hand side:
\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] - \lambda\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

Notice that by doing this, we haven’t changed anything except now we can factor out that vector; this would leave:
(\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right]  - \lambda\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] )\left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

Which leads to:

(\left[ \begin{array}{cc} 1-\lambda & 2 \\ 0 & 3-\lambda \end{array} \right] ) \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

Now we use a fact from linear algebra: if [x,y]^T is not the zero vector, we have a non-zero matrix times a non-zero vector yielding the zero vector. This means that the matrix is singular. In linear algebra class, you learn that singular matrices have determinant equal to zero. This means that (1-\lambda)(3-\lambda) = 0 which means that \lambda = 1, \lambda = 3 are the respective eigenvalues. Note: when we do this procedure with any 2 by 2 matrix, we always end up with a quadratic with \lambda as the variable; if this quadratic has real roots then the linear transformation (or matrix) has real eigenvalues. If it doesn’t have real roots, the linear transformation (or matrix) doesn’t have non-zero real eigenvalues.

Now to find the associated eigenvectors: if we start with \lambda = 1 we get
(\left[ \begin{array}{cc} 0 & 2 \\ 0 & 2 \end{array} \right]  \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] which has solution \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] . So that is the eigenvector associated with eigenvalue 1.
If we next try \lambda = 3 we get
(\left[ \begin{array}{cc} -2 & 2 \\ 0 & 0 \end{array} \right]  \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] which has solution \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] . So that is the eigenvector associated with the eigenvalue 3.

In the general “k-dimensional vector space” case, the recipe for finding the eigenvectors and eigenvalues is the same.
1. Find the matrix A for the linear transformation.
2. Form the matrix A - \lambda I which is the same as matrix A except that you have subtracted \lambda from each diagonal entry.
3. Note that det(A - \lambda I) is a polynomial in variable \lambda ; find its roots \lambda_1, \lambda_2, ...\lambda_n . These will be the eigenvalues.
4. Start with \lambda = \lambda_1 Substitute this into the matrix-vector equation det(A - \lambda I) \vec{v_1} = \vec{0} and solve for \vec({v_1} . That will be the eigenvector associated with the first eigenvalue. Do this for each eigenvalue, one at a time. Note: you can get up to k “linearly independent” eigenvectors in this manner; that will be all of them.

Practical note
Yes, this should work “in theory” but practically speaking, there are many challenges. For one: for equations of degree 5 or higher, it is known that there is no formula that will find the roots for every equation of that degree (Galios proved this; this is a good reason to take an abstract algebra course!). Hence one must use a numerical method of some sort. Also, calculation of the determinant involves many round-off error-inducing calculations; hence sometimes one must use sophisticated numerical techniques to get the eigenvalues (a good reason to take a numerical analysis course!)

Consider a calculus/differential equation related case of eigenvectors (eigenfunctions) and eigenvalues.
Our vectors will be, say, infinitely differentiable functions and our scalars will be real numbers. We will define the operator (linear transformation) D^n = \frac{d^n}{dx^n} , that is, the process that takes the n’th derivative of a function. You learned that the sum of the derivatives is the derivative of the sums and that you can pull out a constant when you differentiate. Hence D^n is a linear operator (transformation); we use the term “operator” when we talk about the vector space of functions, but it is really just a type of linear transformation.

We can also use these operators to form new operators; that is (D^2 + 3D)(y) = D^2(y) + 3D(y) = \frac{d^2y}{dx^2} + 3\frac{dy}{dx} We see that such “linear combinations” of linear operators is a linear operator.

So, what does it mean to find eigenvectors and eigenvalues of such beasts?

Suppose we with to find the eigenvectors and eigenvalues of (D^2 + 3D) . An eigenvector is a twice differentiable function y (ok, we said “infinitely differentiable”) such that (D^2 + 3D) = \lambda y or \frac{d^2y}{dx^2} + 3\frac{dy}{dx} = \lambda y which means \frac{d^2y}{dx^2} + 3\frac{dy}{dx} - \lambda y = 0 . You might recognize this from your differential equations class; the only “tweak” is that we don’t know what \lambda is. But if you had a differential equations class, you’d recognize that the solution to this differential equation depends on the roots of the characteristic equation m^2 + 3m - \lambda = 0 which has solutions: m = -\frac{3}{2} \pm \frac{\sqrt{9-4\lambda}}{2} and the solution takes the form e^{m_1}, e^{m_2} if the roots are real and distinct, e^{ax}sin(bx), e^{ax}cos(bx) if the roots are complex conjugates a \pm bi and e^{m}, xe^{m} if there is a real, repeated root. In any event, those functions are the eigenfunctions and these very much depend on the eigenvalues.

Of course, reading this little note won’t make you an expert, but it should get you started on studying.

I’ll close with a link on how these eigenfunctions and eigenvalues are calculated (in the context of solving a partial differential equation).

May 3, 2012

Composing a non-constant analytic function with a non-analytic one, part II

Filed under: advanced mathematics, analysis, calculus, complex variables, matrix algebra — collegemathteaching @ 6:40 pm

I realize that what I did in the previous post was, well, lame.
The setting: let g be continuous but non-analytic in some disk D in the complex plane, and let f be analytic in g(D) which, for the purposes of this informal note, we will take to contain an open disk. If g(D) doesn’t contain an open set or if the partials of g fail to exist, the question of f(g) being analytic is easy and uninteresting.

Let f(r + is ) = u(r,s) + iv(r,s) and g(x+iy) = r(x,y) + is(x,y) where u, v, r, s are real valued functions of two variables which have continuous partial derivatives. Assume that u_r = v_s and u_s = -v_r (the standard Cauchy-Riemann equations) in the domain of interest and that either r_x \neq s_y or r_y \neq -s_x in our domain of interest.

Now if the composition f(g) is analytic, then the Cauchy-Riemann equations must hold; that is:
\frac{\partial u}{\partial x} = \frac{\partial v}{\partial y}, \frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x}

Now use the chain rule and do some calculation:
From the first of these equations:
u_r r_x + u_s s_x = v_r r_y + v_s s_y
u_r r_y + u_s s_y = -v_r r_x - v_s s_x
By using the C-R equations for u, v we can substitute:
u_r r_x + u_s s_x = -u_s r_y + u_r s_y
u_r r_y + u_s s_y = u_s r_x - u_r s_x
This leads to the following system of equations:
u_r(r_x -s_y) + u_s(s_x + r_y) = 0
u_r(r_y + s_x) + u_s(s_y - r_x) = 0
This leads to the matrix equation:
\left( \begin{array}{cc}(r_x -s_y) & (s_x + r_y)  \\(s_x + r_y) & (s_y - r_x)  \end{array} \right)\  \left(\begin{array}{c}u_r \\u_s \end{array}\right)\ = \left(\begin{array}{c} 0 \\ 0 \end{array}\right)\

The coefficient matrix has determinant -((r_x - s_y)^2 + (s_x + r_y)^2) which is zero when BOTH (r_x - s_y) and (s_x + r_y) are zero, which means that the Cauchy-Riemann equations for g hold. Since that is not the case, the system of equations has only the trivial solution which means u_r = u_s = 0 which implies (by C-R for f ) that v_r = v_s = 0 which implies that f is constant.

This result includes the “baby result” in the previous post.

December 9, 2011

Striking a balance between precision and being intelligible

Ok, what do we mean by: x + 2 = 1 ? Now, what do we mean by (A+B)x + (B-A) = 1 ? Of course, the answer is “it depends”. The most common use of the first “equation” is “find the real number x such that that number added to 2 equals 1.” In the second case, the most common use is “find real numbers A, B such that this equation is true for all real x .

In short, we are using the equal sign very differently: in the first case we are using it as the equivalence relation in the field of real numbers. In the second case, we are really talking about vector space equivalence.

We see this multiple use in calculus all the time; for example \int \int_{A} f dx dy = \int \int_{A} f dy dx but \int \int_{A} f dx\wedge dy = -\int \int_{A} f dy\wedge dx Of course, the first is the usual non-oriented integral that we talk about in calculus courses (absolute values of the Jacobians!) and the latter is the oriented integral that we use for 2-forms, which, when you think about it, is the logical extension of the usual calculus I definite integral.

There are certainly more examples.

What got me to thinking about this was an office hour encounter I had with a numerical methods student (a good student who is doing solid work in the course). We were talking about various methods of solving the matrix problem AX = B where X is a column vector of variables and B is the “answer” vector of numbers. We were discussing the number of operations (multiplications/divisions and additions/subtractions) required to obtain a solution if we had that A = LDU where D was a diagonal matrix with non-zero entries, L, U are lower and upper triangular matrices (respectively) with 1’s on the diagonal.

She kept on being off by a peculiar factor on the multiplication count.

Eventually we figured out the problem. When we converted the matrix equations to equations, she was counting the matrix entry multiplied by the unsolved for variables as a multiplication. Why? Well, once we solved for the variable we then counted operations with it AFTER it had been “solved for”. Example: given a_{1,1}x_1 +a_{1,2}x_2 = 3, a_{2,2}x_2=5 we don’t count the “coefficient times the variable” as a multiplication. But once we solve and obtain x_2  = \frac{5}{a_{2,2}} we then count operations involving x_2 . (of course, the diagonal elements are non-zero).

It is clear why we do this: prior to being solved for, the variables are really storage locations, and we are interested in counting the numerical operations that can contribute to round off error. But when we think about it, we are actually distinguishing between several types of multiplications: matrix multiplication, scalar multiplication in a vector space between a vector and a scalar, and the scalar (numerical) multiplication.

However, explaining that in class might lead to confusion among the students; it is probably best to bring this up only if someone is confused about it.

The language of mathematics can be so subtle that sometimes it probably good pedagogy to speak a bit informally, at least to beginning students.

Blog at