College Math Teaching

October 4, 2016

Linear Transformation or not? The vector space operations matter.

Filed under: calculus, class room experiment, linear albegra, pedagogy — collegemathteaching @ 3:31 pm

This is nothing new; it is an example for undergraduates.

Consider the set R^+ = \{x| x > 0 \} endowed with the “vector addition” x \oplus y = xy where xy represents ordinary real number multiplication and “scalar multiplication r \odot x = x^r where r \in R and x^r is ordinary exponentiation. It is clear that \{R^+, R | \oplus, \odot \} is a vector space with 1 being the vector “additive” identity and 0 playing the role of the scalar zero and 1 playing the multiplicative identity. Verifying the various vector space axioms is a fun, if trivial exercise.

Now consider the function L(x) = ln(x) with domain R^+ . (here: ln(x) is the natural logarithm function). Now ln(xy) = ln(x) + ln(y) and ln(x^a) = aln(x) . This shows that L:R^+ \rightarrow R (the range has the usual vector space structure) is a linear transformation.

What is even better: ker(L) =\{x|ln(x) = 0 \} which shows that ker(L) = \{1 \} so L is one to one (of course, we know that from calculus).

And, given z \in R, ln(e^z) = z so L is also onto (we knew that from calculus or precalculus).

So, R^+ = \{x| x > 0 \} is isomorphic to R with the usual vector operations, and of course the inverse linear transformation is L^{-1}(y) = e^y .

Upshot: when one asks “is F a linear transformation or not”, one needs information about not only the domain set but also the vector space operations.

March 13, 2015

Moving from “young Turk” to “old f***”

Filed under: calculus, class room experiment, editorial, pedagogy — Tags: , , — collegemathteaching @ 9:09 pm

Today, one of our hot “young” (meaning: new here) mathematicians came to me and wanted to inquire about a course switch. He noted that his two course load included two different courses (two preparations) and that I was teaching different sections of the same two courses…was I interested in doing a course swap so that he had only one preparation (he is teaching 8 hours) and I’d only have two?

I said: “when I was your age, I minimized the number of preparations. But at my age, teaching two sections of the same low level course makes me want to bash my head against the wall”. That is, by my second lesson of the same course in the same day; I just want to be just about anywhere else on campus; I have no interest, no enthusiasm, etc.

I specifically REQUESTED 3 preparations to keep myself from getting bored; that is what 24 years of teaching this stuff does to you.

Every so often, someone has the grand idea to REFORM the teaching of (whatever) and the “reformers” usually get at least a few departments to go along with it.

The common thing said is that it gets professors to reexamine their teaching of (whatever).

But I wonder if many try these things….just out of pure boredom. Seriously, read the buzzwords of the “reform paper” I linked to; there is really nothing new there.

January 23, 2015

Making a math professor happy…

Filed under: calculus, class room experiment, elementary mathematics — Tags: , — collegemathteaching @ 10:28 pm

Calculus III: we are talking about polar curves. I give the usual lesson about how to graph r = sin(2 \theta) and r = sin(3 \theta) and give the usual “if n is even, the graph of r = sin (n \theta) has 2n petals and if n is odd, it has n petals.

Question: “does that mean it is impossible to have a graph with 6 petals then”? 🙂

Yes, one can have intersecting petals and one try: r = |sin(3 \theta) | . But you aren’t going to get it without a trick of some sort.


November 22, 2014

One upside to a topologist teaching numerical analysis…

Yes, I was glad when we hired people with applied mathematics expertise; though I am enjoying teaching numerical analysis, it is killing me. My training is in pure mathematics (in particular, topology) and so class preparation is very intense for me.

But I so love being able to show the students the very real benefits that come from the theory.

Here is but one example: right now, I am talking about numerical solutions to “stiff” differential equations; basically, a differential equation is “stiff” if the magnitude of the differential equation is several orders of magnitude larger than the magnitude of the solution.

A typical example is the differential equation y' = -\lambda y , y(0) = 1 for \lambda > 0 . Example: y' = -20y, y(0) = 1 . Note that the solution y(t) = e^{-20t} decays very quickly to zero though the differential equation is 20 times larger.

One uses such an equation to test a method to see if it works well for stiff differential equations. One such method is the Euler method: w_{i+1} = w_{i} + h f(t_i, w_i) which becomes w_{i+1} = w_i -20h \lambda w_i. There is a way of assigning a method to a polynomial; in this case the polynomial is p(\mu) = \mu - (1+h\lambda) and if the roots of this polynomial have modulus less than 1, then the method will converge. Well here, the root is (1+h\lambda) and calculating: -1 > 1+ h \lambda > 1 which implies that -2 >   h \lambda > 0 . This is a good reference.

So for \lambda = 20 we find that h has to be less than \frac{1}{10} . And so I ran Euler’s method for the initial problem on [0,1] and showed that the solution diverged wildly for using 9 intervals, oscillated back and forth (with equal magnitudes) for using 10 intervals, and slowly converged for using 11 intervals. It is just plain fun to see the theory in action.

April 7, 2014

Numerical integration: why the brain is still required…

Filed under: class room experiment, integrals, numerical methods, pedagogy — Tags: — collegemathteaching @ 4:59 pm

I gave the following demonstration in class today: \int^1_0 sin^2(512 \pi x) dx =

Now, of course, even a C student in calculus II would be able to solve this exactly using sin^2(u) = \frac{1}{2} - \frac{1}{2}cos(2u) to obtain: \int^1_0 sin^2(512 \pi x) dx=\frac{1}{2}

But what about the “just bully” numerical methods we’ve learned?

Romberg integration fails miserably, at least at first:


(for those who don’t know about Romberg integration: the first column gives trapezoid rule approximations, the second gives Simpson’s rule approximations and the third gives Boole’s rule; the value of \Delta x gets cut in half as the rows go down).

I said “at first” as if one goes to, say, 20 rows, one can start to get near the correct answer.

Adaptive quadrature: is even a bigger fail:


The problem here is that this routine quits when the refined Simpson’s rule approximation agrees with the less refined approximation (to within a certain tolerance), and here, the approximations are both zero, hence there is perfect agreement, very early in the process.

So, what to do?

One should note, of course, that the integrand is positive except for a finite number of points where it is zero. Hence one knows right away that the results are bogus.

One quick way to get closer: just tweak the limits of integration by a tiny amount and calculate, say, \int^{.999}_{.001} sin(512*\pi *x) dx and do some mathematics!


The point: the integration routines cannot replace thinking.

March 30, 2014

About that “viral” common core meme

Filed under: class room experiment, editorial, pedagogy — Tags: , — collegemathteaching @ 10:09 pm

This is making the rounds on social media:


Now a good explanation as to what is going on can be found here; it is written by an experienced high school math teacher.

I’ll give my take on this; I am NOT writing this for other math professors; they would likely be bored by what I am about to say.

My take
First of all, I am NOT defending the mathematics standards of Common Core. For one: I haven’t read them. Another: I have no experience teaching below the college level. What works in my classroom would probably not work in most high school and grade school classrooms.

But I think that I can give some insight as to what is going on with this example (in the photo).

When one teaches mathematics, one often teaches BOTH how to calculate and the concepts behind the calculation techniques. Of course, one has to learn the calculation technique; no one (that I know) disputes that.

What is going on in the photo
The second “calculation” is an exercise designed to help students learn the concept of subtraction and NOT “this is how you do the calculation”.

Suppose one wants to show the students that subtracting two numbers yields “the distance on the number line between those numbers”. So, “how far away from 12 is 32? Well, one moves 3 units to get to 15, then 5 to get to 20. Now that we are at 20 (a multiple of 10), it is easy to move one unit of 10 to get to 30, then 2 more units to get to 32. So we’ve moved 20 units total.

Think of it this way: in the days prior to google maps and gps systems, imagine you are taking a trip from, say, Morton, IL to Chicago and you wanted to take interstate highways all of the way. You wanted to figure the mileage.

You notice (I am making these numbers up) that the “distance between big cities” map lists 45 miles from Peoria to Bloomington and 150 miles from Bloomington to Chicago. Then you look at the little numbers on the map to see that Morton is between Peoria and Bloomington: 10 miles away from Peoria.

So, to find the distance, you calculate (45-10) + 150 = 185 miles; you used the “known mileages” as guide posts and used the little map numbers as a guide to get from the small town (Morton) to the nearest city for which the “table mileage” was calculated.

That is what is going on in the photo.

Why the concept is important

There are many reasons. The “distance between nodes” concept is heavily used in graph theory and in operations research. But I’ll give a demonstration in numerical methods:

Suppose one needs a numerical approximation of \int^{48}_0 \sqrt{1 + cos^2(x)} dx . Now if one just approaches with by a Newton-Coats method (say, Simpson’s rule) or by Romberg, or even by a quadrature method, one runs into problems. The reason: the integrand is oscillatory and the range of integration is very long.

But one notices that the integrand is periodic; there is no need to integrate along the entire range.

Note that there are 7 complete periods of 2 \pi between 0 and 48. So one merely needs to calculate 7 \int^{2 \pi}_0 \sqrt{1+cos^2(x)} dx + \int^{48 - 14 \pi}_0 \sqrt{1+ cos^2(x)} dx and these two integrals are much more readily approximated.

In fact, why not approximate 30 \int^{\frac{\pi}{2}}_0 \sqrt{1+cos^2(x)} dx + \int^{48 - 15 \pi}_0 \sqrt{1 + cos^2(x)}dx which is even better?

The concept of calculating distance in terms of set segment lengths comes in handy.

Or, one can think of it this way
When we teach derivatives, we certainly teach how to calculate using the standard differentiation rules. BUT we also teach the limit definition as well, though one wouldn’t use that definition in the middle of, say, “find the maximum and minimum of f(x) = x-\frac{1}{x} on the interval [\frac{1}{4}, 3] ” Of course, one uses the rules.

But if you saw some kid’s homework and saw f'(x) being calculated by the limit definition, would you assume that the professor was some idiot who wanted to turn a simple calculation into something more complicated?

March 25, 2014

The error term and approximation of derivatives

I’ll go ahead and work with the common 3 point derivative formulas:

This is the three-point endpoint formula: (assuming that f has 3 continuous derivatives on the appropriate interval)

f'(x_0) = \frac{1}{2h}(-3f(x_0) + 4f(x_0+h) -f(x_0 + 2h)) + \frac{h^2}{3} f^{3}(\omega) where \omega is some point in the interval.

The three point midpoint formula is:

f'(x_0) = \frac{1}{2h}(f(x_0 + h) -f(x_0 -h)) -\frac{h^2}{6}f^{3}(\omega) .

The derivation of these formulas: can be obtained from either using the Taylor series centered at x_0 or using the Lagrange polynomial through the given points and differentiating.

That isn’t the point of this note though.

The point: how can one demonstrate, by an example, the role the error term plays.

I suggest trying the following: let x vary from, say, 0 to 3 and let h = .25 . Now use the three point derivative estimates on the following functions:

1. f(x) = e^x .

2. g(x) = e^x + 10sin(\frac{\pi x}{.25}) .

Note one: the three point estimates for the derivatives will be exactly the same for both f(x) and g(x) . It is easy to see why.

Note two: the “errors” will be very, very different. It is easy to see why: look at the third derivative term: for f(x) it is e^x -10(\frac{\pi}{.25})^2sin(\frac{\pi x}{.25})

The graphs shows the story.


Clearly, the 3 point derivative estimates cannot distinguish these two functions for these “sample values” of x , but one can see how in the case of g , the degree that g wanders away from f is directly related to the higher order derivative of g .

March 14, 2014

Approximating the derivative and round off error: class demonstration

In numerical analysis we are covering “approximate differentiation”. One of the formulas we are using: f'(x_0) = \frac{f(x_0 + h) -f(x_0 -h)}{2h} - \frac{h^2}{6} f^{(3)}(\zeta) where \zeta is some number in [x_0 -h, x_0 + h] ; of course we assume that the third derivative is continuous in this interval.

The derivation can be done in a couple of ways: one can either use the degree 2 Lagrange polynomial through x_0-h, x_0, x_0 + h and differentiate or one can use the degree 2 Taylor polynomial expanded about x = x_0 and use x = x_0 \pm h and solve for f'(x_0) ; of course one runs into some issues with the remainder term if one uses the Taylor method.

But that isn’t the issue that I want to talk about here.

The issue: “what should we use for h ?” In theory, we should get a better approximation if we make h as small as possible. But if we are using a computer to make a numerical evaluation, we have to concern ourselves with round off error. So what we actually calculate will NOT be f'(x_0) = \frac{f(x_0 + h) -f(x_0 -h)}{2h} but rather f'(x_0) = \frac{\hat{f}(x_0 + h) -\hat{f}(x_0 -h)}{2h} where \hat{f}(x_0 \pm h) = f(x_0 \pm h) - e(x_0 \pm h) where e(x_0 \pm h) is the round off error used in calculating the function at x = x_0 \pm h (respectively).

So, it is an easy algebraic exercise to show that:

f'(x_0) - \frac{f(x_0 + h) -f(x_0 -h)}{2h} = - \frac{h^2}{6} f^{(3)}(\zeta)-\frac{e(x_0 +h) -e(x_0 -h)}{2h} and the magnitude of the actual error is bounded by \frac{h^2 M}{6} + \frac{\epsilon}{2} where M = max\{f^{(3)}(\eta)\} on some small neighborhood of x_0 and \epsilon is a bound on the round-off error of representing f(x_0 \pm h) .

It is an easy calculus exercise (“take the derivative and set equal to zero and check concavity” easy) to see that this error bound is a minimum when h = (\frac{3\epsilon}{M})^{\frac{1}{3}} .

Now, of course, it is helpful to get a “ball park” estimate for what \epsilon is. Here is one way to demonstrate this to the students: solve for \epsilon and obtain \frac{M h^3}{3} = \epsilon and then do some experimentation to determine \epsilon .

That is: obtain an estimate of h by using this “3 point midpoint” estimate for a known derivative near a value of x_0 for which M (a bound for the 3’rd derivative) is easy to obtain, and then obtain an educated guess for h .

Here are a couple of examples: one uses Excel and one uses MATLAB. I used f(x) = e^x at x = 0; of course f'(0) = 1 and M = 1 is reasonable here (just a tiny bit off). I did the 3-point estimation calculation for various values of h and saw where the error started to increase again.

Here is the Excel output for f(x) = e^x at x =0 and at x = 1 respectively. In the first case, use M = 1 and in the second M = e

In the x = 0 case, we see that the error starts to increase again at about h = 10^{-5} ; the same sort of thing appears to happen for x = 1 .

So, in the first case, \epsilon is about \frac{1}{3} \times (10^{-5})^3 = 3.333 \times 10^{-16} ; it is roughly 10^{-15} at x =1 .

Note: one can also approach h by using powers of \frac{1}{2} instead; something interesting happens in the x = 0 case; the x = 1 case gives results similar to what we’ve shown. Reason (I think): 1 is easy to represent in base 2 and the powers of \frac{1}{2} can be represented exactly.

Now we turn to MATLAB and here we do something slightly different: we graph the error for different values of h . Since the values of h are very small, we use a -log_{10} scale by doing the following (approximating f'(0) for f(x) = e^x )

rounoffmatlabcommand. By design, N = -log_{10}(H) . The graph looks like:


Now, the small error scale makes things hard to read, so we turn to using the log scale, this time on the y axis: let LE = -log_{10}(E) and run plot(N, LE):

roundlogscale and sure enough, you can see where the peak is: about 10^{-5} , which is the same as EXCEL.

December 4, 2012

Teaching Linear Regression and ANOVA: using “cooked” data with Excel

During the linear regression section of our statistics course, we do examples with spreadsheets. Many spreadsheets have data processing packages that will do linear regression and provide output which includes things such as confidence intervals for the regression coefficients, the r, r^2 values, and an ANOVA table. I sometimes use this output as motivation to plunge into the study of ANOVA (analysis of variance) and have found that “cooked” linear regression examples to be effective teaching tools.

The purpose of this note is NOT to provide an introduction to the type of ANOVA that is used in linear regression (one can find a brief introduction here or, of course, in most statistics textbooks) but to show a simple example using the “random number generation” features in the Excel (with the data analysis pack loaded into it).

I’ll provide some screen shots to show what I did.

If you are familiar with Excel (or spread sheets in general), this note will be too slow-paced for you.

Brief Background (informal)

I’ll start the “ANOVA for regression” example with a brief discussion of what we are looking for: suppose we have some data which can be thought of as a set of n points in the plane (x_i, y_i). Of course the set of y values has a variance which is calculated as \frac{1}{n-1} \sum^n_{i=1}(y_i - \bar{y})^2 = \frac{1}{n-1}SS

It turns out that the “sum of squares” SS = \sum^n_{i=1} (y_i - \hat{y_i})^2 + \sum^n_{i=1}(\hat{y_i} - \bar{y})^2 where the first term is called “sum of squares error” and the second term is called “sum of squares regression”; or: SS = SSE + SSR. Here is an informal way of thinking about this: SS is what you use to calculate the “sample variation” of the y values (one divides this term by “n-1” ). This “grand total” can be broken into two parts: the first part is the difference between the actual y values and the y values predicted by the regression line. The second is the difference between the predicted y values (from the regression) and the average y value. Now imagine if the regression slope term \beta_1 was equal to zero; then the SSE term would be, in effect, the SS term and the second term SSR would be, in effect, zero (\bar{y} - \bar{y} ). If we denote the standard deviation of the y’s by \sigma then \frac{SSR/\sigma}{SSE/((n-2)\sigma} is a ratio of chi-square distributions and is therefore F with 1 numerator and n-2 denominator degrees of freedom. If \beta_1 = 0 or was not statistically significant, we’d expect the ratio to be small.

For example: if the regression line fit the data perfectly, the SSE terms would be zero and the SSR term would equal the SS term as the predicted y values would be the y values. Hence the ratio of (SSR/constant) over (SSE/constant) would be infinite.

That is, the ratio that we use roughly measures the percentage of variation of the y values that comes from the regression line verses the percentage that comes from the error from the regression line. Note that it is customary to denote SSE/(n-2) by MSE and SSR/1 by MSR. (Mean Square Error, Mean Square Regression).

The smaller the numerator relative to the denominator the less that the regression explains.

The following examples using Excel spread sheets are designed to demonstrate these concepts.

The examples are as follows:

Example one: a perfect regression line with “perfect” normally distributed residuals (remember that the usual hypothesis test on the regression coefficients depend on the residuals being normally distributed).

Example two: a regression line in which the y-values have a uniform distribution (and are not really related to the x-values at all).

Examples three and four: show what happens when the regression line is “perfect” and the residuals are normally distributed, but have greater standard deviations than they do in Example One.

First, I created some x values and then came up with the line y = 4 + 5x . I then used the formula bar as shown to create that “perfect line” of data in the column called “fake” as shown. Excel allows one to copy and paste formulas such as these.


This is the result after copying:


Now we need to add some residuals to give us a non-zero SSE. This is where the “random number generation” feature comes in handy. One goes to the data tag and then to “data analysis”


and clicks on “random number generation”:


This gives you a dialogue box. I selected “normal distribution”; then I selected “0” of the mean and “1” for the standard deviation. Note: the assumption underlying the confidence interval calculation for the regression parameter confidence intervals is that the residuals are normally distributed and have an expected value of zero.


I selected a column for output (as many rows as x-values) which yields a column:


Now we add the random numbers to the column “fake” to get a simulated set of y values:


That yields the column Y as shown in this next screenshot. Also, I used the random number generator to generate random numbers in another column; this time I used the uniform distribution on [0,54]; I wanted the “random set of potential y values” to have roughly the same range as the “fake data” y-values.


Y holds the “non-random” fake data and YR holds the data for the “Y’s really are randomly distributed” example.


I then decided to generate two more “linear” sets of data; in these cases I used the random number generator to generate normal residuals of larger standard deviation and then create Y data to use as a data set; the columns or residuals are labeled “mres” and “lres” and the columns of new data are labeled YN and YVN.

Note: in the “linear trend data” I added the random numbers to the exact linear model y’s labeled “fake” to get the y’s to represent data; in the “random-no-linear-trend” data column I used the random number generator to generate the y values themselves.

Now it is time to run the regression package itself. In Excel, simple linear regression is easy. Just go to the data analysis tab and click, then click “regression”:


This gives a dialogue box. Be sure to tell the routine that you have “headers” to your columns of numbers (non-numeric descriptions of the columns) and note that you can select confidence intervals for your regression parameters. There are other things you can do as well.


You can select where the output goes. I selected a new data sheet.


Note the output: the r value is very close to 1, the p-values for the regression coefficients are small and the calculated regression line (to generate the \hat{y_i}'s is:
y = 3.70 + 5.01x . Also note the ANOVA table: the SSR (sum squares regression) is very, very large compared to the SSE (sum squares residuals), as expected. The variance in y values is almost completely explained by the variance in the y values from the regression line. Hence we obtain an obscenely large F value; we easily reject the null hypothesis (that \beta_1 = 0 ).

This is what a plot of the calculated regression line with the “fake data” looks like:


Yes, this is unrealistic, but this is designed to demonstrate a concept. Now let’s look at the regression output for the “uniform y values” (y values generated at random from a uniform distribution of roughly the same range as the “regression” y-values):


Note: r^2 is nearly zero, we fail to reject the null hypothesis that \beta_1 = 0 and note how the SSE is roughly equal to the SS; the reason, of course, is that the regression line is close to y = \bar{y} . The calculated F value is well inside the “fail to reject” range, as expected.

A plot looks like:


The next two examples show what happens when one “cooks” up a regression line with residuals that are normally distributed, have mean equal to zero, but have larger standard deviations. Watch how the r values change, as well as how the SSR and SSE values change. Note how the routine fails to come up with a statistically significant estimate for the “constant” part of the regression line but the slope coefficient is handled easily. This demonstrates the effect of residuals with larger standard deviations.





June 11, 2012

Well, what do you mean by…..

Filed under: class room experiment, mathematics education, statistics, well posed problem — collegemathteaching @ 12:09 pm

Often seemingly simple questions don’t have simple answers; in fact, a seemingly simple question can be ambiguous.

I’ll give two examples:

1. Next week, Peoria, IL has the Steamboat 4 mile/15 km running race. So one question is: which race is more competitive?
The answer is: “it depends on what you mean by “more competitive”.”

On one hand, the 4 mile race offers prize money and attracts Olympic caliber runners, current world record holders in the marathon and the like. Typical winning times for males is under 18 minutes and the first woman sometimes breaks 20 minutes. There are also a large number of university runners chasing them. So, at the very front of the pack, the 4 mile race is much more competitive.

But the “typical” 15 Km runner is far more serious than the “typical” 4 mile runner. Here is what I mean:

(2011 statistics) 4 mile race had 3346 finishers, median runner (half faster, half slower) was 39:58 (9:59.5 minutes per mile). The 15K race had 836 finishers; the median time was 1:23:25 (8:57 minutes per mile) and that was LONGER and on a much more difficult course (4 mile course is pancake flat).

If you wonder about the mix of men and women, I went ahead and compared the male and female age groups (50-54; my group):
4 mile men: 138 finishers, median time 37:05, median pace: 9:16
15K men: 45 finishers, median time 1:19:50, median pace: 8:34

4 mile women: 128 finishers median time 46:10, median pace: 11:32
15K women: 27 finishers, median time: 1:28:41, median pace: 9:32

That is, the typical 15 km runner will run a course that is over twice as long and much, much, much hillier at a faster pace than the typical 4 mile runner. So in this sense, the 15 km race is far more competitive.

In other words, I’ll be faster than the median pace for my age group if I ran the 4 mile but will be slower (much) in the 15K.

So, for this question “which race is more competitive”, the answer depends on “what you mean by “more competitive””.

Example two: this is the Bertrand paradox:

Inscribe an equilateral triangle into a circle. Now pick a random chord in the circle (a line from one point on the circle to some other point). What is the probability that the length of this chord is longer than the length of one of the sides of the triangle?

Answer: it depends on what you mean by “randomly pick”!

Method 1. If you just pick some point “p” on the circle and then some second random point “q” on the circle then:
you can arrange for a vertex of the triangle to coincide with “p”. Then the chord will be longer if the chord pq lies in that 60 degree angle at the vertex; hence the probability of that happening is 1/3.

Method 2. Pick the line as follows: draw a random radius (segment from the center to the edge) and then randomly pick some point on the radius and construct a perpendicular to that. Arrange for the inscribed triangle to have one angle bisector to overlap the radius. Now the chord will be longer than the side of the triangle if the second point is between the center and the opposite edge of the triangle. Since the side bisects the radius, the probability is 1/2.

Method 3. Choose a point anywhere in the circle and let that be the midpoint of the random chord. Then the chord is longer than a side of the inscribed triangle if and only if the point happens to lie inside the circle that is inscribed INSIDE the equilateral triangle. Since that area is 1/4’th of the area inside the circle, the probability is 1/4’th.

For more on what is the “best method”: read the article:

In his 1973 paper The Well-Posed Problem,[1] Edwin Jaynes proposed a solution to Bertrand’s paradox, based on the principle of “maximum ignorance”—that we should not use any information that is not given in the statement of the problem. Jaynes pointed out that Bertrand’s problem does not specify the position or size of the circle, and argued that therefore any definite and objective solution must be “indifferent” to size and position. In other words: the solution must be both scale invariant and translation invariant.

It turns out that method 2 is both scale and translation invariant.

Older Posts »

Blog at