College Math Teaching

February 18, 2019

An easy fact about least squares linear regression that I overlooked

The background: I was making notes about the ANOVA table for “least squares” linear regression and reviewing how to derive the “sum of squares” equality:

Total Sum of Squares = Sum of Squares Regression + Sum of Squares Error or…

If $y_i$ is the observed response, $\bar{y}$ the sample mean of the responses, and $\hat{y}_i$ are the responses predicted by the best fit line (simple linear regression here) then:

$\sum (y_i - \bar{y})^2 = \sum (\hat{y}_i -\bar{y})^2+ \sum (y_i - \hat{y}_i)^2$ (where each sum is $\sum^n_{i=1}$ for the n observations. )

Now for each $i$ it is easy to see that $(y_i - \bar{y}) = (\hat{y}_i -\bar{y}) + (y_i - \hat{y}_i)$ but the equations still holds if when these terms are squared, provided you sum them up!

And it was going over the derivation of this that reminded me about an important fact about least squares that I had overlooked when I first presented it.

If you go in to the derivation and calculate: $\sum ( (\hat{y}_i -\bar{y}) + (y_i - \hat{y}_i))^2 = \sum ((\hat{y}_i -\bar{y})^2 + (y_i - \hat{y}_i)^2 +2 (\hat{y}_i -\bar{y})(y_i - \hat{y}_i))$

Which equals $\sum ((\hat{y}_i -\bar{y})^2 + (y_i - \hat{y}_i)^2 + 2\sum (\hat{y}_i -\bar{y})(y_i - \hat{y}_i))$ and the proof is completed by showing that:

$\sum (\hat{y}_i -\bar{y})(y_i - \hat{y}_i)) = \sum (\hat{y}_i)(y_i - \hat{y}_i)) - \sum (\bar{y})(y_i - \hat{y}_i))$ and that BOTH of these sums are zero.

But why?

Let’s go back to how the least squares equations were derived:

Given that $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$

$\frac{\partial}{\partial \hat{\beta}_0} \sum (\hat{y}_i -y_i)^2 = 2\sum (\hat{y}_i -y_i) =0$ yields that $\sum (\hat{y}_i -y_i) =0$. That is, under the least squares equations, the sum of the residuals is zero.

Now $\frac{\partial}{\partial \hat{\beta}_1} \sum (\hat{y}_i -y_i)^2 = 2\sum x_i(\hat{y}_i -y_i) =0$ which yields that $\sum x_i(\hat{y}_i -y_i) =0$

That is, the sum of the residuals, weighted by the corresponding x values (inputs) is also zero. Note: this holds with multilinear regreassion as well.

Really, that is what the least squares process does: it sets the sum of the residuals and the sum of the weighted residuals equal to zero.

Yes, there is a linear algebra formulation of this.

Anyhow returning to our sum:

$\sum (\bar{y})(y_i - \hat{y}_i)) = (\bar{y})\sum(y_i - \hat{y}_i)) = 0$ Now for the other term:

$\sum (\hat{y}_i)(y_i - \hat{y}_i)) = \sum (\hat{\beta}_0+\hat{\beta}_1 x_i)(y_i - \hat{y}_i)) = \hat{\beta}_0\sum (y_i - \hat{y}_i) + \hat{\beta}_1 \sum x_i (y_i - \hat{y}_i))$

Now $\hat{\beta}_0\sum (y_i - \hat{y}_i) = 0$ as it is a constant multiple of the sum of residuals and $\hat{\beta}_1 \sum x_i (y_i - \hat{y}_i)) = 0$ as it is a constant multiple of the weighted sum of residuals..weighted by the $x_i$.

That was pretty easy, wasn’t it?

But the role that the basic least squares equations played in this derivation went right over my head!

March 21, 2014

Projections, regressions and Anscombe’s quartet…

Data and its role in journalism is a hot topic among some of the bloggers that I regularly follow. See: Nate Silver on what he hopes to accomplish with his new website, and Paul Krugman’s caveats on this project. The debate is, as I see it, about the role of data and the role of having expertise in a subject when it comes to providing the public with an accurate picture of what is going on.

Then I saw this meme on a Facebook page:

These two things (the discussion and meme) lead me to make this post.

First the meme: I thought of this meme as a way to explain volume integration by “cross sections”. 🙂 But for this post, I’ll focus on this meme showing an example of a “projection map” in mathematics. I can even provide some equations: imagine the following set in $R^3$ described as follows: $S= \{(x,y,z) | (y-2)^2 + (z-2)^2 \le 1, 1 \le x \le 2 \}$ Now the projection map to the $y-z$ plane is given by $p_{yz}(x,y,z) = (0,y,z)$ and the image set is $S_{yz} = \{(0,y,z)| (y-2)^2 + (z-2)^2 \le 1$ which is a disk (in the yellow).

The projection onto the $x-z$ plane is given by $p_{xz}(x,y,z) = (x,0,z)$ and the image is $S_{xz} = \{(x,0,z)| 1 \le x \le 2, 1 \le z \le 3 \}$ which is a rectangle (in the blue).

The issue raised by this meme is that neither projection, in and of itself, determines the set $S$. In fact, both of these projections, taken together, do not determine the object. For example: the “hollow can” in the shape of our $S$ would have the same projection; there are literally an uncountable. Example: imagine a rectangle in the shape of the blue projection joined to one end disk parallel to the yellow plane.

Of course, one can put some restrictions on candidates for $S$ (the pre image of both projections taken together); say one might want $S$ to be a manifold of either 2 or 3 dimensions, or some other criteria. But THAT would be adding more information to the mix and thereby, in a sense, providing yet another projection map.

Projections, by design, lose information.

In statistics, a statistic, by definition, is a type of projection. Consider, for example, linear regression. I discussed linear regressions and using “fake data” to teach linear regression here. But the linear regression process inputs data points and produces numbers including the mean and standard deviations of the $x, y$ values as well as the correlation coefficient and the regression coefficients.

But one loses information in the process. A good demonstration of this comes from Anscombe’s quartet: one has 4 very different data set producing identical regression coefficients (and yes, correlation coefficients, confidence intervals, etc). Here are the plots of the data:

And here is the data:

The Wikipedia article I quoted is pretty good; they even provide a link to a paper that gives an algorithm to generate different data sets with the same regression values (and yes, the paper defines what is meant by “different”).

Moral: when one crunches data, one has to be aware of the loss of information that is involved.