College Math Teaching

March 14, 2019

Sign test for matched pairs, Wilcoxon Signed Rank test and Mann-Whitney using a spreadsheet

Filed under: statistics, Uncategorized — Tags: , , , — collegemathteaching @ 10:33 pm

Our goal: perform non-parametric statistical tests for two samples, both paired and independent. We only assume that both samples come from similar distributions, possibly shifted.

I’ll show the steps with just a bit of discussion of what the tests are doing; the text I am using is Mathematical Statistics (with Applications) by Wackerly, Mendenhall and Scheaffer (7’th ed.) and Mathematical Statistics and Data Analysis by John Rice (3’rd ed.).

First the data: 56 students took a final exam. The professor gave some questions and a committee gave some questions. Student performance was graded and the student performance was graded as a “percent out of 100” on each set of questions (committee graded their own questions, professor graded his questions).

The null hypothesis: student performance was the same on both sets of questions. Yes, this data was close enough to being normal that a paired t-test would have been appropriate and one was done for the committee. But because I am teaching a section on non-parametric statistics, I decided to run a paired sign test and a Wilcoxon signed rank test (and then, for the heck of it, a Mann-Whitney test which assumes independent samples..which these were NOT (of course)). The latter was to demonstrate the technique for the students.

There were 56 exams and “pi” was the score on my questions, “pii” the score on committee questions. The screen shot shows a truncated view.

The sign test for matched pairs.
The idea behind this test: take each pair and score it +1 if sample 1 is larger and score it -1 if the second sample is larger. Throw out ties (use your head here; too many ties means we can’t reject the null hypothesis ..the idea is that ties should be rare).

Now set up a binomial experiment where n is the number of pairs. We’d expect that if the null hypothesis is true, p = .5 where p is the probability that the pair gets a score of +1. So the expectation would be np = \frac{n}{2} and the standard deviation would be \frac{1}{2} \sqrt{n} , that is, \sqrt{npq}

This is easy to do in a spreadsheet. Just use the difference in rows:

Now use the “sign” function to return a +1 if the entry from sample 1 is larger, -1 if the entry from sample 2 is larger, or 0 if they are the same.

I use “copy, paste, delete” to store the data from ties, which show up very easily.

Now we need to count the number of “+1”. That can be a tedious, error prone process. But the “countif” command in Excel handles this easily.

Now it is just a matter of either using a binomial calculator or just using the normal approximation (I don’t bother with the continuity correction)

Here we reject the null hypothesis that the scores are statistically the same.

Of course, this matched pairs sign test does not take magnitude of differences into account but rather only the number of times sample 1 is bigger than sample 2…that is, only “who wins” and not “by what score”. Clearly, the magnitude of the difference could well matter.

That brings us to the Wilcoxon signed rank test. Here we list the differences (as before) but then use the “absolute value” function to get the magnitudes of such differences.

Now we need to do an “average rank” of these differences (throwing out a few “zero differences” if need be). By “average rank” I mean the following: if there are “k” entries between ranks n, n+1, n+2, ..n+k-1, then each of these gets a rank \frac{n + n+1 + n+2 +...+ n+k-1}{k} = n + \frac{(k-1)}{2}

(use \sum^n_{k=1} k = \frac{n(n+1)}{2} to work this out).

Needless to say, this can be very tedious. But the “rank.avg” function in Excel really helps.

Example: rank.avg(di, $d$2:$d$55, 1) does the following: it ranks the entry in cell di versus the cells in d2: d55 (the dollar signs make the cell addresses “absolute” references, so this doesn’t change as you move down the spreadsheet) and the “1” means you rank from lowest to highest.

Now the test works in the following manner: if the populations are roughly the same, the larger or smaller ranked differences will each come from the same population roughly half the time. So we denote T^{-} the sum of the ranks of the negative differences (in this case, where “pii” is larger) and T^{+} is the sum of the positive differences.

One easy way to tease this out: T^{+} + T^{-1} = \frac{1}{2}n(n+1) and T^{+} - T^{-} can be computed by summing the entries in which the larger differences in “pii” get a negative sign. This is easily done by multiplying the absolute value of the differences by the sign of the differences. Now note that \frac{1}{2}((T^{+} + T^{-1}) + (T^{+} - T^{-1})) = T^{+} and \frac{1}{2}((T^{+} + T^{-1}) - (T^{+} +-T^{-1})) = T^{-}

One can use a T table (this is a different T than “student T”) or one can use the normal approximation (if n is greater than, say, 25) with
E(T^{+})  = \frac{n(n+1)}{2}, V(T^{+})  = \frac{n(n+1)(2n+1)}{24} and use the normal approximation.

How these are obtained: the expectation is merely the one half the sum of all the ranks (what one would expect if the distributions were the same) and the variance comes from n Bernouilli random variables I_k (one for each pair) with p = \frac{1}{2} where the variance is \frac{1}{4} \sum^n_{k=1} k^2 = \frac{1}{4} \frac{n(n+1)(2n+1)}{6}

Here is a nice video describing the process by hand:

Mann-Whitney test
This test doesn’t apply here as the populations are, well, anything but independent, but we’ll pretend so we can crunch this data set.

Here the idea is very well expressed:

Do the following: label where the data comes from, and rank it all together. Then add the ranks of the population, of say, the first sample. If the samples are the same, the sums of the ranks should be the same for both populations.

Again, do a “rank average” and yes, Excel can do this over two different columns of data, while keeping the ranks themselves in separate columns.

And one can compare, using either column’s rank sum: the expectation would be E = \frac{n_1(n_1 +n_2 + 1}{2} and variance would be V = \frac{n_1n_2(n_1+n_2+1)}{12}

Where this comes from: this is really a random sample of since n_1 drawn without replacement from a population of integers 1, 2, ... n_1+n_2 (all possible ranks…and how they are ordered and the numbers we get). The expectation is n_1 \mu and the variance is n_1 \sigma^2 \frac{n_1+n_2-n_1}{n_1+n_2 -1} where \mu = \frac{n_1+n_2 +1}{2}, \sigma^2 \frac{(n_1+n_2)^2-1}{12} (should remind you of the uniform distribution). The rest follows from algebra.

So this is how it goes:

Note: I went ahead and ran the “matched pairs” t-test to contrast with the matched pairs sign test and Wilcoxon test, and the “two sample t-test with unequal variances” to contrast to the Mann-Whitney test..use the “unequal variances” assumption as the variance of sample pii is about double that of pi (I provided the F-test).

Advertisements

February 1, 2014

Numerical methods class: round off error using geometric series and spreadsheets…

Of course computers store numbers in binary; that is numbers are represented by \sum^n_{k=-m} a_k 2^{k} = 2^{n}+a_{n-1}2^{n-1} +....+a_1 2 + a_0 1 + a_{-1}\frac{1}{2} + ...+a_{-m} \frac{1}{2^m} where each a_j \in {0,1} (of course the first coefficient is 1).

We should probably “warm up” by showing some binary expansions. First: someone might ask “how do I know that a number even HAS a binary expansion? The reason: the dyatic rationals are dense in the number line. So just consider a set of nested partitions of the number line, where each partition in the family has width \frac{1}{2^k} and it isn’t hard to see a sequence that converges to a given number. That sequence leads to the binary expansion.

Example: What is the binary expansion for 10.25?

Answer: Break 10.25 down into 10 + 0.25.

10 = 8 + 2 = 2^3 + 2^1 so the “integer part” of the binary expansion is 1010 = 2^3 + 0*2^2 + 2^1 +0*2^0 .

Now for the “faction part”: 0.25 = \frac{1}{4} = \frac{1}{2^2} = 0*\frac{1}{2} + \frac{1}{2^2} = .01 in binary.

Hence 10.25 in base 10 = 1010.01 in binary.

So what about something harder? Integers are easy, so lets look at \frac{3}{7} = \sum_{k=1}a_k\frac{1}{2^k} where each a_k \in \{0,1\} .

Clearly a_1 =0 since \frac{1}{2} is greater than \frac{3}{7} . Now a_2 = 1 so multiply both sides of \frac{3}{7} = \sum_{k=1}a_k\frac{1}{2^k} by 4 to obtain \frac{12}{7} = 1 + a_3 \frac{1}{2} + a_4 \frac{1}{4} + ...

Now subtract 1 from both sides and get \frac{5}{7} =  a_3 \frac{1}{2} + a_4 \frac{1}{4} + ... Here a_3 = 1 so multiply both sides by 2 and subtract 1 and get \frac{3}{7} =   a_4 \frac{1}{2} + a_5 \frac{1}{4} ...

Note that we are back where we started. The implication is that \frac{3}{7} = (\frac{1}{4} + \frac{1}{8})(1 + \frac{1}{2^3} + \frac{1}{2^6} +....) and so the base 2 decimal expansion for \frac{3}{7} is .\overline{011}

Of course there is nothing special about \frac{3}{7} ; a moment’s thought reveals that if one starts with \frac{p}{q} where p is less than q , the process stops (we arrive at zero on the left hand side) or we return to some other \frac{m}{q} where m is less than q ; since there are only a finite number of m we have to arrive at a previous fraction eventually (pigeonhole principle). So every rational number has a terminating or repeating binary decimal expansion.

Now it might be fun to check that we did the expansion correctly; the geometric series formula will help:

(\frac{1}{4} + \frac{1}{8})(1 + \frac{1}{2^3} + \frac{1}{2^6} +....) = (\frac{3}{8})(\frac{1}{1-\frac{1}{2^3}})=\frac{3}{8-1} =\frac{3}{7}

Now about how the computer stores a number: a typical storage scheme is what I call the 1-11-52 scheme. There are 64 bits used for the number. The first is for positive or negative. The next 11 bits are for the exponent. Now 2^{12} - 1 = 2047 so technically the exponent could be as large as 2047 . But we want to be able to store small numbers as well, so 1023 is subtracted from this number so we could get exponents that range from 1024 to -1023 . Also remember these are exponents for base 2.

This leaves 52 digits for the mantissa; that is all the computer can store. This is the place where one can look at round off error.

Let’s see two examples:

10.25 is equal to 2^3 (1 + \frac{1}{4} + \frac{1}{32}) . The first bit is a 0 because the number is positive. The exponent is 3 and so is represented by 3 + 1023 = 1026 which is 10000000011 . Now the mantissa is assumed to start with 1 every time; hence we get 0100100000000000000000000000000000000000000000000000.

So, let’s look at the \frac{3}{7} example. The first digit is 0 since the number is positive. We write the binary expansion as: 2^{-2}(1+\frac{1}{2} + \frac{1}{8} + \frac{1}{16} + ....) . The exponent is -2 which is stored as -2 + 1023 = 1021 = 01111111101 . Now the fun starts: we need an infinite number of bits for an exact representation but we have 52. .011 is what is repeated so we can repeat this 51 times plus the next 0. So the number that serves as a proxy for \frac{3}{7} is really \frac{3}{7} - (\frac{1}{4} + \frac{1}{8})^{18}\sum^{\infty}_{k=0} (\frac{1}{8})^k = \frac{3}{7}- (\frac{3}{8})^{18}(\frac{1}{1 - \frac{1}{8}}) = \frac{3}{7} - \frac{3}{7}(\frac{3}{8})^{18} = \frac{3}{7}(1-(\frac{3}{8})^{18}))

Ok, about spreadsheets:

Of course, it is difficult to use base 2 directly to demonstrate round off error, so many texts use regular decimals and instruct the students to perform calculations using “n-digit rounding” and “n-digit chopping” to show how errors can build up with iterated operations.

One common example: use the quadratic formula to find the roots of a quadratic equation. Of course the standard formula for the roots of ax^2 + bx + c =0 is \frac{-b \pm \sqrt{b^2 -4ac}}{2a} and there is an alternate formula \frac{-2c}{b \pm\sqrt{b^2 -4ac}} that leads to less round off error in the case when the “alternative formula” denominator has large magnitude.

Now the computations, when done by hand, can be tedious and more of an exercise in repeated calculator button punching than anything else.

But a spreadsheet can help, provided one can find a way to use the “ROUND(N, M)” command and the “TRUNC(N, M)” commands to good use. In each case, N is the number to be rounded or truncated and M is the decimal place.

A brief review of these commands: the ROUND command takes a decimal and rounds to the nearest \frac{1}{10^N} place; the TRUNC truncates the decimal to the nearest \frac{1}{10^N} place.
The key: M can be any integer; positive, negative or zero .

Examples: TRUNC(1234.5678, 2) = 1234.56, TRUNC(1234.5678, -1) = 1230, ROUND(1234.5678, 3) = 1234.569, ROUND(1234.5678, -2) = 1200.

Formally: if one lets a_i \in {0, 1, 2, ...9} for all i and if x = \sum_{i=-m}^{k} a_i (10)^i = a_k 10^k + a_{k-1} 10^{k-1} +....a_1 10 + a_0 + a_{-1}10^{-1} +...a_{-m}10^{-m} then TRUNC(x, M) = \sum_{i=-M}^k a_i (10)^i =a_i (10)^i = a_k 10^k + a_{-k-1} 10^{-k-1}+...+ a_{-M}10^{-M}

So if M is negative, this process stops at a positive power of 10.

That observation is the key to getting the spread sheet to round or truncate to a desired number of significant digits.

Recall that the base 10 log picks of the highest power of 10 that appears in a number; for example log10(1234.5678) = 3.xxxxx So let’s exploit that: we can modify our commands as follows:

for TRUNC we can use TRUNC(N, M -1 – INT(log10(abs(N)))) and for ROUND we can use ROUND(N, M -1 – INT(log10(abs(N)))). Subtracting the 1 and the integer part of the base 10 log moves the “start point” of the truncation or the rounding to the left of the decimal and M moves the truncation point back to the right to the proper place.

Of course, this can be cumbersome to type over and over again, so I recommend putting the properly typed formula in some “out of the way” cell and using “cut and paste” to paste the formula in the proper location for the formula.

Here is an example:
This spread sheet shows the “4 digit rounding” calculation for the roots of the quadratics x^2 - (60 + \frac{1}{60})x + 10 and 1.002x^2 - 11.01x +.01265 respectively.

roundspreadsheet

(click for a larger view).

Note that one has to make a cell for EVERY calculation because we have to use 4 digit arithmetic at EACH step. Note also the formula pasted as test in the upper right hand cell.

One can cut an paste as a cell formula as shown below:

round2
Here one uses cell references as input values.

Here is another example:

round3

Note the reference to the previous cells.

Blog at WordPress.com.