## Classical Biases in Common Statistics

Written 2001

Formatted 2009

Although statistics can be very valuable, we do have to be clear on what they can tell us, and what they can't. Understanding where the biases come from will help promote understanding that distinction. As you look at these examples ask, "How can I be sure not to read in more than is actually there?"

This page is laid out in a form that would make an easy discussion for a classroom lesson.

### Example 1: Economic Comparisons

Look at the following two groups and ask yourself which group is better off. Each group has four members.
 Group A dead in jail earns \$30,000 earns \$50,000 Group B earns \$10,000 earns \$20,000 earns \$30,000 earns \$50,000
Which group did you decide? Lets look at what will be reported to us. For group A, the average income is \$40,000, because the dead person and the imprisoned person have been removed from the statistical process. For group B the average income \$27,500, about 2/3 the average income of group A.

If you look only at the average incomes which group do you consider better off? Group A has the higher average. Is Group A really better off? Which is better poor or dead? Poor or in jail?

Related pages at this site:

### Example 2: More Economic Comparisons

How do these two groups compare?

#### Group B

• earns \$10,000
• earns \$10,000
• earns \$10,000
• earns \$1,000,000
• earns \$20,000
• earns \$30,000
• earns \$30,000
• earns \$40,000

Average = \$260,000

Average = \$30,000
Is Group A, with an average income of \$340,000 better off than Group B with an average income of \$30,000. Group A's average income is over ten times larger than Group B's. But three quarters of Group A earn less than every member of Group B! Did the average really give you useful information? What measure would have been more informative than average?

### Example 3: Economic Combined Effects

What happens to the numbers when you combine the two effects above?

 Group A dead in jail earns \$10,000 earns \$1,000,000 Group B earns \$10,000 earns \$20,000 earns \$20,000 earns \$50,000
The average incomes are group A: \$505,000, and group B: \$25,000. The averages say that group A earns nearly 20 times as much as group B. Do you consider this claim accurate, considering that 3/4 of the members of Group B are better off than 3/4 of the members of Group A? What measure would have been more informative than average?

Statisticians frequently talk about normalizing data, that is correcting for intrinsic errors. How would you normalize this data to account for the dead and jailed persons being removed from the data set?

### Example 4: Life expectancy - Remote Location

Imagine a 40 year old pregnant woman, discouraged in life, retreating to a remote desert and dying immediately after labor. Only two people, mother and child, have settled in this place, so it is easy to calculate the life expectancy (40 + 0) / 2 = 20. If you go to that location should you expect to die when you are 20?

### Example 5: Life Expectancy - Childhood Illness

Imagine a small town with a high infant mortality rate. The recorded deaths have occurred at these ages: ten infants have died in their first year, and five adults at the ages 60, 70, 75, an d 80. This towns life expectancy will be calculated as 20. Should the young start worrying as they approach the age of 20?

Do life expectancy numbers describe something that any individual within that group should expect?

### Example 6: Generational Changes

How do these two generations compare?
 Generation 1 Generation 2 Parent of 1 earning \$60,000 1 grown child earning \$60,000 Parent of 5 earning \$20,000 5 grown children each earning \$20,000 Average income: \$40,000 Average income: \$27,000
The average income dropped from \$40,000 to \$27,000 from one generation to the next, yet the offspring grew up to earn the same as their parents. Is correct to say that incomes dropped? Is it correct to say that incomes stayed the same? How could this data be presented in a clearer way than average income?

### Comparing real data to our examples

In many locations and times in history life expectancy has been reported to be less than 30. How should we interpret this? Did most people really die when they were 30? How would this have affected families? How old would most children have been when their parents died? Who would have raised the children?

Many observers make powerful claims about the members of different groups after average incomes are compared? Do averages really represent the individuals? Imagine what the average income data will look like for any group that Bill Gates is a member of. During the first decade of the 21st century, the average income rose, but the median income stagnated. What did this mean?

How would you generalize these examples to other statistical data sets? What alternatives to averaging would be more informative? How would you represent the difference between the lowest and the average? Or the highest and the average? Do some biased data sets lend themselves well to normalizing? If so, How would you do it?