## Snacking on Statistics and Variability

One of my goals this year in Algebra 2 has been to include more discrete math, statistics, and probability when I can. I’ve been convinced by all sorts of smart people that as **traditional** as it may be to have Calculus as the ultimate goal for math students, statistics and probability are the math that people are more likely to need to use. It compels me to include it in my courses as more than a separate unit.

As if I didn’t need another reason, we are also in a spell of reviewing properties of radicals, and it’s refreshing to get my students thinking differently after a period of simplifying, multiplying, and rationalizing.

I gave them the following scenario:

- Imagine yourself in twenty years – you are, of course, rich and famous. You are hiring someone to fly your personal jet. your last pilot fell asleep on the job, though he was luckily parked at the gate when it happened.
Two pilots have applied for the position, both equally qualified as pilots. In order to help you make your decision (and avoid the previous situation), you have asked them to keep track of how many hours of sleep they get over a two week period before the interview.

Two weeks later, they return to you with the following data:

- What differences do you notice about the two pilots?
- What calculations would you make to describe any quantitative differences between them?
- Which one would you hire? Why?

Note: This data is completely made up. My new semi-obsession is in using normal distributions to mess up clean functions and force my students (in physics and math) to deal with messy data.

The students almost immediately started calculating means – exactly what I would have expected them to do given what they have been taught to do when faced with a table of data like this. Some did so manually, others used the Geogebra file that generated the data to make their calculation.

The results were fairly consistent – everyone chose the second pilot. When asked why, they said the pilot gets more sleep on average, and so would be the better choice.

When I asked who was more consistent in their sleep, they were easily able to identify the first pilot. When asked why, many had explanations that correctly danced around how most of the data was closer to the average. No students really brought up this fact before I asked though, which leads me to believe they observed one of two things:

- The importance of the consistency doesn’t really matter given the difference in the means for how much sleep the pilots got.
- They didn’t think to look at consistency at all.

Some other interesting tidbits:

- None of the students thought to construct a histogram to look at the data. When asked, about half of the class said they knew how to construct a histogram. I didn’t dig any deeper to flesh this out. I was going to throw one together in Geogebra, but decided that might be something we should look at with more time available.
- Half of the class that is taking AP Psychology didn’t think about finding standard deviation. Again, I didn’t dig any deeper to find if this was because they didn’t know that it might apply here, or because they thought the values of the means were more important.

There is plenty here to generate discussion, but the one thing I wonder about is if variation about a mean is a concept that comes naturally to students to consider when given a set of 1-D data. One of my professors mentioned offhand in an experimental design class that any measurement you take is a distribution, a point which I have never forgotten. Up to that moment, I had never really thought much about it either.

Sure, I had collected data in my biology, chemistry, and physics classes before and knew I had to take multiple data points. All I knew then was that doing so made my data “better”. More data makes things better. Get it? My understanding in high school science was also that you never measure the same quantity at the exact same value ten times in a row because someone in your lab group is always messing it up or doing it wrong. Averaging things together smooths that out. I don’t recall ever discussing in either math or science class that the true beauty of statistics comes from managing, communicating, and understanding variability in data that will never really go away. I have always shuddered when students write lab report conclusions that discuss how “the data are/is wrong because” rather than focusing on what the data reveals about an experiment. We definitely want to work to minimize experimental error, but sometimes the variation in the data is an important characteristic of what is being measured.

Maybe this is something that needs to be explicitly taught in the way we present statistics to our students. It seems like something that needs to be drawn out over time, rather than in one big statistics unit of a course that focuses on other things. I think using technology to handle the mechanics of calculating statistical quantities allows students to focus more on what the statistics say and develop their intuition about it. We risk letting the important ideas of variation and statistics collect dust and stagnate as another box of content for students to throw in the closet of their busy, distracted brains.