(Students) thinking like computer scientists

It generally isn't too difficult to program a computer to do exactly what you want it to do. This requires, however, that you know exactly what you want it to do. In the course of doing this, you make certain assumptions because you think you know beforehand what you want.

You set the thermostat to be 68ยบ because you think that will be warm enough. Then when you realize that it isn't, you continue to turn it up, then down, and eventually settle on a temperature. This process requires you as a human to constantly sense your environment, evaluate the conditions, and change an input such as the heat turning on or off to improve them. This is a continuous process that requires constant input. While the computer can maintain room temperature pretty effectively, deciding whether the temperature is a good one or not is something that cannot be done without human input.

The difficulty is figuring out exactly what you want. I can't necessarily say what temperature I want the house to be. I can easily say 'I'm too warm' or 'I'm too cold' at any given time. A really smart house would be able to take those simple inputs and figure out what temperature I want.

I had an idea for a project for exploring this a couple of years ago. I could try to tell the computer using levels of red, green, and blue exactly what I thought would define something that looks 'green' to me. In reality, that's completely backwards. The way I recognize something as being green never has anything to do with RGB, or hue or saturation - I look at it and say 'yes' or 'no'. Given enough data points of what is and is not green, the computer should be able to find the pattern itself.

With the things I've learned recently programming in Python, I was finally able to make this happen last night: a page with a randomly selected color presented on each load:
Screen Shot 2013-04-18 at 9.51.51 PM

Sharing the website on Twitter, Facebook, and email last night, I was able to get friends, family, and students hammering the website with their own perceptions of what green does and does not look like. When I woke up this morning, there were 1,500 responses. By the time I left for school, there were more then 3,000, and tonight when my home router finally went offline (as it tends to do frequently here) there were more than 5,000. That's plenty of data points to use.

I decided this was a perfect opportunity to get students finding their own patterns and rules for a classification problem like this. There was a clearly defined problem that was easy to communicate, and I had lots of real data data to use to check a theoretical rule against. I wrote a Python program that would take an arbitrary rule, apply it to the entire set of 3,000+ responses from the website, and compare its classifications of green/not green to that of the actual data set. A perfect rule for the data set would correctly predict the human data 100% of the time.

I was really impressed with how quickly the students got into it. I first had them go to the website and classify a string of colors as green or not green - some of them were instantly entranced b the unexpected therapeutic effect of clicking the buttons in response to the colors. I soon convinced them to move forward to the more active role of trying to figure out their own patterns. I pushed them to the http://www.colorpicker.com website to choose several colors that clearly were green, and others that were not, and try to identify a rule that described the RGB values for the green ones.

When they were ready, they started categorizing their examples and being explicit in the patterns they wanted to try. As they came up with their rules (e.g. green has the greatest level) we talked about writing that mathematically and symbolically - suddenly the students were quite naturally thinking about inequalities and how to write them correctly. (How often does that happen?) I showed them where I typed it into my Python script, and soon they were telling me what to type.

rgbwork

In the end, they figured out that the difference of the green compared to each of the other colors was the important element, something that I hadn't tried when I was playing with it on my own earlier in the day. They really got into it. We had a spirited discussion about whether G+40>B or G>B+40 is correct for comparing the levels of green and blue.

In the end, their rule agreed with 93.1% of the human responses from the website, which beat my personal best of 92.66%. They clearly got a kick out of knowing that they had not only improved upon my answer, but that their logical thinking and mathematically defined rules did a good job of describing the thinking of thousands of people's responses on this question. This was an abstract task, but they handled it beautifully, both a tribute to the simplicity of the task and to their own willingness to persist and figure it out. That's perplexity as it is supposed to be.

Other notes:

  • One of the most powerful applications of computers in the classroom is getting students hands on real data - gobs of it. There is a visible level of satisfaction when students can talk about what they have done with thousands of data points that have meaning that they understand.
  • I happened upon the perceptron learning algorithm on Wikipedia and was even more excited to find that the article included Python code for the algorithm. I tweaked it to work with my data and had it train using just the first 20 responses to the website. Applying this rule to the checking script I used with the students, it correctly predicted 88% of the human responses. That impresses me to no end.
  • A relative suggested that I should have included a field on the front page for gender. While I think it may have cut down on the volume of responses, I am hitting myself for not thinking to do that sort of thing, just for analysis.
  • A student also indicated that there were many interesting bits of data that could be collected this way that interested her. First on the list was color-blindness. What does someone that is color blind see? Is it possible to use this concept to collect data that might help answer this question? This was something that was genuinely interesting to this student, and I'm intrigued and excited by the level of interest she expressed in this.
  • I plan to take a deeper look at this data soon enough - there are a lot of different aspects of it that interests me. Any suggestions?
  • Anyone that can help me apply other learning algorithms to this data gets a beer on me when we can meet in person.