(Students) thinking like computer scientists

It generally isn’t too difficult to program a computer to do exactly what you want it to do. This requires, however, that you know exactly what you want it to do. In the course of doing this, you make certain assumptions because you think you know beforehand what you want.

You set the thermostat to be 68º because you think that will be warm enough. Then when you realize that it isn’t, you continue to turn it up, then down, and eventually settle on a temperature. This process requires you as a human to constantly sense your environment, evaluate the conditions, and change an input such as the heat turning on or off to improve them. This is a continuous process that requires constant input. While the computer can maintain room temperature pretty effectively, deciding whether the temperature is a good one or not is something that cannot be done without human input.

The difficulty is figuring out exactly what you want. I can’t necessarily say what temperature I want the house to be. I can easily say ‘I’m too warm’ or ‘I’m too cold’ at any given time. A really smart house would be able to take those simple inputs and figure out what temperature I want.

I had an idea for a project for exploring this a couple of years ago. I could try to tell the computer using levels of red, green, and blue exactly what I thought would define something that looks ‘green’ to me. In reality, that’s completely backwards. The way I recognize something as being green never has anything to do with RGB, or hue or saturation – I look at it and say ‘yes’ or ‘no’. Given enough data points of what is and is not green, the computer should be able to find the pattern itself.

With the things I’ve learned recently programming in Python, I was finally able to make this happen last night: a page with a randomly selected color presented on each load:
Screen Shot 2013-04-18 at 9.51.51 PM

Sharing the website on Twitter, Facebook, and email last night, I was able to get friends, family, and students hammering the website with their own perceptions of what green does and does not look like. When I woke up this morning, there were 1,500 responses. By the time I left for school, there were more then 3,000, and tonight when my home router finally went offline (as it tends to do frequently here) there were more than 5,000. That’s plenty of data points to use.

I decided this was a perfect opportunity to get students finding their own patterns and rules for a classification problem like this. There was a clearly defined problem that was easy to communicate, and I had lots of real data data to use to check a theoretical rule against. I wrote a Python program that would take an arbitrary rule, apply it to the entire set of 3,000+ responses from the website, and compare its classifications of green/not green to that of the actual data set. A perfect rule for the data set would correctly predict the human data 100% of the time.

I was really impressed with how quickly the students got into it. I first had them go to the website and classify a string of colors as green or not green – some of them were instantly entranced b the unexpected therapeutic effect of clicking the buttons in response to the colors. I soon convinced them to move forward to the more active role of trying to figure out their own patterns. I pushed them to the http://www.colorpicker.com website to choose several colors that clearly were green, and others that were not, and try to identify a rule that described the RGB values for the green ones.

When they were ready, they started categorizing their examples and being explicit in the patterns they wanted to try. As they came up with their rules (e.g. green has the greatest level) we talked about writing that mathematically and symbolically – suddenly the students were quite naturally thinking about inequalities and how to write them correctly. (How often does that happen?) I showed them where I typed it into my Python script, and soon they were telling me what to type.

rgbwork

In the end, they figured out that the difference of the green compared to each of the other colors was the important element, something that I hadn’t tried when I was playing with it on my own earlier in the day. They really got into it. We had a spirited discussion about whether G+40>B or G>B+40 is correct for comparing the levels of green and blue.

In the end, their rule agreed with 93.1% of the human responses from the website, which beat my personal best of 92.66%. They clearly got a kick out of knowing that they had not only improved upon my answer, but that their logical thinking and mathematically defined rules did a good job of describing the thinking of thousands of people’s responses on this question. This was an abstract task, but they handled it beautifully, both a tribute to the simplicity of the task and to their own willingness to persist and figure it out. That’s perplexity as it is supposed to be.

Other notes:

  • One of the most powerful applications of computers in the classroom is getting students hands on real data – gobs of it. There is a visible level of satisfaction when students can talk about what they have done with thousands of data points that have meaning that they understand.
  • I happened upon the perceptron learning algorithm on Wikipedia and was even more excited to find that the article included Python code for the algorithm. I tweaked it to work with my data and had it train using just the first 20 responses to the website. Applying this rule to the checking script I used with the students, it correctly predicted 88% of the human responses. That impresses me to no end.
  • A relative suggested that I should have included a field on the front page for gender. While I think it may have cut down on the volume of responses, I am hitting myself for not thinking to do that sort of thing, just for analysis.
  • A student also indicated that there were many interesting bits of data that could be collected this way that interested her. First on the list was color-blindness. What does someone that is color blind see? Is it possible to use this concept to collect data that might help answer this question? This was something that was genuinely interesting to this student, and I’m intrigued and excited by the level of interest she expressed in this.
  • I plan to take a deeper look at this data soon enough – there are a lot of different aspects of it that interests me. Any suggestions?
  • Anyone that can help me apply other learning algorithms to this data gets a beer on me when we can meet in person.

7 thoughts on “(Students) thinking like computer scientists

  1. Hi Evan, great idea indeed! I first found out about this from Dan’s website and I participated in the first round of the contest. When I realised that this was a real Machine Learning problem (with a complex/noisy target function) and a large number of “hidden” data points I was really excited! A few months ago I took Caltech’s online course “Learning From Data” which I really enjoyed (and worked hard for). “Do you know blue” was an excellent example to revisit some of the material and test my knowledge in a practical problem.
    I run the perceptron algorithm too, regression with non-linear transforms, and even support vectors machines (SVM) with RBF kernel (probably all these sound like gibberish to you). The main problem was the few data points given (30 points/colours to quickly test your rule). With 30 points you cannot hope for much in machine learning. I tried to gather more points by doing the tests again, but after a few days I was getting the same blue points. So at the end I had 140 points/colours of which 44 where blue.
    I run different non-linear regressions and even SVM. The best approach was regression with a cubic non-linear transform. It scored 2nd in the overall board and it seemed to come closer to 1st with more points added in the database. The SVM approach did not seem to work, as it was merely fitting the 140 points and would not generalize well to the overall database.
    The first contest ended and then the site opened again after a few days. I decided to play a bit more with it. I saw that the original points/colours were kept. So I entered the same cubic rule and it scored slightly better than before. Then I thought I’d try something major. You have a page (/blueis) that you give all the colors in the database in little boxes, separated in two regions blue and not blue. I saved these regions as images and then wrote a program to parse them and get r,g,b, values for all the different colours. So now I have the entire database (4090 points at the time). Now I could run the algorithm on the entire database and see what are the best results I could get. This is not really machine learning, it is more like fitting 🙂
    Curiously enough the cubic regression on the 4090 points works slightly worse than the cubic regression on the 140 points I initially had!
    Then I noticed some things that made me scratch my head even more.
    I tried a 4th degree plane to separate the blue from non blue, as I was expecting it would do better than cubic with 4090 points. It did. With my calculations it was giving me an accuracy of 93.4%. But when I applied the rule with your website I got 80%. I assumed 80% is what you get if your rule compute always false. So maybe my rule was not parsed/calculated correctly (it is a long rule afterall). I checked this assumption by entering an always-false rule (r=300) and noticed that although it was close, it was not identical.
    Moreover I start noticing that accuracy scores were changing in the standings *without* the number of colours changing. The number of responses are changing, but I assume, that the colours in the database are already fixed, i.e, when a color enters the database, it does not appear in the test and it does not change its blueness value.
    If you have thoughts on the last points I’d love to hear them.

  2. I found a mistake in my code, that explains the poor performance of the 4-degree plane. After correction, I should have seen 93.75%. Still the webpage give me 91.36%. My calculations are based on all 4304 colours currently in the database. Maybe there are rounding errors as many terms in my rule are very small (10^-9)

    1. This is very cool – thanks for your comments, and I’m sorry about the delay in responding. We’re in the final stages of our school year here.

      Your steps in actually using the machine learning algorithms from the CS class are interesting to me here – I’d like to know the details. What programming language did you use? (PLEASE let it be python!) I find it fascinating that these algorithms can figure out patterns so well when there are multiple variables as in this case.

      If I understand Dave’s programming structure correctly, the colors are all voted on 10 times before being entered into the database. When your rule is tested, it is tested against the 10 rules you voted on, and ten other colors from the database. Once a color has been voted on 10 times, its blueness percentage shouldn’t change. The changes you see in the percentage are from new colors entering the database after receiving 10 votes.

      I’ll need some more time to sift through your process, but your analysis is exactly the sort of thing I want to learn more about. Binary classification is really cool, and I like the idea of being able to train a learning algorithm from saying yes/no based on a set of examples. Thanks for participating!

      1. Yes, it’s python 🙂 But I am using a few libraries, like numpy and libsvm.
        I would be glad to share my code. I am not sure how readable it is though 🙂
        “Once a color has been voted on 10 times, its blueness percentage shouldn’t change. The changes you see in the percentage are from new colors entering the database after receiving 10 votes.”
        Yes, this is/was my understanding too, but it does not seem to add up. The standings page give you the total number of colours in the database. I am claiming that the accuracy changed *without* the total number of colours changing.
        Also there seem to be inaccuracies in the way the rules are evaluated, I left a comment at Dan’s blog (the last comment I believe).

Leave a Reply

Your email address will not be published. Required fields are marked *