Comments on: (Students) thinking like computer scientists

By: Do You Know Blue? | Mathy McMatherson

Do You Know Blue? | Mathy McMatherson — Tue, 23 Jun 2015 22:38:01 +0000

[…] http://evanweinberg.com/2013/04/19/students-thinking-like-computer-scientists/ […]

By: Bob

Bob — Fri, 21 Jun 2013 22:05:39 +0000

http://blog.xkcd.com/2010/05/03/color-survey-results/
Something vaguely related.

By: Thanassis

Thanassis — Mon, 10 Jun 2013 01:48:50 +0000

In reply to Evan Weinberg.

Yes, it’s python 🙂 But I am using a few libraries, like numpy and libsvm.
I would be glad to share my code. I am not sure how readable it is though 🙂
“Once a color has been voted on 10 times, its blueness percentage shouldn’t change. The changes you see in the percentage are from new colors entering the database after receiving 10 votes.”
Yes, this is/was my understanding too, but it does not seem to add up. The standings page give you the total number of colours in the database. I am claiming that the accuracy changed *without* the total number of colours changing.
Also there seem to be inaccuracies in the way the rules are evaluated, I left a comment at Dan’s blog (the last comment I believe).

By: Evan Weinberg

Evan Weinberg — Mon, 10 Jun 2013 00:28:39 +0000

In reply to Thanassis.

This is very cool – thanks for your comments, and I’m sorry about the delay in responding. We’re in the final stages of our school year here.

Your steps in actually using the machine learning algorithms from the CS class are interesting to me here – I’d like to know the details. What programming language did you use? (PLEASE let it be python!) I find it fascinating that these algorithms can figure out patterns so well when there are multiple variables as in this case.

If I understand Dave’s programming structure correctly, the colors are all voted on 10 times before being entered into the database. When your rule is tested, it is tested against the 10 rules you voted on, and ten other colors from the database. Once a color has been voted on 10 times, its blueness percentage shouldn’t change. The changes you see in the percentage are from new colors entering the database after receiving 10 votes.

I’ll need some more time to sift through your process, but your analysis is exactly the sort of thing I want to learn more about. Binary classification is really cool, and I like the idea of being able to train a learning algorithm from saying yes/no based on a set of examples. Thanks for participating!

By: Thanassis

Thanassis — Fri, 31 May 2013 05:28:33 +0000

I found a mistake in my code, that explains the poor performance of the 4-degree plane. After correction, I should have seen 93.75%. Still the webpage give me 91.36%. My calculations are based on all 4304 colours currently in the database. Maybe there are rounding errors as many terms in my rule are very small (10^-9)

By: Thanassis

Thanassis — Fri, 31 May 2013 04:21:02 +0000

Hi Evan, great idea indeed! I first found out about this from Dan’s website and I participated in the first round of the contest. When I realised that this was a real Machine Learning problem (with a complex/noisy target function) and a large number of “hidden” data points I was really excited! A few months ago I took Caltech’s online course “Learning From Data” which I really enjoyed (and worked hard for). “Do you know blue” was an excellent example to revisit some of the material and test my knowledge in a practical problem.
I run the perceptron algorithm too, regression with non-linear transforms, and even support vectors machines (SVM) with RBF kernel (probably all these sound like gibberish to you). The main problem was the few data points given (30 points/colours to quickly test your rule). With 30 points you cannot hope for much in machine learning. I tried to gather more points by doing the tests again, but after a few days I was getting the same blue points. So at the end I had 140 points/colours of which 44 where blue.
I run different non-linear regressions and even SVM. The best approach was regression with a cubic non-linear transform. It scored 2nd in the overall board and it seemed to come closer to 1st with more points added in the database. The SVM approach did not seem to work, as it was merely fitting the 140 points and would not generalize well to the overall database.
The first contest ended and then the site opened again after a few days. I decided to play a bit more with it. I saw that the original points/colours were kept. So I entered the same cubic rule and it scored slightly better than before. Then I thought I’d try something major. You have a page (/blueis) that you give all the colors in the database in little boxes, separated in two regions blue and not blue. I saved these regions as images and then wrote a program to parse them and get r,g,b, values for all the different colours. So now I have the entire database (4090 points at the time). Now I could run the algorithm on the entire database and see what are the best results I could get. This is not really machine learning, it is more like fitting 🙂
Curiously enough the cubic regression on the 4090 points works slightly worse than the cubic regression on the 140 points I initially had!
Then I noticed some things that made me scratch my head even more.
I tried a 4th degree plane to separate the blue from non blue, as I was expecting it would do better than cubic with 4090 points. It did. With my calculations it was giving me an accuracy of 93.4%. But when I applied the rule with your website I got 80%. I assumed 80% is what you get if your rule compute always false. So maybe my rule was not parsed/calculated correctly (it is a long rule afterall). I checked this assumption by entering an always-false rule (r=300) and noticed that although it was close, it was not identical.
Moreover I start noticing that accuracy scores were changing in the standings *without* the number of colours changing. The number of responses are changing, but I assume, that the colours in the database are already fixed, i.e, when a color enters the database, it does not appear in the test and it does not change its blueness value.
If you have thoughts on the last points I’d love to hear them.

By: dy/dan » Blog Archive » Great Lessons: Evan Weinberg’s “Do You Know Blue?”

Thu, 23 May 2013 22:13:33 +0000

[…] Weinberg posted "(Students) Thinking Like Computer Scientists" a month ago and the lesson idea haunted me since. It realizes the promise of digital, networked […]