Scaling in Education

From today's New York Times article, The Broken Promises of Choice in New York City Schools.

"Ultimately, there just are not enough good schools to go around. And so it is a system in which some children win and others lose because of factors beyond their control — like where they live and how much money their families have."

The structures of education do not scale well. This is because good lessons, good classrooms, and good schools are all sourced from people, and people do not scale well. People cannot be copied. The human mind is exceedingly, beautifully complex - a fact that underlies the wonderful challenge of teaching. The talents, ideas, and experience of people that understand this reality are essential to making a school what it can be.

The work that must be done centers on building a culture that acknowledges and values the human basis of our profession. It takes energy and time from human beings to turn an empty room into a learning space. Budgeting for all of the costs of the inputs, financial or otherwise, is necessary to do this work.

Ideas scale easily because it costs virtually nothing to share them. Cultivating the relationships that are necessary to use those ideas to make opportunities for children needs to be our focus.

People matter. We should be skeptical of anyone that seeks to minimize this reality.

Probability, Spreadsheets, and the Citizen Database

I've grown tired of the standard probability questions involving numbers of red, blue, and green marbles. Decks of cards are culturally biased and require a lot of background information to get in the game, as I wrote about a while ago. It seems that if there's any place where computational thinking should come into play, it's with probability and statistics. There are lots of open data sets out there, but few of them are (1) easy to parse for what a student might be looking for and (2) are in a form that allows students to easily make queries.

If you know of some that you've used successfully with classes, by all means let me know.

A couple of years ago, I built a web programming exercise to use to teach students about database queries. Spreadsheets are a lot more accessible though, so I re-wrote it to generate a giant spreadsheet of data for my Precalculus students to dig into as part of a unit on counting principle, probability, and statistics. I call it the Citizen Database, and you can access it here.

I wanted a set of data that could prompt all sorts of questions that could only be answered easily with a spreadsheet counting command. The citizens in the database can be described as follows:

  • Each citizen belongs to one of twelve districts, numbered 1 - 12.
  • Citizens are male or female.
  • Citizens have their ages recorded in the database. Citizens 18 and below are considered minors. Citizens older than 18 and younger than 70 are adults. All citizens aged 70 and above are called seniors.
  • Citizens each prefer one of the two sports teams: the Crusaders or the Orbiters.
  • If a citizen is above the age of 18, they can vote for Mayor. There are two families that always run for mayor: the Crenshaw family and the Trymenaark family.
  • Each citizen lives in either a home, apartment, villa, or mansion.
  • A citizen above the age of 18 also uses some type of vehicle for transportation. They may rent a car, own a car, have a limousine, or take a helicopter.

I wrote another document showing how to do queries on a spreadsheet of data using some commands here. My students asked for some more help on creating queries using the COUNTIFS command on Google Sheets, so I also created the video below.

The fun thing has been seeing students acknowledge the fact that answering these questions would be a really poor use of the human brain, particularly given how quickly the computer comes up with an answer. One student went so far as to call this side-trip into spreadsheet usage "really actually useful", a comment which I decided only to appreciate.

Programming in Javascript, Python, Swift, whatever is great, but it takes a while to get to the point where you can do something that is actually impressive. Spreadsheets are an easy way in to computational thinking, and they are already installed on most student (and teacher) computers. We should be using them more frequently than we probably are in our practice.

If you are interested in how I generated the database, you can check out the code here at CodePen:

See the Pen CitizenDatabaseCreator by Evan Weinberg (@emwdx) on CodePen.

Making Groups - A Genetic Algorithm Experiment

I've wanted to experiment with genetic algorithms for a long time, but never quite found the time to make it happen. I guess part of it was that I never believed the algorithm would work and find what I wanted.  I decided recently to pick this project back up after a long time (with some prodding from good people) and actually make it work. I think the major push came when I realized that I wanted the ability to make mixed groups, or homogeneous groups, and balance gender, and prevent certain students from being together. Was I over-constraining? Was the possibility that I was over-constraining keeping me from even trying to do this in the first place?

Thus, I decided to actually make this happen. I also decided I wanted to make it look much nicer than the Python version I've been using now for over four years.

You can see the code directly here at CodePen, or play with it below.

See the Pen GroupMaker by Evan Weinberg (@emwdx) on CodePen.0

The basic algorithm is this:

  • Fill in a list of students with names, genders, skill levels, and an optional list of names with whom a given student should not be grouped. Line 2
  • Generate a bunch of random groups of students with these properties. For each group, calculate a series of metrics that the fitness of a given group. Lines 45-156
  • Calculate the score of a grouping, which consists of a full set of groups that contain all of the students of the class. Line 200
  • Generate a bunch of groupings, sort them according to score. Take the top 10 groups, and make swaps of students between groups to make a long list of groups. (This is lines 214-224.) This is the mutation step of the genetic algorithm. Sort them again according to score.
  • Repeat for a few generations, then take the top group.

It's pretty fascinating to watch it work. I made the scoring step uniform for gender by tweaking the coefficients in Line 200. You could also make this score value gender balance, a range of abilities, or anything else.

This is screaming for a nicer UI, which I have in the works. For now, it's out there in its fairly un-commented state. If you want to hack this to use with your own students, you'll want to tweak the student list in Line 2, the numbers of groups of each size (groups of 1, groups of 2, groups of 3, and so on) in Line 242, and possibly the values I use in the score generation line in line 200.

An Experiment: Swapping Numerical Grades for Skill-Levels and Emoji

I decided to try something different for my pre-Calculus class for the past three weeks. There was a mix of factors that led me to do this when I did:

  • The quarter ended one week, with spring break beginning at the end of the next. Not a great time to start a full unit.
  • I knew I wanted to include some conic sections content in the course since it appears on the SAT II, and since the graphs appear in IB and AP questions. Some familiarity might be useful. In addition, conic sections also appear as plus standards within CCSS.
  • The topic provides a really interesting opportunity to connect the worlds of geometry and algebra. Much of this connection, historically, is wrapped up in algebraic derivations. I wanted to use technology to do much of the heavy lifting here.
  • Students were exhibiting pretty high levels of stress around school in general, and I wanted to provide a bit of a break from that.
  • We are not in a hurry in this class.

Before I share the details of what I did, I have to share the other side to this. A long time ago, I was intrigued by the conversation started around the Twitter hashtag #emojigrading, a conversational fire stoked by Jon Smith, among many others. I like the idea of using emoji to communicate, particularly given my frustrations over the past year on how communication of grades as numbers distort their meaning and imply precision that doesn't exist. Emoji can be used communicate quickly, but can't be averaged.

I was also very pleased to find out that PowerSchool comments can contain emoji, and will display them correctly based on the operating system being used.

So here's the idea I pitched to students:

  • Unit 7 standards on conic sections would not be assessed with numerical grades, ever. As a result, these grades would not affect their numerical average.
  • We would still have standards quizzes and a unit exam, but instead of grades of 6, 8, and 10, there would be some other designation that students could help select. I would grade the quizzes and give feedback during the class, as with the rest of the units this year.
  • Questions related to Unit 7 would still appear on the final exam for the semester, where scores will be point based.

I also let students submit some examples of an appropriate scale. Here's what I settled on based on their recommendations:

I also asked them for their feedback before this all began. Here's what they said:

  • Positive Feedback:
    • Fourteen students made some mention of a reduction in stress or pressure. Some also mentioned the benefits of the grade being less specific being a good thing.
    • Three students talked about being able to focus more on learning as a result. Note that since I already use a standards based grading system, my students are pretty aware of how much I value learning being reflected in the grade book.
  • Constructive Feedback:
    • Students were concerned about their own motivation about studying or reassessing knowing that the grades would not be part of the numerical average.
    • Some students were concerned about not having knowledge about where they are relative to the boundaries of the grades. Note: I don't see this by itself as a bad thing, but perhaps as the start of a different conversation. Instead of how to raise my grade, it becomes how I develop the skills needed to reach a higher level.
    • There were also mentions of 'objectivity' and how I would measure their performance relative to standards. I explained during class that I would probably do what I always do: calculate scores on individual standards, and use those scores to inform my decisions on standards levels. I was careful to explain that I wasn't going to change how I generate the standards scores (which students have previously agreed are fair) but how I communicate them.

I asked an additional question about what their parents would think about the change. My plan was to send out an email to all parents informing them of the specifics of the change, and I wanted students to think proactively about how their parents would respond. Their response in general: "They won't care much." This was surprising to me.

So I proceeded with the unit. I used a mix of direct instruction, some Trello style lists of tasks from textbooks, websites, and Desmos, and lots of circulating and helping students individually where they needed it. I tried to keep the only major change to this unit to be the communication of the scores through the grade book using the emoji and verbal designation of beginner, intermediate, expert. As I also said earlier, I gave skills quizzes throughout.

The unit exam was a series of medium level questions that I wanted to use to gauge where students were when everything was together. As with my other units, I gave a review class after the spring break where students could work on their own and in groups, asking questions where they needed it. Anecdotally, the class was as focused and productive as for any other unit this year.

I was able to ask one group some questions about this after their unit test, and here's how they responded:

The fact that the stress level was the same, if not less, was good to see. The effort level did drop in the case of a couple of students here, but for the most part, there isn't any major change. This class as a whole values working independently, so I'm not surprised that none reported working harder during this unit.

I also asked them to give me general feedback about the no-numerical-grades policy. Some of them deleted their responses before I could take a look, but here's some of what they shared:

    • Three students confirmed a lower stress level. One student explained that since there was no numerical grade, she "...couldn't force/motivate [her]self to study."
    • Five students said the change made little to no difference to them. One student summed it up nicely: "It wasn't much different than the numerical grades, but it definitely wasn't worse."
    • One student said this: "The emojis seemed abstract so I wasn't as sure of where I was within the unit compared to numbers." This is one of a couple of the students that had concerns about knowing how to move from one level to the next, so the unit didn't change this particular student's mind.


  • This was a really thought-provoking exercise. A move away from numerical grades is a compelling proposition, but a frequent argument against it is that grades motivate students. By no means have I disproven this fact in the results of my small case study. If a move like this can have a minimal effect on motivation, and students get the feedback they need to improve, it offers an opportunity for considering similar experiments in my other classes.

    There are a couple questions I still have on this. Will students choose to reassess on the learning standards from unit 7, given that they won't change the numerical average when we return to numerical grades for unit 8? The second involves the longer term retention of this material. How will students do on these questions when they appear on the final exam?

    I'll return to this when I have more answers.


Trello for Class Organization

Our school hosted the Vietnam Technology Conference this past February.

(Yes, I'm just getting around to talking about it. Don't judge.)

One of the sessions I attended was about agile development in education, specifically as a way to organize the classroom into a room of independently functioning teams that are all trying to reach the goal of learning content. The full details on the philosophy can be found at I most certainly am not following the full implementation described there.

My interest was piqued by the possibility of using a Trello board to organize tasks for my classroom. I always make a digital handout for each class that consists of a series of tasks, links, problems, and chunks of information. Within the class block, I weave these together in a mix of direct instruction, group tasks, PearDeck activities, Desmos explorations, and so on. I advise students not to just do every problem on my handouts from start to finish because there is an order to my madness. I have a plan for students to go through the different activities, but I don't always clearly indicate that plan on these handouts.

This is where Trello came in. For my past two units in PreCalculus, I broke up the tasks on my digital handout into tasks on a Trello board. This consists of a list of tasks, and then three columns labeled 'to-do', 'in progress', and 'completed'.

I put students in groups, and then shared this Trello board here with them. Their group needed to make a Trello board for their group, and then copy the day's tasks onto their group's board. I told students how long a 'sprint' (an agile development term) was going to be, and the group would decide which tasks they would collectively (or individually) do during that time. They moved these tasks into the appropriate column of the board. As I continued to use the system, I realized that I could color code tasks according to the learning standards, and identify them according to the day of class. This helped students to understand the context of individual tasks later on.

The thing I liked the most about this experiment was that it actually enabled students to take charge of what they were doing during the lesson. I sometimes said that I was going to go over a concept at a particular time during the class block, and that teams could pay attention or not depending on their needs. This could be replaced by videos of this direct instruction to allow for more asynchronous learning for the students that weren't ready at that time. There were some great conversations between students about what they did and didn't understand. I could circulate and interject when I saw the need.

This put me in the position of curating interesting and productive tasks related to the course content, which is a lot more fun than planning a lecture. The students also liked being able to talk to each other and work at their own pace. Here's some feedback from the students:

What they liked:

  • "I think it was nice how I could do things by whatever pace I felt more comfortable with to an extent since we did things as a small group rather than as an entire class."
  • "It kept me working and thinking the whole class. It also helped me work out problems independently which helped my understanding."
  • "I liked the ability to keep track of all my work, as well as knowing all the problems beforehand. I also like being able to have more time to discuss with friends to understand how we each came up with various solutions."

What could be improved:

  • "Maybe I rather stick with traditional teaching methods. This is borderline self-taught and it's not so much better with group of people that I don't know well."
  • "I think it would be better to go through the theory and concepts of the standard first, meaning how to do a problem as a class before splitting into smaller groups for individual/team work."
  • "For future classes, I would also like informative videos to be included so that we can learn new topics this way."

This feedback made it easy to adjust for the next classes, and I continued to tweak in the next unit. The students really like the act of moving tasks between the different columns on the Trello board too. I really like the ease with which students can copy tasks, move them around, and plan their time independently. There are some good habits here that I'll be thinking about expanding to other classes later this semester or for the next school year.

Generating the Mandelbrot Set with PearDeck

One of the benefits of being a digital packrat having a digital file cabinet is that every old file can be a starting point for something new.

In PreCalculus, I decided to do a short conic sections unit to fill the awkward two weeks between the end of the third quarter and the start of spring break. We've centered all of our conversations around the idea of a locus of points. I realized yesterday afternoon that the Algebra 2 activity I described here would be a great way to have some inquiry and experimentation on the day before break.

The online collaborative tools have improved considerably since 2012 when I first did this. I put much of the lesson into Google Docs and PearDeck which made sharing answers for the final reveal much easier. Here's what the students had for values that either "escaped" or were "trapped" in the Complex Plane:

I compared this to the pixelated Mandelbrot set I hacked together in Processing from Daniel Shiffman's code five years ago. Still works!

You can access the entire digital lesson with links as a Google Doc here.

My Reassessment Queue

We're almost at the end of the third quarter over here. Here's the current plot of number of reassessments over time for this semester:

I'm energized though that the students have bought into the system, and that my improved workflow from last semester is making the process manageable. My pile of reassessment papers grows faster than I'd like, but I've also improved the physical process of managing the paperwork.

While I'm battling performance issues on the site now that there's a lot of data moving around on there, the thing I'm more interested is improving participating. Who are the students that aren't reassessing? How do I get them involved? Why aren't they doing so?

There are lots of issues at play here. I'm loving how I've been experimenting a lot lately with new ways of assessing, structuring classes, rethinking the grade book, and just plain trying new activities out on students. I'll do a better job of sharing out in the weeks to come.

SBG and Leveling Up, Part 3: The Machine Thinks!

Read the first two posts in this series here:

SBG and Leveling Up, Part 1
SBG and Leveling Up, Part 2: Machine Learning

...or you can read this quick review of where I've been going with this:

  • When a student asks to be reassessed on a learning standard, the most important inputs that contribute to the student's new achievement level are the student's previously assessed level, the difficulty of a given reassessment question, and the nature of any errors made during the reassessment.
  • Machine learning offers a convenient way to find patterns that I might not otherwise notice in these grading patterns.

Rather than design a flow chart that arbitrarily figures out the new grade given these inputs, my idea was to simply take different combinations of these inputs, and use my experience to determine what new grade I would assign. Any patterns that exist there (if there are any) would be determined by the machine learning algorithm.

I trained the neural network methodically. These were the general parameters:

  • I only did ten or twenty grades at any given time to avoid the effects of fatigue.
  • I graded in the morning, in the afternoon, before lunch, and after lunch, and also some at night.
  • I spread this out over a few days to minimize the effects of any one particular day on the training.
  • When I noticed there weren't many grades at the upper end of the scale, I changed the program to generate instances of just those grades.
  • The permutation-fanatics among you might be interested in the fact that there are 5*3*2*2*2 = 120 possibilities for numerical combinations. I ended up grading just over 200. Why not just grade every single possibility? Simple - I don't pretend to think I'm really consistent when I'm doing this. That's part of the problem. I want the algorithm to figure out what, on average, I tend to do in a number of different situations.

After training for a while, I was ready to have the network make some predictions. I made a little visualizer to help me see the results:

You can also see this in action by going to the CodePen, clicking on the 'Load Trained Data' button, and playing around with it yourself. There's no limit to the values in the form, so some crazy results can occur.

The thing that makes me happiest about the result is that there's nothing surprising about the results.

  • Conceptual errors are the most important ones that limit students from making progress from one level to the next. This makes sense. Once a student has made a conceptual error, I generally don't let students increase their proficiency level
  • Students with low scores that ask for the highest difficulty problems probably shouldn't.
  • Students that have an 8 can get to a 9 by doing a middle difficulty level problem, but can't get to a 10 in one reassessment without doing the highest difficulty level problem. On the other hand, a student that is a 9 that makes a conceptual error on a middle difficulty problem are brought back to a 7.

When I shared this with students, the thing they seemed most interested to use this to do is decide what sort of problem they want for a given reassessment. Some students with a 6 have come in asking for the simplest level question so they can be guaranteed a rise to a 7 if they answer correctly. A lot of level 8 students want to become a 10 in one go, but often make a conceptual error along the way and are limited to a 9. I clearly have the freedom to classify these different types of errors as I see fit when a student comes to meet with me. When I ask students what they think about having this tool available to them, the response is usually that it's a good way to be fair. I'm pretty happy about that.

I'll continue playing with this. It was an interesting way to analyze my thinking around something that I consider to still be pretty fuzzy, even this long after getting involved with SBG in my classes.

SBG and Leveling Up - Part 2: Machine Learning

In my 100-point scale series last June, I wrote about how our system does a pretty cruddy job of classifying students based on raw point percentages. In a later post in that series, I proposed that machine learning might serve as a way to make sense of our intuition around student achievement levels and help provide insight into refining a rubric to better reflect a student's ability.

In I last post, I wrote about my desire to become more methodical about my process of deciding how a student moves from one standard level to the next. I typically know what I'm looking for when I see it. Observing students and their skill levels relative to a given set of tasks is often required to identify the level of a student students. Defining the characteristics of different levels is crucial to communicating those levels to students and parents, and for being consistent among different groups. This is precisely what we intend to do when we define a rubric or grading scale.

I need help relating my observations of different factors to a numerical scale. I want students to know clearly what they might expect to get in a given session. I want them to understand my expectations of what is necessary to go from a level 6 to a level 8. I don't believe I have the ability to design a simple grid rubric that describes all of this to them though. I could try, sure, but why not use some computational thinking to do the pattern finding for me?

In my last post, I detailed some elements that I typically consider in assigning a level to a student: previously recorded level, question difficulty, number of conceptual errors, and numbers of algebraic, and arithmetic errors. I had the goal of creating a system that lets me go through the following process:

  • I am presented with a series of scenarios with different initial scores, arithmetic errors, conceptual errors, and so on.
  • I decide what new numerical level I think is appropriate given this information. I enter that into the system.
  • The system uses these examples to make predictions for what score it thinks I will give a different set of parameters. I can choose to agree, or assign a different level.
  • With sufficient training, the computer should be able to agree with my assessment a majority of the time.

After a lot of trial and error, more learning about React, and figuring out how to use a different machine learning library than I used previously, I was able to piece together a working prototype.

You can play with my implementation yourself by visiting the CodePen that I used to write this. The first ten suggested scores are generated by increasing the input score by one, but the next ten use the neural network to generate the suggested scores.

In my next post in this series, I'll discuss the methodology I followed for training this neural network and how I've been sharing the results with my students.

Exploring Dan Meyer's Boat Dock with PearDeck

In PreCalculus, I tend to be application heavy whenever possible. This unit, which has focused on analytic trigonometry, has been pretty high on the abstraction ladder. I try to emphasize right triangle trigonometry in nearly everything we do so that students have a way in, but that's still pretty abstract. I decided it was time to do something more on the application side.

Enter Dan Meyer's Boat Dock, a makeover concept he put together a year ago on his blog.

I decided to put some of it into Pear Deck to allow for efficient collection of student responses. The start of my activity was the same as what Dan suggested in his blog post:

After collecting the data, I asked students to clarify what they meant by 'best' and 'worst'. Student comments were focused on safety, cost, and limiting the movement of the ramp.

I shared that the maximum safe angle for the ramp was 18˚, and then called upon PearDeck to use one of its best features to see what the class was thinking visually. I asked students to draw the best ramp.

After having them draw it, I had them calculate the length of the best ramp. This is where some of the best conflict arose. Not everyone responded, for a number of reasons, but the spread was pretty awesome in terms of stoking conversation. Check it out:

The source of some of the conflict was this commonly drawn triangle, which prompted lots of productive discussion.

When students built their safest ramp using the Boat Dock simulator, it prompted the modelling cycle to return to the start, which is always great to have the ability to do.

I then asked students to create a tool using a spreadsheet, program, or algorithm by hand for finding the safest ramp of least cost for every random length of the ramp in the simulator. This open-ended request led to a lot of students nodding their heads about concepts learned in their programming classes being applied in a new context. It also lead to a lot of confusion, but productive confusion.

This was a lot of fun - I need to do this more often. I say that a lot about things like this though, so I also hope I follow my own advice.