# Hacking the 100-point Scale - Part 4: Playing with Neural Networks

First, a review of where we've been in the series:

• The 100 point scale suffers from issues related to its historical use and difficulties of communicating what it means.
• It might be beneficial to have a solid link between the 100 point scale (since it likely isn't going anywhere) and the idea of achievement levels. This does not need to be rigidly defined as 90 - 100 = A, 80-89 = B, and so on.
• I asked for you to help me collect some data. I gave you a made up rubric with three categories, and three descriptors for each, and asked you to categorize these as achievement levels 1 - 4. Thank you to everyone who participated!

This brings us to today's post, where I try to bring these ideas together.

In case you only have time for a quick overview, here's the tl;dr:

I fed the rubric scores you all sent me after the previous post to train a neural network. I then used that neural network to grade all possible rubric scores and generate achievement levels of 1, 2, 3, or 4.

Scroll down to the image to see the results.

Now to the meat of the matter.

Rubric design is not easy. It takes quite a bit of careful thought to decide on descriptors, point values and much of the time we don't have a team of experts on the payroll to do this for us.

On the other hand, we're asked to make judgements on students all the time. These judgements are difficult and subjective at times. Mathematical tools like averages help reduce the workload, but they do this at the expense of reducing the information available.

The data you all gave me was the result of educational judgment, and that judgement comes from what you prioritize. In the final step of my Desmos activity, I asked what you typically use to relate a rubric score to a numerical grade. Here are some of the responses.

From @aknauft:

I need to see a consistent pattern of top rubric scores before I assign the top numerical grade. Similarly, if the student does *not* have a consistent set of low rubric scores, I will *not* give them the low numerical grade.
Here specifically, I was looking for:
3 scores of 1 --> skill level 1
2 scores of 2 or 1 score of 3 --> skill level 2 or more
2 scores of 3 --> skill level 3 or more
3 scores of 3 --> skill level 4

From Will:

Sum 'points'
3 or 4 points= 1
5 or 6 points = 2
7 points= 3
8 or 9 points = 4

From Clara:

1 is 60-70
2 is 70-80
3 is 80-90
4 is 90-100
However, 4 is not achievable based on your image.
Also to finely split each point into 10 gradients feels too subjective.
Equivalency to 100 (proportion) would leave everyone except those scoring 3 on the 4 or scale, failing.

Participant Paul also shared some helpful percentages that directly relate the 1 - 4 scale to percentages, perhaps off of his school's grading policy. I'd love to know more. Dennis (on the previous post) commented that multi-component analysis should be done to set the relative weights of the different categories. I agree with his point that this is important and that it can easily be done in a spreadsheet. The difficulty is setting the weights.

The experience of assigning grades using percentages is a time saver, and is easy because of its historical use. Generating the scales as the contributors above did is helpful for relating how a student did on a task to their level. My suggestion is that the percentages we use for achievement levels should be an output of the rubric design process, not an input. In other words, we've got it all backwards.

I used the data you all gave me and fed it into a neural network. This is a way of teaching a computer to make decisions based on a set of example data. I wanted the network to understand how you all thought a particular set of rubric scores would relate to achievement level, and then see how the network would then score a different set of rubric scores.

Based solely on the six example grades I asked you to give, here are the achievement levels the neural network spit out:

I was impressed with how the network scored with the twenty one (out of 27 possible permutations) that you didn't score. It might not be perfect, and you might not agree with every one. The amazing part of this process, however, is that any results you disagree with could be tagged with the score you prefer, and then the network could retrain on that additional training data. You (or a department of teachers) could go through this process and train your own rubric fairly quickly.

I was also curious about the sums of the scores that led to a given achievement level. This is after all what we usually do with these rubrics and record in the grade book. I graphed the rounded results in Desmos. Achievement level is on the vertical axis, and sum is on the horizontal.

One thing that struck me is the fuzziness around certain sum values. A score of 6, for example, leads to a 1, 2, or a 3. I thought there might be some clear sum values that might serve as good thresholds for the different levels, but this isn't the case. This means that simply taking the percentage of points earned and scaling into the ten point ranges for A, B, C, and D removes some important information about what a student actually did on the rubric.

A better way to translate these rubric scores might be to simply give numerical grades that indicate the levels, and communicate the levels that way as part of the score in the grade book. "A score of 75 indicates the student was a level 2."

Where do we go from here? I'm not sure. I'm not advocating that a computer do our grading for us. Along the lines of many of my posts here, I think the computer can help alleviate some of the busy work and increase our efficiency. We're the ones saying what's important. I did another data set where I went through the same process, but acted like the third category was less important than the other two. Here's the result of using that modified training data:

It's interesting how this changed the results, but I haven't dug into them very deeply.

I just know that something needs to change. I had students come to me after final exam grades were put in last week (which, by the way, were raw percentage grades) and being confused by what their grades meant. The floor for failing grades is a 50, and some students interpreted this to mean that they started with a 50, and then any additional points they earned were added on to that grade. I use the 50 as a floor, meaning that a 30% raw score is listed as a 50% in the final exam grade. We need to improve our communication, and there's a lot of work to do if the scale isn't going away.

I'm interested in the idea of a page that would let you train any rubric of any size through a series of clicks. What thoughts do you have at the end of this exploration?

#### Technical Details:

I used the Javascript implementation of a neural network here to do the training. The visualizations were all made using the Raphael JS library.

# Rubrics and Numerical Grades - Hacking the 100-Point Scale, Part 3

As part of thinking through my 100-point scale redesign, I'd like you to share some of your thoughts on a rubric scenario.

Rubrics are great for how they clearly classify different components of assessment for a given task. They also use language that, ideally, gives students the feedback to know what they did well, and where they fell short on that assessment. Here's an example rubric with three performance levels and three categories for a generic assignment:

I realize some of you might be craving some details of the task and associated descriptors for each level. I'm looking for something here that I think might be independent of the task details.

The student shown above has scores of 1, 2, and 3 respectively for the three categories on this assignment, and all three categories are equally important. Suppose also that in my assessment system, I need to identify a student as being a 1, 2, 3, or 4 in the associated skills based on this assessment.

More generally, I want to be able to take a set of three scores on the rubric and generate a performance level of the student that earned them. I'd like to get your sense of classifying students into the four levels this way.

Here are the rubrics I'd like your help with:

I've created a Desmos Activity using Activity Builder to collect your thoughts. I chose Activity Builder because (a) Desmos is awesome, and (b) the internet is keeping me from Google Docs.

You can access that activity here.

I'll be using the results as an input for a prototype idea I have to make this process a bit easier for all involved. Thanks in advance!

# Hacking the 100-Point Scale - Part 2

My previous post focused on the main weakness of the 100-point scale which is the imprecision with which it is defined. Is it percentage of material mastered? Homework percentage completion? Total points earned? It might be all of these things, or none of them, depending on the details of one person's grade book.

Individual departments or schools might try to define uniformity in grading policies, give common final assessments, or spread grading of final exams amongst all teachers to ensure fairness. This might make it easier to compare two students across a course, but still does not clearly define what the grade means. What, however, does it signify that a student in an AP course has an 80 while a student in a regular section of the same course has a 90?

Part of the answer here is based in curriculum. Understanding what students are learning and in what order defines what is being learned, and would add some needed information to compare the AP and regular students just mentioned. The other part is assessment: a well crafted assessment policy based in learning objectives and communicated to a student helps with understanding his or her progress during the school year. I hope it goes without saying that these two components must be present for a teacher to be able to craft and communicate a measure of the student's learning that students, teachers, parents, and administrators can understand.

At this point, I think the elementary teachers have the right idea. I've been in two different school systems now that use a 1 - 4 scale for different skills, with clear descriptors that signify the meaning of each level. Together with detailed written comments, these can paint a picture of what knowledge, skills, and understanding a student has developed during a block of the school year. These levels might describe the understanding of grade level benchmarks using labels such as limited, basic, good, and thorough understanding. These might classify a student using the state of their progress with terms like novice/beginner/intermediate/advanced. The point is that these descriptors are attached to a student and ideally are assigned after reviewing the learning that the student has done over a period of time. I grant that the language can be vague, but this also demands that a teacher must put time into understanding the criteria at his or her school in order to assign grades to a particular student.

When it comes to the 100 point scale, it's all too easy to avoid this deliberate process. I can report assignments as a series of total point values, and then report a student's grade as a percentage of the total using grade book software. Why is a student failing? He didn't earn enough points. How can he do better? Earn more points. How can he do that? Bonus assignments, improving test scores, or by developing better work habits. The ease of generating grades cheapens the deliberate process that may (or may not) have been involved in generating them. Some of the imprecision of the meaning of this grade comes, ironically, from an assumption that the precision of a numerical grade makes it a better indicator. It actually requires more on the part of the teacher to define components of the grade clearly using numerical indicators, and defining these in a way that avoids unintended consequences requires a lot of work to get right.

Numerical grades inform a student's progress, but don't tell the whole story. The A-B-C-D-F grading system hasn't been in use in any of the schools where I've taught, but it escapes some of the baggage of the numerical grade in that it requires that the school report somehow what each letter grade represents. An A might be mapped from a 90-100% average in the class, or 85-100 depending on the school. As with a verbal description, there needs to be some deliberate conversation and communication about the meaning of those grades, and this process opens the door for descriptors for what grades might represent. Numerical grades on the 100 point scale lack this specificity because grades on this scale can be generated with nothing more than a calculation. That isn't to say that a teacher can't put in the time to make that calculation meaningful, but it does mean it's easy to give the impression of precision that isn't there.

Compounding the challenge of its imprecision is the reality that we use this scale for many purposes. Honor roll or merit roll are often based in having a minimum average over courses taken in a given semester. Students on probation, often measured by having a grade below a cut-off score, might not be able to participate in sports or activities. Students with a given GPA have automatic admission to some universities.

I'm not proposing breaking away from grading, and I don't think the 100 point scale is going away. I want to hack the 100 point scale to do a better job of what it is supposed to do. While technology makes it easier to generate a grade than it used to be, I believe it also provides opportunity to do some things that weren't feasible for a teacher to do in the past. We can improve the process of generating the grade to be a measure of learning, and in communicating that measure to all stakeholders.

Some ideas on this have been brewing as I've started grading finals and packing for the end of the year. Summer is a great time to reflect on what we do, isn't it?

# Hacking The 100-Point Scale - Part 1

One highlight of teaching at an international school is the intersection of many different philosophies in one place. As you might expect, the most striking of these is that of students comparing their experiences. It's impressive how the experienced students that have moved around quickly learn the system of the school they are currently attending and adjust accordingly. What unites these particularly successful students is their awareness that they must understand the system they are in if they are to thrive there.

This is the case with teachers, as we share with each other just as much. We discuss different school systems and school structures, traditions, and assessment methods. Identifying the similarities and differences in general is an engaging exercise. In general, these conversations lead to a better understanding of why we do what we do in the classroom. Also, in general, these conversations end with specific ideas for what we might do differently on the next meeting with students.

There is one important exception. No single conversation topic has caused more argument, debate, and unresolved conflict at the end of a staff meeting than the use of the 100-point scale.

The reason it's so prevalent is  that it's easy to use. Multiply the total points earned by 100, and then divide by the total possible points. What could go wrong with this system that has been used for so long by so many?

There a number of conversation threads that have been particularly troublesome in our international context, and I'd like to share one here.

### "A 75 isn't a bad score."

For a course that is difficult, this might be true. Depending on the Advanced Placement course, you can earn the top score of 5 on the exam by earning anywhere between around 65% and 100% of the possible points. The International Baccalaureate exams work the same way. I took a modern physics exam during university on which I earned a 75 right on the nose. The professor said that considering the content, that was excellent, and that I would probably end up with an A in the course.

The difference between these courses and typical school report cards is that the International Baccalaureate Organization (IBO), College Board, and college professor all did some sort of scaling to map their raw percentages to what shows up on the report card. They have specific criteria for setting up the scaling that goes from a raw score to the 1 - 5 or 1 - 7 scores for AP or IB grades respectively.

What are these criteria? The IBO, to its credit, has a document that describes what each score indicates about a student with remarkable specificity. Here is their description of a student that receives score of 3 in mathematics:

Demonstrates some knowledge and understanding of the subject; a basic sense of structure that is not sustained throughout the answers; a basic use of terminology appropriate to the subject; some ability to establish links between facts or ideas; some ability to comprehend data or to solve problems.

Compare this to their description of a score of 7:

Demonstrates conceptual awareness, insight, and knowledge and understanding which are evident in the skills of critical thinking; a high level of ability to provide answers which are fully developed, structured in a logical and coherent manner and illustrated with appropriate examples; a precise use of terminology which is specific to the subject; familiarity with the literature of the subject; the ability to analyse and evaluate evidence and to synthesize knowledge and concepts; awareness of alternative points of view and subjective and ideological biases, and the ability to come to reasonable, albeit tentative, conclusions; consistent evidence of critical reflective thinking; a high level of proficiency in analysing and evaluating data or problem solving.

I believe the IBO uses statistical and norm referenced methods to determine the cut scores between certain score bands. I'm also reasonably sure the College Board has a similar process. The point, however, is that these bands are determined so that a given score matches

The college professor used his professional judgement (or a bell curve, I don't actually know) to make his scaling. This connects the raw score to the 'A' on my report card that indicated I knew what I was doing in physics.

The reason this causes trouble in discussions of grades in our school, and I imagine in other schools as well, is the much more ill-defined definition of what percentage grades mean on the report card. Put quite simply, does a 90% on the report card mean the student has mastered 90% of the material? Completed 90% of the assignments? Behaved appropriately 90% of the time? If there are different weights assigned to categories of assignments in the grade book, what does an average of 90% mean?

This is obviously an important discussion for a school to have. Understanding the meaning of the individual percentage grades and what they indicate about student learning should be clear to administrators, teachers, parents, and most importantly, the students themselves. These is a tough conversation.

Who decided that 60% is the percentage of the knowledge I need to get credit? On a quiz on tool safety in the maker space, is 60% an appropriate cut score for someone to know enough? I say no. On the report card, I'd indicate that a student has a 50 as their grade until they demonstrate he or she can get 100% of the safety questions correct. Here, I've redefined the grade in the grade book as being different from the percentage of points earned, however. In other words, I've done the work of relating a performance measure to a grade indicator. These should not be assumed to be the same thing, but being explicit about this requires a conversation defining this to be the case, and communication of this definition to students and teachers sharing sections of the same course.

Most of this time, I don't think there is time for this conversation to happen, which is the first reason I believe this issue exists. The second is the fact that a percentage calculation is mathematically simple and understood as a concept by students, teachers, and parents alike. Grades have been done this way for so long that a grade on the 100-point scale is generally assumed to be this percentage mastered or completed concept.

This is too important to be left to assumption. I'll share more about the dangers of this assumption in a future post.

# Building Functions - Thinking Ahead to Calculus

My ninth graders are working on building functions and modeling in the final unit of the year. There is plenty of good material out there for doing these tasks as a way to master the Common Core standards that describe these skills.

I had a sudden realization that a great source for these types of tasks might be my Calculus materials. Related rates, optimization, and applications of integrals in a Calculus course generally require students to write models of functions and then apply their differentiation or integration knowledge to arrive at a result. The first step in these questions usually involves writing a function, with subsequent question parts requiring Calculus methods to be applied to that function.

I dug into my resources for these topics and found that these questions might be excellent modeling tasks for the ninth grade students if I simply pull out the steps that require Calculus. Today's lesson using these adapted questions was really smooth, and felt good from a vertical planning standpoint.

I could be late to this party. My apologies if you realized this well before I did.

# Problems vs. Exercises

My high school mathematics teacher, Mr. Davis, classified all learning tasks in our classroom into two categories: problems and exercises. The distinction between the two is pretty simple. Problems set up a non-routine mathematical conflict. Once that conflict is resolved once, problems cease to be problems - they become exercises. Exercises tend to develop content skills or application of knowledge - problems serve to develop one's habits of mathematical practice and understanding.

I tend to give a mixture of the two types to my students. The immediate question in an assessment context is whether my students have a particular skill or can apply concepts. Sometimes this can be established by doing several problems of the same or similar type. This is usually the situation when students sign up for a reassessment on a learning standard. In cases where I believe my students have overfit their understanding to a particular question type, I might throw them a problem - a new task that requires higher levels of understanding. I might also give them a task that I know is similar to a question they had wrong last time, with a twist. What I have found over time is that there needs to be a difference between what I give them on a subsequent assessment, or I won't get a good reading on their mastery level.

The difficulty I've established over the past few years learning to use SBG has been curating my own set of problems and exercises for assessment. I have textbooks, both electronic and hard copy, and I've noted down the locations of good problems in analog and digital forms. I've always felt the need to guard these and not share them with students so that they don't become exercises. My sense is that good problems are hard to find. Good exercises, on the other hand, are all over the place. This also means that if I've given Student A a particular problem, that I have to find an entirely different one for Student B in case the two pool their resources. In other words, Student A's problem then becomes Student B's exercise. I haven't found that students end up thinking that way, but I still feel weird about using the same problem multiple times.

What I've always wanted was a source of problems that somehow straddled the two categories. I want to be able to give Student A a specific problem that I carefully designed for assessing a particular standard, and student B a different manifestation of that same problem. This might mean different numbers, or a slight variation that still assesses the same thing. I don't want to have to reinvent the problem every single time - there must be a way to avoid repeating that effort. By carefully designing a problem once, and letting, say, a computer make randomized changes to different instances of that problem, I've created a task I can use with different students. Even if I'm in the market for exercises, it would be nice to be able to create those quickly and efficiently too. Being able to share that initial effort with other teachers who also share a need would be a bonus.

I think I've made an initial stab at creating something to fit that need.

# Taking Time Learning Math: A Student's Perspective

Yesterday was our school's student led conference day. I've written previously on how proud these days make me as an educator. Whens students do genuine reflection on their learning and share the ups and downs of their school days, it's hard not to see the value of this as an exercise.

During one conference, a student shared a fascinating perspective on her learning in math. This is not the usual level of specificity that we get from our students, so I am eager to share her thinking. Here's the student's comment during the conference:

“It isn’t that I don’t like math. Learning takes time in math, and I don’t always get the time it takes to really understand it.”

I asked her for further clarification, and this was her response:

...Math is such an interesting subject that can be “explored” in so many different ways, however, in school here I don’t really get to learn it to a point where I say yeah this is what I know, I fully understand it. We move on from topic to topic so quickly that the process of me creating links is interrupted and I practice only for the test in order to get high grades.

It's certainly striking to get this sort of feedback from a student who is doing all the things we ask her to do. The activities this student is doing in class are not day-after-day repetitions of "I do, we do, you do" - we do a range of class activities that involve exploring, questioning, and interacting with other students.

This student's comment is about limitations of time. She isn't saying that we aren't doing enough of X, Y, or Z - quite the contrary, she just is asking for time to let it sink in. She doesn't answer the question of what that time looks like, but that's not her job, it's ours.

I know I always feel compelled to nudge a class forward in some way. This doesn't mean I moving through material more quickly, but I do push for increased depth, intuition, or quality conversation about the content in every class period. Her comment makes me realize that something still stands to be improved. Great food for thought for the weekend.

# My Journey with Meteor as a Teacher-Coder

Many of you may know about my love for Meteor, the Javascript framework that I've used for a number of projects in and around the classroom. I received an email this morning alerting me (and the many other users) that the free hosting service they have generously offered since inception would be shutting down over the next month.

To be honest, I'm cool with this decision. I think it's important to explain why and express my appreciation for having access to the tool for as long as I have.

I started writing programs to use in my classroom in earnest in 2012. Most of these tended to be pretty hacky - a simple group generator and a program to randomly generate practice questions on geometric transformations were among these early ones. The real power I saw for these was the ability to collect, store, and filter information that would be useful for teaching so that I could focus my time on using that information to decide on the next steps for my students. I took a Udacity course on programming self-driving cars and on web applications and loved what I learned. I learned to use some Python to take some of the programs I had written early on and run them within web pages. I built some nifty online activities inspired by the style of Dan Meyer and put them out for others across the world to try out. (Links for these Half-Full and Shapes tasks are below.) It was astounding how powerful I felt being able to take something I created and get it out into the internet wilderness for others to see.

It was also astounding how much time it took. I learned Javascript to manage the interactivity in the web page, and then once that was working, I switched to Python on the server to manage the data coming from users. For those that have never done this sort of switching, it involves a lot of misplaced semicolons, tabs, and error messages. I accepted that this was the way the web worked - Javascript in front, and Python (or PHP, Rails, Perl, etc.) on the back end. That extra work was what kept someone like me from starting a project on a whim and putting it together. That cost, in the midst of continuing to do my actual job of teaching and assessing students five days a week, was too great.

This was right around the summer of 2013 when a programmer named Dave Major introduced me to Meteor. I did not know the lingo of reactivity or isomorphic Javascript - I just saw the demonstration video on YouTube and thought it was cool. It made the connection between the web page and the server seamless, eliminating the headaches I mentioned earlier. Dave planned to put together some videos and tutorials to help teachers code tools for the classroom using Meteor, and I was obviously on board. Unfortunately, things got in the way, and the video series didn't end up happening. Still, with Dave's help, I learned a bit about Meteor and was able to see how easy it was to go from an idea to a working application. I was also incredibly impressed that Meteor made it easy to get an application online with one line: `meteor deploy (application-name here) `. No FTP, no hostname settings - one line of code in the terminal, and I could share with anybody.

With that server configuration friction eliminated, I had the time to focus on learning to build truly useful tools for myself. I created my standard based grading system called WeinbergCloud that lets students sign up for reassessments, earn credit for the homework and practice they did outside of class, and see the different learning objectives for my course. I created a system for my colleagues to use to award house points for the great things that students did during the school day. I made a registration and timing system for our school's annual charity 5K run that reduced the paperwork and time required of our all volunteer staff to manage the hundreds of registrants. I spoke at a Meteor DevShop about this a year and a half ago and have continued to learn more since then.

Most importantly to me, it gave me knowledge to share with a class of web programming students, who have learned to create their own apps. One student from last year's class learned about our library media specialist's plan to hold a read-a-thon, and asked if he could create an interactive website to show the progress of each class using, you guessed it, Meteor. Here's a screenshot of the site he created in his spare time:

And yes, all of these apps have been hosted on the free deploy server at *.meteor.com, and yes, I will have to do the work of moving these sites to a new place. The public stance from Meteor has been that the free site should not really be used for production apps, something I've clearly been doing for over two years now. I re-read that line on the documentation website back in January and asked myself what I would do if I no longer had access to that site. The result: I did what I am paid to do as a master learner, and learned to host a site on my personal server. That learning was not easy. The process definitely had me scratching my head. But it also meant that I had a better understanding of the value that the free site had given me over my time using it.

The reality is that Meteor has clearly and publicly shifted away from being just being that framework that has a free one line deployment. The framework has so much going for it, and the ability to create interesting apps is not going away. The shift toward doing what one does best requires hard choices, and the free site clearly was something that did not serve that purpose. It means that those of us that value the free deploy as a teaching tool can seek other options for making it as easy to get others in the game as it was for us.

Meteor has helped me be better at my job, and I appreciate their work.

As promised, here are those learning task sites I mentioned before:

# Choosing the Next Question

If a student can solve $3x - 1 = 5$ for x, how convinced are we of that student's ability to solve two step equations?

If that same student can also solve $14 = 3x + 2$ , how does our assessment of their ability change, if at all?

What about $-2-3x= 5$ ?

Ideally, our class activities push students toward ever increasing levels of generalization and robustness. If a student's method for solving a problem is so algorithmic that it fails when a slight change is made to the original problem, that method is clearly not robust enough. We need sufficiently different problems for assessing students so that we know their method works in all cases we might through their way.

In solving $3x-1 = 5$ , for example, we might suggest to a student to first add the constant to both sides, and then divide both sides by the coefficient. If the student is not sure what 'constant' or 'coefficient' mean, he or she might conclude that the constant is the number to the right of the x, and the coefficient is the number to the left. This student might do fine with $10 =2x-4$ , but would run into trouble solving $-2-3x = 5$ . Each additional question gives more information.

The three equations look different. The operation that is done as a first step to solving all three is the same, though the position of the constant is different in all three. Students that are able to solve all three are obviously proficient. What does it mean that a student can solve the first and last equations, but not the middle one? Or just the first two? If a student answers a given question correctly, what does that reveal about the student's skills related to that question?

It's the norm to consider these issues in choosing questions for an assessment. The more interesting question to me theses days is that if we've seen what a student does on one question, what should the next question be? Adaptive learning software tries to do this based on having a large data set that maps student abilities to right/wrong answers. I'm not sure that it succeeds yet. I still think the human mind has the advantage in this task.

Often this next step involves scanning a textbook or thinking up a new question on the spot. We often know the next question we want when we see it. The key then is having those questions readily available or easy to generate so we can get them in front of students.

# Standards Based Grading & Streamlining Assessments

I give quizzes at the beginning of most of my classes. These quizzes are usually on a single standard for the course, and are predictably on whatever we worked on two classes before. I also give unit exams as ways to assess student mastery of the standards all together. Giving grades after exams usually consists of me looking at a single student's exam, going standard by standard through the entire paper, and then adjusting their standards grades accordingly. There's nothing groundbreaking happening here.

The two downsides to this process are that it is (a) tedious and (b) is subject to my discretion at a given time. I'm not confident that I'm consistent between students. While I do go back and check myself when I'm not sure, I decided to try a better way. If you're a frequent reader of my blog, you know that either a spreadsheet or programming is involved. This time, it's the former.

One sheet contains what I'm calling a standards map, and you can see this above. This relates a given question to the different standards on an exam. You can see above that question 1 is on only standard 1, while question 4 spans both standards 2 and 3.

The other sheet contains test results, and looks a lot like what I used to do when I was grading on percentages, with one key difference. You can see this below:

Rather than writing in the number of points for each question, I simply rate a student's performance on that question as a 1, 2, or 3. The columns S1 through S5 then tally up those performance levels according to the standards that are associated with each question, and then scale those values to be a value from zero to one.

This information was really useful when going through the last exam with my ninth graders. The spreadsheet does the association between questions and standards through the standards map, so I can focus my time going through each exam and deciding how well a student completed a given question rather than remembering which standard I'm considering. I also found it much easier to make decisions on what to do with a student's standard level. Student 2 is an 8 on standard 1 before the exam, so it was easy to justify raising her to a 10 after the exam. Student 12 was a 7 on standard 4, and I left him right where he was.

I realize that there's a subtlety here that needs to be mentioned - some questions that are based on two or three standards might not communicate effectively a student's level with a single 1, 2, or 3. If a question is on solving systems graphically, a student might graph the lines correctly, but completely forget to identify the intersection. This situation is easy to address though - questions like this can be broken down into multiple entries on the standards map. I could give a student a 3 on the entry for this question on the standard for graphing lines, and a 1 for the entry related to solving systems. Not a big deal.

I spend a lot of time thinking about what information I need in order to justify raising a student's mastery level. Having the sort of information that is generated in this spreadsheet makes it much clearer what my next steps might be.

You can check out the live spreadsheet here:

Standards Assessment - Unit 5 Exam