## Analyzing IB Physics Exam Language Programmatically

I just gave my IB physics students an exam consisting entirely of IB questions. I’ve styled my questions after IB questions on other exams and on homework. I’ve also looked at (and assigned) plenty of example questions from IB textbooks.

Just before the exam, students came to me with some questions on vocabulary that had never come up before. It could be that they hadn’t looked at the problems as closely as they had before this exam. What struck me was that their questions were not on physics words. They were on regular English words that, used in a physics context, can have a very different meaning than otherwise. For these students that often use online translators to help in decoding problems, I suddenly saw this to be a bigger problem than I had previously imagined. An example: a student asked what it meant for an object to be ‘stationary’. This was easily explained, but the student shook her head and smiled because she had understood its other meaning. On the exam, I saw this same student making mistakes because she did not understand the word ‘negligible’, though we had talked about it before in the context of multiple ways to say that energy was conserved. Clearly, I need to do more, but I need more information about vocabulary.

It got me wondering – what non-content related vocabulary does occur frequently on IB exams to warrant exposing students to it in some form?

I decided to use a computational solution because I didn’t have time to go through multiple exams and circle words I thought students might not get. I wanted to know what words were most common across a number of recent exams.

Here’s what I did:

• I opened both paper 1 and paper 2 from May 2014, 2013, 2012 (two time zones for each) as well as both papers from November 2013. I cut and pasted the entire text from each test into a text file – over 25,000 words.
• I wrote a Python script using the pandas library to do the heavy lifting. It was my first time using it, so no haters please. You can check out the code here. The basic idea is that the pandas DataFrame object lets you count up the number of occurrences of each element in the list.
• Part of this process was stripping out words that wouldn’t be useful data. I took out the 100 most common words in English from Wikipedia. I also removed some other exam specific words like instructions, names, and artifacts from cutting and pasting from a PDF file. Finally, I took out the command terms like ‘define’,’analyze’,’state’, and the like. This would leave the words I was looking for.
• You can see the resulting data in this spreadsheet, the top 300 words sorted by frequency. On a quick run through, I marked the third column if a word was likely to appear in development of a topic. This list can then be sorted to identify words that might be worth including in my problem sets so that students have seen them before.

There are a number of words here that are mathematics terms. Luckily, I have most of these physics students for mathematics as well, so I’ll be able to make sure those aren’t surprises. The physics related words (such as energy, which appeared 177 times) will be practiced through doing homework problems. Students tend to learn the content-specific vocabulary without too much trouble, as they learn those words in context. I also encourage students to create glossaries in their notebooks to help them remember these terms.

The bigger question is what to do with those words that aren’t as common – a much more difficult one. My preliminary ideas:

• Make sure that I use this vocabulary repeatedly in my own practice problems. Insist that students write out the equivalent word in their own language, once they understand the context that it is used in physics.
• Introduce and use vocabulary in the prerequisite courses as well, and share these words with colleagues, whether they are teaching the IB courses or not.
• Share these words with the ESOL teachers as a list of general words students need to know. These (I think) cut across at least math and science courses, but I’m pretty sure many of them apply to language and social studies as well.

I wish I had thought to do this earlier in the year, but I wouldn’t have had time to do this then, nor would I have thought it would be useful. As the semester draws to a close and I reflect, I’m finding that the free time I’ll have coming up to be really valuable moving forward.

I’m curious what you all think in the comments, folks. Help me out if you can.