Although university professors aren’t trained as teachers, many parts of teaching — lecturing, answering questions, creating tests — come naturally to most of us. Assigning grades, on the other hand, is a part of the game that I’ve never felt very comfortable about. Part of the problem is that as a student I found grades to be kind of irrelevant, so it’s disconcerting to realize how important grades are for many students.
It’s easy to just use a standard 90/80/70/60 scale but this often feels unsatisfactory. One problem is that the desirable grades are compressed in the top 17% of the range of scores: the statistical significance of the median A- vs. the median B+ seems seriously questionable, for example. Grades aren’t required to be statistically significant, but certainly we should avoid grading practices that are especially sensitive to noise.
A different grading approach is to decide beforehand that a fixed fraction of the class will get As, another fraction will get Bs, etc. While this doesn’t reduce the fundamental noisiness in students’ scores, it enables me to use the entire range of scores and it avoids the (I would argue artificial) coupling between students’ scores and specific grades. However, students really hate this approach: they feel much more comfortable being judged against an objective standard (nevermind that this doesn’t really exist). Also, if students feel they are competing with each other, they’re less likely to help each other out; that is undesirable since ideally, they learn as much or more from each other as they do from me. Finally, I’ve seen instances where a small class is abnormally strong or weak: in these cases it makes less sense to apply a bell curve.
My sense is that most professors mix these approaches, for example by adjusting the A/B/C/D scale to something other than 90/80/70/60. I’ve talked to several people who make a scatter plot of scores and then put the grade boundaries between clusters of students. This approach has a certain logic to it, but one of my colleagues argues that it’s statistical nonsense. When pressed, he failed to describe a convincing model of student behavior under which one could argue about the relative amounts of signal and noise in grades, but my feeling is that his intuition is sound. An even worse curving practice is “curve to highest” where everyone is graded against the highest achieved score instead of the maximum possible score. This is bad because the highest score is by definition an outlier — we should try to minimize the effect of outliers, not amplify it.
Over the years I’ve evolved a pretty simple approach to assigning grades in “real classes” (as opposed to graduate seminars and such): I look at the grades that would be assigned using a 90/80/70/60 scale and decide — based on what I know from watching students work, talking to them, looking at their code, etc. — if the grades are appropriate or if they are too low. In the latter case, I curve students scores by giving then back a fixed fraction of points they missed. For example, I may give back 1 point for every 3 points missed, turning a 70% in the course into an 80%, a 0% into a 33%, etc. This is simple and has a pleasing rationale: my assumption is that some missed points are because the student failed to study or whatever, but other missed points are because I worded a question poorly, bungled a lecture, or similar. This approach also doesn’t compress scores near the top very much. In principle, if the 90/80/70/60 scale leads to scores that are too high, I should apply a reverse curve. In practice, this would piss off the students too much, and anyway I’m not sure that I’ve seen it happen.
I’d be interested to hear others’ rationales for assigning grades in a particular way. Probably this discussion is very US-specific — due to working on graduate admissions I’m well aware that grading conventions differ considerably around the world.
7 responses to “Grading on a Curve”
I really struggle with this one! I’m afraid I’ve contributed to the problem of grade inflation since I really find it hard to give bad grades to students who are nice and work pretty hard.
You definitely have to have a fixed scale, students hate it if they can’t figure out exactly “what do I need to get an A.” But I have found that if I make the scale too difficult I end up giving easier tests towards the end because I don’t want to be forced to give everyone a ‘D’. But if I make the scale too easy then the better students will slack off and/or I’ll be forced to give everyone an ‘A’ (It is a lot easier to curve ‘up’ than to curve ‘down’!)
I have found the best way is to work backwards. I first think of how hard I want to make the questions and then set a scale so that a good student could get a B+/A- on this scale. If the class is unusually smart/diligent they can all get As but the outliers don’t overly influence the process.
I realized at one point that marking papers was not really grading, but ranking. After an initial calibration on obviously good, passable, and mediocre samples, I’d figure out who’s an A/B/C/D/F, give them a subjective numerical grade in that range, then add an offset to everyone’s grades so that the numerical average is at 70%. This numerical grade then becomes their final grade. Surprisingly, each time the initial average was either very close (one time, within 9 points, usually within 2), and the resulting final grades do reflect the actual performance and quality I observed in the students.
How do you “give back” points ? you’re grading for the entire semester, so you have a number of assignments/exams/projects to deal with, right ?
I do the clustering thing and spend a lot of time staring at the students at the borders so that I feel comfortable with the breaks. I also try to reserve some small percent of their grade (5%) as a subjective measure which gives me the ability to push a grade up. This tends to get used for a student that either had a bad exam day or a bad homework.
Suresh, I just compute a “real score” for the class and then (if needed) also compute a higher “curved score” that is used for purposes of assigning grades. So it was a poor choice of words to say that I give back points.
Sounds good Greg. I’ve thought a few times about having a discretionary part of the grades, but never end up doing it. Part of the reason is selfish — I have no wish to be accused of favoritism or to have students spent even more energy trying to get me to change their grades. But also I’m not sure that my rationale for adjusting grades would be sound, as opposed to being biased towards students I know better or who are more vocal.
It’s interesting to me how the 90/80/70 scale relates (or doesn’t relate) to the scale we use on each individual assignment.
Personally, I don’t end up curving too much in the final grade (except for finding “clumps”- I have a hard time giving someone with a 90.1 an A- and someone else with an 89.8 a B+), because my grading on tests and homeworks are “innately curved”. I try to make (say) an 85 on a homework or test mean “it looks like you got the main points, but are missing enough things that you either need to be more careful, or work harder to understand the hard concepts”. Which to me, is what a B in the course should say (but I know that’s a whole other discussion)
Contrast that with some of my colleagues, who take off _lots_ of points on assignments (so that the highest grade is often a 70%), and then have to fit it together at the end. On the one hand, you can argue that they’re more honest about what the student is doing wrong (because every little thing is found and penalized). On the other, it’s hard for a student used to the US high school model of grades to accept that a 70% is ok (really good, even). Lots of them get frustrated.
It’s taken me several years to get my system working, but now I have a very good picture of what a 10/10 “looks like” on a standard test question (or a 9/10, or a 6/10).
It also lets me detect when I messed a question up (“Hey-why does this homework have so many low grades? Oh, because what I said to do and what I _meant_ for them to do were different”), and give back points if needed. I’d prefer to do it on a question-by-question basis rather than by giving an across-the-board rebate like I think John is describing, but it is more work.
I’m curious, what is the rationale for scaling marks at all?
I’m from Australia, and at my university, for most of the courses I have done, the marks were not scaled.
I can think of two exceptions, one where everyone’s final exam result was multiplied by 45/35, and another where the lecturer said that he would identify the high distinction and pass boundaries (usually 85 and 50) by looking at students’ assessment results near the boundary and then determining how much to add to everyone’s score to ensure that everyone he thought deserved to pass did so and everyone he thought deserved a HD received one.
In the first case we were told this was because otherwise 9 out of 15 students would have failed the final exam without the scaling and supposedly this was an indication of a final exam that was too hard (also the course outline had said that anyone who failed the final exam would fail the course, lol). Personally, given my unscaled mark was around 92-93, I didn’t see it as particularly difficult and was kind of annoyed that my exam mark got capped at 100 since it meant that anyone who got 78+ would’ve gotten the same scaled mark as me; I found out there was no-one else in that range, but I was still disappointed that the cap stopped my overall course mark being 100 :p
On a somewhat related note, I didn’t really care about my marks a whole lot (I had several courses I didn’t care about which I just scraped through with marks in the 50’s or 60’s), but I found out in my third year that having good marks (plus some random whatever the hell they were trying to promote in students, leadership blah blah) meant that people were willing to give you money for doing nothing (aka scholarships).