Comprehensive Exams and Curves

Comprehensive Exams and Curves

As you can see from the mean (47) and standard deviation (22) of my comprehensive exam results, the numbers are a lot lower than I expected. I thought the exam would be tough, fair, but tough. I did get comments from students saying things like that but nevertheless, the numbers are lower than expected and basically show more than 23rds of the class would be failing.

I mentioned a few weeks ago at GEARS that grades in engineering programs are skewing towards the 80-90 range. I would prefer a system that grades 0-100 (0r 0-10) where above a 50% is passing. While that won’t fly academically, it’s probably a true representation of the material learned by the average student. But because educators must deal with this grade inflation and dissatisfied students in a course are more likely to write negative reviews than satisfied students writing positive reviews, this presents a significant problem. It can be stated simply as follows:

How can the results of any exam with extremely low and unacceptable grades (for example, this 47 mean grade) be transformed in to the 70-90% Gaussian that is expected (and needed)? And as a follow up question, how can this be done effectively from the educator’s perspective and the students’ perspective without removing incentives for the high performing students to work hard?

I’m going to focus on three different methods that are typically used. If any of you readers out there have more, please share.

  1. The Curve – One of the most common questions I received about this exam was “what’s going to be on the exam?”. The second most common question was “Is there going to be a curve?”. My answer to that is a laugh and an emphatic NO. Curving an exam is pointless for many reasons, but I’ll just point out a few. It doesn’t force students to learn from their mistakes. If you curve upward to fit some ideal distribution that you think the class should be at, then you must also curve downward when everyone does very well. Curving is basically the easy button and a lazy way of educating. And so on. From a workload point of view, I totally get why prof’s curve. It’s difficult to make a fair but comprehensive test that challenges the students but is doable. Plus, with all the other requirements (remember, teaching is only 30% of the job at major research universities), taking the time to do this right is draining from other endeavors. But I see why some profs do it.
  2. The Busy Work – Essentially, this is recognizing ahead of time that students will fail the exam. By giving them a ton of assignments (HW, projects, in class quizzes), the weight of the exam is diminished. Even though a student may fail the exam, it’s only 5% of the overall grade, making it insignificant. I don’t like this approach for three reasons: 1) the real world isn’t like this. You really only get a few opportunities (if that) to make something count. 2) busy work is pointless for the truly bright students. They should be putting their time to better use. 3) places a lot of grading emphasis on TAs (often). I think TAs are great for helping out with lab classes and grading HWs but that’s where it stops in my opinion. But if you have a ton of other assignments, you may be too reliant on them.
  3. The Make Up – The third possibility is offering some form of make up work to earn back points. For this exam, rather than giving the students another assignment or additional problems, I gave the students the opportunity to rework the exam. If the student corrected all of their mistakes, they could earn back 50% of the points lost. I allowed this for a few reasons. 1) It will bring up the average grade, not to the 80-90% but better than a 47%. 2) This will force students who did poorly to actually learn the material. 3) the better performing students still earn a higher grade than the poorer performing students (no incentive to do well is lost).

Obviously, the curve is the easiest from the students’ perspective and the professor’s time allocation. But which is better for educating in general? Is there a better solution out there? I feel like my method for making up the work is a fair way to improve grades while still making the students work. Are there any other schools of thought on this?


I tend to agree that option 3 is the best for the development of the students and puts minimal work on the professor and TA’s (no need to have to write more questions/solutions).

When I was an undergrad (EE) I know I took many tests where the main contributing factor for getting low grades was the lack of time. The professor threw in some tricky problem that if you started off down the wrong track at first, it was impossible to recover from before the end of the hour.

At least with the chance to look at the problem again with more time you can prove to the instructor that you’ve learned the material, but that you might just not have mastered it like the students who Aced the test on the first pass.

There’s two that I’ve used in previous classes taught. One isn’t applicable to you at this point, since you’re just getting started at , and have obviously taught this particular class a total of 0.5 times so far. 🙂

1) From previous experience with the class, scale the difficulty of the exam relative to the rest of the course so that you don’t get a 47% exam mean. In my experience, any time I’ve done a ‘hard but fair’ exam, it’s been more emphasis on the ‘hard’, less emphasis on the ‘fair’, no matter how hard I’ve tried to make it fair. My typical rule now has become ‘make it easier than you think you need to’, and it works out around the right spot. I don’t trust myself to remember what the proper difficulty should be for the course, even though I’m the one who taught it.

2) Scale the exam to the highest grade. This obviously only works if everyone did more poorly than expected — if you have a couple of wunderkind in your class, this will do absolutely nothing. I personally experienced several classes back in my undergraduate education where the professor gave apologetically brutal tests and exams … but scaled them to the highest mark in the class. On our midterm, the highest score was an 89: the test was therefore out of 89. On the final exam, the highest score was somewhere in the high 70s: that was what your personal score was out of. As I remember, I did quite badly on the midterm, but with the scaling, still got a ~ 60% or so.

It works well if your grade shape is correct, but scaled poorly (in more than just mean). In the case where your shape is right, but linearly scaled downwards, you can do a so-called ‘vertical translation’ and just bump everyone up a certain number of points.

With respect to grade distribution and engineers (I mostly have taught engineers, and was one in undergrad), I have tended *not* to aim for a Gaussian distribution of grades. Most classes that have worked out well tend toward more of a bimodal distribution, with a peak in the mid-to-high-80s, and a peak in the mid-to-low-70s (maybe high 60s). These are (respectively) the students who ‘got it’ and did all the work and earned their ~ 90%, and the students who didn’t really ‘get it’, but did all the work and did enough work to pass with a non-51% grade. Emphasizing the exact shape of the distribution of grades was way too much hassle, and almost never worked out as intended.

Indeed, the distribution was much more bimodal. There was a group in the 70-90s and then another group in the 30s. So the distribution wasn’t even close to Gaussian. I was using that just to make it a little simpler.

I disagree with grading to the highest grade. That’s just a straight curve of X points. Plus, I have been in classes where student have tried to team together to do collectively poor so everyone passes. I’d put that in the Curve category, which comes with an emphatic “no”.

I’m not sure what your problem is. A well-designed test has a mean around 50% and a standard deviation around 20%—that gives you a lot of information about where each student is, without wasting a lot of time on questions that no one can answer or everyone can answer. It sounds like your test was at exactly the right level for the class.

How you interpret the test results depends on what level of performance you were expecting.

If you really thought that the test as so easy that everyone should get 70% or better, then maybe 2/3 of the class is failing. I’ve taken a class where 80% of the class failed the first midterm and the high grade was a C. It was a wake-up call for everyone that we were not performing at the level expected, and almost everyone stepped up their studying. (I ended the class with an A.)

On the other hand, if you expected the test to be tough, then it may very well be that getting 30% right was a perfectly acceptable pass level. I have taken week-long take-home tests (a qualifying exam in grad school) where getting any one question right out of ten was enough to pass. Getting 2 right was an exceptional performance.

Why do you feel that the pass level on every test must be at 70%? Especially on tests that have never been calibrated?

I thought the average would be in the 60-70 range. But, as I said in another comment, the people who i thought knew the test did very well (70s-90s) and the people who I thought would do poor were in the 30s.

The pass level should be a 70 because most engineering programs limit the number of Cs you can get in required courses. Also, students can only fail a course once. While fundamentally, I agree with not letting underperforming students move on, in practice no filter is perfect.

Because of things like grade inflation and the structure of the department, the average grade in a class is supposed to be about an 80, with the brighter students higher and the rest lower.

I don’t see how you can have 50% be the average but say the pass rate is a 70% without a curve.

“I don’t see how you can have 50% be the average but say the pass rate is a 70% without a curve.”

I don’t know why you believe that 70% right on a test defines passing. As the writer of the test you can set the threshold where it needs to be given the difficulty of the test. If the students who know enough to meet the criteria of the class are getting 70% of better, then 70% is an appropriate pass level. But that 70% is a completely arbitrary number—you could equally well write a harder test where the appropriate pass level is 30% right. In fact, if you want grades to be meaningful, the pass level on a test should be fairly low, so that the limited number of questions can be meaningfully used to separate the Cs from the Bs from the As. Most tests are set up so that the questions primarily separate the D- students from the F students: a totally useless distinction.

If your school has some stupid rules about mapping percentage correct to grades, then you do have run your raw scores through a monotone function to rescale them to their arbitrary scale.

I’d be inclined to go a little further than ‘gasstation’. As an underachiever from the get go, I never did all that well in engineering school if you measured my grades.

After 4 years of a BSEE degree and 3 years of an MEng (separated by 5 years of work experience), I discovered that very little of what I supposedly learned was of any use to the people I worked for.

They would have preferred that I understood relay control arrangements, the correct way to size and test electric motors and why it was that commutator bars built up an unwanted patina after a few months of operation.

Maybe I was sick the day these subjects were covered in one of the many math classes I took.

The point? Your students will not “grok” all the course material in quite the way you’d like. For some, this may be a problem in their careers. For the other 99%, not so much. I know that this doesn’t solve your grading problem, but it seems that grades are …
“a tale,
Told by an idiot, full of sound and fury,
Signifying nothing.”

If you really feel, that the average of 0-100 points should be 75 percent, then you should just make half the questions so easy that a monkey could do them.
Or you could just add 50 points to everyones result.

Personally I think having a distribution like this is ideal. If you have 50 of 100 points you are in the middle, that is how tests should be.

btw.: The System where I am at is: Curves like you had, lower then 30-40 percent fails. If you fail you can repeat the test with no impact to your other studies, at the next exam period (half a year later).

Lets assume the average grade in the class is a 50, with that same standard deviation. As a prof, how do you assign grades at the end of the semester without curving? (assuming 90-100 is A, 80-89 is B, and so on). I don’t see how it’s possible.

Based on that metric, the average is a 50, 70 is “passing”, so that leaves only 30% of the class with a passing grade. The rest have to retake the course. When’s the last time you’ve seen 70% of a class fail in college? I bet it happens pretty rarely, if ever.

Your mistake is “(assuming 90-100 is A, 80-89 is B, and so on).”

That assumption is where you are making your error. Once you make that assumption, you get ridiculous results. This is a standard proof by contradiction—clearly the assumption is false.

Here is a grading scheme from a test from last semester:
P 5.0
40<P 4.0
44<P 3.7
50<P 3.3
56<P 3.0
62<P 2.7
66<P 2.3
74<P 2.0
80<P 1.7
86<P 1.3
92 1.0

In the German grading system 1.0 is the best grade and 4.0 is the worst passing grade.

My MS advisor used a 10 point curve. He said he reserved the right to move the scale down, e.g. he may change it so that 85% became an A, but he would never move the scale up.

But he also didn’t give as much weight to tests as most other professors. We had two or three tests for each class, and they were worth approximately 15 percent each.

We had boatloads of homework. We had to review journal articles. We also had to do projects of some sort.

In other words, we worked our asses off, so that if the tests didn’t come back as nice as we would have liked, we had other ways to get a decent grade.

And I will say that I learned more in his classes than all over my others in grad school simply because there was so much work, we became immersed in the topic.

Therefore, I don’t think it’s unreasonable to use that kind of scale, but I also don’t know that you can expect individual assignments or exams to follow the distribution…you have to look at the sum of contributing factors and not think a single test will be a reflection of class performance.

And yes, I have heard of 70% of a class being failed (and by that, I mean Ds and Fs since you have to retake anything with a D). The worst class I ever had left me with a C, and I was one of 20 people who managed to pass…there were about 60-70 people in the class altogether, and that was after a slew of people dropped.

Best class I ever had from a grading prospective was an advanced analog class where the tests were 300 points, the mean was always in the 120-180 range, and nobody really got below 20 or above 280 (unless you got really [un]lucky or were possessed by Bob Widlar’s ghost (depending on BAC)). The “curve” then didn’t care about making a 70% mean, but just put the B-C cutoff around the mean (50%) and shifted grades from there.

Comments are closed.