"Feels like a... 60, maybe a 63?" Let's knock the informed guessing game out of assigning grades.

CP Moore
Apr 12, 2022
11 min read

Updated: Apr 19, 2022

Marking is a royal pain. There. I said it. No one (at least - gulps at the risk of offending someone - in their right mind) has marking at the top of the "favourite things about the job" list. The end result can be rewarding if it's clear your students listened, learned, and applied their knowledge and skill in the way you intended. It can be demoralising when the outcome is the other end of that particular spectrum. But it is a necessary part of the job, just as it is a necessary part of the student journey (at least until the world of work stops using grades as the sole metric of determining employment worth and begins to also look at the soft skills that we can now more easily collect data on and create a sort of... engagement scorecard). But it's also a hugely subjective one.

It has to be of course. Assessments at university can't be designed and marked according to rubrics that anyone can work through and allocate marks based on having included the right keyword. University is supposed to be the place of learning where you have learned the skills of learning (primary school), the fundamentals of specific subjects and the beginnings of skills used in those subjects (secondary school) and focussed your knowledge into several key subjects to have a foundation in a discipline or field (further education/college/sixth form) in order to drive your future by becoming an independent learner who can apply a wide range of skills to a specific area of study - to specialise. So the assessments at university, at least once you get past the first stages of knowledge and understanding that your first year of study will provide you, need to have the work done by less "by the book" and more "interpretive". Building arguments, evaluating information, demonstrating fluidity of knowledge and of crafting a compelling critique. But if marks are going to continue to be a metric, a measure of success, then there needs to be some consistency in how they are applied. Students study multiple modules (or units, or whatever terminology your place uses) and are assessed by multiple markers (sometimes each marking their own question within a single piece of work, sometimes each marking a batch of the same thing). Assuming that all markers are created equal (they aren't), and assuming all questions are set and marked with the same fairness and difficulty and level of expectation (they aren't), the only variable in the marks from submission to submission must surely be the individual who wrote it, right? Maybe not...

Now, we all have marking criteria. And if I were to suggest for even a moment that marking often goes along the lines of "read it through, make some comments, read it again, look at the brief, look at the marking criteria, think about the work as a whole, then assign a grade based on where you feel it roughly falls within the grade boundaries and then roughly how much into that boundary and towards the next one you want to move it," there'd be an uproar and I'd be strung up (because we're all infallible, right?). But marking criteria usually fall into one of two camps. They are either:

So generic as to in no way align to the work to be easily applied or for the work to be measured against, but they're approved by your course/school/faculty so you can hold them up as having followed protocol.
So specific that the expectations by which the work is being measured surely cannot be applied to any other work, even one that approximates the skill being utilised, and therefore the feedback becomes somewhat meaningless unless they are doing this exact same piece of work again.

Feedback templates are a shocking way of trying to get around both of these camps. The idea that standardising how feedback is provided/framed/boxed will somehow not only give the student more to work with because they can "see" it in a nicely labelled space on a word document (it won't, because the nature of that box hugely limits the scope and depth of feedback you can fit into it) is just nonsense. Making a feedback template "three things done well, three things to improve on" is so pedestrian that one might as well break out the keyword list from A level and tell them which ones to include next time for a better mark. Equally though, asking academic staff to provide such huge levels of feedback and instructional tutelage that the student cannot possibly perform to a lower standard next time (and this is assuming that the marking criteria and the expectation of the work can be transferred to work elsewhere) is going to knock your staff off the twig so fast you'd think winter came early.

The two need to be separated - marks and feedback as two separate entities. The one is the measure, the other is the aid. Conflating them under the misguided belief that students want feedback more than their marks is where we often come unstuck by devoting enormous amounts of time to giving lots of high quality feedback only to find that none of it was applied to the next assignment. Every university on the planet gets hammered in the NSS in the "marks and feedback" question (okay, just every university in the country as the NSS is a national thing), with qualitative comments and anecdotes from students laying heavily into the timeliness of marks returned and quality of the feedback. But in those times when extra effort is gone into to improve the feedback, the scores don't get seem to get better. Indeed when we go to do a little detective work on the matter, we invariably find that feedback is rarely collected or read when they have the marks (I generalise of course just in case any well-intentioned undergrads who always read through and learn from their feedback are reading this - I know there are exceptions). A work friend of mine did an experiment over several years with a paper based coursework assignment. They released the marks online and informed the students that all work with its feedback could be picked up from the hub, and found that year on year only around 25% of that work was picked up. The rest sat there until almost the start of the next academic year when it had to be removed because the box was needed for another assignment. They had the marks, and that, as far as the majority of the students seemed to feel, was the important bit.

Worse is when feedback (which we provide specifically to help the student to do better what they did not do so well and to do more of what they did well) is used to justify (or argue) the final grade for the work. Comments like "why if it said 'great' in the feedback did I only get 50%?" Well that's because that specific sentence/diagram/observation/whatever was great. Doesn't mean the rest was. Doesn't necessarily mean you even addressed the brief, let alone turned out some wonderous prose and mind blowing revelations. So don't determine where your mark came from based on the feedback given - that's not what it is there for.

So I say again - the two need to be separated so that is clear that the feedback is there purely to help you, and the mark has been calculated by more standardised means; but standardised means that can be applied to other assessments yet not be so generic and vague that they mean nothing to the brief of the work. And that means taking a step back and looking at what skill the assessment is looking to have the student put into practice. Is it an essay (and even then, is it a true essay or a case study or a report or a critical evaluation of something? Essays are notoriously vague and wonderfully diverse)? Is it a lab report, is it a paper, is it a presentation (and then it is visual, or live, or a debate?), is it an analysis of something (which then risks falling back into essay territory if you aren't clear what they are analysing)? This is something that I tried several years ago in order to combat this very problem, as well as design out the frustrating levels of subjectivity I was seeing from academic to academic on a particular assignment when the whole team was marking a batch of submissions on the same thing (the brief was deliberately open enough for the students that our individual specialties within the module weren't necessary to "get" the content being presented). Here's what I did:

I took the, what was at the time, approved faculty marking criteria. This was criteria that wasn't terrible as criteria go, largely because it was at least distinct to a particular level of study, it was tailored to be different in what it said for each grade boundary, and it was based, in the main, on a simple medium to long length generic essay. It risked straying into the "too generic" category but I could work with it.
As I mentioned, this marking criteria was fairly generic. Each boundary had a paragraph (that we could add to for some specificity but could not alter or omit) that talked in vague statements about what would and wouldn't be present. Four or five lines that said things like "Reasoning and argument generally relevant but could be further developed," which in themselves might be useful, but when shoved in among "appropriately referenced" or "Ability to relate theory and concepts to discussion" just made it intangible as an aid to the reader after the fact. Plus it was more for us than the students and therefore they would have to deduce which of those fragmented phrases applied to which bit of their work such that they might know where they landed. So I started by turning those paragraphs into more distinct sentences.
As I did so, I realised three things. Firstly, they, with some serious re-wording, began to sound like categories of things one might expect (part of the original design I'm sure). Secondly (and making me question the likelihood of my previous parentheses bookended statement), they seemed not to always build on one another as the boundaries went up - there was no obvious increase in expectation to justify the reward. And thirdly, it became clear that it was unclear how the final mark would be reached as were those paragraphs saying the work needed to do everything they asked to get the top of that boundary, or was one mishap enough to deny that upper limit forever? Or was it pure guesswork and academic judgement based on little more than "kinda feels like a..."?
So I re-wrote them and re-ordered them so that each sentence I had extracted and edited now fed on from the boundary before and led towards the next boundary. In doing so the categories I had spied from a distance in step two began to coalesce into something I could work with. Things that the work, if a "traditional essay", might be expected to have no matter what the brief or the context. Things that a level three (FHEQ6) student might be expected to be able to achieve and to demonstrate. these were structure, knowledge, argument, evaluation, referencing. Referencing immediately turned to reading to give more wiggle room in assessment design (not every bloody piece of work, even in science, needs endless citations that were clearly unread but vaguely conformed to what you wanted to prove via a quick Google search).
A good buddy of mine, as I put these on the whiteboard in my office pointed out that they nearly formed a word when written in list format; an acrostic. But the K in "knowledge" was problematic. So we changed that to subject knowledge, reordered the list, and the following was born:

Argument

Reading

Subject Knowledge

Evaluation

Structure

It was good, but still just better organised and more meaningful descriptors of things to achieve. It needed the separation from the feedback. It need to be given back to the student! So I made it as simple as a calculation. I added a tickbox at the end of each category (each boundary now having a lower or higher expectation of that category). The idea being that you go over the work and provide your feedback (by whatever means is best - my audio feedback from my "Time is on my side" blog being one I prefer and wish my team could get to grips with the ease of), then tick the box for each category in the boundary you feel the work as whole maps to.
You now have five ticks, one for each category. Now unless the work is a dog's dinner, they should not be too spread out across the boundaries. The mark is then calculated by taking the boundary within which the most ticks sit (for sake of example here we'll go with two ticks in 50-59%) - that is the boundary where the maximum possible sits (60% if they had all five ticks in that boundary). From there it becomes a process of addition and subtraction. If you score a tick in the boundary above, that's 2% for having definitely hit the expectation of the starter boundary (because you've got a tick above that one) and a bonus 1% for having been one boundary above (and a bonus 2% if two boundaries above and so on). If you score a tick one below, that's a 1% deduction (and 2% deduction for two boundaries below as with bonuses).
So in this example, two ticks in 50-59, one tick in 40-49, one tick in 60-69, and one tick in 70-79. That's a starting point of 54 (bottom of 50-59 plus two ticks at 2% each), deduct 1% for the 40-49, add 3% for the 60-69, add 4% for the 70-79. Gives you a grand total of 60% (54-1+3+4).

The student can now see it because they get that marking criteria tickbox returned to them, they know exactly which criteria/categories they did better on than others (which to work on, which to relax about), and it means that they should no longer use the feedback in the actual submission (or template if you really love them) to work out where their mark came from - the feedback becomes purely feedback, not a justification of grade. It means I can say "excellent" all I want because I just mean literally that about that bit, not that the work was excellent and therefore you should get 70%+ because that's the descriptor for the 70-79% grade boundary.

That was a very long account of what in reality is a very simple tool. Not so much a digital tool as I would normally demonstrate here. But it greatly simplifies the marking process, creates greater opportunities for more feedback of a higher(er) quality, and it takes a lot of the subjectivity out of assigning grades. Not all of it, because as I mentioned right back at the start, subjectivity is all part of academic judgement and rigor, and in students learning to operate and write within a set of parameters while still having room to move; the notion of working beyond the confines of rubrics and syllabus as in their earlier stages of education.

Perhaps more importantly, it forms a foundation on which other forms of assessment can be created from. The framework is the key to its success, with the categories being interchangeable depending on the skill being assessed. Doing a live presentation? Well then maybe style or communication is more important than reading. Writing a lab report? Well then structure may not be as important as it was for a long form essay as you could be operating from a set model, but analysis or data handling might be the skill to focus instead. It's important to be consistent in the categories mapped to the main skills that form the assignment, because that allows for the subjectivity within the subject that is all important to the discipline. But we're talking maybe five or six of these things to cover those main areas? And if being implemented at level two? Just move all the descriptors one banding down to lower the expectation while still expecting them to be there - you just do better for having achieved them earlier. And that's the communication to make sure they receive there - just because you got a 70% at level two doesn't mean the same quality of work will get you a 70% next year, because the expectations are greater. But now, you can see them.

So there you have it. Simple, but hopefully addresses a very real problem. And if we're really lucky, by taking some of the guesswork out of the actual mark allocation, maybe marking itself will become less of a burden and more of an exercise in giving students a genuine opportunity to learn from their work.