39object made of multiple components. In surgical education, you are trying to mea-
sure something complex – a skill or attitude or body of knowledge made up of many
parts. Measuring that construct means you must define its component parts.
In the clearest terms possible, you must be able to answer the question, “What
are you trying to measure?” The following are a few examples of how your assess-
ment will be shaped by how you answer that question. An assessment of a construct
is, in essence, an assessment of its component parts:
Operative performance: The American Board of Surgery (ABS) requires that gen-
eral surgery residents be rated on their performance during operative procedures.
Rating tools for several different procedures are available on the ABS website,
developed by researchers at Southern Illinois University School of Medicine
[ 10 ]. Using these tools, evaluators observe a resident during an operation and
rate the performance on a standardized form.
The construct being measured is “performance during an operative procedure.” This
is a complex construct – because operations are complex procedures – made up
of several key components. The rating form for a laparoscopic cholecystectomy,
for example, asks evaluators to score residents on incision and port placement,
exposure, cystic duct dissection, cystic artery dissection, and gallbladder dissec-
tion. Residents are also scored on more general criteria: instrument handling,
respect for tissue, time and motion, and operation flow.
Performance in the operating room is, in many ways, an ideal example of a com-
plex construct. Performance in this context cannot be reduced to a single question.
The same is usually true when measuring attitudes or psychological traits.
Grit: The concept of “grit” has gained a great deal of attention. Developed and
popularized by personality psychologist Angela Duckworth, grit is defined a
“perseverance and passion for long term goals” [ 5 , 6 ]. Take notice of the word
“and”  – grit has two components. In order to measure grit, one must measure
both perseverance and passion for long-term goals.
An eight-item “short grit scale” is publicly available on the Duckworth Lab website.
Four of the items focus on perseverance. Four focus on long-term goals. The
items use a “Likert-type” format, meaning that response options are aligned
along a continuum, in this case from “very much like me” to “not like me at all.”
This format allows items to be scored numerically, from 1 to 5 in this instance.
Scores across all items can be averaged to form an overall Grit score.
The Grit scale has multiple items because grit is a multifaceted construct. But
there is another reason why the Grit scale and other scales like it ask multiple ques-
tions, because no single question perfectly captures a construct. Language is messy.
And since no single question is perfect, researchers ask about the same construct in
several different ways, with the hope that a common pattern will emerge across
answers.
4 Measurement in Education: A Primer on Designing Assessments
