Surgeons as Educators A Guide for Academic Development and Teaching Excellence

(Ben Green) #1
109

the different criteria used by crowdworkers and the simulation software to generate
their respective assessments. Given the existing literature regarding crowd-based
feedback, this suggests that perhaps this form of surgical evaluation may actually be
superior to current artificial intelligence-based models, in that it more closely
approximates the assessments of expert human surgeons.
Large-scale evaluation using crowdworkers has also been shown to be economi-
cally efficient. Across multiple published studies, crowdworkers such as Amazon
Mechanical Turks are compensated small monetary sums ranging from $0.50 to
$1.00 per task [ 27 , 47 , 49 , 50 , 55 ]. For crowdworkers, the amount of remuneration
has been linked to the rapidity of feedback, which may provide further opportunities
to improve the efficiency of this method of technical assessment [ 47 ]. In contrast,
the cost for an expert surgeon assessment of a 5–10-min video is estimated to range
from $54 to $108 or about $10 per minute, assuming an annual surgeon salary of
$340,000 per year and a 2000-h work year [ 50 ]. Aggregate calculations based on the
available data in the literature estimate that the cost of feedback provided by sur-
geons ranges from 1.15 to 8.38 times more expensive than that provided by crowds
for the same task [ 57 ]. Thus, crowd-based evaluation of surgical task performance
videos is consistently more cost-effective, particularly when scaled to multiple vid-
eos for an entire group of residents.
A 2016 systematic review highlighted that crowdworkers consistently completed
evaluation tasks significantly faster than experts, ranging from 9 to 144 times faster
[ 57 ]. In the current published literature, the time required for crowdworkers to
return feedback for a specific task ranges from 5 days to as little as 2 h and 50 min,
with variation depending on the length of task video and complexity of the task. In
contrast, the time for expert surgeons to return feedback for the same tasks ranges
from 26 h to 60 days [ 27 , 47 , 49 , 51 , 52 , 54 – 56 ].
Though the value of crowd-based evaluation in dry lab simulation settings has
been repeatedly validated, the impact of crowdsourced feedback on live intraopera-
tive performance has been less extensively studied. Systematic reviews investigat-
ing skills transfer for laparoscopic and endoscopic simulation tasks suggest that
such training improves operative performance [ 58 ], but there has been limited data
on skills transfer from robotic simulation tasks [ 59 ]. Unlike simulation tasks, intra-
operative performance is not limited to a single skill or task segment, with success-
ful performance requiring nuanced surgical judgment in addition to basic technical
fluency. Thus, equivalency between crowd-based evaluation and expert evaluation
for the performance of a narrow scope of simulation tasks may not be maintained
for more complex operative procedures performed in real time.
To address this issue, several groups have investigated the use of crowd-based
feedback in the assessment of live operative video. Powers and colleagues gener-
ated 14 10-min video clips of renal hilar dissection performed at varying skill levels
by 5 postgraduate year 3 or 4 urology residents and surgical attendings. Using the
validated GEARS tool plus a novel renal artery dissection question, the videos were
assessed by Amazon Mechanical Turk workers and urologic surgeons with exper-
tise in robotic-assisted laparoscopic surgery and robotic partial nephrectomy.
Complete ratings were returned by 14 expert surgeons in 13 days, as compared to


6 Crowdsourcing and Large-Scale Evaluation

Free download pdf