Surgeons as Educators A Guide for Academic Development and Teaching Excellence

(Ben Green) #1
83

more discriminating than the others. Watanabe et  al. found that by utilizing item
response theory, they were able to calculate which aspects of GOALS are more dif-
ficult over an impressive 12-year time period with a total of 396 evaluations [ 43 ].
These metrics included the bimanual dexterity, efficiency, and autonomy items and
were also found to show nonlinear regression in achieving marks on the higher side
of the scales. This means that not only are certain aspects more difficult but that it is
increasingly more difficult to earn a higher score the further to the right on the
global rating scale one goes [ 43 ].


GEARS
Robotic-assisted surgery comes with new challenges for robotic-assisted surgery
and thus for assessment. At the present, there isn’t technology to support haptic
feedback during robotic-assisted surgery, though this will likely be changing shortly.
As a consequence, gaining skill in force sensitivity and robotic handling are critical,
yet tricky, businesses that differentiate robotic from laparoscopic surgery. Another
distinguishing skill is addition of 3D stereoscopic visualization, compared to lapa-
roscopies 2D endoscopes.
To generate the means of assessing these differences, Goh and his team based a
global ratings scale off of GOALS that measured robotic-specific skills, pictured
below in Table 5.3 [ 44 ].
Since GEARS' development it has undergone extensive validation. Evidence of
construct validity has been shown many times over by various studies [ 21 , 44 – 47 ],
as well as face, content, and concurrent validity [ 21 ]. The majority of GEARS
research has been done with in  vivo cases, though construct validity has also
extended to dry lab simulation [ 21 ]. Interrater reliability has also been illustrated,
across a number of different grading groups.
Nabhani et  al. took GEARS further with evaluations on surgeons immediately
after robotic prostatectomies and robotic partial nephrectomies [ 48 ]. They took a
slightly different approach to completing evaluations, using faculty, fellows, resi-
dents, and surgical technicians. The findings were not surprising; surgeons with
higher levels of experience had better correlation than the other groups, particularly
resident self-evaluations and the surgical technicians. Overall though, GEARS per-
formed well as an assessment for live surgery [ 48 ].
GEARS not only can be utilized for evaluating skills acquisition during surgery
or dry lab rehearsal but also for assessing progress during dry lab or VR robotic
simulation and for full-length procedures [ 46 , 49 ]. All VR simulators utilize varying
metrics to grade performance, but GEARS can be used as a baseline score generator
to compare viability as a training tool for the different devices.


C-SATS
Crowd sourcing is a plausible solution to the notoriously long wait times for expert
review. In a study evaluating possible means of scoring BLUS tasks and earlier
robotic attempts at FLS modules, researchers found that Amazon.com’s Mechanical
Turks could rate exercises as well as both expert reviewers and motion capture tech-
nology [ 50 , 51 ]. (Why Turks? The story goes the Napoleon Bonaparte, brilliant


5 Performance Assessment in Minimally Invasive Surgery

Free download pdf