Re: Manual Evaluation of Algorithm Performance on Identifying OA
On Mon, 23 Jan 2006, Phil Davis wrote:
> It would be much more constructive if Stevan spent time trying
> to find problems in their methodology and analysis...
As I said, the discrepancies between our test of the robot's accuracy
and Goodman et al's prompted us to try to find the basis for the
discrepancy, and we think we have found it:
The robot sends ISI reference queries to several search engines and then
tests up to the first 60 hits to see if any of them is OA, stopping
and returning "OA" as soon as its algorithm judges that a hit is OA,
and returning "NOA" if none of the (up to) 60 hits is OA.
The right way to check the robot's accuracy is to save all the hits, and
hand-check all of them for a sample that the robot judged
"OA" and a sample the robot judged "NOA". What we instead did in our
own small test sample was to do a search by hand for a sample of 100
references that the robot had judged to be OA and 100 references it had
judged NOA (in Biology). Goodman et al. did the same for a sample
about three times as big in Biology, as well as in Sociology.
All three tests found very different robot accuracies. The reason now
seems clear: When one hand-checks the accuracy of a device, that has to be
done on the *device*'s original sample, not on a different sample. All of
us had used a different sample (and even different search engines). The
right test of the robot's accuracy requires hand-checking the (up to)
60 hits that the robot actually sampled and processed and judged OA or
NOA. We are now re-doing both the searches and the tests, saving the
hits for doing this hand-checking.
In other words, all three tests were biassed against the robot --
being based on different samples, from different sources, united only
by whether or not the robot had judged the reference item to have an OA
version somewhere among the (up to) 60 hits in its own *first* sample. We
had not noticed that bias earlier, because our own test had yielded such
a strong accuracy despite the (unnoticed) bias.
As I said before, I am glad Goodman et al. did the further test, whose
finding of much weaker and more variable accuracy alerted us to the fact
that something was amiss. We think we have found what was amiss, and it
was not in the robot's accuracy but in our test of the robot's accuracy.
Stay tuned for the results for both Biology and Sociology, which are being
completely re-done by the robot, but this time saving all the hits; the
robot accuracy test will be available soon for a still larger subsample
of these same data. We are also saving all the hits (for all of Biology
and Sociology, not just this larger sample), so anyone else can hand-check
them if they wish.
Stevan Harnad
> At 08:41 PM 1/22/2006, you wrote:
> >Before anyone gets too excited about the tiny Goodman et al. test
> >result, may I suggest waiting a couple of weeks, when we will be
> >reporting the results of a far bigger and more accurate test of
> >the robot's accuracy?
> >
> >Those who (for some reason) were hoping that the robot would
> >prove too inaccurate and that the findings on the OA advantage
> >would prove invalid may be disappointed with the outcome. I can
> >already say that overinterpretations of the tiny Goodman et al.
> >test as showing that the OA/OAA findings to date are "worthless"
> >are rather overstated even on the meagre evidence to date,
> >especially since two thirds of the published findings on the OA
> >citation advantage are not even robot-based!.
> >
> >(This shrillness also seems to me to be trying to make rather
> >much out of having actually done rather little!)
> >
> >As to the separate issue of how to treat the OA journal article
> >counts (as opposed to the counts for the self-archived non-OA
> >journal articles): We count it all, of course, but only use the
> >non-OA journal article counts in calculating the OA advantage,
> >because those are (necessarily) within-journal ratios, and
> >citation ratios of zero and infinity are meaningless. Think about
> >it.
>
> [SNIP]
>
> >Stevan Harnad
>
>
Received on Tue Jan 24 2006 - 21:01:40 GMT
This archive was generated by hypermail 2.3.0
: Fri Dec 10 2010 - 19:48:11 GMT