As most scientists do, I occasionally find myself talking to colleagues about scientific impact. This discussion often centers on the prestigiousness of journals and their Impact Factors (for a very good discussion on this subject see this blog post). The other topic that comes up is the number of citations papers receive. Although the brilliance of science does not perfectly correlate with the number of times colleagues cite you, it is a much better indication of one’s success (which is not the same as one’s abilities!) than being able to report having published in a high IF journal. Again, this measure is not perfect, for instance, some research fields are smaller and consequently there is a smaller pool of people that could cite you. Also, it seems that by citing more papers, you will receive more citations yourself in what seems to be a tit-for-tat game.
My colleagues and I had noted that the number of citations given by Google Scholar was always higher than that given by Web of Science. There were some suspicions of Google Scholar algorithms including dodgy references and Web of Science being more limited in the number of journals it included (it does not need saying that we were all in favour of Google Scholar…). Although far from putting an end to this debate, I still thought that it could be illuminating to take a single paper to check where the differences in citation counts could lie. (I have not looked at the other two main resources Scopus and PubMed as I don’t use these myself.) I used my best-cited paper, a meta-analysis of homologous recombination rates in bacteria which was a collaboration with Xavier Didelot, now at Imperial College London. This paper re-analyzed published data; such a meta-analysis lies somewhere between a paper on one’s own data and a review paper. Review papers are usually better cited than data papers and therefore excluded from some Impact measurements (such as ‘the REF’ here in the UK). Web of Science says it has been cited 89 times, Google Scholar says it was cited 125 times: 40% more!
I downloaded all 89 WoS references into Endnote to check them against the GS list and to export the 36 missing references to the GS ‘My Library’ list. The strange thing was that the numbers did not add up: I missed five references (I first blamed myself but after rechecking the list they were still not there). Then I noticed that there were twelve search results pages displaying ten papers each: 120 hits, not the 125 citations listed below the paper. Google thus was contradicting itself*. This instantly reminded me of a conversation I had with my ECEHH colleague Marco Palomino who researches ‘horizon scanning’; the identification of emerging technologies or risks using internet searches (combining keywords of interest with phrases such as ‘break through’ and ‘cutting edge’ for instance). Marco found out that there are huge discrepancies between actual search results and the the number of total reported hits by Google search algorithms (see here). Perhaps Google’s ‘Don’t be Evil‘ should be followed by ‘but it’s OK if you’re slightly disingenuous’…
Now for the remaining 31 (120-89) GS references not covered by WoS. First I checked whether there were any references found by WoS and not by GS. There was one (a 2011 paper in the American Journal of Clinical Pathology). GS also linked to a pre-published paper whereas WoS had a complete reference. Only one GS false negative, how about the WoS false negatives? Frustratingly, I found 33 GS references not present in WoS instead of the expected 32 (120-89-1). Re-checking did not solve this and I hope the reader will forgive me for one reference gone astray. Among these 33 references were four book chapters and eleven dissertations. It is perhaps debatable if these (especially the latter) should be counted, as their availability might be more limited compared to papers and they go through a more informal type of peer review (and also the contents of dissertations should find their way into papers). I personally think they represent proper scientific output and should be counted. One reference was a comment in Science which is not peer-reviewed (and very brief) and one could argue whether to include that one as well (I’ll include it!). One reference was in Chinese. I would argue that all references should be in the lingua franca English (I am not a native English speaker so I can say that…).
A total of 16 papers were missed by WoS, including ones in well-known journals such as Environmental Microbiology, Nature Reviews Microbiology, PLoS ONE and BMC Evolutionary Biology. Other journals missed were less well known, eg Mobile Genetic Elements and Applied Microbiology and Biotechnology; the former is not indexed by WoS but the latter is (you can check for WoS indexed journals here). Nine papers were from 2013 and so might have been missed by WoS due to a time lag but that should not be an excuse.
Of course I have cited myself on occasion. Some funders require a reference list with (WoS) citation counts minus self citations. It is true that researchers to an extent can inflate their citation counts by citing themselves and that a self citation on average is worth less than a citation of your work by someone else. However, it must also be said that these same funders usually insist on seeing an overarching research theme by the applicant and it makes complete sense that your work is built on your previous (published) work (see here for an interesting discussion on the topic of self-citations). Anyway, the strange thing is that I have cited this paper both in a Trends in Microbiology paper in 2009 and in a paper in the same journal in 2011 and WoS has only found the 2011 one: very strange!
Below I have plotted the Google Scholar and Web of Science citation counts, as well as the ‘real’ count (125 minus 5 false positives plus one false negative minus one none-English paper). (A ‘real’ papers-only count would amount to 115 citations (120 minus 15)). Despite evident faults, it is clear that GS is a better predictor of actual citation numbers than is WoS. Perhaps some of the confusion will clear in the near future now it seems that Google Scholar and Web of Science/Knowledge will start to cooperate…
*=I see that the citation count has increased to 129 while writing but still only 120 citations show up
P.S. I found out that (of course) I wasn’t the first to make this comparison, see here for another blog post:
Google Scholar vs. Scopus and Web of Science