Citation counts: Google Scholar vs. Web of Science

As most scientists do, I occasionally find myself talking to colleagues about scientific impact. This discussion often centers on the prestigiousness of journals and their Impact Factors (for a very good discussion on this subject see this blog post). The other topic that comes up is the number of citations papers receive. Although the brilliance of science does not perfectly correlate with the number of times colleagues cite you, it is a much better indication of one’s success (which is not the same as one’s abilities!) than being able to report having published in a high IF journal. Again, this measure is not perfect, for instance, some research fields are smaller and consequently there is a smaller pool of people that could cite you. Also, it seems that by citing more papers, you will receive more citations yourself in what seems to be a tit-for-tat game.

My colleagues and I had noted that the number of citations given by Google Scholar was always higher than that given by Web of Science.  There were some suspicions of Google Scholar algorithms including dodgy references and Web of Science being more limited in the number of journals it included (it does not need saying that we were all in favour of Google Scholar…). Although far from putting an end to this debate, I still thought that it could be illuminating to take a single paper to check where the differences in citation counts could lie. (I have not looked at the other two main resources Scopus and PubMed as I don’t use these myself.) I used my best-cited paper, a meta-analysis of homologous recombination rates in bacteria which was a collaboration with Xavier Didelot, now at Imperial College London. This paper re-analyzed published data; such a meta-analysis lies somewhere between a paper on one’s own data and a review paper. Review papers are usually better cited than data papers and therefore excluded from some Impact measurements (such as ‘the REF’ here in the UK). Web of Science says it has been cited 89 times, Google Scholar says it was cited 125 times: 40% more!

I downloaded all 89 WoS references into Endnote to check them against the GS list and to export the 36 missing references to the GS ‘My Library’ list. The strange thing was that the numbers did not add up: I missed five references (I first blamed myself but after rechecking the list they were still not there). Then I noticed that there were twelve search results pages displaying ten papers each: 120 hits, not the 125 citations listed below the paper. Google thus was contradicting itself*. This instantly reminded me of a conversation I had with my ECEHH colleague Marco Palomino who researches ‘horizon scanning’; the identification of emerging technologies or risks using internet searches (combining keywords of interest with phrases such as ‘break through’ and ‘cutting edge’ for instance). Marco found out that there are huge discrepancies between actual search results and the the number of total reported hits by Google search algorithms (see here). Perhaps Google’s ‘Don’t be Evil‘ should be followed by ‘but it’s OK if you’re slightly disingenuous’…

Now for the remaining 31 (120-89) GS references not covered by WoS. First I checked whether there were any references found by WoS and not by GS. There was one (a 2011 paper in the American Journal of Clinical Pathology). GS also linked to a pre-published paper whereas WoS had a complete reference. Only one GS false negative, how about the WoS false negatives? Frustratingly, I found 33 GS references not present in WoS instead of the expected 32 (120-89-1). Re-checking did not solve this and I hope the reader will forgive me for one reference gone astray. Among these 33 references were four book chapters and eleven dissertations. It is perhaps debatable if these (especially the latter) should be counted, as their availability might be more limited compared to papers and they go through a more informal type of peer review (and also the contents of dissertations should find their way into papers). I personally think they represent proper scientific output and should be counted. One reference was a comment in Science which is not peer-reviewed (and very brief) and one could argue whether to include that one as well (I’ll include it!). One reference was in Chinese. I would argue that all references should be in the lingua franca English (I am not a native English speaker so I can say that…).

A total of 16 papers were missed by WoS, including ones in well-known journals such as Environmental Microbiology, Nature Reviews Microbiology, PLoS ONE and BMC Evolutionary Biology. Other journals missed were less well known, eg Mobile Genetic Elements and Applied Microbiology and Biotechnology; the former is not indexed by WoS but the latter is (you can check for WoS indexed journals here). Nine papers were from 2013 and so might have been missed by WoS due to a time lag but that should not be an excuse.

Of course I have cited myself on occasion. Some funders require a reference list with (WoS) citation counts  minus self citations. It is true that researchers to an extent can inflate their citation counts by citing themselves and that a self citation on average is worth less than a citation of your work by someone else. However, it must also be said that these same funders usually insist on seeing an overarching research theme by the applicant and it makes complete sense that your work is built on your previous (published) work (see here for an interesting discussion on the topic of self-citations). Anyway, the strange thing is that I have cited this paper both in a Trends in Microbiology paper in 2009 and in a paper in the same journal in 2011 and WoS has only found the 2011 one: very strange!

Below I have plotted the Google Scholar and Web of Science citation counts, as well as the ‘real’ count (125 minus 5 false positives plus one false negative minus one none-English paper). (A ‘real’ papers-only count would amount to 115 citations (120 minus 15)). Despite evident faults, it is clear that GS is a better predictor of actual citation numbers than is WoS. Perhaps some of the confusion will clear in the near future now it seems that Google Scholar and Web of Science/Knowledge will start to cooperate

Presentation1

Michiel

*=I see that the citation count has increased to 129 while writing but still only 120 citations show up

 

P.S. I found out that (of course) I wasn’t the first to make this comparison, see here for another blog post:

Google Scholar vs. Scopus and Web of Science

 

Advertisements
This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

9 Responses to Citation counts: Google Scholar vs. Web of Science

  1. Tim Vines says:

    The peril that I see with Google Scholar is that it’s free – Google are under no obligation to keep providing it, and, just like Google Reader, they may just announce that they’re stopping Scholar. If by that point they’ve driven WoS out of business, what will we do?

  2. Hopefully the newly announced cooperation between WoS and Google Scholar will resolve these differences but the reasons for the differences seem relatively benign, namely that WoS is a little behind, that the source types are different and that some journals are not listed by WoS (it seems this is an on-going addition issue). Both sources are quite opaque about their methodologies, despite several attempts at reverse-engineering.

    In any case, the main differences between the two relate to the bottom-end of citations or “dead-end” citations. For example, citations in papers in obscure/iffy journals or in dissertations or abstracts are unlikely to accrue citations themselves (especially non-self citations) due to their reduced accessibility. Perhaps WoS has made this calculation in excluding these sources?

    Whatever the reasons, we should not become too enamoured about bigger numbers. It’s their meaning that is important, lest we slip into the laziness of attaching significance to large numbers, as in JIFs.

    • Hi Jim,
      thanks for your thoughts. I agree that the case could be made that dissertations should be omitted from citation counts. However, that still leaves a good number of papers missed by WoS and I do find that that just isn’t good enough. Most of the missing citations where from journals that they indexed. Why should there be a lag? This stuff is computer-automated: when a PLoS ONE paper is online, it can be added to WoS’s databases a minute later…

      cheers, Michiel

  3. Rex says:

    In the humanities, it is not unusual to have one, two, or even three orders of magnitude differences between GS and WoS counts, because TR’s coverage of the humanities is very sketchy.
    So, the outcome of your test crucially depends on the research domain.

  4. Eve Carlson says:

    You may be interested in software called Publish or Perish that uses Google Scholar to determine citation metrics.

    I very much enjoyed your article as I have thought of doing the same myself when curious about why my Google Scholar citation count comes out 63% higher than my WoS citation count. Part of the answer in my case is that my work is cited in non-English journals. My field is traumatic stress, which happens everywhere – probably most frequently in places where English is not the primary language. I find it interesting that you think those citations should not be counted. While some of those publications do not have standards as high as those in English language journals, surely not all! While I am an admittedly biased observer, such citations seems to me to reflect some kind of impact. If citations in the context of criticism get counted, why not citations in other languages?

    • Dear Eve,

      Many thanks for your comment. I will check out Publish or Perish, sounds interesting! I guess in the field of biology, non-english langiuage journals are very rare nowadays (the few non-English citations I’ve encountered usually are theses). I agree with you that a researcher with 10 ‘english language (or in a much broader sense ‘traditional’) publications’ and 10 ‘foreign-language (or non-mainstream) publications (i.e. the ones I chose to ignore) is doing a better job than one that just has the 10 traditional publications. How to weight this difference is not straighforward though…I haven’t got a clue in any case!

      cheers, Michiel

  5. Interesting post. I think google scholar is better in the end but indeed only when used in a proper way as it is indeed vulnerable to ‘gaming’. Nevertheless, I have only been publishing for the last three years and therefore my overall citation count is low. But my scopus count is indeed considerably lower, which is difficult for young scientists if seniors pay too much attention to the scopus count.
    Cheers, Jolle

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s