
As it turns out, my genetics class was just starting to discuss genomes and genomics when this fact-checking project began, and the first fact presented in "A User's Guide: Your Genes - 100 Things You Never Knew" (a Time Inc. Specials publication, by National Geographic), happened to be relevant to that topic:
1. We humans share 99% of our genes with chimpanzees and bonobos
Initial Reaction
I knew that this putative Fact would be tricky to ask students to work on, because it combines an easily vetted numerical value with two subtle, perhaps trivial, issues of definition:
- what does "share" mean?
- what definition of "gene" should we use?
Also, I anticipated that this topic could automatically (without trial following investigation and critical thinking) be rejected by creationists or others who don't want to consider whether we humans are most closely related to non-human primate species like chimpanzees.
Initial Student Responses
Of all of the ~60 written summaries by students:
A few cited a review paper by Khodosevich, Lebedev and Sverdlov, "
Endogenous retroviruses and convergent evolution" (2002)
Comp Funct Genom, which states in the second sentence of the introduction, "The average DNA sequence difference between human and chimpanzee is only 1.24% [7] and probably only 0.5% in active coding regions [9]," but some students didn't even make it that far, stopping at the first sentence of the abstract, "Humans share about 99% of their genomic DNA with chimpanzees and bonobos."
Another review, by Varki and Altheide, "
Comparing the human and chimpanzee genomes: Searching for needles in a haystack" (2005)
Genome Research, was often-cited to refute the Fact, perhaps because its abstract states, "The difference between the two genomes is actually not ∼1%, but ∼4%—comprising ∼35 million single nucleotide differences and ∼90 Mb of insertions and deletions."
Britten (2002)
PNAS says it all in the title, "
Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels."
In Ebersberger
et al. (2002) "
Genomewide Comparison of DNA Sequences between Humans and Chimpanzees"
Am J Hum Genet., the authors studied about 1.9 million nucleotides and found an average sequence difference of 1.24%.
Most cited PrĂĽfer
et al. "
The bonobo genome compared with the chimpanzee and human genomes" (2012)
Nature, in which the authors state that, in the single-copy autosomal regions they analyzed, the average percent identity between human and bonobo genomes is 98.7%, with bonobo-chimpanzee percent identity at 99.6%.
My Feedback to the Students

If you'd like to hear (and see the slides I used) our in-class discussion on this process, please visit this
link to YouTube
Much student discussion focused on whether it was reasonable to round 98.7% (e.g. from PrĂĽfer
et al.) up to 99% (as stated in the Fact). Although this is a relatively trivial point, whether it is critical for concluding that a statement is fact is not an easy conclusion to draw!

I raised one concern with using the Khodosevich paper as evidence supporting the Fact. This is a review paper, meaning that it didn't technically adhere to my requirement of a primary research article (such as PrĂĽfer
et al.) that would be the original source of data that could be used to support or refute a conclusion. If students had found, read, and directly cited reference [7] from the above quote from Khodosevich, then that would be more appropriate. This assumes, of course, that reference 7 really does provide evidence that supports the Fact. The practice of citing literature based only on its title, or perhaps after only reading the abstract, does occur in science. Hence, some practical skepticism is warranted! Don't blindly believe that a citation supports an author's position until you read the citation.
I also offered students two technical, genetics-related viewpoints.
The Fact, as stated, discusses the percentage of
genes shared between the species. However, most of the students cited research that was not tallying genes shared in common, but rather
nucleotides (the letters comprising a chromosome's sequence) that are shared in common, and it is not necessarily an obvious logical step to conclude that 98.7 percent identity at the DNA sequence level would result in 98.7% (or 99%, with rounding) of genes being shared between humans and chimpanzees.
Second, many of the genomics studies that students cited analyzed different regions of chromosomes. Some looked only at single-copy regions of chromosomes. These DNA sequences are easy to locate the identical (homologous) chimpanzee version of, so the two sequences can be directly compared to find percent identity.
ATACATAG (Human)
ATAGATAG (Chimp)
Of these eight nucleotides (which I invented for the purpose of this example), only one is different between humans and chimps, so 7/8 are identical (87.5% identity). This was the type of approach used by Ebersberger
et al. and by PrĂĽfer
et al.
Other chromosomal differences exist, like insertions and deletions (indels), and duplications of large (or small) stretches of DNA. Scientists can interpret these sorts of differences in various ways.
ATAC-----ATAG (Human)
ATAGGGGGGATAG (Chimp)
Here, in the middle of the same eight nucleotides as the first example, there is a five-nucleotide insertion of G in chimpanzee (or, just as likely, a five-nucleotide deletion of G in human). If scientists align the two sequences like this, then they might only analyze the alignable sequences, which would result in the same calculation as above: 7 of 8 alignable nucleotides are identical, so 87.5% identity. However, if the
structural variants (like indels) are included in the calculation, then above there are seven of thirteen total nucleotides that match (53.8% identity). This latter approach was employed by Varki and Altheide and also by Britten. Notably, Britten found 1.4% divergence in alignable nucleotides, with an additional 3.4% divergence based on indels. The sum is thus 4.8 (which was rounded to 5% in the manuscript title - probably not deceptively, but also not accurately!)

Thus, different studies can reach different conclusions because of the analysis methods used. And, of course, those subtle differences don't make it into the paper titles (and sometimes not into the abstracts) - and they definitely don't trickle down into popular science and media coverage of these types of studies!
Comparing research studies that arrive at apparently different conclusions
I made two final points for the students. I suggested that they might consider
how much DNA sequence was analyzed in each study. This could be used to decide the relative importance of each study when arriving at a conclusion about which of the various calculations of human-chimp percent divergence (here we've seen citations reporting 0.5, 1, 1.24, 4, and 5%) is perhaps most accurate. Britten's analysis of 779,000 nucleotides resulted in a conclusion of 5% divergence, while Ebersberger
et al. analyzed 1,900,000 nucleotides and found 1.24% divergence.
I also suggested that they might
consider the ages of the studies. It might be important, if you really want to be sure of your facts, that you not cite an old study that might have been conducted with perhaps less precise methods than we have at present, or that was since corrected in more recent studies. This is not to say that old studies using old approaches or tools are necessarily flawed and inherently worse than more current research, but some people (I understand) still think the earth is flat, citing really old literature, despite a plethora of more recent work that has pretty much eroded confidence in the flat-earth stance. When performing a literature review, it is best practice to read studies spanning the time when a particular topic has been researched. I know that my audience here doesn't necessarily have that sort of time, which is why I hope that this information literacy project will fill that need!
Student Decision: Fact or Fiction?

Ultimately, 30 of 51 students (58.8%) agreed that it is a fact that "We humans share 99% of our genes with chimpanzees and bonobos."
From my perspective, it seems like the top two aspects that caused less-than-overwhelming support for this Fact, as stated, were that
- the specific value of 99% did not appear in research literature
- the studies students found did not assess shared genes (as in the Fact) but instead shared DNA sequence
In other words, I think that this Fact didn't garner more support because of
rounding and because the Fact, as written,
misconstrues the actual basis of the research supporting it.
What resources did students use to find supporting research literature?
The week before this information literacy project began, I had started showing students in class how to use the
NCBI PubMed literature database to find research publications. So, I also asked students to report how they identified the research literature they cited for this first fact-checking assignment.
29% PubMed
17% Google Scholar
29% Other form of search via web browser
2% EasyBib
2% Public Library of Science website
3% EBSCO
and, to my delight and surprise:
19% used some form of resource at the
Henry Madden Library on our campus. I only teach upper-division and graduate classes, so I'm not familiar with how much exposure students get to using our library resources. I'm glad they're taking advantage of what our library has to offer!
Literature Cited