Applying Genetic Research To My 23andMe Data
To calculate your risk of various diseases, 23andMe scours the medical research literature for studies that correlate incidence rates for those diseases with mutations at specific locations in the human genome. The locations where these mutations commonly occur are referred to as single-nucleotide polymorphisms, or SNPs.
In this post, I show how I applied the findings of this study about caffeine-induced anxiety to discover more about myself. I have no genetic research background whatsoever, and my knowledge of genetics is minimal, so it's amazing that this is slowly becoming accessible to a wider audience.
The Genetic Culprit #
From the study abstract:
At the 150 mg dose of caffeine, we found a significant association between caffeine-induced anxiety (Visual Analog Scales, VAS) and ADORA2A rs5751876 (1976C/T), rs2298383 (intron 1a) and rs4822492 (3′-flank), and DRD2 rs1110976 (intron 6).
rs4822492, and a few others are claimed here to relate to caffeine-induced anxiety. Reading further, Figure 4 and Figure 5 show which variations are correlated with higher anxiety. Here's a quick summary of the high-anxiety variations listed in those figures:
rs5751876: T/T (vs. C/C, C/T)
rs2298383: C/C (vs. T/T, C/T)
rs4822492: C/C (vs. G/G, G/C)
rs1110976: G/G (vs. G/-)
Armed with this information, I can check my own genome for variations at those SNPs.
Checking My Genome #
In the appendix of this blog post, I discuss how to retrieve your genetic data from the 23andMe API. I followed those directions with two changes:
- I used the authorization scope
- I accessed the API endpoint
<id>is my internal ID. This ID is returned with every response from their API.
I downloaded the genomic data as
curl https://api.23andme.com/1/genomes/<id>/ -H "Authorization: Bearer <access_token>" > genome.json
To help extract the specific SNPs listed above, I wrote
a quick Python script.
Running it, I get my results:
$ python extractSNP.py rs5751876 rs2298383 rs4822492 rs1110976 < genome.json
23andMe doesn't provide data on the
rs1110976 SNPs, but I can see that I have the high-anxiety variations at
The script uses 23andMe's SNP list, which identifies the locations of all SNPs that they look for. The first time you run
extractSNP.py, it will generate an index of all 23andMe SNPs in
.snps-index to make subsequent runs faster. Once you have your
genome.json data and have built the
.snps-index index, you can look up any SNP or group of SNPs in about a second.
To apply genetic research findings to your genomic data:
- Use the process described in this blog post with scope
genomesand API endpoint
https://api.23andme.com/1/genomes/<id>to download your genetic data. Save it as
- Download my script into the same folder as
- Find a paper, journal article, etc. that mentions specific SNPs (those funny
python extractSNP.py rsXXX < genome.jsonto search for those SNPs in your genome.
This is still a highly manual process, and you'll need some familiarity with the command line to pull it off. We haven't quite reached the promised near-future where the deepest insights about our personal genetics are easily available.
23andMe's data is incomplete: they only had two of the four listed SNPs. That said, it's far better than nothing!
Also, there's room for skepticism regarding the underlying research. Only 102 participants took part, all of European-American descent. The subjects all had stated caffeine intake levels of less than 300mg per day, or about 3 cups' worth - that's a pretty wide range of potential tolerances. Subjects rated their anxiety on a subjective scale. The researchers also took measurements of heart rate and blood pressure, but did not incorporate that data into the anxiety rankings.
No single study is perfect. For more reliable results, you'll want to dig up findings that are supported by multiple studies.