Back to Top

Should we even sequence genomes for disease diagnostics?

It's always important to come back to fundamentals.  Before we decided on the ethics of "incidental findings" (see previous post), before we opine on the value of genomic data, before we even pick up the pipette, we must first establish whether we should even sequence genomes at all in the context of disease diagnostics.  And of course, the answer is no.
Why? Well... The genome doesn't really DO anything.  It's the blue print, the instruction manual, the cake recipe.  If there's something wrong with your cake, you can look at your recipe to see if it says "Rat Poison" instead of "butter", but you're better off looking at the actual cake and checking for arsenic.  If your oven's broken, if your eggs are rotten, if your flour has beetles in it, you won't see that in the recipe but your cake will still be terrible.  Likewise, in humans, it's the proteins (mostly) that DO things, that ARE who we are, and thus, if something is wrong, it's the proteins that are affected and we should look at the proteins directly to see what's wrong with them (The proof of the genome is in
This, sadly, is harder than it sounds.  There's a number of ways to go about it: RPPA, mass spectrometry, ELISA, and various optical methods as well but these methods are either very specific to the protein you're evaluating, expensive, difficult, or not especially thorough at showing us if the protein is functioning normally.  This is one of the great advantages of sequencing.  You can interrogate every gene in the exact same manner, and with some error, determine all the changes present in that gene that are not present in healthy people all at relatively low cost.  Compare this to either generating an enzymatic assay or monoclonal antibodies for all 20,000 proteins the body makes and you'll come to the conclusion that everything should be sequenced.  There are problems, however...
Our understanding of how a gene becomes a protein is fairly complete, but our ability to understand how changes at the DNA level affect protein function, or DNA binding sites is much more archaic, despite great efforts in the field.  for example, exactly how many CpG residues need to be methylated upstream of a gene to affect the expression of that gene is poorly understood for even the best studied genes.  It's much easier to actually just measure the amount of RNA present!  Further, RNA editing allows "mutations" to be introduced into genes with otherwise normal genomic DNA!
Right!  So lets sequence RNA.  We'll get all the RNA edited sites, we'll be able to quantify expression, pick up on transcript isoform biases, etc.  Well, there's problems here, too.  RNA levels don't always correlate as much as we'd like to think to protein levels.  It's also not always clear that having, say, half the amount of a certain enzyme means you have disease (indeed, in many cases, you may show no overt phenotype even with fairly low, but non-zero, levels of enzyme activity).  BUT, that objection aside, yes, you should sequence RNA.  It's a good idea.  Except...there's always an's not always possible for us to know that we're sequencing the correct tissue to detect a change in the RNA.  Imagine a disease that causes some kind of cranial-facial deformity.  Is the gene that caused this even expressed in the blood (the most common source of DNA to sequence)?  Even if we take a sample of the skin or bone from the face, we do not know if the gene that caused the disease is still expressed (perhaps it was only expressed during early development).  These uncertainties might drive you to back to sequencing the genome, which, more or less, is the same in every cell.
However, on the plus side, there are a few diseases where you always know the affected tissue.  Most notably, cancer.  The tumor is the disease and is readily sequencable (time to plug an early paper of mine, one of the first times HTS was used to sequence RNA).  The fact that RNA sequencing isn't more prevalent in cancer treatment intrigues me.  There are some possible objections to this approach (it might matter WHY/HOW a gene is repressed, not just that it IS repressed), but for trying to establish treatment regimens, you are almost certainly better off sequencing RNA than DNA in cancer and I look forward to seeing more RNA-seq data from TCGA. why do we sequence genomes?  Because it's the least bad thing we can do, and until whole cell mass-spec can start making diagnoses, we're stuck with it.  It's not so bad really, but I suspect in 20 or 30 years people might think sequencing a genome is a trivial pursuit.

Notes to self:  What about the 1000-transcriptomes?  And maybe something on open data.


Matthew Bainbridge's picture
About the author

Matthew Bainbridge is President and CEO of Codified Genomics, analyst, and sometimes scientist