Next generation DNA sequencing is ubiquitously integrated in modern biomedical research while mass-spectrometry proteomics remains less ubiquitous. In fact, mass-spectrometry proteomics is conspicuously missing from projects that desperately need it.

Why is DNA sequencing better integrated with biomedical research? This question comes up often in my conversations with colleagues. A commonly suggested answer is the difference in cost. I am not convinced by this answer, so I decided to evaluate it with a bit more rigour than in my usual casual conversation. The metric of merit for the comparison will be the cost of quantifying 10,000 genes in a sample at the transcriptome and the proteome level.

This simple metric has many dimensions, such as quantification of proteforms and transcript isoforms, that are beyond the scope of my comparison. Also, both the RNA and protein analysis might use different analytical methods and the cost will vary somewhat between methods. So, my estimates will be based on high-quality economic options and on representatives facility fees charged in Boston, MA.

Proteomics

A good and economic option for quantifying >10,000 proteins in a sample is TMT 16-plex with offline fractionation and DDA analysis, which needs about 1-2 hours of instrument time per sample. Another up and coming option is label-free DIA analysis, which also needs about 1-2 hours of instrument time per sample, but at present will struggle to quantify 10,000 proteins / sample. With facility fees of about 100 – 200 USD / hour, the cost of analysis is about 200 – 400 USD / sample. This cost does not include sample preparation, which is relatively simple and almost any biology lab can perform in house. The reagents for sample prep are less than 100 USD, but I will include 100 USD to the cost of instrument time (also including the cost for offline fractionation). So the final estimate is 300 – 500 USD / sample.

 

Transcriptomics

For RNA sequencing, I will base the estimate on using Illumina NextSeq 500 Next Generation Sequencing at 150 Cycles paired-end reads. The cost for a run is about 3000 – 4000 USD, which can provides 200 – 400 millions reads. Depending on the number of reads per sample, one can analyze different number of samples. For RNA data quality comparable with the quality of the mass-spec data, I will assume that we need 15 million reads per sample and that we can analyse about 20 samples per run. Fewer reads per sample can reduce the cost while still providing usable, albeit less quantitative, data. Again, sample prep can be performed in house for much less than 100 USD / sample and is more expensive if performed at a facility. So the final cost estimate is lower but comparable to the one as for mass-spec proteomics, about 250 – 350 USD / sample.

Cost of transcriptomics and proteomics

Cost of transcriptomics and proteomics analysis of a sample.

Data analysis

The above estimates considered only the cost of data generation while the cost of analysis (human hours) is the dominant expense for many studies. The analysis cost is quite similar for DNA sequencing and mass-spectrometry data, and more variable depending on the number of samples and analysis aims.

Why the difference?

The above rough estimates suggest that the cost may not be the main reason why research projects that need mass-spectrometry do not use it. If not cost, then what? These are my main hypotheses.

  • Availability of service: While good mass-spec labs can quantify > 10,000 proteins as sketched above, relatively few facilities can accomplish that. Finding a DNA sequencing facility is easier than finding a mass-spec facility that can perform the analysis outlined above.
  •  Knowledge: Many biologists are familiar with next generation sequencing and its capabilities while fewer are familiar with mass-spectrometry. This is related both to the time of maturation of the technologies and sociological factors. So, I am passionate about making mass-spec technology accessible and about teaching its basic concepts.

What do you think? Why the difference?

 

Discussion from Twitter: