Download PDF

Ralph Haygood, Ph.D.

Data scientist, software developer, & population biologist


  • Translating business and scientific questions into statistical and mathematical terms.

  • Assembling, curating, and processing large data sets.

  • Applying a wide variety of statistical and machine-learning methods.

  • Interpreting quantitative results and communicating their implications to diverse audiences.

  • Programming in many languages, including C, FORTRAN, JavaScript, Prolog, Python, R, Ruby, and SQL.

  • Managing distributed computing using tools such as Luigi and Spark.

  • Developing web applications with Ruby on Rails and other frameworks.

  • Familiarity with a broad range of topics in evolution, ecology, genetics, genomics, physics, and applied mathematics.


  • As the Data Science Developer at ReverbNation, I collaborate with executives, product managers, and marketers to understand and predict our users' responses to our communications and services. Typically, my work begins with my colleagues' questions and intuitions, which I translate into descriptive statistics, graphs, hypotheses, and statistical models. For each analysis, I prepare a suitable data set, often from multiple data sources. Simple, well chosen descriptive statistics and graphs can be highly instructive, but when appropriate, I also use more elaborate statistical and machine-learning methods. In every case, I strive to present the results in lucid, practical terms. My primary computational tools are SQL and Python, including Jupyter, NumPy, pandas, scikit-learn, Luigi, and PySpark.

  • As a freelance software developer with my own company Haygoodness L.L.C., I developed a laboratory information management system (LIMS) for the Duke University Genome Sequencing Shared Resource, which serves customers both at Duke and around the world. This system, known as DUGSIM, enables customers to get self-serve estimates, request quotes from staff, place orders, download results, and receive invoices for DNA sequencing on several platforms (Illumina, Pacific Biosciences, etc.). It also enables staff members to prepare quotes, process orders, schedule and track library preparations and sequencing runs, distribute results, and issue invoices. It has been in use since mid-2013 and has handled over 4000 orders as of early 2017. DUGSIM is built with Ruby on Rails, Sencha Ext JS, MySQL, Apache, and Phusion Passenger.

  • As a Postdoctoral Fellow in the Duke University Biology Department, I conducted research in evolutionary genetics and genomics. For example, colleagues and I fitted (MLE, MCMC) statistical models to DNA sequences from non-protein-coding, putatively gene-regulatory regions of the human, chimpanzee, and macaque genomes. We found evidence for many adaptive changes in the human lineage, particularly in noncoding regions adjacent to coding regions for proteins involved in neural development and function (Haygood et al., 2007). Subsequently, we performed a meta-analysis of surveys for adaptive changes in the human lineage, and we found that neural-related genes were prominent in surveys of noncoding regions but not in surveys of coding regions (Haygood et al., 2010). These findings affirm a long-standing conjecture that human cognition evolved mainly through changes in gene regulation. My primary computational tools were Ruby, R, and C.

  • As a Quantitative Analyst at Hydrologic Consultants, Inc. (of Sacramento, CA, later acquired by Bookman-Edmonston Engineering, Inc., later acquired by GEI Consultants, Inc.) and Timothy J. Durbin, Inc., I analyzed hydrologic data and situations for several clients. For example, I applied statistical methods (ANCOVA, MAP estimation) to streamflow measurements in order to reveal trends in water use within the North Platte River watershed, despite climatic fluctuations. Other projects were less statistical and more mathematical. For example, I extended and applied proprietary numerical software (PDE solution via FEM) for modeling groundwater flow and solute transport in order to elucidate salt-water intrusion into an aquifer beneath Lompoc, CA. These analyses were implemented using FORTRAN, Excel, and Access.


Data Science Developer, ReverbNation2014 –present.
Applied statistics and machine learning in support of online services used by over four million musicians.

Founder, Haygoodness L.L.C., 2012 –present.
Freelance development of web applications.

Founder, CardVine, 2009 –2011.
Development, promotion, and operation of a web application replacing business cards.

Postdoctoral Fellow, Biology Department, Duke University, 2005–2009.
Research in evolution, ecology, genetics, and genomics.
National Science Foundation Postdoctoral Fellowship in Biological Informatics, 2005–2006.

Postdoctoral Fellow, Department of Zoology, University of Wisconsin  Madison, 2002–2004.
Research in evolution, ecology, and genetics.

Graduate Student, Section of Evolution and Ecology, University of California, Davis, 1997–2002.
Coursework, research, and teaching in evolution, ecology, and genetics.
Merton Love Award for best dissertation on ecology, ethology, or evolution at UC Davis in 2002.

Quantitative Analyst, Hydrologic Consultants, Inc. / Timothy J. Durbin, Inc., 1996–2000.
Statistical and numerical analyses of surface-water and groundwater flows.
(This position was part-time, supplementing my graduate-student stipend.)

Graduate Student, Department of Mathematics, University of California, Davis, 1994–1997.
Coursework and teaching in mathematics.
(I fulfilled all the requirements for a Ph.D. in mathematics except the dissertation before transferring into population biology.)

Guest Researcher, Swedish Institute of Computer Science, 1992–1994.
Research and development in compilation techniques for logic programming languages.

Consulting Programmer, Department of Electrical Engineering, University of Southern California, 1991–1992.
Research and development in compilation techniques for logic programming languages.

Programmer/Analyst II, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, 1988–1991.
Research and development in compilation techniques for logic programming languages.

Graduate Student, Department of Physics, University of California, Santa Barbara, 1986 –1988.
Coursework and teaching in physics.


C. C. Babbitt, R. Haygood, W. J. Nielsen, L. W. Pfefferle, J. Horvath, O. Fedrigo, and G. A. Wray. The impact of positive selection on gene expression during human evolution. In review.

A. R. Ives, C. Paull, A. Hulthen, S. Downes, D. A. Andow, R. Haygood, M. P. Zalucki, and N. A. Schellhorn, 2017. Spatio-temporal variation in landscape composition may speed resistance evolution of pests to Bt crops. PLOS ONE 12:e0169167.

D. A. Garfield, D. E. Runcie, C. C. Babbitt, R. Haygood, W. J. Nielsen, and G. A. Wray, 2013. The impact of gene expression variation on the robustness and evolvability of a developmental gene regulatory network. PLOS Biology 11:e1001696.

D. Garfield, R. Haygood, W. J. Nielsen, and G. A. Wray, 2012. Population genetics of cis-regulatory sequences that operate during embryonic development in the sea urchin Strongylocentrotus purpuratusEvolution and Development 14:152–167.

O. Fedrigo, A. D. Pfefferle, C. C. Babbitt, R. Haygood, C. E. Wall, and G. A. Wray, 2011. A potential role for glucose transporters in the evolution of human brain size. Brain, Behavior and Evolution 78:315–326.

T. A. Oliver, D. A. Garfield, M. K. Manier, R. Haygood, G. A. Wray, and S. R. Palumbi, 2010. Whole-genome positive selection and habitat-driven evolution in a shallow and a deep-sea urchinGenome Biology and Evolution 2:800–814.

R. Haygood, C. C. Babbitt, O. Fedrigo, and G. A. Wray, 2010. Contrasts between adaptive coding and noncoding changes during human evolutionProceedings of the National Academy of Sciences of the United States of America 107:7853–7857.

C. C. Babbitt, J. S. Silverman, R. Haygood, J. M. Reininga, M. V. Rockman, and G. A. Wray, 2010. Multiple functional variants in cis modulate PDYN expressionMolecular Biology and Evolution 27:465–479.

L. R. Warner, C. C. Babbitt, A. E. Primus, T. F. Severson, R. Haygood, and G. A. Wray, 2009. Functional consequences of genetic variation in primates on tyrosine hydroxylase (TH) expression in vitroBrain Research 1288:1–8.

J. Tung, O. Fedrigo, R. Haygood, S. Mukherjee, and G. A. Wray, 2009. Genomic features that predict allelic imbalance in humans suggest patterns of constraint on gene expression variationMolecular Biology and Evolution 26:2047–2059.

R. Haygood and M. Turelli, 2009. Evolution of incompatibility-inducing microbes in subdivided host populationsEvolution 63:432–447.

J. L. Walters, E. M. Binkley, R. Haygood, and L. A. Romano, 2008. Evolutionary analysis of the cis-regulatory region of the spicule matrix gene SM50 in strongylocentrotid sea urchinsDevelopmental Biology 315:567–578.

C. C. Babbitt, R. Haygood, and G. A. Wray, 2007. When two is better than oneCell 131:225–227.

R. Haygood, O. Fedrigo, B. Hanson, K.-D. Yokoyama, and G. A. Wray, 2007. Promoter regions of many neural- and nutrition-related genes have experienced positive selection during human evolutionNature Genetics 39:1140–1144.

B. W. Spitzer and R. Haygood, 2007. Migration load and the coexistence of ecologically similar sexuals and asexualsAmerican Naturalist 170:567–572.

Sea Urchin Genome Sequencing Consortium, 2006. The genome of the sea urchin Strongylocentrotus purpuratusScience 314:941–952.

R. Haygood, 2006. Mutation rate and the cost of complexityMolecular Biology and Evolution 23:957–963.

R. Haygood, 2004. Sexual conflict and protein polymorphismEvolution 58:1414–1423.

R. Haygood, A. R. Ives, and D. A. Andow, 2004. Population genetics of transgene containmentEcology Letters 7:213–220.

R. Haygood, A. R. Ives, and D. A. Andow, 2003. Consequences of recurrent gene flow from crops to wild relativesProceedings of the Royal Society of London Series B, Biological Sciences 270:1879–1886.

R. Haygood, 2002. Coexistence in MacArthur-style consumer–resource modelsTheoretical Population Biology 61:215–223.

R. Haygood, 1994. Native code compilation in SICStus Prolog. P. Van Hentenryck (editor), Proceedings of the Eleventh International Conference on Logic Programming, MIT Press, pp. 190–204.

B. K. Holmer, B. Sano, M. Carlton, P. Van Roy, R. Haygood, W. R. Bush, A. M. Despain, J. M. Pendleton, and T. P. Dobry, 1990. Fast Prolog with an extended general purpose architecture. Proceedings of the 17th International Symposium on Computer Architecture, IEEE Computer Society Press, pp. 282–291.

Selected talks

Selected software

(I've recently begun compiling various software I've created that may be useful to other people and that I'm legally entitled to distribute. The following is a sample.)

sklearn-gbmi, which provides a Python module for computing Friedman and Popescu's H statistics, in order to look for interactions among variables in scikit-learn gradient-boosting models.

Haygood et al., 2007 HyPhy-ware, which includes the HyPhy Batch Language files used to compute the results in Haygood et al., 2007 and an example of their use.


Q & A

Q1: Why did you leave academia?

A1: I didn’t have to. My position at Duke was “soft money” but in no immediate danger. I’d applied for several faculty jobs, done one interview, and scheduled another. Deciding to leave wasn’t easy, but after considering it for quite awhile, I concluded that although I’d been a mostly happy and fairly productive student and postdoc, I’d almost surely be neither happy nor productive, in any sense that matters to me, as a faculty member.

The crux of the matter is that faculty members at major universities are now employed not so much to do research as to manage it and, above all, to get money for it. As Paul Graham observed, "Professors nowadays seem to have become professional fundraisers who do a little research on the side." Ultimately, there are several reasons why, including a decline in federal research funding precipitated by the end of the Cold War, so-called tax revolts that have left state universities cash-strapped, and other trends in American society and government. Proximately, the driving force is that in many fields, the available dollars have been dwindling for years, at least per researcher, if not for the field as a whole. As the pie has gotten ever smaller, professional survival has demanded ever more strenuous efforts to get a piece of it. I foresaw a future in which however much I struggled to concentrate on science, my thoughts would be dominated by money and its concomitants, politics and bureaucracy.

And I foresaw that the resulting science — steered by me but largely done by my students and postdocs — would probably be, like most academic science, of little consequence. Thomas Merton remarked, “There is always a temptation to diddle around in the contemplative life, making itsy-bitsy statues.” It isn’t only in the contemplative life. Most academic research is of marginal interest even when it’s first published, let alone 10 or 20 years later. Many academic publications aren’t cited even a dozen times. Genuinely innovative thinking is never easy, but certain characteristics of academia make it harder. Money, politics, and bureaucracy are severely distracting. Moreover, as Stuart Rojstaczer observed, “With so little money available, funding agencies have become very cautious in the type of work they are supporting. They want ‘proven results’ [and] a ‘high probability of success’ for their money.” So they fund proposals that go just a little bit beyond what’s already been done.

I don’t consider myself to have abandoned science by leaving academia. Indeed, I’ve continued to collaborate and contribute. At present, I’m mainly occupied with commercial work, but I’m determined to return to basic research in due course. People who doubt the feasibility of high-quality basic research outside academia should recall that Charles Darwin was never a faculty member, Albert Einstein did much of his best work while employed by the Swiss patent office, etc. Of course, I’m not claiming to be the next Darwin or Einstein. I may never do any science of much interest outside academia. However, I think I stand a better chance outside than I would inside. That may well not be true of other people — some people are better at fighting off distractions, and some kinds of science need more institutional support — but I’m pretty sure it’s true of me.

Q2: Given your background, why haven’t you started a biotech company?

A2: I’ve thought about it but decided against it, at least for now. Biotech companies tend to need several years and several million dollars to develop a product. I’m not terrifically patient, and having to sell an idea to venture capitalists before even starting to realize it sounds an awful lot like the grant grind I left academia to get away from (see Q1).

Q3: Were you involved in that boating disaster in the Gulf of California?

A3: Yes. It was an ecological research expedition in March, 2000. I was in a small boat that capsized in a wind-driven swell. Of my eight companions, five died, including the leader of the expedition. I could easily have died too, but with help from another survivor, I got to shore.