Sticky Statistics: Getting Started with Stats in the Lab

Courtesy: xkcd

Courtesy: xkcd

A strong grasp of statistics is an important tool that any analytical laboratory worker should possess. I think it is immensely important to understand the limitations of the process by which any data is measured, and the associated precision and accuracy of the instruments used to measure said data. Apart from analytical constraints, the samples from which data are measured aren't perfect indicators of the true population (true values) and hence, sampling uncertainty must be carefully dealt with as well (e.g. sampling bias).

In most cases, both analytical (or measurement) uncertainty and sampling uncertainty are equally important in influencing the outcome of a hypothesis test. In certain cases, analytical uncertainty may be more pivotal than sampling uncertainty, whereas in others, sampling uncertainty may prove to be more influential to the outcome while testing a hypothesis. Regardless, in all these cases, both analytical and sampling uncertainty must be accounted for when testing (and conceiving) a hypothesis.

Consider a paleoclimate example where we measure stable oxygen isotopes in planktic foraminiferal shells with a mass spectrometer whose precision is 0.08‰ (that's 0.08 parts per 1000), based on known standards. With foraminifera, we take a certain number of shells (say, n) from a discrete depth in a marine sediment core and obtain a single δ18O number for that particular depth interval. This depth interval represents Y years, where Y can represent decades to millennia depending on the sedimentation rate at the site where the core was collected. The lifespan of foraminifera is about a month (Spero, 1998). Therefore the measurement represents the mean of n months in Y years. It does not give you the mean of the continuous δ18O during that time interval (true value). Naturally, as n increases and/or Y decreases, the sampling uncertainty decreases. There may be several additional sampling complications such as the productivity and habitat of the analyzed species' shells that may bias the data to say, summer months (as opposed to a mean annual measurement), or deeper water δ18O (as opposed to sea-surface water) etc. Hence, both foraminiferal sampling uncertainty (first introduced by Schiffelbein and Hills, 1984) along with the analytical uncertainty must be considered while testing a hypothesis (e.g. "mean annual δ18O signal remains constant from age A to age D" - the signal-to-noise ratio invoked by your hypothesis will determine which uncertainty plays a bigger role).

Here are two recent papers that are great starting points for working with experimental statistics in the laboratory (shoot me an email if you want pdf copies):

  1. Know when your numbers are significant - David Vaux

  2. Importance of being uncertain - Martin Krzywinski and Naomi Altman

Both first-authors have backgrounds in biology, a field which I am led to believe that heinous statistical crimes are committed on a weekly (journal) basis. Nonetheless, statistical crimes tend to occur in paleoclimatology and the geosciences too (and a myriad of other fields too I'm sure). The first paper urges experimentalists to use error bars on independent data only:

Simply put, statistics and error bars should be used only for independent data, and not for identical replicates within a single experiment.

What does this mean? Arvind Singh, a friend and co-author at GEOMAR (whom I have to thank for bringing these papers to my attention), and I had an interesting discussion that I think highlights what Vaux is talking about:

Arvind: On the basis of Vaux's article, errors bars should be the standard deviation of 'independent' replicates. However, it is difficult (and almost impossible) to do this for my work, e.g., I take 3 replicates from the same Niskin bottle for measuring chlorophyll but then they would be dependent replicates so I cannot have error bars based on those samples. And as per Vaux's statistics, it appears to me that I should've taken replicates from different depths or from different locations, but then those error bars would be based on the variation in chlorophyll due to light, nutrient etc, which is not what I want. So tell me how would I take true replicates of independent samples in such a situation. I've discussed this with a few colleagues of mine who do similar experiments and they also have no clue on this.

Me: I think when Vaux says "Simply put, statistics and error bars should be used only for independent data, and not for identical replicates within a single experiment." - he is largely talking about the experimental, hypothesis-driven, laboratory-based bio. community, where errors such as analytical error may or may not be significant in altering the outcome of the result. In the geo/geobio community at least, we have to quantify how well we think we can measure parameters especially field-based measurements, which easily has the potential to alter the outcome of an experiment. In your case, first, what is the hypothesis you are trying to put forth with the chlorophyll and water samples? Are you simply trying to see how well you can measure it at a certain depth/location such that an error bar may be obtained, which will subsequently be used to test certain hypotheses? If so, I think you are OK in measuring the replicates and obtaining a std. dev. However, even here, what Vaux says applies to your case, because a 'truly independent' measurement would be a chlorophyll measurement on a water sample from another Niskin bottle from the same depth and location. This way, you are removing codependent measurement error/bias which could potentially arise due to sampling from the same bottle. So, in my opinion, putting an error bar to constrain the chlorophyll mean from a particular depth/location can be done using x measurements of water samples from n niskin bottles; where x can be = 1.

While Vaux's article focuses on analytical uncertainty, the second paper details the importance of sampling uncertainty and the central limit theorem. The Krzywinski and Altman article introduced me to the Monty Hall game show problem, which highlights that statistics can be deceptive on first glance!

Always keep in mind that your measurements are estimates, which you should not endow with “an aura of exactitude and finality”. The omnipresence of variability will ensure that each sample will be different.

In closing, another paper that I would highly recommend for beginners is David Streiner's 1996 paper, Maintaining Standards: Differences between the Standard Deviation and Standard Error, and When to Use Each, which has certainly proven handy many times for me!

A Liquid Nitrogen Incident

A couple of Tuesdays ago, Marissa Vara, an undergraduate researcher in our lab, and I were expecting our weekly liquid nitrogen delivery from Air Products. Before the delivery arrived, I remembered an errand I had to run in the other building. Before leaving, I told Marissa to text me when our delivery person showed up. Unfortunately, Marissa did not know that I had changed my phone number and I hadn't bothered to tell her either. So naturally she texted my old phone number, which, as we later figured out, now belongs to someone else. Comedy ensued (Marissa in blue; owner of my old ph. no. in silver):

Of course, with my penchant for eccentricity, Marissa thought this was weird but not outside the 2σ realm of possibilities. So she didn't mention the incident until yesterday when I told her that my phone number had changed. The eureka moment was, "Wait, then who was I texting about liquid nitrogen?!"

One of the more bizarre papers I have come across...

...is written by Oleg McNoleg,published in the peer-reviewed journal, Computers & Geosciences. Oleg is affiliated with the prestigious Brigadoon University of Longitudinal Learning, School of Holistic Information Technology, situated in Noplace, Neverland. The title of the paper: The Integration Of GIS, Remote Sensing, Expert Systems And Adaptive Co-Kriging for Environmental Habitat Modeling of the Highland Haggis using Object-Oriented, Fuzzy-Logic and Neural-Network Techniques(phew).

So, what does Oleg McNoleg have to say about the habitat of the Highland Haggis? But firstly, what is a Haggis? A Haggis is a mythological Scottish creature that vaguely reminds me of the misconceptions associated with lemmings. McNoleg writes:

The Highland Haggis is unique amongst all mammals in that it has a pair of legs (either left or right) that are shorter (longer) than the other pair... It is a sad consequence that each year, many fledgling Haggis die whilst attempting to move upslope...

McNoleg then dives into the theoretical aspects of incorporating various geographical techniques to model the habitat of the Highland Haggis. This, of course, includes the insertion of data from a digital elevation model (DEM) that is hierarchically decomposed (?) into a Polymorphic Euclidean Adaptive Region tree (PEARtree - see figure) - of course. Then, McNoleg provides a mathematical framework for modeling Haggis habitats using geophysical data because "It has become customary for papers to contain copious quantities of gratuitous mathematics (Heckbert, 1987 Well and Du, 1993; Rull, 1993)" where Heckbert (1987) is titled 'Ray tracing in jello brand gelatin', Rull (1993) is titled, 'BARRY: An autonomous train-spotter', and there is no reference for Well and Du (1993).

After the theory has been established, what are the results of this study?

Honestly, this may be the most glorious Academia Bizarro entry thus far. Hats off to Oleg McNoleg for this wonderfully entertaining paper, chockfull of ridiculous and bizarre references/ideas. You must read it in its entirety to fully grasp the depth of this article. Also, hats off to the editor(s?) of Computers & Geosciences for okaying publication (full disclosure - I was rejected from this journal!) Funnily enough, the Haggis paper has been cited 29 times and I have a hunch to whom the nom de plume, Oleg McNoleg, belongs. Hat tip to Lars Beierlein for bringing this article to my attention!