Book Review: Indica by Pranay Lal

IndicaCover.jpg

Balancing the nuanced and involved intricacies of the scientific method versus proselytizing the fantastic “factoids” of popular science is a tough act. Having to straddle this line to focus on the geology and geobiological history of the Indian subcontinent, an ambitiously multidisciplinary topic, on which there are scant accessible texts (popular science or not), is an even tougher act to follow. Fortunately, Pranay Lal manages to achieve such a balance and convey his infectious enthusiasm about the subject matter rather effectively for the most part of Indica’s ~400 pages.

It was refreshing and enjoyable to learn about new geological and paleontological information of the Indian subcontinent - a topic dear to my heart. The detailed place-markers and the McPhee-esque narratives of sites where geological features are found scattered throughout India was highly interesting. The accompanying photographs and schematics are also very nicely done. You can quickly see that Lal put in hours and hours of (non-book-based) research into Indica — it shows. It felt as if Indica was an attempt to channel Sagan or Bryson or Winchester but with a focus on the history of the Indian subcontinent — a fantastic idea. However, it becomes apparent through Lal’s reporting that it is challenging to piece together and chronicle information on such a vastly “big-picture” topic, especially, when construction, urban expansion, and apathy are on their path to eroding many of India’s geological marvels.

Lal is a geneticist by training and his disposition towards anthropology, biology, and paleontology becomes discernible as his writing on these topics shines. For example, his narrative on the evolutionary history of the recently discovered Indian purple frog (Nasikabatrachus sahayadrensis), its evolutionary ties to another frog found in Seychelles, and its parallels to the tuatara or kiwi was a treat to read. Moreover, the lengthy descriptions of India’s Phanerozoic paleoenvironment and the medley of dinosaurs that walked on the subcontinent were entertaining. The closing chapters on hominid evolution and India’s potential contribution to this story were thought-provoking.

As a downside to Indica, there are many small inaccuracies conveyed with certainty that are really more uncertain than presented. My friend Suvrat Kher has an excellent blog post on many problematic sections dealing with sedimentology, tectonics, and mantle dynamics. I can echo Suvrat’s concerns in the paleomonsoon and paleoclimate domain where, amongst other things, Lal makes it seem as if we have a more concrete picture of the vagaries of the monsoon, its initiation, and its intensification than we actually do. Many of these points amount to more than sheer nitpicking. Ultimately, these inaccuracies are a significant downside to Indica, and I wonder about errors revolving around geobiology and other realms removed from my own field. Nevertheless, these inaccuracies did not prevent me from puzzling about them for a few minutes and moving on, driven by Lal’s ardor (one day, on my second read, I might find the time to write down my concerns as well and as thoroughly as Suvrat did).

As a closing statement, Indica is for anyone and everyone interested in the geological natural history of the Indian subcontinent. For students/workers who do read it, I recommend trying to spot the inaccuracies and perhaps making a list.

Pubsplained #1: How to fit a straight line through a set of points with uncertainty in both directions?

Publication

Thirumalai, K., Singh, A., & Ramesh, R. (2011). A MATLAB™ code to perform weighted linear regression with (correlated or uncorrelated) errors in bivariate data. Journal of the Geological Society of India, 77(4), 377–380. 
doi: 10.1007/s12594–011–0044–1

Summary

We present a code that fits a line through a set of points (“linear regression”). It is based on math first described in 1966 that provides general and exact solutions to the multitude of linear regression methods out there. Here is a link to our code.

Pubsplainer

Fitting a straight line through a bunch of points with X and Y uncertainty.

Fitting a straight line through a bunch of points with X and Y uncertainty.

My first peer-reviewed publication in the academic literature described a procedure to perform linear regression, or, in other words, build a straight line (of “best fit”) through a set of points. We wrote our code in MATLAB and applied it to a classic dataset from Pearson (1901).

“Why?”, you may ask, perhaps followed by “doesn’t MATLAB have linear regression built into it already?” or “wait a minute, what about polyfit?!”

Good questions, but here’s the kicker: our code numerically solves this problem when there are errors in both x and y variables… and… get this, even when those errors might be correlated! And if someone tells you that there is no error in the x measurement or that errors are rarely correlated - I can assure you that they are most probably erroneous.

York was the first to find general solutions for the “line of best fit” problem when he was working with isochron data where the abscissa (x) and ordinate (y) axis variables shared a common term (and hence resulted in correlated errors). He first published the general solutions to this problem in 1966 and subsequently published the solutions to the correlated-error problem in 1969.

If these solutions were published so long ago, why are there so many different regression techniques detailed in the literature? Well, it’s always useful to have different approaches to solving numerical problems, but as Wehr & Saleska (2017) point out in a nifty paper from last year, the York solutions have largely remained internal to the geophysics community (in spite of 2000+ citations), escaping even the famed “Numerical Recipes” textbooks. Furthermore, they state that there is abundant confusion in the isotope ecology & biogeochemistry community about the myriad available linear regression techniques and which one to use when. I can somewhat echo that feeling when it comes to calibration exercises in the (esp. coral) paleoclimate community. A short breakdown of these methods follows.

Ordinary Least Squares (OLS) or Orthogonal Distance Regression (ODR) or Geometric Mean Regression (GMR): which one to use?!

Although each one of these techniques might be more appropriate for certain sets of data versus others, the ultimate take-home message here is that all of these methods are approximations of York’s general solutions, when particular criteria are matched (or worse, unknowingly assumed).

  • OLS provides unbiased slope and intercept estimates only when the x variable has negligible errors and when the y error is normally distributed and does not change from point to point (i.e. no heteroscedasticity).

  • ODR, formulated by Pearson (1901), works only when the variances of the x and y errors do not change from point-to-point, and when the errors themselves are not correlated. ODR also fails to handle scaled data i.e. slopes and intercepts devised from ODR do not scale if the x or y data are scaled by some factor. Note that ODR is also called “major axis regression”.

  • GMR transforms x and y data and can thus scale estimates of the slope and intercept but works only under the condition when the ratio of the standard deviation of x to the standard deviation of the error on x is equal to that same ratio in the y coordinate.

Most importantly, and perhaps quite shockingly, NONE of these methods involve the actual measurement uncertainty from point-to-point in the construction of the ensuing regression. Essentially, each method is an algebraic approximation of York’s equations, and whereas his equations have to be solved numerically in their most general form, they provide the most unbiased estimates of the slope and intercept for a straight line. In 2004, York and colleages showed that his 1969 equations, (based on least-square estimation) were also consistent with (newer) methods based on maximum likelihood estimation when dealing with (correlated or uncorrelated) bivariate errors. Our paper in 2011 provides a relatively fast way to iteratively solve for the slope and estimate.

In our publication, besides the Pearson data, we also applied our algorithm to perform “force-fit” regression - a unique case where one point is almost exactly known (i.e. very little error and near-infinite weight) - on meteorite data and showed that our results were consistent with published data.

All in all, if you want to fit a line through a bunch of points in an X-Y space, you won’t be steered too far off course by using our algorithm.

References

#Pubsplained

I am introducing a new series on this blog called Pubsplained, where I plan on breaking down my peer-reviewed publications into (more) digestible blog-posts. The motivation for this is threefold:

  1. To see if its possible to broaden the audience of some of these manuscripts
  2. To be more productive on Paleowave
  3. To “keep in touch” with my older publications.

The idea is to provide an accessible summary (perhaps a tweet-length synopsis) on our publications, and also provide a little more background on the topic, including problems and challenges, for those who might be interested.