MATLAB: Handling NetCDF files & HadISST data

This is a rehashed post from my old blog which proved to be a popular post. It is a set of basic instructions on handling NetCDF files in MATLAB - something that can be very handy in climate science. There are various instrumental records (man-made thermometer/satellite based measurements) of global temperature variability specified by different parameters (sea-surface temperatures, marine air temperatures, land temperatures, combined land-sea, 5°x5° gridded, 1°x1° gridded and so on). Most of these are open for public use and require citations for scientific publication. Careful consideration is required in choosing the data set (each with specific inherent errors) that you want to work with depending on the question that you want to answer. Recently, I've been working with the Hadley Centre Sea Ice and Sea Surface Temperature (HadISST) data set. This data set gives you global, 1°x1° gridded, sea-surface temperature (SST) data from 1870 to present (updated on the 2nd of every month). The provided array consists of 3 dimensions (longitude, latitude and time) storing SSTs as data.

For my use, I needed the complete SST time series from 1870 up till the present, however, for only one 1°x1° grid point. You can understand that this would require (basic) manipulation of the given data set.

The problem is that most data sets are presented as ASCII characters through a .txt file. These are tough to work with on a non-Linux based system. It takes a long time to edit and optimize these files for statistical/computational use through most software. The good thing is that most of the datasets are provided in NetCDF or .nc format. The Network Common Data Form (netCDF) is an open standard format of software libraries and data formats that support the creation, access, and sharing of array-oriented scientific data. The project was initiated by the University Corporation for Atmospheric Research (UCAR). I couldn't find a simple method online for data manipulation with these big files (~400mb-4gb sized) be it through .txt or .nc files.

Without going into the intricacies of netCDF libraries and formats, here is the easiest way of manipulating netCDF (and hence, global temperature) data sets in basic MATLAB (no fancy toolboxes required!):

  • Download the netCDF version of the data set (or the .nc format).
  • If you have the later versions of MATLAB there are inbuilt functions capable of reading netCDF files, otherwise you can download required functions/libraries here.
  • Create a netCDF object for the data file using the netcdf.open function. Use the NC_NOWRITE command in order to specify a read-only format (you typically don't want to tamper with the original .nc file.)
  • Figure out the specifications involved with the file through the netcdf.inq function which tells you about the variables that the creator of the file used and the dimensions of each variable (if available, you can use ncinfo).
  • Assign the complete data set of the particular variable you want (usually this is the last dimension of the netCDF file - every single data point contained in the array) to an array.
  • This new array takes the dimensions of the complete data set.
  • Now you are ready to go - you can manage the huge data set through simple array manipulation.

For example:

had = netcdf.open('HadISST.nc','NC_NOWRITE'); 
[varname, xtype, varDimIDs, varAtts] = netcdf.inqVar(had,4) % '4' being a specific dimension. 
varid = netcdf.inqVarID(had,varname); 
data = netcdf.getVar(had,varid); % this is the full data set.

In case of the HadISST data set, changing the variable ID (i.e. 4 in the second line) yields different parameters (0 - longitude, 1 - latitude, 2 - time, 3 - specific months, 4 - SST). However, since you ultimately want to work with the SSTs, the var ID 4 would yield the complete SST data set. Now it is a question of simple array manipulation obtaining the data set you require be it a particular time slice, particular range of latitudes, a single spatial point or a single month's global data. To key in on a particular parameter, it would be useful to use the netcdf.inq function on variables other than SST (or in a broader sense, the single data point variable). Once you gather more experience with this basic method, you can look at netCDF handling toolboxes (I would recommend mexcdf - particularly, nc_dump comes in handy).

The specific data set finally obtained as an array can easily be written into any required format (.xls, .xlsx, .dat, .xml etc.) through MATLAB. This is probably the easiest way of extracting data from a global temperature database, fit for use in programs such as Excel or SigmaPlot. Any corrections/suggestions for improving this method are welcome.

Paleo-CO2

Watch this video! It is an amazing display of atmospheric carbon dioxide trends. My favourite portion kicks in after 1:40mins - so, make sure to watch it till the end:

Carbon dioxide content in the atmosphere is a crucial parameter that plays a major role in mediating the surface temperature of the Earth through radiative forcing. Its inherent molecular makeup traps the outgoing longwave radiation (OLR) that the Earth emits. This balance of incoming solar radiation (the radiative budget, in heat transfer terminology) is vital for the climate system of the Earth which encompasses the atmosphere, biosphere, cryosphere, hydrosphere etc.

In 1958, Charles Keeling, from the Scripps Institute of Oceanography, started collecting and monitoring carbon dioxide in the atmosphere at the Mauna Loa Observatory in Hawaii. Shown in the figure above, this dataset represents the longest continuous record of atmospheric CO2 measurements - a mere 54 years. How do we know what CO2 was doing in the past when the Earth system was very different than it is now (eg. the ice ages, prolonged warm periods)? For this, we turn towards proxies.

Here is a list of some proxies (to the best of my knowledge) that are useful in reconstructing atmospheric carbon dioxide content, with links to a few articles that deal with them (most are pdfs, though some links are paywalled - I will be happy to send particular papers if requested):

  • Bulk inorganic/organic carbon content in marine sediments [ref 1, 2]
  • Air bubbles trapped in ice cores [refs 1, 2]
  • Paleosols - ancient soils buried under sedimentary deposits [refs 1, 2]
  • Boron isotopes in planktic foraminifera, a proxy for paleo-pH levels in the ocean [refs 1, 2, 3]
  • Alkenone & lipid biomarkers derived from haptophyte algae [refs 1, 2]
  • Fossil plants and leaves [refs 1, 2]

This is an incomplete list, but covers most of the widely used proxies for paleo-CO2 reconstructions (please post more examples in the comments if you know 'em). Each record obtained from a different proxy varies in terms of resolution of the estimates (decadal, centennial, millenial etc.) and length of the record (how far can they go back?). Together though, they paint a cohesive picture - as can be seen in the second half of the video above.