Conversion to time series format

For a lot of applications it is favorable to convert the image based format into a format which is optimized for fast time series retrieval. This is what we often need for e.g. validation studies. This can be done by stacking the images into a netCDF file and choosing the correct chunk sizes or a lot of other methods. We have chosen to do it in the following way:

  • Store the grid points in a 1D array. This also allows reduction of the data volume by e.g. only saving the points over land.

  • Store the time series in netCDF4 in the Climate and Forecast convention Orthogonal multidimensional array representation

  • Store the time series in 5x5 degree cells. This means there will be 2566 cell files and a file called grid.nc which contains the information about which grid point is stored in which file. This allows us to read a whole 5x5 degree area into memory and iterate over the time series quickly.

    _images/5x5_cell_partitioning.png

SPL3SMP

This conversion can be performed using the smap_repurpose command line program. An example would be:

smap_repurpose /SPL3SMP_data /timeseries/data 2015-04-01 2015-04-02 soil_moisture soil_moisture_error --overpass AM

Which would take SMAP SPL3SMP data stored in /SPL3SMP_data from April 1st 2015 to April 2nd 2015 and store the parameters soil_moisture and soil_moisture_error for the AM overpass as time series in the folder /timeseries/data. When the PM overpass is selected, time series variables will be renamed with the suffix _pm.

Conversion to time series is performed by the repurpose package in the background. For custom settings or other options see the repurpose documentation and the code in smap_io.reshuffle.

Note: If a RuntimeError: NetCDF: Bad chunk sizes. appears during reshuffling, consider downgrading the netcdf4 library via:

conda install -c conda-forge netcdf4=1.2.2