This library contains classes and functions to generate datasets corresponding to spatial features from a time-series of satellite images. The impetus for this project was to develop an easy to use, high-level interface to numerous Python modules for the clustering and classification of land cover/land use (LULC) types, with an initial focus on classifying individual crop types in challenging geographies using a time-series of multi-spectral earth observatoin (EO) images. The use of a time-series of EO images better captures the dynamic nature of the appearance of crops and other LULC classes through a growing season, enabling more accurate model predictions. The functions and methods provided in this library can be used to generate EO reflectance time-series datasets and models for arbitraty vector data, e.g. points or polygons.
The library is divided in to several components:
-
tsmask: provides functions to create a masked numpy arrays corresponding to areas of interest, as well as aBandTimeSeriesobject initialized using the maked array. Specific functions and objects include:-
raserizeutilizes theosgeolibrary and the underlyinggdalfunctionaility to rasterize vector features from a shapefile and output a .tif file sharing the relevant metadata and dimensions as the reference image from which it was created. Acheck_rasterizefunction is also provided to confirm that the features were correclty "buned" into the raster layer. The resulting image can be characterized as a land cover "mask". -
mask_to_arraygenerates a 3D numpy array from the output ofrasterize. Each element of the 3D array is a 2D array representing band reflectance values for a given date. Values in the 3D array that are not no-data values correspond to a land cover class burned in usingrasterize. -
BandTimeSeriesobjects contain information about time-series' of reflectance values for samples in a given land cover class, and methods to operate on and format the reflectance time-series.BandTimeSeriesobjects are initialized using an output from themask_to_arrayfunction, along with arguments specifying the land cover class of the object, and the variable (band) name of the reflectance time-series. Thetime_series_data_framemethod allows for interpolation of the time-series.
-
-
tsclust: provides aTimeSeriesSampleclass that is useful for generating a dataset from all or a subset of data contained in aBandTimeSeriesand formating it for direct use in the functions and classes provided in thetslearnlibrary.-
TimeSeriesSampletake n_samples of the data in aBandTimeSeriesand optionally smooth the time-series' using a Savgol signal smoothing. Thets_datasetmethod generates an object that can be used directly in the time series clustering and classification algorithms provided in thetslearnlibrary. -
cluster_time_seriesperforms eitherGlobalAlignmentKernelKMeansorTimeSeriesKMeans(both from thetslearnlibrary) on aTimeSeriesSampleobject. The user specifies the number of clusters as well as the distance metric used if the clustering algorithm isTimeSeriesKMeans(dynamic time warping or soft dynamic time warping). Sillhouette scores computed on the resulting clusters can optionally be returned. Alternative sets of hyperparamters forcluster_times_seriescan be tested using thecluster_grid_searchfunction. -
cluster_mean_quantilesandplot_clustersprovide methods for inspecting and visualizing cluster results.
-
-
tstrainprovides functions for extracting training datasets comprising time-series' of band reflectance values at known locations (x,y numpy array indices) from satelite scenes.-
random_ts_samplestakes n_samples from .csv files containging reflectance time-series data for a given land cover class. -
get_training_datareads satellite scenes, e.g. scense corresponding to an areo of interest specified withsat-searchand download and saved using the default direcorty structure ofsat-search load, into numpy arrays using functionaility fromgippy. The output is a long-formpandasdataframe with colums for date, feature (band-value), band reflectance value, the 2d array index, and a label corresponding to a samples land cover class. -
format_training_datatakes the ouput ofget_training_dataand reshapes it into a 3D numpy array of shape (n_samples, n_timesteps, n_features) suitable for use in aKerasSequential model. Both x and y (optionally one-hot encoded) are returned.
-
Coming soon: Two jupyter notebook tutorials showcasing the functionality in this library