5.2 Tutorials summary

5.2 Tutorials summary#

In this book, we worked through tutorials accessing two satellite remote sensing datasets, preparing them for analysis and performing exploratory data analysis and visualization.

Reading and writing data#

Data used in the tutorials include Zarr data cubes and cloud-optimized GeoTIFFs (COGs), which were accessed from cloud object stores such as AWS S3 and Microsoft Planetary Computer. In addition, we analyzed data from GeoTIFF files read from local storage by creating a virtual representation of the data on disk. We also wrote xr.Dataset objects to disk as Zarr data cubes for later reuse.

See: Reading Zarr data cubes stored in S3 buckets with Xarray, Handling encoding information so that Xarray objects can be written to disk, Accessing data from Microsoft Planetary Computer, Reading and writing vector data cubes.

Larger-than-memory data#

We encountered many situations where memory required to perform operations on a dataset was larger than that available on a standard laptop. This prompted us to explore strategies for more efficient memory usage such as parallelization of operations and virtualization of data.

See: Strategies for reading data into memory, Creating a virtual copy to read a larger dataset into memory.

Joining raster and vector data#

In both tutorials, we started with large spatial footprints of satellite imagery represented by raster data cubes. We explored ways to:

Efficiently view the spatial footprint of a raster data cube by creating vector data frame of the footprint,
Clip raster data cubes by the extent of a polygon represented by a vector data frame, and
Create vector data cubes that hold time series data for a set of spatial areas of interest represented by polygons.

See: Handling raster and vector data and Creating vector data cubes.

Data visualization#

We used a number of visualization tools in these tutorials, each of which are appropriate for different situations and use-cases. These include interactive and static visualizations and plotting tools optimized for n-d array and vector datasets.

See: Plotting raster and vector data, Adding basemaps to plots of raster and vector data, Interactive visualization of vector data cubes with Xvec and GeoPandas, Interactive visualization of 2-d data with Xarray and holoviz, Sentinel-1 nb3, Using xr.FacetGrid for data inspection and cleaning, Different ways of plotting 2-d data side-by-side, Interactive visualization of time series data with Xarray and holoviz, and Visual dataset comparison.

Making analysis-ready data cubes#

Several notebooks focused on the tasks of organizing data cubes so that they were appropriate representations of physical observables and attaching relevant metadata to relevant variables and along appropriate dimensions.

This process, often called ‘data tidying,’ refers to the steps required to prepare a dataset for analysis. Depending on your data and how you want to use it, it can become quite involved. Data tidying requires a detailed understanding of both the data that you want to analyze and the ‘data model’ of the tools you are using to work with the data. The following pages contains more discussion on tidying steps for n-dimensional array datasets.

See: Inspection and exploratory analysis of ice velocity data, Wrangle metadata of a Sentinel-1 RTC dataset and Comparing two datasets.