4.3 Exploratory analysis of ASF S1 imagery

4.3 Exploratory analysis of ASF S1 imagery#

Now that we have read in and organized the stack of Sentinel-1 RTC images, let’s take a look at the data.

ASF data access options

The steps shown in this notebook involve downloading and extracting large volumes of data. It is not necessary to do this to follow the rest of the content in the tutorial. We include the demonstration for the purposes of completeness and to help users who may be in this situation.

For more information on different options for downloading data locally, see the Introduction.

Outline

A. Read and prepare data

1. Clip to spatial area of interest

B. Layover-shadow map

1. Interactive visualization of layover-shadow maps

C. Orbital direction

1. Is a pass ascending or descending?
1. Assign orbital direction as a coordinate variable

D. Duplicate time steps

1. Identify duplicate time steps
1. Visualize duplicates
1. Drop duplicates

E. Examine coverage over time series

F. Data Visualization

1. Mean backscatter over time
1. Seasonal backscatter variability
1. Backscatter time series

Learning goals

Concepts

Spatial joins of raster and vector data.
Visualize raster data.
Use raster metadata to aid interpretation of backscatter imagery.
Examine data quality using provided layover-shadow maps.
Identify and remove duplicate time step observations.

Techniques

Clip raster data cube using vector data with rioxarray.clip().
Using xr.groupby() for grouped statistics.
Reorganizing data with xr.Dataset.reindex().
Visualizing multiple facets of the data using FacetGrid

ASF Data Access

You can download the RTC-processed backscatter time series here. For more detail, see tutorial data and the notebook on reading ASF Sentinel-1 RTC data into memory.

A. Read and prepare data#

We’ll go through the steps shown in metadata wrangling, but this time, combined into one function from s1_tools.

Attention

If you are following along on your own computer, you must specify 'timeseries_type' below:

Set timeseries_type to 'full' or 'subset' depending on if you are using the full time series (103 files) or the subset time series (5 files).

timeseries_type = "full"

vv_vrt_path = f"../data/raster_data/{timeseries_type}_timeseries/vrt_files/s1_stack_vv.vrt"
vh_vrt_path = f"../data/raster_data/{timeseries_type}_timeseries/vrt_files/s1_stack_vh.vrt"
ls_vrt_path = f"../data/raster_data/{timeseries_type}_timeseries/vrt_files/s1_stack_ls_map.vrt"

asf_data_cube = s1_tools.metadata_processor(
    vv_path=vv_vrt_path,
    vh_path=vh_vrt_path,
    ls_path=ls_vrt_path,
    timeseries_type=timeseries_type,
)

asf_data_cube

<xarray.Dataset> Size: 289GB
Dimensions:        (x: 17452, y: 13379, acq_date: 103)
Coordinates:
  * x              (x) float64 140kB 3.833e+05 3.833e+05 ... 9.068e+05 9.068e+05
  * y              (y) float64 107kB 3.309e+06 3.309e+06 ... 2.907e+06 2.907e+06
    spatial_ref    int64 8B 0
    ls             (acq_date, y, x) float32 96GB dask.array<chunksize=(11, 1536, 1536), meta=np.ndarray>
  * acq_date       (acq_date) datetime64[ns] 824B 2021-05-02T12:14:14 ... 202...
    product_id     (acq_date) <U4 2kB '1424' '54B1' '8A4F' ... '1380' 'E5B6'
    data_take_ID   (acq_date) <U6 2kB '047321' '047463' ... '052C00' '052C00'
    abs_orbit_num  (acq_date) <U6 2kB '037709' '037745' ... '043309' '043309'
Data variables:
    vh             (acq_date, y, x) float32 96GB dask.array<chunksize=(11, 1536, 1536), meta=np.ndarray>
    vv             (acq_date, y, x) float32 96GB dask.array<chunksize=(11, 1536, 1536), meta=np.ndarray>
Attributes: (12/13)
    area_or_clipped:                   e
    sensor:                            S1A
    orbit_type:                        P
    beam_mode:                         IW
    processing_software:               G
    output_type:                       g
    ...                                ...
    notfiltered_or_filtered:           n
    deadreckoning_or_demmatch:         d
    output_unit:                       p
    terrain_correction_pixel_spacing:  RTC30
    polarization_type:                 D
    primary_polarization:              V

1) Clip to spatial area of interest#

Until now, we’ve kept the full spatial extent of the dataset. This hasn’t been a problem because all of our operations have been lazy. Now, we’d like to visualize the dataset in ways that require eager instead of lazy computation. We subset the data cube to a smaller area to focus on a location interest to make computation more less computationally-intensive.

Later notebooks use a different Sentinel-1 RTC dataset that is accessed for a smaller area of interest. Clip the current data cube to that spatial footprint:

# Read vector data
pc_aoi = gpd.read_file("../data/vector_data/hma_rtc_aoi.geojson")

Visualize location

pc_aoi.explore()

Make this Notebook Trusted to load map: File -> Trust Notebook

Check the CRS and ensure it matches that of the raster data cube:

assert asf_data_cube.rio.crs == pc_aoi.crs, f"Expected: {asf_data_cube.rio.crs}, received: {pc_aoi.crs}"

Clip the raster data cube by the extent of the vector:

clipped_cube = asf_data_cube.rio.clip(pc_aoi.geometry, pc_aoi.crs)
clipped_cube

<xarray.Dataset> Size: 142MB
Dimensions:        (x: 290, y: 396, acq_date: 103)
Coordinates:
  * x              (x) float64 2kB 6.194e+05 6.195e+05 ... 6.281e+05 6.281e+05
  * y              (y) float64 3kB 3.102e+06 3.102e+06 ... 3.09e+06 3.09e+06
    ls             (acq_date, y, x) float32 47MB dask.array<chunksize=(11, 396, 290), meta=np.ndarray>
  * acq_date       (acq_date) datetime64[ns] 824B 2021-05-02T12:14:14 ... 202...
    product_id     (acq_date) <U4 2kB '1424' '54B1' '8A4F' ... '1380' 'E5B6'
    data_take_ID   (acq_date) <U6 2kB '047321' '047463' ... '052C00' '052C00'
    abs_orbit_num  (acq_date) <U6 2kB '037709' '037745' ... '043309' '043309'
    spatial_ref    int64 8B 0
Data variables:
    vh             (acq_date, y, x) float32 47MB dask.array<chunksize=(11, 396, 290), meta=np.ndarray>
    vv             (acq_date, y, x) float32 47MB dask.array<chunksize=(11, 396, 290), meta=np.ndarray>
Attributes: (12/13)
    area_or_clipped:                   e
    sensor:                            S1A
    orbit_type:                        P
    beam_mode:                         IW
    processing_software:               G
    output_type:                       g
    ...                                ...
    notfiltered_or_filtered:           n
    deadreckoning_or_demmatch:         d
    output_unit:                       p
    terrain_correction_pixel_spacing:  RTC30
    polarization_type:                 D
    primary_polarization:              V

Use xr.Dataset.persist(); this method is an integration of Dask with Xarray. It will trigger the background computation of the operations we’ve so far executed lazily. Persist is similar to compute but it keeps the underlying data as dask-backed arrays instead of converting them to NumPy arrays.

clipped_cube = clipped_cube.persist()

Great, we’ve gone from an object where each 3-d variable is ~ 90 GB to one where each 3-d variable is ~45 MB, this will be much easier to work with.

B. Layover-shadow map#

As discussed in previous notebooks, every Sentinel-1 scene comes with an associated layover shadow mask GeoTIFF file. This map describes the presence of layover, shadow and slope angle conditions that can impact backscatter values in a scene, which is especially important to consider in high-relief settings with more potential for geometric distortions

The following information is copied from the README file that accompanies each scene:

Layover-shadow mask

The layover/shadow mask indicates which pixels in the RTC image have been affected by layover and shadow. This layer is tagged with _ls_map.tif

The pixel values are generated by adding the following values together to indicate which layover and shadow effects are impacting each pixel:
 Pixel not tested for layover or shadow
 Pixel tested for layover or shadow
 Pixel has a look angle less than the slope angle
 Pixel is in an area affected by layover
 Pixel has a look angle less than the opposite of the slope angle
Pixel is in an area affected by shadow

There are 17 possible different pixel values, indicating the layover, shadow, and slope conditions present added together for any given pixel._

The values in each cell can range from 0 to 31:
 Not tested for layover or shadow
 Not affected by either layover or shadow
 Look angle < slope angle
 Affected by layover
 Affected by layover; look angle < slope angle
 Look angle < opposite slope angle
Look angle < slope and opposite slope angle
Affected by layover; look angle < opposite slope angle
Affected by layover; look angle < slope and opposite slope angle
Affected by shadow
Affected by shadow; look angle < slope angle
Affected by layover and shadow
Affected by layover and shadow; look angle < slope angle
Affected by shadow; look angle < opposite slope angle
Affected by shadow; look angle < slope and opposite slope angle
Affected by shadow and layover; look angle < opposite slope angle
Affected by shadow and layover; look angle < slope and opposite slope angle

The ASF RTC image product guide has detailed descriptions of how the data is processed and what is included in the processed dataset.

The layover-shadow variable provides categorical information so we’ll use a qualitative colormap to visualize it.

clipped_cube.isel(acq_date=1).ls.plot()

<matplotlib.collections.QuadMesh at 0x71f4db27ef90>

../../_images/78d280fde10fcd6b6e652c22542c97e85d308c6d77560488315b9831de483c5c.png

cat_cmap = plt.get_cmap("tab20b", lut=32)
time_step1 = "2021-06-07"
time_step2 = "2021-06-10"

if timeseries_type == "subset":
    time_step1 = "2021-05-05T00:03:07"
    time_step2 = "2021-05-14T12:13:49"

fig, axs = plt.subplots(ncols=2, figsize=(12, 7), layout="constrained")

clipped_cube.sel(acq_date=time_step1).ls.plot(ax=axs[0], cmap=cat_cmap, cbar_kwargs=({"label": None}), vmin=0, vmax=31)

clipped_cube.sel(acq_date=time_step2).ls.plot(ax=axs[1], cmap=cat_cmap, cbar_kwargs=({"label": None}), vmin=0, vmax=31)

fig.suptitle(f"Layover shadow map of two time steps: {time_step1}, and {time_step2}", y=1.05)
for i in range(len(axs)):
    axs[i].set_xlabel(None)
    axs[i].set_ylabel(None)
    axs[i].tick_params(axis="x", labelrotation=45)

axs[0].set_title(time_step1)
axs[1].set_title(time_step2)
fig.supylabel("y coordinate of projection (m)")
fig.supxlabel("x coordinate of projection (m)");

../../_images/802fdc1c8abd3d15b525aa9a97511575b593a3b636dff888ed973f7f2c7d7bf7.png

It looks like there are areas affected by different types of distortion on different dates. For example, in the lower left quadrant, there is a region that is blue (5 - affected by layover) on 6/7/2021 but much of that area appear yellow (16 - affected by radar shadow) on 6/10/2021. This pattern is present throughout much of the scene with portions of area that are affected by layover in one acquisition in shadow in the next acquisition. This is not due to any real changes on the ground that occurred between the two acquisitions, rather it is the different viewing geometries of the orbital passes: one of the above scenes was collected during an ascending pass of the satellite and one during a descending pass. Since Sentinel-1 is always looking to the same side, ascending and descending passes will view the same area on the ground from opposing perspectives.

Attention

If you’re following along using the subset time series, your plot will look different than the plot above. That plot should display the layover-shadow map from 05/05/2021 on the left and 05/14/2021 on the right. In this plot, you’ll see that the areas in shadow on 05/05/2021 are similar to the areas affected by layover on 05/14/2024.

1) Interactive visualization of layover-shadow maps#

We can use Xarray’s integration with hvplot (a library within the holoviz ecosystem) to look at the time series of layover-shadow maps interactively. Read more about interactive plots with Xarray and hvplot here.

To do this, we need to demote 'ls' from a coordinate variable to a data variable because of how hvplot.xarray expects the data to be structured. We can do this with xr.reset_coords(). First, recreate the above layover-shadow plot using hvplot:

ls_var = clipped_cube.reset_coords("ls")

(
    ls_var.ls.sel(acq_date=time_step1)
    .squeeze()
    .hvplot(
        cmap="tab20b",
        width=400,
        height=350,
        clim=(0, 32),  # specify the limits of the colorbar to match original
        title=f"Acq date: {time_step1}",
    )
    + ls_var.ls.sel(acq_date=time_step2)
    .squeeze()
    .hvplot(
        cmap="tab20b",
        width=400,
        height=350,
        clim=(0, 32),
        title=f"Acq date: {time_step2}",
    )
)

D. Duplicate time steps#

Note

If you’re working with the subset time series, skip this section.

If we take a closer look at the ASF dataset, we can see that there are a few scenes from identical acquisitions (this is apparent in acq_date and more specifically in product_id). Let’s examine these and see what’s going on.

Note

This section will not work if you’re using the subset timeseries.

1) Identify duplicate time steps#

First we’ll extract the data_take_ID from the Sentinel-1 granule ID:

clipped_cube.data_take_ID.data

array(['047321', '047463', '047676', '047898', '047898', '0479A9',
       '047BBD', '047DE5', '047EF4', '0480FD', '048318', '04841E',
       '04862F', '04884C', '04895A', '048B6C', '048D87', '048D87',
       '048E99', '0490AD', '0492D4', '0492D4', '0492D4', '0493DB',
       '0495EC', '04980F', '04991E', '049B1F', '049D70', '049EAC',
       '04A0FF', '04A383', '04A4B9', '04A6FC', '04A972', '04AAB3',
       '04AD0D', '04AF85', '04B0C4', '04B2FB', '04B566', '04B6AB',
       '04B8FF', '04BB88', '04BCB8', '04BF12', '04C195', '04C2D6',
       '04C52E', '04C7A8', '04C8E9', '04CB46', '04CDC8', '04CEFF',
       '04D14F', '04D3CC', '04D50E', '04D761', '04D9E8', '04DB21',
       '04DD71', '04DFDD', '04E110', '04E340', '04E5AB', '04E6D7',
       '04E931', '04EB9B', '04ECCD', '04EEE8', '04F156', '04F294',
       '04F4D6', '04F754', '04F88B', '04FAF5', '04FD6F', '04FEB2',
       '050108', '050376', '0504AF', '0506F7', '050962', '050AA2',
       '050CE5', '050F59', '051096', '0512D7', '05154A', '05154A',
       '051682', '0518C0', '051B30', '051C63', '051E96', '0520EE',
       '05221E', '05245F', '0526C4', '0529DD', '052C00', '052C00',
       '052C00'], dtype='<U6')

Let’s look at the number of unique elements using np.unique().

data_take_ids_ls = clipped_cube.data_take_ID.data.tolist()
data_take_id_set = np.unique(clipped_cube.data_take_ID)
len(data_take_id_set)

Interesting - it looks like there are only 96 unique elements. Let’s figure out which are duplicates:

def duplicate(input_ls):
    return list(set([x for x in input_ls if input_ls.count(x) > 1]))


duplicate_ls = duplicate(data_take_ids_ls)
duplicate_ls

['047898', '048D87', '052C00', '05154A', '0492D4']

These are the data take IDs that are duplicated in the dataset. We now want to subset the xarray object to only include these data take IDs:

duplicates_cube = clipped_cube.where(asf_duplicate_cond == True, drop=True)
duplicates_cube

<xarray.Dataset> Size: 17MB
Dimensions:        (acq_date: 12, y: 396, x: 290)
Coordinates:
  * x              (x) float64 2kB 6.194e+05 6.195e+05 ... 6.281e+05 6.281e+05
  * y              (y) float64 3kB 3.102e+06 3.102e+06 ... 3.09e+06 3.09e+06
    ls             (acq_date, y, x) float32 6MB dask.array<chunksize=(10, 396, 290), meta=np.ndarray>
  * acq_date       (acq_date) datetime64[ns] 96B 2021-05-14T12:13:49 ... 2022...
    product_id     (acq_date) <U4 192B '971C' 'FA4F' '0031' ... '1380' 'E5B6'
    data_take_ID   (acq_date) <U6 288B '047898' '047898' ... '052C00' '052C00'
    abs_orbit_num  (acq_date) <U6 288B '037884' '037884' ... '043309' '043309'
    spatial_ref    int64 8B 0
    orbital_dir    (acq_date) <U4 192B 'desc' 'desc' 'desc' ... 'desc' 'desc'
Data variables:
    vh             (acq_date, y, x) float32 6MB dask.array<chunksize=(10, 396, 290), meta=np.ndarray>
    vv             (acq_date, y, x) float32 6MB dask.array<chunksize=(10, 396, 290), meta=np.ndarray>
Attributes: (12/13)
    area_or_clipped:                   e
    sensor:                            S1A
    orbit_type:                        P
    beam_mode:                         IW
    processing_software:               G
    output_type:                       g
    ...                                ...
    notfiltered_or_filtered:           n
    deadreckoning_or_demmatch:         d
    output_unit:                       p
    terrain_correction_pixel_spacing:  RTC30
    polarization_type:                 D
    primary_polarization:              V

2) Visualize duplicates#

Great, now we have a 12-time step Xarray object that contains only the duplicate data takes. Let’s see what it looks like. We can use xr.FacetGrid objects to plot all of the arrays at once.

Before we make a FacetGrid plot, we need to make a change to the dataset. FacetGrid takes a column and expands the levels of the provided dimension into individual sub-plots (a small multiples plot). We’re looking at the duplicate time steps, meaning the elements of the acq_date dimension are non-unique. FacetGrid expects unique values along the specified coordinate array. If we were to directly call:

fg = duplicates_cube.vv.plot(col="acq_date", col_wrap=4)

We would receive the following error:

ValueError: Coordinates used for faceting cannot contain repeated (nonunique) values.

Renaming the dimensions of duplicates_cube with xr.rename_dims() demotes the acq_date coordinate array to non-dimensional coordinate and replaces it with step an array of integers. Because these are unique, we can make a FaceGrid plot with the step dimension.

duplicates_cube.rename_dims({"acq_date": "step"})

<xarray.Dataset> Size: 17MB
Dimensions:        (step: 12, y: 396, x: 290)
Coordinates:
  * x              (x) float64 2kB 6.194e+05 6.195e+05 ... 6.281e+05 6.281e+05
  * y              (y) float64 3kB 3.102e+06 3.102e+06 ... 3.09e+06 3.09e+06
    ls             (step, y, x) float32 6MB dask.array<chunksize=(10, 396, 290), meta=np.ndarray>
  * acq_date       (step) datetime64[ns] 96B 2021-05-14T12:13:49 ... 2022-05-...
    product_id     (step) <U4 192B '971C' 'FA4F' '0031' ... 'CA1B' '1380' 'E5B6'
    data_take_ID   (step) <U6 288B '047898' '047898' ... '052C00' '052C00'
    abs_orbit_num  (step) <U6 288B '037884' '037884' ... '043309' '043309'
    spatial_ref    int64 8B 0
    orbital_dir    (step) <U4 192B 'desc' 'desc' 'desc' ... 'desc' 'desc' 'desc'
Dimensions without coordinates: step
Data variables:
    vh             (step, y, x) float32 6MB dask.array<chunksize=(10, 396, 290), meta=np.ndarray>
    vv             (step, y, x) float32 6MB dask.array<chunksize=(10, 396, 290), meta=np.ndarray>
Attributes: (12/13)
    area_or_clipped:                   e
    sensor:                            S1A
    orbit_type:                        P
    beam_mode:                         IW
    processing_software:               G
    output_type:                       g
    ...                                ...
    notfiltered_or_filtered:           n
    deadreckoning_or_demmatch:         d
    output_unit:                       p
    terrain_correction_pixel_spacing:  RTC30
    polarization_type:                 D
    primary_polarization:              V

fg = duplicates_cube.rename_dims({"acq_date": "step"}).vv.plot(col="step", col_wrap=4)

../../_images/9decb27d8ce279fef1605e45d1afe5663a05f8cdf702fe1b7f7cbc34610cf363.png

Interesting, it looks like there’s only really data for the 0, 2, 4, 7 and 9 elements of the list of duplicates. It could be that the processing of these files was interrupted and then restarted, producing extra empty arrays.

3) Drop duplicates#

To drop these arrays, extract the product ID (the only variable that is unique among the duplicates) of each array we’d like to remove.

drop_ls = [1, 3, 5, 6, 8, 10, 11]

We can use xarray’s .isel() method, .xr.DataArray.isin(), xr.Dataset.where(), and list comprehension to efficiently subset the time steps we want to keep:

drop_product_id_ls = duplicates_cube.isel(acq_date=drop_ls).product_id.data
drop_product_id_ls

array(['FA4F', '65E0', 'E113', '24B8', '57F2', '1380', 'E5B6'],
      dtype='<U4')

Using this list, we want to drop all of the elements of clipped_cube where product ID is one of the values in the list.

duplicate_cond = ~clipped_cube.product_id.isin(drop_product_id_ls)
clipped_cube = clipped_cube.where(duplicate_cond == True, drop=True)
clipped_cube

<xarray.Dataset> Size: 132MB
Dimensions:        (acq_date: 96, y: 396, x: 290)
Coordinates:
  * x              (x) float64 2kB 6.194e+05 6.195e+05 ... 6.281e+05 6.281e+05
  * y              (y) float64 3kB 3.102e+06 3.102e+06 ... 3.09e+06 3.09e+06
    ls             (acq_date, y, x) float32 44MB dask.array<chunksize=(10, 396, 290), meta=np.ndarray>
  * acq_date       (acq_date) datetime64[ns] 768B 2021-05-02T12:14:14 ... 202...
    product_id     (acq_date) <U4 2kB '1424' '54B1' '8A4F' ... '7418' 'CA1B'
    data_take_ID   (acq_date) <U6 2kB '047321' '047463' ... '0529DD' '052C00'
    abs_orbit_num  (acq_date) <U6 2kB '037709' '037745' ... '043236' '043309'
    spatial_ref    int64 8B 0
    orbital_dir    (acq_date) <U4 2kB 'desc' 'asc' 'desc' ... 'desc' 'desc'
Data variables:
    vh             (acq_date, y, x) float32 44MB dask.array<chunksize=(10, 396, 290), meta=np.ndarray>
    vv             (acq_date, y, x) float32 44MB dask.array<chunksize=(10, 396, 290), meta=np.ndarray>
Attributes: (12/13)
    area_or_clipped:                   e
    sensor:                            S1A
    orbit_type:                        P
    beam_mode:                         IW
    processing_software:               G
    output_type:                       g
    ...                                ...
    notfiltered_or_filtered:           n
    deadreckoning_or_demmatch:         d
    output_unit:                       p
    terrain_correction_pixel_spacing:  RTC30
    polarization_type:                 D
    primary_polarization:              V

Conclusion#

In this notebook, we demonstrated how to use the data cube that we assembled in the previous notebooks. We saw various ways that having metadata accessible and attached to the correct dimensions of the data cube made learning about the dataset much smoother and more efficient than it would otherwise be.

In the next notebook, we’ll work with a different Sentinel-1 RTC dataset. We’ll write this dataset to disk in order to use it in the final notebook of the tutorial, a comparison of two datasets.

if timeseries_type == "full":
    clipped_cube.to_zarr(
        f"../data//raster_data/{timeseries_type}_timeseries/intermediate_cubes/s1_asf_clipped_cube.zarr",
        mode="w",
    )
else:
    pass

4.3 Exploratory analysis of ASF S1 imagery

Contents

4.3 Exploratory analysis of ASF S1 imagery#

Concepts

Techniques

A. Read and prepare data#

1) Clip to spatial area of interest#

B. Layover-shadow map#

1) Interactive visualization of layover-shadow maps#

C. Orbital direction#

1) Is a pass ascending or descending?#

2) Assign orbital direction as a coordinate variable#

D. Duplicate time steps#

1) Identify duplicate time steps#

2) Visualize duplicates#

3) Drop duplicates#

E. Examine coverage over time series#

F. Data visualization#

1) Mean backscatter over time#

Specify min and max values in plotting call#

Expand dimensions#

Creating a gap-filled backscatter image#

2) Seasonal backscatter variability#

3) Backscatter time series#

Conclusion#