3.4 Exploratory data analysis of a single glacier

3.4 Exploratory data analysis of a single glacier#

Introduction#

In the previous notebook, we walked through initial steps to read and organize a large raster dataset, and to understand it in the context of spatial areas of interest represented by vector data. In this notebook, we will continue performing initial data inspection and exploratory analysis but this time, focused on velocity data clipped to an individual glacier. We examine velocity variability at the scale of a single glacier while demonstrating Xarray functionality for common computations and visualizations

Outline

A. Data exploration

1. Load raster data and visualize with vector data
1. Examine data coverage along the time dimension
1. Look at data by sensor

B. Comparing different satellites

1. DataTree approach
1. GroupBy approach

C. Examine velocity variability

1. Histograms and summary statistics
1. Spatial velocity variablity
1. Temporal velocity variability

D. Computations along time dimension

1. Temporal resampling
1. Grouped analysis by season

Learning Goals

Concepts

Examining metadata, interpreting physical observable in the context of available metadata,
Sub-setting and visualizing raster datasets
‘Split-apply-combine’ workflows,
Calculating summary statistics across given dimensions of a dataset,
Performing reductions across multi-dimensional datasets.

Techniques

Managing groups of Xarray objects with Xarray.DataTree,
Grouped computations with GroupBy,
Using matplotlib to visualize raster and vector data with satellite imagery basemaps,
Combining Xarray with statistical packages like SciPy.

Expand the next cell to see specific packages used in this notebook and relevant system and version information.

Like in the previous notebook, we will write some objects to disk. Create the same variable to store the path to the root directory for this tutorial.

cwd = pathlib.Path.cwd()
tutorial1_dir = pathlib.Path(cwd).parent

C. Examine velocity variability#

1) Histograms and summary statistics#

First, we plot histogram of the v, vx and vy variables to examine their distributions. To construct these plots, we use a combination of xarray plotting functionality and matplotlib object-oriented plotting. In addition, we use xr.reduce() and scipy.stats.skew() to calculate the skew of each variable (inset in each sub-plot).

To make things easier, write a function that calculates and hold summary statistics for each variable in a dictionary:

def calc_summary_stats(ds: xr.Dataset, variable: str):
    """I'm a function that calculates summary statistics for a given data variable and returns them as a dict to be used in a plot"""

    skew = ds[f"{variable}"].reduce(func=scipy.stats.skew, nan_policy="omit", dim=["x", "y", "mid_date"]).data
    mean = ds[f"{variable}"].mean(dim=["x", "y", "mid_date"], skipna=True).data
    median = ds[f"{variable}"].median(dim=["x", "y", "mid_date"], skipna=True).data

    stats_dict = {"skew": skew, "mean": mean, "median": median}
    return stats_dict

stats_vy = calc_summary_stats(single_glacier_raster, "vy")
stats_vx = calc_summary_stats(single_glacier_raster, "vx")
stats_v = calc_summary_stats(single_glacier_raster, "v")

fig, axs = plt.subplots(ncols=3, figsize=(20, 5))
# VY
hist_y = single_glacier_raster.vy.plot.hist(ax=axs[0], bins=100)
cumulative_y = np.cumsum(hist_y[0])
axs[0].plot(hist_y[1][1:], cumulative_y, color="orange", linestyle="-", alpha=0.5)
# VY stats text
axs[0].text(x=-2000, y=2e6, s=f"Skew: {stats_vy['skew']:.3f}", fontsize=12, color="black")
axs[0].text(x=-2000, y=1.5e6, s=f"Mean: {stats_vy['mean']:.3f}", fontsize=12, color="black")
axs[0].text(x=-2000, y=1e6, s=f"Median: {stats_vy['median']:.3f}", fontsize=12, color="black")

# VX
hist_x = single_glacier_raster.vx.plot.hist(ax=axs[1], bins=100)
cumulative_x = np.cumsum(hist_x[0])
axs[1].plot(hist_x[1][1:], cumulative_x, color="orange", linestyle="-", alpha=0.5)
# VX stats text
axs[1].text(x=-2000, y=2e6, s=f"Skew: {stats_vx['skew']:.3f}", fontsize=12, color="black")
axs[1].text(x=-2000, y=1.5e6, s=f"Mean: {stats_vx['mean']:.3f}", fontsize=12, color="black")
axs[1].text(x=-2000, y=1e6, s=f"Median: {stats_vx['median']:.3f}", fontsize=12, color="black")

# V
hist_v = single_glacier_raster.v.plot.hist(ax=axs[2], bins=100)
cumulative_v = np.cumsum(hist_v[0])
axs[2].plot(hist_v[1][1:], cumulative_v, color="orange", linestyle="-", alpha=0.5)
# V stats text
axs[2].text(x=2000, y=2e6, s=f"Skew: {stats_v['skew']:.3f}", fontsize=12, color="black")
axs[2].text(x=2000, y=1.5e6, s=f"Mean: {stats_v['mean']:.3f}", fontsize=12, color="black")
axs[2].text(x=2000, y=1e6, s=f"Median: {stats_v['median']:.3f}", fontsize=12, color="black")

# Formatting and labeling
axs[0].set_title("VY")
axs[1].set_title("VX")
axs[2].set_title("V)")

for i in range(len(axs)):
    axs[i].set_xlabel(None)
    axs[i].set_ylabel(None)

fig.supylabel("# Observations", x=0.08, fontsize=12)
fig.supxlabel("Meters / year", fontsize=12)
fig.suptitle(
    "Histogram (blue) and cumulative distribution function (orange) of velocity components and magnitude",
    fontsize=16,
    y=1.05,
);

../../_images/8f3af118af081507129e031551fe2d909d858be1e36b73fbf325ebc45ba45368.png

The histograms and summary statistics show that vx and vy distributions are relatively Gaussian, while v is positively skewed and Rician. This is due to the non-linear relationship between component and displacement vectors. In datasets such as this one where the signal to noise ratio can be low, calculating velocity magnitude on smoothed or averaged component vectors can help to suppress noise (for a bit more detail, refer to this comment). For this reason, we will usually calculate velocity magnitude after the dataset has been reduced over space or time dimensions.

2) Spatial velocity variablity#

Now that we have a greater understanding of the importance of velocity component variability in shaping our understandings of velocity variability, let’s examine these variables as well as the estimated error provided in the dataset by reducing the dataset along the temporal dimension so that we can visualize the data along x and y dimensions.

# Calculate min, max for color bar
vmin_y = single_glacier_raster.vy.mean(dim=["mid_date"]).min().data
vmax_y = single_glacier_raster.vy.mean(dim=["mid_date"]).max().data

vmin_x = single_glacier_raster.vx.mean(dim=["mid_date"]).min().data
vmax_x = single_glacier_raster.vx.mean(dim=["mid_date"]).max().data

vmin = min([vmin_x, vmin_y])
vmax = max([vmax_x, vmax_y])

fig, axs = plt.subplots(ncols=2, figsize=(17, 7))

x = single_glacier_raster.vx.mean(dim="mid_date").plot(ax=axs[0], vmin=vmin, vmax=vmax, cmap="RdBu_r")
y = single_glacier_raster.vy.mean(dim="mid_date").plot(ax=axs[1], vmin=vmin, vmax=vmax, cmap="RdBu_r")
axs[0].set_title("x-component velocity", fontsize=12)
axs[1].set_title("y-component velocity", fontsize=12)
fig.suptitle("Temporal mean of velocity components", fontsize=16, y=1.02)

x.colorbar.set_label("m/y", rotation=270)
y.colorbar.set_label("m/yr", rotation=270)

for i in range(len(axs)):
    axs[i].set_ylabel(None)
    axs[i].set_xlabel(None)
fig.supylabel("Y-coordinate of projection (meters)", x=0.08, fontsize=12)
fig.supxlabel("X-coordinate of projection (meters)", fontsize=12);

../../_images/5c92972442e2ec5616d3dd2a1210110d4d4223fa7b4d41f0ff1c2c83369cd6b3.png

In addition to visualizing components (above), plotting velocity vectors is helpful for understanding magnitude and direction of flow:

First, calculate and visualize mean velocity magnitude over time (we will use the function defined in Part 1), and the mean estimated error over time:

ds_v = calc_v_magnitude(single_glacier_raster.mean(dim="mid_date", skipna=True))

fig, axs = plt.subplots(ncols=2, figsize=(20, 7))

single_glacier_vector.plot(ax=axs[0], facecolor="none", edgecolor="red")
single_glacier_raster.mean(dim="mid_date").plot.quiver("x", "y", "vx", "vy", ax=axs[1], angles="xy", robust=True)

single_glacier_vector.plot(ax=axs[1], facecolor="none", edgecolor="red")
a = ds_v["vmag"].plot(ax=axs[0], alpha=0.6, vmax=45, vmin=5)
a.colorbar.set_label("meter/year")

fig.supylabel("Y-coordinate of projection (meters)", x=0.08, fontsize=12)
fig.supxlabel("X-coordinate of projection (meters)", fontsize=12)

fig.suptitle(
    "Velocity vectors (R) and magntiude of velocity (L), averaged over time",
    fontsize=16,
    y=0.98,
)
for i in range(len(axs)):
    axs[i].set_xlabel(None)
    axs[i].set_ylabel(None)
    axs[i].set_title(None)

../../_images/721cd5a53feb2bc90cfb8c15de3d13cafef12de46778b38e4a7303d3e6945565.png

Visualize magnitude of velocity overlaid with velocity vectors next to velocity error:

fig, ax = plt.subplots(figsize=(22, 6), ncols=2)

vmag = ds_v.vmag.plot(ax=ax[0], vmin=0, vmax=52, alpha=0.5)
single_glacier_raster.mean(dim="mid_date").plot.quiver("x", "y", "vx", "vy", ax=ax[0], angles="xy", robust=True)

err = ds_v.v_error.plot(ax=ax[1], vmin=0, vmax=52)


vmag.colorbar.set_label("m/y")  # , rotation=270)
err.colorbar.set_label("m/y")  # , rotation=270)

for i in range(len(ax)):
    ax[i].set_ylabel(None)
    ax[i].set_xlabel(None)
    ax[i].set_title(None)

fig.supxlabel("X-coordinate of projection (meters)", fontsize=12)
fig.supylabel("Y-coordinate of projection (meters)", x=0.08, fontsize=12)
fig.suptitle(
    "Mean velocity magnitude over time (L), mean error over time (R)",
    fontsize=16,
    y=1.02,
);

../../_images/760ee7144880fa854d2887472e6dce5a963455e01a4bb7fd29861d00dc5a8e19.png

v_error is large relative to the magnitude of velocity, suggesting that this data is pretty noisy.

3) Temporal velocity variability#

Reduce over the spatial dimensions (this time we will switch it up and choose a different reduction function) and visualize variability over time:

fig, ax = plt.subplots(figsize=(20, 5))

vmag_med = calc_v_magnitude(single_glacier_raster.median(dim=["x", "y"]))

vmag_med.plot.scatter(x="mid_date", y="vmag", ax=ax, marker="o", edgecolors="None", alpha=0.5)
fig.suptitle("Spatial median magnitude of velocity over time")
ax.set_title(None)
ax.set_ylabel("m/y")
ax.set_xlabel("Time");

../../_images/025f989aa453eb7a409b855afc013ff848feba1d602b5670f73dfeb12ea91de6.png

This helps get a sense of velocity variability over time, but also shows how many outliers there are, even after taking the median over the x and y dimensions. In the final section of this notebook, we explore different approaches for changing the resolution of the temporal dimension.

Conclusion#

This was a primer in exploratory data analysis at the scale of an individual spatial area of interest (in this case, a glacier). The last notebook in this chapter will demonstrate exploratory analysis at a larger spatial scale.

3.4 Exploratory data analysis of a single glacier

Contents

3.4 Exploratory data analysis of a single glacier#

Introduction#

Concepts

Techniques

A. Data exploration#

1) Load raster data and visualize with vector data#

2) Examine data coverage along the time dimension#

3) Look at data by sensor#

Landsat 8#

What about a sensor with multiple identifiers?#

B. Comparing different satellites#

1) DataTree approach#

2) GroupBy approach#

Choosing an approach#

C. Examine velocity variability#

1) Histograms and summary statistics#

2) Spatial velocity variablity#

3) Temporal velocity variability#

D. Computations along time dimension#

1) Temporal resampling#

2) Grouped analysis by season#

Conclusion#