A. Data exploration

A. Data exploration#

2) Examine data coverage#

A wide variety of forces can impact both satellite imagery and the ability of ITS_LIVE’s feature tracking algorithm to extract velocity estimates from satellite image pairs. For these reasons, there are at times both gaps in coverage and ranges in the estimated error associated with different observations. The following section will demonstrate how to calculate and visualize coverage of the dataset over time. Part 2 will include a discussion of uncertainty and error estimates

When first investigating a dataset, it is helpful to be able to scan/quickly visualize coverage along a given dimension. To create the data needed for such a visualization, we first need a mask that will tell us all possible ‘valid’ pixels; in other words, we need to differentiate between pixels in our 2-d rectangular array that represent ice v. non-ice. Then, for every time step, we can calculate the portion of possible ice pixels that contain an estimated velocity value.

#calculate number of valid pixels
valid_pixels = single_glacier_raster.v.count(dim=['x','y'])
#calculate max. number of valid pixels
valid_pixels_max = single_glacier_raster.v.notnull().any('mid_date').sum(['x','y'])
#add cov proportion to dataset as variable
single_glacier_raster['cov'] = valid_pixels/ valid_pixels_max

Now we can visualize coverage over time:

fig, ax = plt.subplots(figsize=(20,5))

#Plot object
single_glacier_raster['cov'].plot(ax=ax, linestyle='None', marker='x',alpha=0.75)

#Specify axes labels and title
fig.suptitle('Velocity data coverage ovver time', fontsize=16)
ax.set_ylabel('Coverage (proportion)', x=-0.05, fontsize=12)
ax.set_xlabel('Date', fontsize=12);

../_images/80e596a55113ad722eaee26dc353f126a4b8aaeecf2285e89efd69a8b728c0bf.png

B. Examine velocity variability#

1) Histograms and summary statistics#

First, we plot histogram of the v, vx and vy variables to examine their distrubtions. To construct these plots, we use a combination of xarray plotting functionality and matplotlib object-oriented plotting. In addition, we use xr.reduce() and scipy.stats.skew() to calculate the skew of each variable (inset in each sub-plot).

To make things easier, write a function that calculates and hold summary statistics for each variable in a dictionary:

def calc_summary_stats(ds: xr.Dataset, variable:str):

    """ I'm a function that calculates summary statistics for a given data variable and returns them as a dict to be used in a plot"""

    skew = ds[f'{variable}'].reduce(func=scipy.stats.skew, nan_policy='omit', dim=['x','y','mid_date']).data
    mean = ds[f'{variable}'].mean(dim=['x','y','mid_date'], skipna=True).data
    median = ds[f'{variable}'].median(dim=['x','y','mid_date'], skipna=True).data

    stats_dict = {'skew':skew, 'mean':mean, 'median':median}
    return stats_dict

stats_vy = calc_summary_stats(single_glacier_raster, 'vy')
stats_vx = calc_summary_stats(single_glacier_raster, 'vx')
stats_v = calc_summary_stats(single_glacier_raster, 'v')

fig,axs=plt.subplots(ncols=3, figsize=(20,5))
#VY
hist_y = single_glacier_raster.vy.plot.hist(ax=axs[0], bins=100)
cumulative_y = np.cumsum(hist_y[0])
axs[0].plot(hist_y[1][1:], cumulative_y, color='orange', linestyle='-', alpha=0.5)
# VY stats text
axs[0].text(x=-2000, y=2e6, s=f"Skew: {stats_vy['skew']:.3f}", fontsize=12, color='black')
axs[0].text(x=-2000, y=1.5e6, s=f"Mean: {stats_vy['mean']:.3f}", fontsize=12, color='black')
axs[0].text(x=-2000, y=1e6, s=f"Median: {stats_vy['median']:.3f}", fontsize=12, color='black')

#VX
hist_x = single_glacier_raster.vx.plot.hist(ax=axs[1], bins=100)
cumulative_x = np.cumsum(hist_x[0])
axs[1].plot(hist_x[1][1:], cumulative_x, color='orange', linestyle='-', alpha=0.5)
#VX stats text
axs[1].text(x=-2000, y=2e6, s=f"Skew: {stats_vx['skew']:.3f}", fontsize=12, color='black')
axs[1].text(x=-2000, y=1.5e6, s=f"Mean: {stats_vx['mean']:.3f}", fontsize=12, color='black')
axs[1].text(x=-2000, y=1e6, s=f"Median: {stats_vx['median']:.3f}", fontsize=12, color='black')

#V
hist_v = single_glacier_raster.v.plot.hist(ax=axs[2], bins=100)
cumulative_v = np.cumsum(hist_v[0])
axs[2].plot(hist_v[1][1:], cumulative_v, color='orange', linestyle='-', alpha=0.5)
#V stats text
axs[2].text(x=2000, y=2e6, s=f"Skew: {stats_v['skew']:.3f}", fontsize=12, color='black')
axs[2].text(x=2000, y=1.5e6, s=f"Mean: {stats_v['mean']:.3f}", fontsize=12, color='black')
axs[2].text(x=2000, y=1e6, s=f"Median: {stats_v['median']:.3f}", fontsize=12, color='black')

#Formating and labeling
axs[0].set_title('VY')
axs[1].set_title('VX')
axs[2].set_title('V)')

for i in range(len(axs)):
    axs[i].set_xlabel(None)
    axs[i].set_ylabel(None)

fig.supylabel('# Observations', x=0.08, fontsize=12)
fig.supxlabel('Meters / year', fontsize=12)
fig.suptitle('Histogram (blue) and cumulative distribution function (orange) of velocity components and magnitude', fontsize=16, y=1.05);

../_images/8f3af118af081507129e031551fe2d909d858be1e36b73fbf325ebc45ba45368.png

The histograms and summary statistics show that vx and vy distributions are relatively Gaussian, while v is positively skewed and Rician. This is due to the non-linear relationshipp between component and displacement vectors. In datasets such as this one where the signal to noise ratio can be low, calculating velocity magnitude on smoothed or averaged component vectors can help to suppress noise (for a bit more detail, refer to this comment). For this reason, we will usually calculate velocity magnitude after the dataset has been reduced over space or time dimensions.

2) Spatial velocity variability#

Now that we have a greater understanding of the importance of velocity component variability in shaping our understandings of velocity variability, let’s examine these variables as well as the estimated error provided in the dataset by reducing the dataset along the temporal dimension so that we can visualize the data along x and y dimensions.

#Calculate min, max for color bar 
vmin_y = single_glacier_raster.vy.mean(dim=['mid_date']).min().data
vmax_y = single_glacier_raster.vy.mean(dim=['mid_date']).max().data

vmin_x = single_glacier_raster.vx.mean(dim=['mid_date']).min().data
vmax_x = single_glacier_raster.vx.mean(dim=['mid_date']).max().data

vmin = min([vmin_x, vmin_y])
vmax = max([vmax_x, vmax_y])

fig, axs = plt.subplots(ncols =2, figsize=(17,7))

x = single_glacier_raster.vx.mean(dim='mid_date').plot(ax=axs[0], vmin=vmin, vmax=vmax, cmap='RdBu_r')
y = single_glacier_raster.vy.mean(dim='mid_date').plot(ax=axs[1], vmin=vmin, vmax=vmax, cmap='RdBu_r')
axs[0].set_title('x-component velocity', fontsize=12)
axs[1].set_title('y-component velocity', fontsize=12)
fig.suptitle('Temporal mean of velocity components', fontsize=16, y=1.02)

x.colorbar.set_label('m/y', rotation=270)
y.colorbar.set_label('m/yr', rotation=270)

for i in range(len(axs)):
    axs[i].set_ylabel(None)
    axs[i].set_xlabel(None)
fig.supylabel('Y-coordinate of projection (meters)', x=0.08, fontsize=12)
fig.supxlabel('X-coordinate of projection (meters)', fontsize=12);

../_images/5c92972442e2ec5616d3dd2a1210110d4d4223fa7b4d41f0ff1c2c83369cd6b3.png

In addition to visualizing components (above), plotting velocity vectors is helpful for understanding magnitude and direction of flow:

First, calculate and visualize mean velocity magnitude over time (we will use the function defined in Part 1), and the mean estimated error over time:

ds_v = calc_v_magnitude(single_glacier_raster.mean(dim='mid_date',skipna=True))

fig, axs= plt.subplots(ncols=2, figsize=(20,7))

single_glacier_vector.plot(ax=axs[0], facecolor='none', edgecolor='red')
single_glacier_raster.mean(dim='mid_date').plot.quiver('x','y','vx','vy', ax=axs[1], angles='xy', robust=True)

single_glacier_vector.plot(ax=axs[1], facecolor='none', edgecolor='red')
a = ds_v['vmag'].plot(ax=axs[0], alpha=0.6, vmax=45, vmin=5)
a.colorbar.set_label('meter/year')

fig.supylabel('Y-coordinate of projection (meters)', x=0.08, fontsize=12)
fig.supxlabel('X-coordinate of projection (meters)', fontsize=12)

fig.suptitle('Velocity vectors (R) and magntiude of velocity (L), averaged over time', fontsize=16, y=0.98)
for i in range(len(axs)):
    axs[i].set_xlabel(None)
    axs[i].set_ylabel(None)
    axs[i].set_title(None);

../_images/721cd5a53feb2bc90cfb8c15de3d13cafef12de46778b38e4a7303d3e6945565.png

Visualize magnitude of velocity overlaid with velocity vectors next to velocity error:

fig, ax = plt.subplots(figsize=(22,6), ncols=2)

vmag = ds_v.vmag.plot(ax=ax[0], vmin=0, vmax=52,alpha=0.5)
single_glacier_raster.mean(dim='mid_date').plot.quiver('x','y','vx','vy', ax=ax[0], angles='xy', robust=True)

err = ds_v.v_error.plot(ax=ax[1], vmin=0, vmax=52)


vmag.colorbar.set_label('m/y')#, rotation=270)
err.colorbar.set_label('m/y')#, rotation=270)

for i in range(len(ax)):
    ax[i].set_ylabel(None)
    ax[i].set_xlabel(None)
    ax[i].set_title(None)

fig.supxlabel('X-coordinate of projection (meters)', fontsize=12)
fig.supylabel('Y-coordinate of projection (meters)', x=0.08, fontsize=12)
fig.suptitle('Mean velocity magnitude over time (L), mean error over time (R)', fontsize=16, y=1.02);

../_images/760ee7144880fa854d2887472e6dce5a963455e01a4bb7fd29861d00dc5a8e19.png

v_error is large relative to the magnitude of velocity, suggesting that this data is pretty noisy.

3) Temporal velocity variability#

Reduce over the spatial dimensions (this time we will switch it up and choose a different reduction function) and visualize variability over time:

fig, ax = plt.subplots(figsize=(20,5))

vmag_med = calc_v_magnitude(single_glacier_raster.median(dim=['x','y']))

vmag_med.plot.scatter(x='mid_date', y='vmag',ax=ax, marker='o', edgecolors='None',alpha=0.5)
fig.suptitle('Spatial median magnitude of velocity over time')
ax.set_title(None)
ax.set_ylabel('m/y')
ax.set_xlabel('Time');

../_images/025f989aa453eb7a409b855afc013ff848feba1d602b5670f73dfeb12ea91de6.png

This helps get a sense of velocity variability over time, but also shows how many outliers there are, even after taking the median over the x and y dimensions. In the final section of this notebook, we explore different approaches for changing the resolution of the temporal dimension.

A. Data exploration

Contents

A. Data exploration#

1) Load raster data into memory and visualize with vector data#

2) Examine data coverage#

3) Break down by sensor#

Landsat 8#

What about a sensor with multiple identifiers?#

4) Combine sensor-specific subset#

B. Examine velocity variability#

1) Histograms and summary statistics#

2) Spatial velocity variability#

3) Temporal velocity variability#

C. Dimensional computations#

1) Temporal resampling#

2) Grouped analysis by season#