Cascadia Landslide Data Post-Processing#

ShakeMap Integration and Unique ID Addition#

This notebook performs final post-processing on Cascadia landslide dataset:

  • Adding USGS M9 scenario ShakeMap seismic intensity data and adding unique identifiers to create the final dataset for visualization and analysis.

  • rainfall data from the rainfall notebook

ShakeMap Integration#

This notebook integrates USGS ShakeMap intensity data with Cascadia landslide location data. The ShakeMap data is available at https://earthquake.usgs.gov/scenarios/eventpage/cszm9ensemble_se/shakemap/intensity.

The process involves several key steps:

1. Data Loading and Parsing#

  • Load ShakeMap XML data containing seismic intensity measurements

  • Parse XML structure to extract grid fields and data values

  • Convert raw XML data into structured pandas DataFrame

2. Data Processing#

  • Extract seismic parameters: MMI (Modified Mercalli Intensity), PGA (Peak Ground Acceleration), PGV (Peak Ground Velocity), and PSA03

  • Create spatial interpolators using nearest neighbor interpolation

  • Analyze PGA value distributions across different intensity ranges

3. Landslide Data Integration#

  • Calculate centroids for landslide geometries

  • Interpolate seismic values at landslide locations using spatial interpolation

4. Visualization and Analysis#

  • Create choropleth maps showing PGA values at landslide locations

  • Generate statistical summaries of seismic intensities at landslide sites

  • Export final integrated dataset with both geometric and seismic attributes

import pandas as pd
import geopandas as gpd 
import numpy as np
import xml.etree.ElementTree as ET
import matplotlib.pyplot as plt
from pyproj import Transformer
xml_file = 'shaking/grid.xml'
namespace = {'shakemap': 'http://earthquake.usgs.gov/eqcenter/shakemap'}

# Parse the XML file
tree = ET.parse(xml_file)
root = tree.getroot()
# Get the column names from the <grid_field> tags
fields = root.findall('shakemap:grid_field', namespace)
column_names = [field.get('name') for field in fields]
print(column_names)
['LON', 'LAT', 'MMI', 'PGA', 'PGV', 'PSA03', 'PSA10', 'PSA30', 'SVEL']
# Get the grid data, which is a long string of numbers
grid_data_text = root.find('shakemap:grid_data', namespace).text.strip()
# Use Numpy to convert the string into a numerical array
data_array = np.fromstring(grid_data_text, dtype=float, sep=' ')
# Reshape the 1D array into a 2D table (rows x columns)
num_columns = len(column_names)
data_array_2d = data_array.reshape(-1, num_columns)
# Create a pandas DataFrame from the 2D array
shakemap_df = pd.DataFrame(data_array_2d, columns=column_names)
print(shakemap_df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 438288 entries, 0 to 438287
Data columns (total 9 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   LON     438288 non-null  float64
 1   LAT     438288 non-null  float64
 2   MMI     438288 non-null  float64
 3   PGA     438288 non-null  float64
 4   PGV     438288 non-null  float64
 5   PSA03   438288 non-null  float64
 6   PSA10   438288 non-null  float64
 7   PSA30   438288 non-null  float64
 8   SVEL    438288 non-null  float64
dtypes: float64(9)
memory usage: 30.1 MB
None
print(len(shakemap_df))
print(shakemap_df.head())
438288
        LON      LAT  MMI    PGA    PGV  PSA03  PSA10  PSA30   SVEL
0 -127.2671  52.5574  4.3  3.148  3.693  7.551  3.124  1.259  900.0
1 -127.2472  52.5574  4.3  3.149  3.694  7.553  3.125  1.260  900.0
2 -127.2273  52.5574  4.3  3.150  3.695  7.555  3.126  1.261  866.7
3 -127.2073  52.5574  4.3  3.142  3.701  7.572  3.130  1.262  600.0
4 -127.1874  52.5574  4.3  3.131  3.710  7.596  3.137  1.263  738.5
# Define the bins for PGA values
# Bins are (0, 2], (2, 4], ..., (48, 50], (50, inf]
bins = list(range(0, 51, 2)) + [float('inf')]

# Create labels for the bins
labels = [f"{bins[i]}-{bins[i+1]}" for i in range(len(bins)-2)] + [f">50"]

# Use pd.cut to categorize PGA values into the defined bins
pga_binned = pd.cut(shakemap_df['PGA'], bins=bins, labels=labels, right=True, include_lowest=True)

# Count the number of values in each bin and sort by the bin range
pga_counts = pga_binned.value_counts().sort_index()

# Print the counts
print("PGA Value Counts in Different Ranges:")
print(pga_counts)
PGA Value Counts in Different Ranges:
PGA
0-2      135275
2-4       83712
4-6       41216
6-8       22319
8-10      18737
10-12     14483
12-14     12017
14-16      9822
16-18      8715
18-20      6961
20-22      6344
22-24      6000
24-26      6352
26-28      5276
28-30      4784
30-32      4371
32-34      3504
34-36      3141
36-38      3076
38-40      2748
40-42      2570
42-44      2509
44-46      2499
46-48      2589
48-50      2534
>50       26734
Name: count, dtype: int64

open dataset#

landslide_ds_path = "../PreProcessing/processed_geojson/washington_landslides_processed_v2.geojson"
ds_name = "washington"
landslides_gdf = gpd.read_file(landslide_ds_path)
from scipy.interpolate import NearestNDInterpolator
import numpy as np

points = np.column_stack((shakemap_df['LON'], shakemap_df['LAT']))

interpolators = {
    field: NearestNDInterpolator(points, shakemap_df[field].values)
    for field in ['MMI', 'PGA', 'PGV', 'PSA03']
}
# Calculate the centroid for all geometries. For points, the centroid is the point itself.
landslides_gdf['centroid'] = landslides_gdf.geometry.centroid

# Ensure the centroid coordinates are in WGS84 (EPSG:4326)
landslides_gdf['centroid'] = landslides_gdf['centroid'].to_crs(epsg=4326)

# Extract longitude and latitude from the centroid
cr = landslides_gdf['centroid']
landslides_gdf['lon'] = cr.x
landslides_gdf['lat'] = cr.y

# Drop rows where longitude or latitude is missing
landslides_gdf.dropna(subset=['lon', 'lat'], inplace=True)

# Prepare the query points for interpolation
query_points_combined = landslides_gdf[['lon', 'lat']].values

# Interpolate PGA, MMI, and PGV values using the existing interpolators
landslides_gdf['filter_PGA'] = interpolators['PGA'](query_points_combined)
landslides_gdf['filter_MMI'] = interpolators['MMI'](query_points_combined)
landslides_gdf['filter_PGV'] = interpolators['PGV'](query_points_combined)
landslides_gdf['filter_PSA03'] = interpolators['PSA03'](query_points_combined)

# Display the head of the updated dataframe with the new columns
print(landslides_gdf[['lon', 'lat', 'filter_PGA', 'filter_MMI', 'filter_PGV']].head())
          lon        lat  filter_PGA  filter_MMI  filter_PGV
0 -122.892431  46.993964       29.33         7.0       27.67
3 -122.942050  47.120019       24.45         7.3       34.02
5 -122.903901  47.131744       27.47         7.5       38.16
6 -123.009570  46.796240       27.92         6.7       21.45
7 -122.986060  47.137527       33.31         7.5       39.01
import contextily as cx

# Ensure the geometry is set to the 'centroid' column for plotting
landslides_gdf = landslides_gdf.set_geometry("centroid")
# Re-project the GeoDataFrame to Web Mercator (EPSG:3857) which is required by contextily
landslides_gdf_wm = landslides_gdf.to_crs(epsg=3857)    

# Create the plot
fig, ax = plt.subplots(figsize=(12, 12))
# Plot the landslide data
landslides_gdf_wm.plot(column='filter_PGA',
                                ax=ax, 
                                legend=True, 
                                cmap='viridis', 
                                alpha=0.7,
                                markersize=20,
                                legend_kwds={'label': "Peak Ground Acceleration (PGA %g)",
                                            'orientation': "horizontal"})   

# Add the basemap
cx.add_basemap(ax, source=cx.providers.OpenStreetMap.Mapnik)

# Set title and remove axis labels for a cleaner map look
ax.set_title('PGA Values at Landslide Locations')
ax.set_axis_off()
plt.show()
../../_images/1958238b30f13b56058ab53961e4fee85731488e6616b6e9a833b76a6f9cee86.png
landslides_gdf.head()
LANDSLIDE_ID MATERIAL MOVEMENT MOVE_CODE CONFIDENCE RELATIVE_AGE YEAR_MOVE FIELD_VERIFIED SLOPE_DEG HS_HEIGHT_FT ... filter_MATERIAL filter_CONFIDENCE geometry centroid lon lat filter_PGA filter_MMI filter_PGV filter_PSA03
0 18414 Earth or debris Flow EFL Low (1-10) Pre-historic (>150 years) None No 11.0 3.0 ... Earth or debris Low (1-10) MULTIPOLYGON (((1043516.638 614643.978, 104351... POINT (-122.89243 46.99396) -122.892431 46.993964 29.33 7.0 27.67 70.45
3 20661 Earth or debris Topple ET Moderate (11-29) Pre-historic (>150 years) None No 61.0 24.0 ... Earth or debris Moderate (11-29) MULTIPOLYGON (((1032543.812 660981.578, 103254... POINT (-122.94205 47.12002) -122.942050 47.120019 24.45 7.3 34.02 64.60
5 20392 Earth or debris Slide-Rotational ES-R Low (1-10) Pre-historic (>150 years) None No 34.0 15.0 ... Earth or debris Low (1-10) MULTIPOLYGON (((1042187.543 664959.22, 1042188... POINT (-122.9039 47.13174) -122.903901 47.131744 27.47 7.5 38.16 78.29
6 18573 Earth or debris Flow EFL Low (1-10) Pre-historic (>150 years) None No 29.0 6.0 ... Earth or debris Low (1-10) MULTIPOLYGON (((1011999.104 543485.112, 101200... POINT (-123.00957 46.79624) -123.009570 46.796240 27.92 6.7 21.45 81.83
7 20357 Earth or debris Flow EFL Low (1-10) Pre-historic (>150 years) None No 20.0 9.0 ... Earth or debris Low (1-10) MULTIPOLYGON (((1021808.655 667705.559, 102181... POINT (-122.98606 47.13753) -122.986060 47.137527 33.31 7.5 39.01 79.19

5 rows × 58 columns

Rainfall#

30 years average annual rainfall data from: 30-Year (1990-2019) Annual Average of DAYMET Precipitation and Temperature for North America https://doi.org/10.5066/P9E0JZ82

import rasterio
from rasterio.transform import rowcol

rain_tif = "./rainfall-data/PPT9019.tif"

with rasterio.open(rain_tif) as src:
    # Transform lon/lat (EPSG:4326) -> raster CRS (NAD83 / Albers)
    tf = Transformer.from_crs("EPSG:4326", src.crs, always_xy=True)
    xs, ys = tf.transform(landslides_gdf["lon"].values, landslides_gdf["lat"].values)

    # Identify which transformed points fall inside the raster bounds
    b = src.bounds
    in_bounds = (
        (xs >= b.left) & (xs <= b.right) &
        (ys >= b.bottom) & (ys <= b.top)
    )

    # Prepare output array, default to NaN
    vals = np.full(len(xs), np.nan, dtype="float64")

    # Sample only the points that are in-bounds
    coords_in = np.column_stack([xs[in_bounds], ys[in_bounds]])
    if len(coords_in) > 0:
        sampled = list(src.sample(coords_in))  # each is an array of band values
        sampled = np.array([s[0] for s in sampled], dtype="float64")

        # Replace raster NoData with NaN, if defined
        if src.nodata is not None:
            sampled[sampled == src.nodata] = np.nan

        vals[in_bounds] = sampled

# Add the column to your GeoDataFrame
landslides_gdf["filter_RAINFALL"] = vals
landslides_gdf.head()
LANDSLIDE_ID MATERIAL MOVEMENT MOVE_CODE CONFIDENCE RELATIVE_AGE YEAR_MOVE FIELD_VERIFIED SLOPE_DEG HS_HEIGHT_FT ... filter_CONFIDENCE geometry centroid lon lat filter_PGA filter_MMI filter_PGV filter_PSA03 filter_RAINFALL
0 18414 Earth or debris Flow EFL Low (1-10) Pre-historic (>150 years) None No 11.0 3.0 ... Low (1-10) MULTIPOLYGON (((1043516.638 614643.978, 104351... POINT (-122.89243 46.99396) -122.892431 46.993964 29.33 7.0 27.67 70.45 1384.876221
3 20661 Earth or debris Topple ET Moderate (11-29) Pre-historic (>150 years) None No 61.0 24.0 ... Moderate (11-29) MULTIPOLYGON (((1032543.812 660981.578, 103254... POINT (-122.94205 47.12002) -122.942050 47.120019 24.45 7.3 34.02 64.60 1503.936035
5 20392 Earth or debris Slide-Rotational ES-R Low (1-10) Pre-historic (>150 years) None No 34.0 15.0 ... Low (1-10) MULTIPOLYGON (((1042187.543 664959.22, 1042188... POINT (-122.9039 47.13174) -122.903901 47.131744 27.47 7.5 38.16 78.29 1421.002563
6 18573 Earth or debris Flow EFL Low (1-10) Pre-historic (>150 years) None No 29.0 6.0 ... Low (1-10) MULTIPOLYGON (((1011999.104 543485.112, 101200... POINT (-123.00957 46.79624) -123.009570 46.796240 27.92 6.7 21.45 81.83 1401.459595
7 20357 Earth or debris Flow EFL Low (1-10) Pre-historic (>150 years) None No 20.0 9.0 ... Low (1-10) MULTIPOLYGON (((1021808.655 667705.559, 102181... POINT (-122.98606 47.13753) -122.986060 47.137527 33.31 7.5 39.01 79.19 1567.274048

5 rows × 59 columns

import contextily as cx

# Ensure the geometry is set to the 'centroid' column for plotting
landslides_gdf = landslides_gdf.set_geometry("centroid")
# Re-project the GeoDataFrame to Web Mercator (EPSG:3857) which is required by contextily
landslides_gdf_wm = landslides_gdf.to_crs(epsg=3857)    

# Create the plot
fig, ax = plt.subplots(figsize=(12, 12))
# Plot the landslide data
landslides_gdf_wm.plot(column='filter_RAINFALL',
                                ax=ax, 
                                legend=True, 
                                cmap='viridis', 
                                alpha=0.7,
                                markersize=20,
                                legend_kwds={'label': "rainfall in mm/year",
                                            'orientation': "horizontal"})   

# Add the basemap
cx.add_basemap(ax, source=cx.providers.OpenStreetMap.Mapnik)

# Set title and remove axis labels for a cleaner map look
ax.set_title('rainfall Values at Landslide Locations')
ax.set_axis_off()
plt.show()
../../_images/b4a5985fde790d9017b05bd1dd97726ec458d0f0727b9898cea36347d36c0d69.png

Add Unique ID#

Generate sequential unique identifiers for all landslide records

# Add a unique ID column to the landslides_gdf
landslides_gdf['viewer_id'] = [
    f"{origin}_{i+1}" for i, origin in enumerate(landslides_gdf['filter_ORIGIN'])
]

print(f"Added unique_id column with {len(landslides_gdf)} unique IDs")
print(landslides_gdf[['filter_ORIGIN', 'viewer_id']].head())
Added unique_id column with 60919 unique IDs
  filter_ORIGIN     viewer_id
0    WASHINGTON  WASHINGTON_1
3    WASHINGTON  WASHINGTON_2
5    WASHINGTON  WASHINGTON_3
6    WASHINGTON  WASHINGTON_4
7    WASHINGTON  WASHINGTON_5

Save the Landslides#

# Keep the original geometry column and remove centroid
gdf_export_original = landslides_gdf.copy()
gdf_export_original = gdf_export_original.set_geometry('geometry')  # Set original geometry as active
gdf_export_original = gdf_export_original.drop('centroid', axis=1)  # Remove the centroid geometry column

# Export with original geometry
gdf_export_original.to_file(f"final_{ds_name}_v2.geojson", driver="GeoJSON")
print(f"✓ Successfully saved with original geometry to: final_{ds_name}_v2.geojson")
✓ Successfully saved with original geometry to: final_washington_v2.geojson