Cascadia Landslide Data Post-Processing

Cascadia Landslide Data Post-Processing#

ShakeMap Integration and Unique ID Addition#

This notebook performs final post-processing on Cascadia landslide dataset:

Adding USGS M9 scenario ShakeMap seismic intensity data and adding unique identifiers to create the final dataset for visualization and analysis.
rainfall data from the rainfall notebook

ShakeMap Integration#

This notebook integrates USGS ShakeMap intensity data with Cascadia landslide location data. The ShakeMap data is available at https://earthquake.usgs.gov/scenarios/eventpage/cszm9ensemble_se/shakemap/intensity.

The process involves several key steps:

1. Data Loading and Parsing#

Load ShakeMap XML data containing seismic intensity measurements
Parse XML structure to extract grid fields and data values
Convert raw XML data into structured pandas DataFrame

2. Data Processing#

Extract seismic parameters: MMI (Modified Mercalli Intensity), PGA (Peak Ground Acceleration), PGV (Peak Ground Velocity), and PSA03
Create spatial interpolators using nearest neighbor interpolation
Analyze PGA value distributions across different intensity ranges

3. Landslide Data Integration#

Calculate centroids for landslide geometries
Interpolate seismic values at landslide locations using spatial interpolation

4. Visualization and Analysis#

Create choropleth maps showing PGA values at landslide locations
Generate statistical summaries of seismic intensities at landslide sites
Export final integrated dataset with both geometric and seismic attributes

import pandas as pd
import geopandas as gpd 
import numpy as np
import xml.etree.ElementTree as ET
import matplotlib.pyplot as plt
from pyproj import Transformer

xml_file = 'shaking/grid.xml'
namespace = {'shakemap': 'http://earthquake.usgs.gov/eqcenter/shakemap'}

# Parse the XML file
tree = ET.parse(xml_file)
root = tree.getroot()

# Get the column names from the <grid_field> tags
fields = root.findall('shakemap:grid_field', namespace)
column_names = [field.get('name') for field in fields]

print(column_names)

['LON', 'LAT', 'MMI', 'PGA', 'PGV', 'PSA03', 'PSA10', 'PSA30', 'SVEL']

# Get the grid data, which is a long string of numbers
grid_data_text = root.find('shakemap:grid_data', namespace).text.strip()

# Use Numpy to convert the string into a numerical array
data_array = np.fromstring(grid_data_text, dtype=float, sep=' ')

# Reshape the 1D array into a 2D table (rows x columns)
num_columns = len(column_names)
data_array_2d = data_array.reshape(-1, num_columns)

# Create a pandas DataFrame from the 2D array
shakemap_df = pd.DataFrame(data_array_2d, columns=column_names)

print(shakemap_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 438288 entries, 0 to 438287
Data columns (total 9 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   LON     438288 non-null  float64
 1   LAT     438288 non-null  float64
 2   MMI     438288 non-null  float64
 3   PGA     438288 non-null  float64
 4   PGV     438288 non-null  float64
 5   PSA03   438288 non-null  float64
 6   PSA10   438288 non-null  float64
 7   PSA30   438288 non-null  float64
 8   SVEL    438288 non-null  float64
dtypes: float64(9)
memory usage: 30.1 MB
None

print(len(shakemap_df))
print(shakemap_df.head())

438288
        LON      LAT  MMI    PGA    PGV  PSA03  PSA10  PSA30   SVEL
-127.2671  52.5574  4.3  3.148  3.693  7.551  3.124  1.259  900.0
-127.2472  52.5574  4.3  3.149  3.694  7.553  3.125  1.260  900.0
-127.2273  52.5574  4.3  3.150  3.695  7.555  3.126  1.261  866.7
-127.2073  52.5574  4.3  3.142  3.701  7.572  3.130  1.262  600.0
-127.1874  52.5574  4.3  3.131  3.710  7.596  3.137  1.263  738.5

# Define the bins for PGA values
# Bins are (0, 2], (2, 4], ..., (48, 50], (50, inf]
bins = list(range(0, 51, 2)) + [float('inf')]

# Create labels for the bins
labels = [f"{bins[i]}-{bins[i+1]}" for i in range(len(bins)-2)] + [f">50"]

# Use pd.cut to categorize PGA values into the defined bins
pga_binned = pd.cut(shakemap_df['PGA'], bins=bins, labels=labels, right=True, include_lowest=True)

# Count the number of values in each bin and sort by the bin range
pga_counts = pga_binned.value_counts().sort_index()

# Print the counts
print("PGA Value Counts in Different Ranges:")
print(pga_counts)

PGA Value Counts in Different Ranges:
PGA
0-2      135275
2-4       83712
4-6       41216
6-8       22319
8-10      18737
10-12     14483
12-14     12017
14-16      9822
16-18      8715
18-20      6961
20-22      6344
22-24      6000
24-26      6352
26-28      5276
28-30      4784
30-32      4371
32-34      3504
34-36      3141
36-38      3076
38-40      2748
40-42      2570
42-44      2509
44-46      2499
46-48      2589
48-50      2534
>50       26734
Name: count, dtype: int64

open dataset#

landslide_ds_path = "../PreProcessing/processed_geojson/washington_landslides_processed_v2.geojson"
ds_name = "washington"
landslides_gdf = gpd.read_file(landslide_ds_path)

from scipy.interpolate import NearestNDInterpolator
import numpy as np

points = np.column_stack((shakemap_df['LON'], shakemap_df['LAT']))

interpolators = {
    field: NearestNDInterpolator(points, shakemap_df[field].values)
    for field in ['MMI', 'PGA', 'PGV', 'PSA03']
}

# Calculate the centroid for all geometries. For points, the centroid is the point itself.
landslides_gdf['centroid'] = landslides_gdf.geometry.centroid

# Ensure the centroid coordinates are in WGS84 (EPSG:4326)
landslides_gdf['centroid'] = landslides_gdf['centroid'].to_crs(epsg=4326)

# Extract longitude and latitude from the centroid
cr = landslides_gdf['centroid']
landslides_gdf['lon'] = cr.x
landslides_gdf['lat'] = cr.y

# Drop rows where longitude or latitude is missing
landslides_gdf.dropna(subset=['lon', 'lat'], inplace=True)

# Prepare the query points for interpolation
query_points_combined = landslides_gdf[['lon', 'lat']].values

# Interpolate PGA, MMI, and PGV values using the existing interpolators
landslides_gdf['filter_PGA'] = interpolators['PGA'](query_points_combined)
landslides_gdf['filter_MMI'] = interpolators['MMI'](query_points_combined)
landslides_gdf['filter_PGV'] = interpolators['PGV'](query_points_combined)
landslides_gdf['filter_PSA03'] = interpolators['PSA03'](query_points_combined)

# Display the head of the updated dataframe with the new columns
print(landslides_gdf[['lon', 'lat', 'filter_PGA', 'filter_MMI', 'filter_PGV']].head())

          lon        lat  filter_PGA  filter_MMI  filter_PGV
-122.892431  46.993964       29.33         7.0       27.67
-122.942050  47.120019       24.45         7.3       34.02
-122.903901  47.131744       27.47         7.5       38.16
-123.009570  46.796240       27.92         6.7       21.45
-122.986060  47.137527       33.31         7.5       39.01

import contextily as cx

# Ensure the geometry is set to the 'centroid' column for plotting
landslides_gdf = landslides_gdf.set_geometry("centroid")
# Re-project the GeoDataFrame to Web Mercator (EPSG:3857) which is required by contextily
landslides_gdf_wm = landslides_gdf.to_crs(epsg=3857)    

# Create the plot
fig, ax = plt.subplots(figsize=(12, 12))
# Plot the landslide data
landslides_gdf_wm.plot(column='filter_PGA',
                                ax=ax, 
                                legend=True, 
                                cmap='viridis', 
                                alpha=0.7,
                                markersize=20,
                                legend_kwds={'label': "Peak Ground Acceleration (PGA %g)",
                                            'orientation': "horizontal"})   

# Add the basemap
cx.add_basemap(ax, source=cx.providers.OpenStreetMap.Mapnik)

# Set title and remove axis labels for a cleaner map look
ax.set_title('PGA Values at Landslide Locations')
ax.set_axis_off()
plt.show()

../../_images/1958238b30f13b56058ab53961e4fee85731488e6616b6e9a833b76a6f9cee86.png

landslides_gdf.head()

	LANDSLIDE_ID	MATERIAL	MOVEMENT	MOVE_CODE	CONFIDENCE	RELATIVE_AGE	YEAR_MOVE	FIELD_VERIFIED	SLOPE_DEG	HS_HEIGHT_FT	...	filter_MATERIAL	filter_CONFIDENCE	geometry	centroid	lon	lat	filter_PGA	filter_MMI	filter_PGV	filter_PSA03
0	18414	Earth or debris	Flow	EFL	Low (1-10)	Pre-historic (>150 years)	None	No	11.0	3.0	...	Earth or debris	Low (1-10)	MULTIPOLYGON (((1043516.638 614643.978, 104351...	POINT (-122.89243 46.99396)	-122.892431	46.993964	29.33	7.0	27.67	70.45
3	20661	Earth or debris	Topple	ET	Moderate (11-29)	Pre-historic (>150 years)	None	No	61.0	24.0	...	Earth or debris	Moderate (11-29)	MULTIPOLYGON (((1032543.812 660981.578, 103254...	POINT (-122.94205 47.12002)	-122.942050	47.120019	24.45	7.3	34.02	64.60
5	20392	Earth or debris	Slide-Rotational	ES-R	Low (1-10)	Pre-historic (>150 years)	None	No	34.0	15.0	...	Earth or debris	Low (1-10)	MULTIPOLYGON (((1042187.543 664959.22, 1042188...	POINT (-122.9039 47.13174)	-122.903901	47.131744	27.47	7.5	38.16	78.29
6	18573	Earth or debris	Flow	EFL	Low (1-10)	Pre-historic (>150 years)	None	No	29.0	6.0	...	Earth or debris	Low (1-10)	MULTIPOLYGON (((1011999.104 543485.112, 101200...	POINT (-123.00957 46.79624)	-123.009570	46.796240	27.92	6.7	21.45	81.83
7	20357	Earth or debris	Flow	EFL	Low (1-10)	Pre-historic (>150 years)	None	No	20.0	9.0	...	Earth or debris	Low (1-10)	MULTIPOLYGON (((1021808.655 667705.559, 102181...	POINT (-122.98606 47.13753)	-122.986060	47.137527	33.31	7.5	39.01	79.19

5 rows × 58 columns

Rainfall#

30 years average annual rainfall data from: 30-Year (1990-2019) Annual Average of DAYMET Precipitation and Temperature for North America https://doi.org/10.5066/P9E0JZ82

import rasterio
from rasterio.transform import rowcol

rain_tif = "./rainfall-data/PPT9019.tif"

with rasterio.open(rain_tif) as src:
    # Transform lon/lat (EPSG:4326) -> raster CRS (NAD83 / Albers)
    tf = Transformer.from_crs("EPSG:4326", src.crs, always_xy=True)
    xs, ys = tf.transform(landslides_gdf["lon"].values, landslides_gdf["lat"].values)

    # Identify which transformed points fall inside the raster bounds
    b = src.bounds
    in_bounds = (
        (xs >= b.left) & (xs <= b.right) &
        (ys >= b.bottom) & (ys <= b.top)
    )

    # Prepare output array, default to NaN
    vals = np.full(len(xs), np.nan, dtype="float64")

    # Sample only the points that are in-bounds
    coords_in = np.column_stack([xs[in_bounds], ys[in_bounds]])
    if len(coords_in) > 0:
        sampled = list(src.sample(coords_in))  # each is an array of band values
        sampled = np.array([s[0] for s in sampled], dtype="float64")

        # Replace raster NoData with NaN, if defined
        if src.nodata is not None:
            sampled[sampled == src.nodata] = np.nan

        vals[in_bounds] = sampled

# Add the column to your GeoDataFrame
landslides_gdf["filter_RAINFALL"] = vals
landslides_gdf.head()

	LANDSLIDE_ID	MATERIAL	MOVEMENT	MOVE_CODE	CONFIDENCE	RELATIVE_AGE	YEAR_MOVE	FIELD_VERIFIED	SLOPE_DEG	HS_HEIGHT_FT	...	filter_CONFIDENCE	geometry	centroid	lon	lat	filter_PGA	filter_MMI	filter_PGV	filter_PSA03	filter_RAINFALL
0	18414	Earth or debris	Flow	EFL	Low (1-10)	Pre-historic (>150 years)	None	No	11.0	3.0	...	Low (1-10)	MULTIPOLYGON (((1043516.638 614643.978, 104351...	POINT (-122.89243 46.99396)	-122.892431	46.993964	29.33	7.0	27.67	70.45	1384.876221
3	20661	Earth or debris	Topple	ET	Moderate (11-29)	Pre-historic (>150 years)	None	No	61.0	24.0	...	Moderate (11-29)	MULTIPOLYGON (((1032543.812 660981.578, 103254...	POINT (-122.94205 47.12002)	-122.942050	47.120019	24.45	7.3	34.02	64.60	1503.936035
5	20392	Earth or debris	Slide-Rotational	ES-R	Low (1-10)	Pre-historic (>150 years)	None	No	34.0	15.0	...	Low (1-10)	MULTIPOLYGON (((1042187.543 664959.22, 1042188...	POINT (-122.9039 47.13174)	-122.903901	47.131744	27.47	7.5	38.16	78.29	1421.002563
6	18573	Earth or debris	Flow	EFL	Low (1-10)	Pre-historic (>150 years)	None	No	29.0	6.0	...	Low (1-10)	MULTIPOLYGON (((1011999.104 543485.112, 101200...	POINT (-123.00957 46.79624)	-123.009570	46.796240	27.92	6.7	21.45	81.83	1401.459595
7	20357	Earth or debris	Flow	EFL	Low (1-10)	Pre-historic (>150 years)	None	No	20.0	9.0	...	Low (1-10)	MULTIPOLYGON (((1021808.655 667705.559, 102181...	POINT (-122.98606 47.13753)	-122.986060	47.137527	33.31	7.5	39.01	79.19	1567.274048

5 rows × 59 columns

import contextily as cx

# Ensure the geometry is set to the 'centroid' column for plotting
landslides_gdf = landslides_gdf.set_geometry("centroid")
# Re-project the GeoDataFrame to Web Mercator (EPSG:3857) which is required by contextily
landslides_gdf_wm = landslides_gdf.to_crs(epsg=3857)    

# Create the plot
fig, ax = plt.subplots(figsize=(12, 12))
# Plot the landslide data
landslides_gdf_wm.plot(column='filter_RAINFALL',
                                ax=ax, 
                                legend=True, 
                                cmap='viridis', 
                                alpha=0.7,
                                markersize=20,
                                legend_kwds={'label': "rainfall in mm/year",
                                            'orientation': "horizontal"})   

# Add the basemap
cx.add_basemap(ax, source=cx.providers.OpenStreetMap.Mapnik)

# Set title and remove axis labels for a cleaner map look
ax.set_title('rainfall Values at Landslide Locations')
ax.set_axis_off()
plt.show()

../../_images/b4a5985fde790d9017b05bd1dd97726ec458d0f0727b9898cea36347d36c0d69.png

Add Unique ID#

Generate sequential unique identifiers for all landslide records

# Add a unique ID column to the landslides_gdf
landslides_gdf['viewer_id'] = [
    f"{origin}_{i+1}" for i, origin in enumerate(landslides_gdf['filter_ORIGIN'])
]

print(f"Added unique_id column with {len(landslides_gdf)} unique IDs")
print(landslides_gdf[['filter_ORIGIN', 'viewer_id']].head())

Added unique_id column with 60919 unique IDs
  filter_ORIGIN     viewer_id
  WASHINGTON  WASHINGTON_1
  WASHINGTON  WASHINGTON_2
  WASHINGTON  WASHINGTON_3
  WASHINGTON  WASHINGTON_4
  WASHINGTON  WASHINGTON_5

Save the Landslides#

# Keep the original geometry column and remove centroid
gdf_export_original = landslides_gdf.copy()
gdf_export_original = gdf_export_original.set_geometry('geometry')  # Set original geometry as active
gdf_export_original = gdf_export_original.drop('centroid', axis=1)  # Remove the centroid geometry column

# Export with original geometry
gdf_export_original.to_file(f"final_{ds_name}_v2.geojson", driver="GeoJSON")
print(f"✓ Successfully saved with original geometry to: final_{ds_name}_v2.geojson")

✓ Successfully saved with original geometry to: final_washington_v2.geojson