Oregon Landslide Data Preprocessing

Oregon Landslide Data Preprocessing#

Overview#

This notebook processes landslide data from Oregon’s Statewide Landslide Information Database for Oregon (SLIDO) into a standardized format for integration with other regional datasets.

The workflow includes:

Data Loading: Load GeoDataFrame layers from SLIDO database
Deposits Analysis: Examine the main deposits layer structure and data quality
References Processing: Load and integrate citation references from the References layer. Merge reference information with landslide records using REF_ID_COD
Data Standardization: Standardize confidence levels, movement types, and material classifications
Column Selection: Select and rename columns to match unified schema
Quality Assurance: Validate data completeness and geometry integrity
Export: Save processed data as GeoJSON for further analysis

Key Data Layers:

Deposits: Landslide records with detailed attributes
References: 376 citation records linked via REF_ID_COD

Obtain the State Statewide Landslide Information Database for Oregon (SLIDO) from the official DOGAMI website: https://www.oregon.gov/dogami/slido/Pages/data.aspx

Create a directory ‘Data’ in your projects root folder and place the downloaded files inside it.

Initialization#

import fiona
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

gdb_dir = "../data/Oregon/SLIDO_Release_4p5_wMetadata.gdb"
layers = fiona.listlayers(gdb_dir)
print("Available layers in the GDB:")
for lyr in layers:
    df = gpd.read_file(gdb_dir, layer=lyr)
    print(f"{lyr:35s} {len(df):6d} data points")

Available layers in the GDB:
Index_LS_Studies                        72 data points

C:\Users\loicb\anaconda3\envs\landslide\Lib\site-packages\pyogrio\raw.py:198: RuntimeWarning: organizePolygons() received a polygon with more than 100 parts. The processing may be really slow.  You can skip the processing by setting METHOD=SKIP, or only make it analyze counter-clock wise parts by setting METHOD=ONLY_CCW if you can assume that the outline of holes is counter-clock wise defined
  return ogr_read(

Detailed_Deep_Landslide_Susceptibility      4 data points
References                             376 data points
Detailed_Susceptibility_Map_Index       12 data points
Historic_Landslide_Points            15386 data points
Scarp_Flanks                         28650 data points
Scarps                               55824 data points
Deposits                             71318 data points
Inventory_Map_Index                   1569 data points
Inventory_Map_Index_2                  355 data points
Inventory_Map_Index_3                  355 data points
References_Check_In_Map_Index_3         21 data points

deposits = gpd.read_file(gdb_dir, layer="Deposits")

deposits.plot(figsize=(8, 6), edgecolor="k", linewidth=0.2)
plt.title("Oregon Landslide Inventory")
plt.axis("off")
plt.show()

../../_images/52a01deaabb9183d1ac3a72c67a86d5d9053c2a34ad15046c4b64b39e02aede1.png

Initial Inspection#

deposits = gpd.read_file(gdb_dir, layer="Deposits")

print("Total number of deposits:", len(deposits))

Total number of deposits: 71318

print(deposits.shape)
print(deposits.dtypes)

(71318, 33)
UNIQUE_ID         object
TYPE_MOVE         object
MOVE_CLASS        object
MOVE_CODE         object
CONFIDENCE        object
AGE               object
DATE_MOVE         object
NAME              object
GEOL              object
SLOPE            float32
HS_HEIGHT        float32
FAN_HEIGHT       float32
FAIL_DEPTH       float32
DEEP_SHAL         object
HS_IS1           float32
IS1_IS2          float32
IS2_IS3          float32
IS3_IS4          float32
HD_AVE           float32
DIRECT           float32
AREA             float32
VOL              float32
REF_ID_COD        object
MAP_UNIT_L        object
DESCRIPTION       object
YEAR             float64
DATE_RANGE        object
REACTIVATION      object
MONTH             object
DAY               object
Shape_Length     float64
Shape_Area       float64
geometry        geometry
dtype: object

import pandas as pd

def completeness(df: pd.DataFrame):
    total = len(df)
    comp = df.count()               # non-null counts per column
    pct = (comp / total * 100).round(1)
    return pd.DataFrame({"non_null": comp, "% filled": pct})


df_completeness = completeness(deposits)

df_completeness.sort_values(by='% filled', ascending=False)

	non_null	% filled
UNIQUE_ID	71318	100.0
REF_ID_COD	71318	100.0
Shape_Area	71318	100.0
Shape_Length	71318	100.0
DESCRIPTION	71318	100.0
geometry	71318	100.0
AREA	61271	85.9
SLOPE	60079	84.2
VOL	58717	82.3
DIRECT	58062	81.4
FAIL_DEPTH	50489	70.8
HS_HEIGHT	50453	70.7
HD_AVE	46342	65.0
FAN_HEIGHT	44436	62.3
AGE	42388	59.4
MOVE_CLASS	42357	59.4
TYPE_MOVE	42383	59.4
MOVE_CODE	42271	59.3
CONFIDENCE	42127	59.1
GEOL	41432	58.1
HS_IS1	38187	53.5
IS1_IS2	37122	52.1
IS2_IS3	36299	50.9
IS3_IS4	35940	50.4
MAP_UNIT_L	34199	48.0
DEEP_SHAL	29868	41.9
NAME	4088	5.7
DATE_MOVE	3932	5.5
DATE_RANGE	386	0.5
YEAR	265	0.4
MONTH	81	0.1
DAY	42	0.1
REACTIVATION	21	0.0

New Columns#

For filtering and aligning with the other cascadia datasets, we are computing some new columns. They will all start with the prefix “filter_”:

Material: [Debris, Earth, Rock, Complex, Water, Submarine]
Mouvement: [Flow, Complex, Slide, Slide-Rotational, Slide-Translational, Avalanche, Flood, Deformation, Topple, Spread, Submarine]
Confidence: [High, Medium, Low]
Dataset link: URL to dataset
Reference: Actual reference from the original data collection

the original dataset is concerved

Later in the post processing notebook, we will add more columns from external datasets

numerical_cols = deposits.select_dtypes(include=['number']).columns.tolist()
non_numerical_cols = deposits.select_dtypes(exclude=['number']).columns.tolist()

print("Numerical Columns:")
for col in numerical_cols:
    print(f"  - {col}")

print("\nNon-Numerical Columns:")
for col in non_numerical_cols:
    print(f"  - {col}")

Numerical Columns:
  - SLOPE
  - HS_HEIGHT
  - FAN_HEIGHT
  - FAIL_DEPTH
  - HS_IS1
  - IS1_IS2
  - IS2_IS3
  - IS3_IS4
  - HD_AVE
  - DIRECT
  - AREA
  - VOL
  - YEAR
  - Shape_Length
  - Shape_Area

Non-Numerical Columns:
  - UNIQUE_ID
  - TYPE_MOVE
  - MOVE_CLASS
  - MOVE_CODE
  - CONFIDENCE
  - AGE
  - DATE_MOVE
  - NAME
  - GEOL
  - DEEP_SHAL
  - REF_ID_COD
  - MAP_UNIT_L
  - DESCRIPTION
  - DATE_RANGE
  - REACTIVATION
  - MONTH
  - DAY
  - geometry

Out of the above columns we have to change MOVE_CLASS, MOVE_CODE, CONFIDENCE,

Changed Columns#

Confidence#

Old Confidence Level#

print("\nValue counts for 'CONFIDENCE':")
print(deposits['CONFIDENCE'].value_counts())

plt.figure(figsize=(8, 6))
confidence_counts = deposits['CONFIDENCE'].value_counts()

sns.barplot(x=confidence_counts.index, y=confidence_counts.values, palette='coolwarm', hue=confidence_counts.index)

plt.title('Distribution of Confidence Levels', fontsize=16)
plt.xlabel('Confidence Level', fontsize=12)
plt.ylabel('Number of Deposits', fontsize=12)

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Value counts for 'CONFIDENCE':
CONFIDENCE
Moderate (11-29)     21774
High (=>30)          14541
Low (=<10)            5723
Moderate (20-30)        46
High (>30)              26
Low (<20)               10
Moderate (11-29)         5
High (=<30)              1
Moderate (15-29)         1
Name: count, dtype: int64

../../_images/7e71d82d0d2cc49620573e077a8c12ad171634e0a2c956fac4f887e63475ad3a.png

Here we have a lot of different confidence ranges. We only consider the major class not the ranges. So, values having low will be low and moderate will be moderate even if there is no class conflicts.

We make this decision because there are only a few data points that fall outside the three major classes. So, we try to align this extremely small minority into major classes.

New Confidence Level#

deposits['filter_CONFIDENCE'] = deposits['CONFIDENCE'].str.extract(r'^(High|Moderate|Low)')

print("Value counts for 'NEW_CONFIDENCE':")
print(deposits['filter_CONFIDENCE'].value_counts())

Value counts for 'NEW_CONFIDENCE':
filter_CONFIDENCE
Moderate    21826
High        14568
Low          5733
Name: count, dtype: int64

A quick math show that we have assigned the correct number of landslide deposits to each class.

plt.figure(figsize=(8,6))

new_confidence_counts = deposits['filter_CONFIDENCE'].value_counts()

sns.barplot(x=new_confidence_counts.index, y=new_confidence_counts.values, palette='coolwarm', hue=new_confidence_counts.index)

plt.title('Distribution of New Confidence Levels', fontsize=16)
plt.xlabel('New Confidence Level', fontsize=12)
plt.ylabel('Number of Deposits', fontsize=12)

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

../../_images/5b99dbda8404784a0b78f2b7d6bdceb7570a5da91062b38146139437f401fd3c.png

MOVE CODE#

# print("\nValue counts for 'MOVE_CODE':")
# print(deposits['MOVE_CODE'].value_counts())

print("MOVE_CODE counts (sorted by frequency):")
move_code_counts = deposits['MOVE_CODE'].value_counts()

sorted_move_codes = move_code_counts.sort_values(ascending=False)

for move_code, count in sorted_move_codes.items():
    print(f"{move_code}: {count}")
    print()  # Empty line for readability

MOVE_CODE counts (sorted by frequency):
DFL: 12752

EFL: 6919

RS-R+EFL: 4518

RS-R: 4407

RFL+EFL: 2948

RS-T: 2830

DS-T: 1800

RFL: 1665

RS-T+EFL: 798

ES-R: 777

ES-R+EFL: 736

RF: 733

ES-T: 270

ES-R + EFL: 145

DS-R: 132

RS-T+RS-R: 73

RS-R+RS-T+EFL: 68

RS-R+RS-T: 66

C: 61

RS-T+RS-R+EFL: 51

ES-R-EFL: 36

RS-T+ES-R+EFL: 34

ES-T+EFL: 29

EF: 26

RFL+RF: 25

EFL+RS-R: 19

RT+EFL: 18

RS-R+RFL: 15

RF+DS-T: 15

DS-R-DFL: 13

RFL+EF: 13

C-EFL: 12

RFL+DFL: 12

RS-T+RFL: 12

EFL+DFL: 12

C+RS-R+EFL: 11

DS-R+DFL: 11

DS-T-DFL: 9

C+RS-T+EFL: 9

RF+EF: 9

RS-R-DFL: 9

RS-R+EFL+RF: 8

RFL+RS-R: 8

EFL+RS-T: 7

RF+RS-T: 7

RS-R+DFL: 7

RFL+RS-T: 5

RS-R-EFL: 4

DS-T+DFL: 4

RS-R+ES-T+EFL: 4

RS-T+RF: 3

EFL+RF: 3

RS-R+ES-R: 3

ES-T+ES-R: 3

RS-T+RS-T: 3

DFL+EFL: 3

ES-R-DFL: 3

RF : 3

EFL+EFL: 3

RS-R+RF+EFL: 2

EFL
EFL
EFL: 2

DS-T+RS-T: 2

RS-T+EFL+RF: 2

ELF: 2

RF+EFL: 2

ES-R+EF: 2

RS-R_EFL: 2

RS-T+DFL: 2

DSP: 2

RS_T: 2

RS-R+RFL+EFL: 2

ES-R+DFL: 1

RS-R-RS-T: 1

RS-T+RS-R-EFL: 1

DFL-ES-R: 1

RS-T-DFL: 1

RS-R+RS-T-EFL: 1

ESL+EFL: 1

EFL
EFL: 1

DS-R-EFL: 1

RS-T+EFFL: 1

RF+DS-T+DF: 1

ES-R+ES-T+EFL: 1

RFL+RS-T+EFL: 1

RS-T+RS-R+RFL: 1

DS_R: 1

RS-T+RF+EFL: 1

RFL+EFL+RS-R: 1

RFL+RS-R+EFL: 1

RS-R+RS-T+DFL: 1

DS-R+DS-T: 1

C-ES+EFL: 1

EFL+ES-T: 1

EFL+RS_R: 1

DS-R+RS-R: 1

RS-T+RFL+EFL: 1

RS-R+RS_T+EFL: 1

RS-R+EFL+RFL: 1

DS-T+EFL: 1

ES-T+RS-T: 1

RS-R+DS-R: 1

RFL+RS-R+RS-T: 1

RFL+EFL_RS-R: 1

DFL-DS-R: 1

EFL : 1

RS-T+EFL_RS-R: 1

ES-R+RFL: 1

RS-T+EFL
RS-R+EFL: 1

RS-T+DS-T: 1

RT+ET: 1

EFL_RS-R: 1

EFL
RS-R+EFL: 1

RS-T+RS-R_EFL: 1

RS-T+RS-R+EFL+RF: 1

EFS-R+EFL: 1

DFL+RS+RF: 1

EFL-RFL: 1

TS-T: 1

DF+RF: 1

ES-R+ES-T: 1

RS-T+EFL+RS-R: 1

TS-T+RS-R+EFL: 1

RS-T+RS-R+EF : 1

RS-T : 1

RS-R+RS-R: 1

ES: 1

ES-R_EFL: 1

RS-T+EFLS: 1

C+RS-R+ES-T: 1

RS-T+RS-R+EF: 1

C+RS-R+DFL: 1

ES-R : 1

RS-R+RFL+DFL: 1

RS-R+DSP: 1

DFL
: 1

RF+RFL: 1

RF+RS-R: 1

RS-R : 1

RS-R + EFL: 1

deposits['MOVE_CODE'] = deposits['MOVE_CODE'].str.replace(r'\s+', '', regex=True)

# Get value counts for the cleaned codes
print("\nCleaned MOVE_CODE counts (sorted by frequency):")
clean_move_code_counts = deposits['MOVE_CODE'].value_counts()
sorted_clean_codes = clean_move_code_counts.sort_values(ascending=False)

for move_code, count in sorted_clean_codes.items():
    print(f"{move_code}: {count}")
    print()  # Empty line for readability

Cleaned MOVE_CODE counts (sorted by frequency):
DFL: 12753

EFL: 6920

RS-R+EFL: 4519

RS-R: 4408

RFL+EFL: 2948

RS-T: 2831

DS-T: 1800

RFL: 1665

ES-R+EFL: 881

RS-T+EFL: 798

ES-R: 778

RF: 736

ES-T: 270

DS-R: 132

RS-T+RS-R: 73

RS-R+RS-T+EFL: 68

RS-R+RS-T: 66

C: 61

RS-T+RS-R+EFL: 51

ES-R-EFL: 36

RS-T+ES-R+EFL: 34

ES-T+EFL: 29

EF: 26

RFL+RF: 25

EFL+RS-R: 19

RT+EFL: 18

RS-R+RFL: 15

RF+DS-T: 15

DS-R-DFL: 13

RFL+EF: 13

EFL+DFL: 12

RFL+DFL: 12

C-EFL: 12

RS-T+RFL: 12

C+RS-R+EFL: 11

DS-R+DFL: 11

RS-R-DFL: 9

RF+EF: 9

DS-T-DFL: 9

C+RS-T+EFL: 9

RS-R+EFL+RF: 8

RFL+RS-R: 8

RS-R+DFL: 7

RF+RS-T: 7

EFL+RS-T: 7

RFL+RS-T: 5

DS-T+DFL: 4

RS-R-EFL: 4

RS-R+ES-T+EFL: 4

ES-R-DFL: 3

EFL+EFL: 3

RS-T+RS-T: 3

DFL+EFL: 3

EFL+RF: 3

RS-R+ES-R: 3

RS-T+RF: 3

ES-T+ES-R: 3

ELF: 2

DS-T+RS-T: 2

RS-T+RS-R+EF: 2

RS-R+RF+EFL: 2

EFLEFLEFL: 2

RF+EFL: 2

ES-R+EF: 2

RS-T+EFL+RF: 2

RS_T: 2

RS-R+RFL+EFL: 2

RS-R_EFL: 2

RS-T+DFL: 2

DSP: 2

DS-R-EFL: 1

DS_R: 1

RS-R+RS-T+DFL: 1

EFL-RFL: 1

DFL-ES-R: 1

DFL-DS-R: 1

RF+DS-T+DF: 1

EFLEFL: 1

ES-R+ES-T+EFL: 1

DS-R+RS-R: 1

RS-T+RF+EFL: 1

RT+ET: 1

RS-T-DFL: 1

RS-T+EFFL: 1

RS-R+RS-T-EFL: 1

ES-R+DFL: 1

RS-R+DSP: 1

RFL+RS-T+EFL: 1

RS-T+RS-R+RFL: 1

RFL+EFL+RS-R: 1

RS-R+RS_T+EFL: 1

RFL+RS-R+RS-T: 1

RS-R+DS-R: 1

ES-T+RS-T: 1

DS-T+EFL: 1

RS-R+EFL+RFL: 1

RS-T+RFL+EFL: 1

RFL+RS-R+EFL: 1

RS-T+EFL+RS-R: 1

EFL+RS_R: 1

RS-R+RS-R: 1

RS-T+RS-R-EFL: 1

DF+RF: 1

RS-T+EFLS: 1

ES-R_EFL: 1

RS-T+DS-T: 1

RS-T+EFLRS-R+EFL: 1

EFS-R+EFL: 1

RS-R+RFL+DFL: 1

RFL+EFL_RS-R: 1

ESL+EFL: 1

RS-T+EFL_RS-R: 1

DS-R+DS-T: 1

EFL_RS-R: 1

EFLRS-R+EFL: 1

C+RS-R+ES-T: 1

TS-T: 1

C+RS-R+DFL: 1

RS-T+RS-R_EFL: 1

EFL+ES-T: 1

RF+RFL: 1

ES-R+RFL: 1

RS-T+RS-R+EFL+RF: 1

TS-T+RS-R+EFL: 1

RF+RS-R: 1

DFL+RS+RF: 1

ES-R+ES-T: 1

C-ES+EFL: 1

RS-R-RS-T: 1

ES: 1

MOVE CLASS#

The original Move Class contains the kind of materials involved and the mode of movement. We seperate those into seperate columns to make it more consistent with other existing datasets.

print("\nValue counts for 'MOVE_CLASS':")

move_class_counts = deposits['MOVE_CLASS'].value_counts()
move_class_counts = move_class_counts.sort_values(ascending=False)
for move_class, count in move_class_counts.items():
    print(f"{move_class}: {count}")
    print()  

Value counts for 'MOVE_CLASS':
Debris Flow: 12753

Earth Flow: 7044

Complex: 6007

Rock Slide-Rotational: 5509

Rock Slide-Translational: 2445

Rock Flow: 1729

Debris Slide-Translational: 1650

Complex-Earth Slide-Rotational+Earth Flow: 826

Rock Fall: 732

Rock Slide - Translational: 511

Earth Slide - Rotational: 456

Rock Slide - Rotational: 443

Earth Slide-Rotational: 342

Complex-Rock Slide-Rotational+Earth Flow: 303

Earth Slide-Translational: 250

Rock Slide - Rotational+Earth Flow: 182

Complex-Rock Slide-rotational and Earth Flow: 171

Debris Slide - Translational: 150

Complex-Rock Slide-Translational+Earth Flow: 113

Complex Rock Slide - Translational & Earth Flow: 111

Debris Slide-Rotational: 110

Complex Rock Slide - Rotational & Earth Flow: 63

Rock Slide-Rotational+Earth Flow: 57

Combined Slump - Earth Flow: 37

Complex Earth Slide - Rotational & Earth Flow: 36

Earth Slide - Translational: 34

Rock Slide-Translational+Earth Flow: 33

Earth Fall: 25

Rock Slide - Translational+Earth Flow: 22

Complex-Rock Slide-R+Rock Slide-T+Earth Flow: 19

Debris Slide- Rotational: 18

Earth Slide- Rotational: 18

Rock Slide- Rotational: 14

Rock Fall + Debris Slide-Translational: 13

Debris Slide-Rotational+Debris Flow: 10

Combined Earth Slide-Rotational + Earth Flow: 10

Earth Slide-Rotational+Earth Flow: 9

Channelized Debris Flow Deposition: 8

Debris Slide - Rotational: 7

Rock Slide- Translational: 6

Rock Slide-Rotational and Earth Flow: 5

Earth Slide- Translational: 4

Rock Slide-Translational+ Earth Flow : 4

Rock Slide-Translational and Earthflow: 4

Rock Slide-Translational+ Earth Flow: 3

Debris Slide-Translational+Debris Flow: 3

Rock Slide- Rotational + Earth Flow: 3

Combined Rock Slide-Rotational + Earth Flow: 3

Rock Slide - Translational + Earth Flow: 3

Earth Flow+Earth Flow: 3

Earth Slide - Rotational+Earth Flow: 3

Complex- Rock Slide-Rotational+Earth Flow: 3

Earth Slide-Translational+Earth Flow: 2

Rock Slide- Rotational+Earth Flow: 2

Earth Flow+Rock Slide - Rotational: 2

Debris Spread: 2

Debris Slide- Translational: 2

Rock Fall + Debris Slide-Tanslational: 1

Rock Fall + Rock Slide- Translational: 1

Debris Flow : 1

Rock Slide - Rotational+Rock Slide - Translational: 1

Rock Slide - Rotational+Debris Spread: 1

Complex-Earth Flow+Rock Slide Rotational: 1

Rotational Slide (slump): 1

Rock Fall + Debris Slide: 1

Rock Fall + Debris Slide-Translational + DFlow: 1

Debris Fall: 1

Translational Slide: 1

Complex-Debris Slide-Rotational+Debris Flow: 1

Rock Slide - Rotational+Rock Slide - Rotational: 1

Debris Slide-Transtional+Debris Flow: 1

Rock Slide-Translational+Rotational+Earth Flow: 1

Rock Slide-Rotational+Rock Flow: 1

Debris Slide-Rotiational: 1

Rock Slide- Translational+Rock Slide-Rotational: 1

Rock Slide-Translational and Rock Fall: 1

Debris Flow+ Earth Flow: 1

Debrs Flow: 1

Rock Fall and Rock Slide-Rotational: 1

Complex- Earth Flow+Debris Flow: 1

Earth Slide- Rotational : 1

Earth Topple: 1

Earth Spread: 1

Earth Slide - Rotational : 1

Rockslide translational+Earth Flow: 1

Rock Slide-Translational, Earth flow: 1

Earth flow: 1

Seperate out Materials involved#

import re
import pandas as pd

deposits['filter_MATERIAL'] = ''

def extract_materials(move_class_str):
    # Return NaN for empty or null values instead of 'Other'
    # This is needed since a lot of the MOVE_CLASS values are empty
    if pd.isna(move_class_str) or str(move_class_str).strip() == '':
        return pd.NA
    
    move_class_str = str(move_class_str).lower()
    
    # Fix common typos 
    move_class_str = move_class_str.replace('debrs', 'debris')
    move_class_str = move_class_str.replace('rockslide', 'rock slide')
    
    materials = []
    
    # There is one Rotational Slide (slump): 1 that will be labelled as Earth
    earth_pattern = r'earth|slump'  

    # From PMags.com Talus are bigger rocks providing slightly better traction
    rock_pattern = r'rock|talus'
    
    debris_pattern = r'debr'
    
    # Check for each material type
    if re.search(earth_pattern, move_class_str, re.IGNORECASE):
        materials.append('Earth')
    if re.search(rock_pattern, move_class_str, re.IGNORECASE):
        materials.append('Rock')
    if re.search(debris_pattern, move_class_str, re.IGNORECASE):
        materials.append('Debris')
    
    # Handle special case for "Complex" without specific material
    if 'complex' in move_class_str.lower() and len(materials) == 0:
        return 'Complex'
    
    # If no materials found but the field is not empty, categorize as 'Other'
    if not materials:
        return 'Other'
    
    # Join all found materials with '+'
    # Sort to ensure consistent ordering
    return '+'.join(sorted(materials))  

# Apply the function to create a new column
deposits['filter_MATERIAL'] = deposits['MOVE_CLASS'].apply(extract_materials)

print("\nValue counts for 'MATERIAL' (including nulls):")
material_counts_final = deposits['filter_MATERIAL'].value_counts(dropna=False)
print(material_counts_final)

# Plot distribution excluding NaN values
plt.figure(figsize=(10, 6))
non_null_counts = deposits['filter_MATERIAL'].value_counts()
sns.barplot(x=non_null_counts.index, y=non_null_counts.values, 
            palette='viridis', hue=non_null_counts.index, legend=False)
plt.title('Distribution of Materials in Landslides', fontsize=16)
plt.xlabel('Material Type', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Value counts for 'MATERIAL' (including nulls):
filter_MATERIAL
<NA>            28961
Debris          14719
Rock            11396
Earth            9105
Complex          6007
Earth+Rock       1110
Debris+Rock        17
Debris+Earth        2
Other               1
Name: count, dtype: int64

../../_images/2dc105149118fca39a750d4ec4c2be0c6f0de01ad931c6c0f2f3c74c1bf05102.png

Seperate out Modes of Movement#

import re
import pandas as pd

def extract_movement_class(move_class_str):
    # Return NaN for empty or null values
    # This is needed since a lot of the MOVE_CLASS values are empty
    if pd.isna(move_class_str) or str(move_class_str).strip() == '':
        return pd.NA
    
    move_class_str = str(move_class_str).lower()
    
    # Fix some typos and standardize formatting
    move_class_str = move_class_str.replace('debrs', 'debris')
    move_class_str = move_class_str.replace('rotiational', 'rotational')
    move_class_str = move_class_str.replace('transtional', 'translational')
    move_class_str = move_class_str.replace('tanslational', 'translational')
    
    
    movement_classes = []
    
    # Check for complex first (may contain other movement types)
    if 'complex' in move_class_str:
        movement_classes.append('Complex')
    
    # Check for slide types
    if 'slide' in move_class_str:
        if 'rotational' in move_class_str or 'slump' in move_class_str:
            movement_classes.append('Slide-Rotational')
        elif 'translational' in move_class_str:
            movement_classes.append('Slide-Translational')
        else:
            movement_classes.append('Slide')
            
    # Check for flow
    if 'flow' in move_class_str:
        movement_classes.append('Flow')
        
    # Check for fall
    if 'fall' in move_class_str:
        movement_classes.append('Fall')
        
    # Check for spread
    if 'spread' in move_class_str:
        movement_classes.append('Spread')
        
    # Check for topple
    if 'topple' in move_class_str:
        movement_classes.append('Topple')
        
    # Check for avalanche
    if 'avalanche' in move_class_str:
        movement_classes.append('Avalanche')
    
    # If no movement class was identified but the string isn't empty
    if not movement_classes:
        return 'Other'
    
    # Join all identified movement classes with '+'
    return '+'.join(movement_classes)

# Apply the function to create a new column
deposits['filter_MOVEMENT'] = deposits['MOVE_CLASS'].apply(extract_movement_class)


print("\nValue counts for 'MOVEMENT':")
movement_counts = deposits['filter_MOVEMENT'].value_counts(dropna=False)
print(movement_counts)

plt.figure(figsize=(12, 6))
non_null_counts = deposits['filter_MOVEMENT'].value_counts()
sns.barplot(x=non_null_counts.index, y=non_null_counts.values, 
            palette='viridis', hue=non_null_counts.index, legend=False)
plt.title('Distribution of Movement Classes', fontsize=16)
plt.xlabel('Movement Class', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

Value counts for 'MOVEMENT':
filter_MOVEMENT
<NA>                                28961
Flow                                21578
Slide-Rotational                     6924
Complex                              6007
Slide-Translational                  5053
Complex+Slide-Rotational+Flow        1404
Fall                                  758
Slide-Rotational+Flow                 288
Complex+Slide-Translational+Flow      224
Slide-Translational+Flow               77
Complex+Slide+Flow                     19
Slide-Translational+Fall               16
Spread                                  3
Slide-Rotational+Spread                 1
Slide+Fall                              1
Slide-Translational+Flow+Fall           1
Slide-Rotational+Fall                   1
Complex+Flow                            1
Topple                                  1
Name: count, dtype: int64

../../_images/bba8c26eba0ac695a35a5623fec06c301defa578c01db3ad84023b672ac94bc9.png

Check all new columns#

print(deposits.dtypes)

UNIQUE_ID              object
TYPE_MOVE              object
MOVE_CLASS             object
MOVE_CODE              object
CONFIDENCE             object
AGE                    object
DATE_MOVE              object
NAME                   object
GEOL                   object
SLOPE                 float32
HS_HEIGHT             float32
FAN_HEIGHT            float32
FAIL_DEPTH            float32
DEEP_SHAL              object
HS_IS1                float32
IS1_IS2               float32
IS2_IS3               float32
IS3_IS4               float32
HD_AVE                float32
DIRECT                float32
AREA                  float32
VOL                   float32
REF_ID_COD             object
MAP_UNIT_L             object
DESCRIPTION            object
YEAR                  float64
DATE_RANGE             object
REACTIVATION           object
MONTH                  object
DAY                    object
Shape_Length          float64
Shape_Area            float64
geometry             geometry
filter_CONFIDENCE      object
filter_MATERIAL        object
filter_MOVEMENT        object
dtype: object

Create a new DataFrame#

# Create a new GeoDataFrame
oregon_landslides = gpd.GeoDataFrame(deposits, geometry='geometry', crs=deposits.crs)

Inspect the new DataFrame#

# Verify the structure of the new GeoDataFrame
print("New GeoDataFrame shape:", oregon_landslides.shape)
print("\nColumn names:")
for col in oregon_landslides.columns:
    print(f"  - {col}")

# Display the first few rows to verify
oregon_landslides.head()

New GeoDataFrame shape: (71318, 36)

Column names:
  - UNIQUE_ID
  - TYPE_MOVE
  - MOVE_CLASS
  - MOVE_CODE
  - CONFIDENCE
  - AGE
  - DATE_MOVE
  - NAME
  - GEOL
  - SLOPE
  - HS_HEIGHT
  - FAN_HEIGHT
  - FAIL_DEPTH
  - DEEP_SHAL
  - HS_IS1
  - IS1_IS2
  - IS2_IS3
  - IS3_IS4
  - HD_AVE
  - DIRECT
  - AREA
  - VOL
  - REF_ID_COD
  - MAP_UNIT_L
  - DESCRIPTION
  - YEAR
  - DATE_RANGE
  - REACTIVATION
  - MONTH
  - DAY
  - Shape_Length
  - Shape_Area
  - geometry
  - filter_CONFIDENCE
  - filter_MATERIAL
  - filter_MOVEMENT

	UNIQUE_ID	TYPE_MOVE	MOVE_CLASS	MOVE_CODE	CONFIDENCE	AGE	GEOL	SLOPE	...	DATE_RANGE	REACTIVATION	MONTH	DAY	Shape_Length	Shape_Area	geometry	filter_CONFIDENCE	filter_MATERIAL	filter_MOVEMENT
0	WASH_CO2	Flow	Debris Flow	DFL	High (=>30)	Historic (<150yrs)	vol.M.cr.gr.nd.nd.bas	4.256579	...	None	None	None	None	447.239405	11291.764767	MULTIPOLYGON (((699810 1325184.192, 699822.282...	High	Debris	Flow
1	WASH_CO3	Flow	Debris Flow	DFL	High (=>30)	Historic (<150yrs)	vol.M.cr.gr.nd.nd.bas	3.350513	...	None	None	None	None	455.775817	13701.014637	MULTIPOLYGON (((699860.473 1325401.027, 699873...	High	Debris	Flow
2	WASH_CO4	Flow	Debris Flow	DFL	High (=>30)	Historic (<150yrs)	vol.M.cr.gr.nd.nd.bas	5.811321	...	None	None	None	None	568.959463	14879.315815	MULTIPOLYGON (((700295.984 1325083.866, 700283...	High	Debris	Flow
3	WASH_CO5	Flow	Debris Flow	DFL	High (=>30)	Historic (<150yrs)	sed.Q.qsd.mf.nd.nd.fine	5.413578	...	None	None	None	None	591.097110	20034.804942	MULTIPOLYGON (((700696.889 1325351.537, 700659...	High	Debris	Flow
4	WASH_CO6	Flow	Debris Flow	DFL	High (=>30)	Historic (<150yrs)	vol.M.cr.gr.nd.nd.bas	13.638928	...	None	None	None	None	391.627031	8462.716893	MULTIPOLYGON (((699242.636 1321687.547, 699279...	High	Debris	Flow

5 rows × 36 columns

Inspect References inside Deposits Layer#

print("Inspecting references in oregon_landslides:")
print(f"Total records: {len(oregon_landslides)}")
print(f"Records with REF_ID_COD: {oregon_landslides['REF_ID_COD'].notna().sum()}")
print(f"Records without REF_ID_COD: {oregon_landslides['REF_ID_COD'].isna().sum()}")

print("\nTop 10 most common references:")
ref_counts = oregon_landslides['REF_ID_COD'].value_counts().head(10)
for ref_id, count in ref_counts.items():
    print(f"{ref_id}: {count}")

print(f"\nTotal unique references: {oregon_landslides['REF_ID_COD'].nunique()}")

print("\nSample of reference IDs:")
sample_refs = oregon_landslides['REF_ID_COD'].dropna().unique()[:10]
for ref in sample_refs:
    print(f"  - {ref}")

Inspecting references in oregon_landslides:
Total records: 71318
Records with REF_ID_COD: 71318
Records without REF_ID_COD: 0

Top 10 most common references:
HairRW2021: 5017
CalhNC2020a: 4096
BurnWJ2017a: 2986
BurnWJ2012a: 2837
BurnWJ2023: 2693
BurnWJ2014: 2483
BurnWJ2021: 2228
McClJD2021b: 2122
CALHNC2020: 2098
BurnWJ2018: 1969

Total unique references: 346

Sample of reference IDs:
  - HairRW2021
  - BurnWJ2023
  - BurnWJ2010a
  - BurnWJ2010b
  - BurnWJ2010c
  - BurnWJ2009b
  - BurnWJ2010d
  - MertSA2007a
  - McClJD2010
  - HladFR2007

Inspect References Layer#

# Load and inspect the References layer
references = gpd.read_file(gdb_dir, layer="References")

print("References layer shape:", references.shape)
print("\nReferences layer columns:")
for col in references.columns:
    print(f"  - {col}")

print("\nFirst few rows of References:")
print(references.head())

print("\nData types:")
print(references.dtypes)

# Check for common reference identifiers between deposits and references
print("\nUnique REF_ID_COD values in deposits (first 10):")
print(deposits['REF_ID_COD'].unique()[:10])

print("\nUnique REF_ID_COD values in references:")
print(references['REF_ID_COD'].unique())

# Check if there's overlap
common_refs = set(deposits['REF_ID_COD'].dropna()) & set(references['REF_ID_COD'].dropna())
print(f"\nNumber of common reference IDs: {len(common_refs)}")
print("Sample common reference IDs:", list(common_refs)[:5])

References layer shape: (376, 4)

References layer columns:
  - REF_ID_COD
  - SCALE
  - REFERENCE
  - DATE

First few rows of References:
   REF_ID_COD     SCALE                                          REFERENCE  \
0   AchJA1991  1:24,000  Ach, J.A., Bateson, J.T., 1991, Geologic map o...   
1  AllaJC2001  1:25,000  Allan, J.C.; Priest, G.R., 2001, Evaluation of...   
2  AlleJE1988  1:24,000  Allen, J.E., 1988, Geologic Hazards in the Col...   
3  AndeJL1978  1:24,000  Anderson, J. L., 1978, The stratigraphy and st...   
4  AshlRP1966  1:21,100  Ashley, R.P., 1966, Metamorphic petrology and ...   

   DATE  
0  1991  
1  2001  
2  1988  
3  1978  
4  1966  

Data types:
REF_ID_COD    object
SCALE         object
REFERENCE     object
DATE          object
dtype: object

Unique REF_ID_COD values in deposits (first 10):
['HairRW2021' 'BurnWJ2023' 'BurnWJ2010a' 'BurnWJ2010b' 'BurnWJ2010c'
 'BurnWJ2009b' 'BurnWJ2010d' 'MertSA2007a' 'McClJD2010' 'HladFR2007']

Unique REF_ID_COD values in references:
['AchJA1991' 'AllaJC2001' 'AlleJE1988' 'AndeJL1978' 'AshlRP1966'
 'BacoCRUnpub' 'BaldEM1952' 'BaldEM1955' 'BaldEM1956' 'BaldEM1961'
 'BaldEM1969' 'BaldEM1973' 'BaraP1991' 'BarnCG1978' 'BeauJD1973hazards'
 'BeauJD1974ahazards' 'BeauJD1974bhazards' 'BeauJD1975hazards'
 'BeauJD1976hazards' 'BeauJD1977ahazards' 'BeauJD1977bhazards'
 'BelaJL1981' 'BelaJL1982' 'BestEA2002Figure3' 'BingNJ1984' 'BlacGL1987'
 'BlacGL1995' 'BlacGL2000' 'BrikTH1983' 'BroeL1995' 'BrooHC1976'
 'BrooHC1977a' 'BrooHC1977cplateA' 'BrooHC1977cplateB' 'BrooHC1979a'
 'BrooHC1979b' 'BrooHC1979c' 'BrooHC1982a' 'BrooHC1982b' 'BrooHC1983'
 'BrooHC1984' 'BrooHCUnpubB' 'BrowCE1966a' 'BrowCE1966b' 'BrowCE1977'
 'BrowDE1980aplate1' 'BrowDE1980aplate2' 'BrowDE1980aplate3'
 'BrowDE1980aplate4' 'BrowDE1980b' 'BrowDE1980c' 'BrowDE1980d'
 'BrowDE1982' 'BrowME1982' 'BurnSF1997' 'BurnWJ2009b' 'BurnWJ2010a'
 'BurnWJ2010b' 'BurnWJ2010c' 'BurnWJ2010d' 'BurnWJ2011' 'BussC2006'
 'CampVCunpubA' 'CampVCunpubB' 'CampVCunpubC' 'CampVCunpubD' 'ConrRM1991'
 'CoxGM1977' 'CummJA1958' 'CunnCT1979' 'DickWR1965easthalf'
 'DickWR1965westhalf' 'DiggMF1991' 'DonaMM1992' 'DonaMM1993' 'ErikA2002'
 'EvanJG1986a' 'EvanJG1987' 'EvanJG1989' 'EvanJG1990' 'EvanJG1991'
 'EvanJG1993' 'EvanJG1995' 'EvanJG2001' 'EvarRC2002' 'EvarRC2004'
 'FEMA2006' 'FernML1983' 'FernML1984' 'FernML1987' 'FernML1993a'
 'FernML1993b' 'FernML2001a' 'FernML2001b' 'FernML2006a' 'FernML2006b'
 'FernML2006c' 'FernML2010' 'FernMLUnpubB' 'GrayF1980' 'GrayF1982'
 'GreeRC1972a' 'HaddGH1959' 'HalePO1975' 'HammPE1982' 'HampER1972'
 'HaucSM1962' 'HeriCW1981' 'HewiSL1970' 'HladFR1992' 'HladFR1993'
 'HladFR1994' 'HladFR1996' 'HladFR1998' 'HladFR1999' 'HladFR2006'
 'HladFR2007' 'HofmRJ2000' 'HoodNF1997' 'HoovL1963' 'HuggJW1978'
 'JackJS1979' 'JacoR1985' 'JenkMDUnpub' 'JohaNP1972' 'JohnJA1994'
 'JohnJA1995' 'JohnJA1998a' 'JohnJA1998b' 'KoroMA1987' 'LangVW1991'
 'MacLJW1994plate1' 'MacLJW1994plate2' 'MacLNS1995sheet1'
 'MacLNS1995sheet2' 'MaddT1965' 'MadiIP2006a' 'MadiIP2006b' 'MadiIP2007'
 'MathAC1993' 'McclJD2006a' 'McclJD2006b' 'McClJD2010' 'McCoVSUnpubA'
 'McCoVSUnpubB' 'MertSA2007a' 'MertSA2007b' 'MertSA2008' 'MertSA2009'
 'MertSAUnpub' 'MillPR1984' 'MinoSA1987b' 'MinoSA1990a' 'MinoSA1990b'
 'Misc_unpub' 'MoorR2004' 'MuntJK1969' 'MuntSR1978' 'MurrRB2001'
 'MurrRB2006a' 'MurrRBUnpubA' 'MurrRBunpubB' 'NappJE1958' 'NichDK1989'
 'NiemAR1985' 'NiemAR1990' 'NoblJB1979' 'OConJE2001' 'ODOT2011'
 'OlesKF1971' 'OlivJA1981' 'OrrWN1986a' 'OrrWN1986b' 'OwenPC1977'
 'PageNJ1977' 'PageNJ1981' 'PattRL1965' 'PertRK1976' 'PeteNV1976Redmond'
 'PeteNV1980' 'PoguKR1999' 'Portland2011' 'PrieGR1982plate5'
 'PrieGR1982plate6' 'PrieGR1983plate2' 'PrieGR1983plate3' 'PrieGR1987'
 'PrieGR1988' 'PrieGR2000' 'PrieGR2004a' 'PrieGR2004b' 'ProsHJ1967'
 'RampL1961' 'RampL1986' 'RitcB1987' 'RobaRC1987' 'RobiPT1975'
 'RobyTL1977' 'RytuJJ1982a' 'RytuJJ1983b' 'RytuJJ1983c' 'RytuJJ1983e'
 'RytuJJ1983g' 'RytuJJ1983h' 'SchlHG1967Plate3' 'SchlHG1972hazards'
 'SchlHG1973' 'SchlHG1974' 'SchlHG1974hazards' 'SchlHG1975'
 'SchlHG1979hazards' 'SchnDR1958' 'SchuMG1980' 'ScotWE1992' 'SherDR1988'
 'SherDR1989' 'SherDR1991' 'SherDR1992' 'SherDR1994' 'SherDR1995'
 'SherDR2000' 'SherDR2004' 'SmitGA1987a' 'SmitGA1987b' 'SmitGA1987c'
 'SmitJG1982' 'SnavPD1976b' 'SnavPD1976c' 'SnavPD1996' 'SobiS2010'
 'StimJP1988' 'StocAO1982' 'StroHR1980' 'SwanDA1969' 'SwanDA1981sheet1'
 'SwanDA1981sheet2' 'SwanDA1981sheet3' 'SwanDA1983' 'SwinCM1968'
 'ThayTP1956' 'ThayTP1966' 'ThayTP1967b' 'ThayTP1967c' 'ThayTP1981'
 'ThomTH1981' 'ThorDJ1984' 'TolaTL1982' 'TolaTL1999sheet1'
 'TolaTL2000sheet2' 'TravPL1977' 'TuckER1975' 'VandDB1988' 'WalkGW1963'
 'WalkGW1965' 'WalkGW1966' 'WalkGW1967' 'WalkGW1968figure2'
 'WalkGW1968figure3' 'WalkGW1979' 'WalkGW1980b' 'WalkGW1989' 'WalkGW2002'
 'WangZ2001' 'WateAC1968a' 'WateAC1968b' 'WateAC1968c' 'WateAC1968d'
 'WeidJP1980' 'WeisKW1984' 'WellFG1949' 'WellFG1953' 'WellRE1975'
 'WellRE1983' 'WellRE1995' 'WellRE2000' 'WendDW1988' 'WethCE1960'
 'WhitDL1994' 'WhitWH1964' 'WilcRE1966' 'WileTJ1991' 'WileTJ1993a'
 'WileTJ1993bplate1' 'WileTJ2006a' 'WileTJ2006b' 'WileTJ2006c'
 'WileTJ2007' 'WileTJunpub' 'WilkRM1986' 'WittRC2007' 'WittRC2009'
 'WoodJD1976' 'YeatRS1996plate2a' 'YeatRS1996plate2b' 'YogoGM1985'
 'YuleJD1996plate10' 'YuleJD1996plate11' 'YuleJD1996plate2'
 'YuleJD1996plate8' 'YuleJD1996plate9' 'BurnWJ2011b' 'BurnWJ2011c'
 'BurnWJ2012a' 'BurnWJ2012b' 'BurnWJ2012c' 'BurnWJ2012d' 'BurnWJ2012f'
 'BurnWJ2012e' 'BurnWJ2012g' 'BurnWJ2012h' 'BurnWJ2012i' 'BurnWJ2012j'
 'BurnWJ2012k' 'BurnWJ2012l' 'BurnWJ2012m' 'BurnWJ2012n' 'BurnWJ2012o'
 'BurnWJ2012p' 'BurnWJ2011d' 'BurnWJ2013a' 'BurnWJ2012q' 'BurnWJ2012r'
 'BurnWJunpub2013a' 'BurnWJ2013c' 'McClJD2012' 'BurnWJunpub2014'
 'McClJDunpub2013' 'MickKA2012' 'WileTJ2011' 'BurnWJ2013d'
 'MickKAunpub2012' 'BurnWJunpub2013b' 'McClJD2013' 'BurnWJunpub2011'
 'BurnWJ2013b' 'WileTJ2014' 'MickKAunpub2014' 'BurnWJ2014' 'BurnWJ2013e'
 'BurnWJ2015a' 'WileTJ2015' 'DirrS2015' 'BurnWJ2018' 'CalhNC2018'
 'BurnWJ2009c' 'BurnWJ2015b' 'NiewCA2018' 'HousRA2018' 'MickKA2011'
 'KossEunpub2019' 'BurnWJunpub2019' 'BurnWJ2010e' 'McClJD2019'
 'MadiIP2019' 'BurnWJ2017a' 'BurnWJ2017' 'BeauJD1976' 'CalhNC2020'
 'CalhNC2020a' 'BurnWJunpub2021' 'BurnWJ2021' 'BurnWJ2021a' 'BurnWJ2021b'
 'BurnWJ2021c' 'BurnWJ2021d' 'BurnWJ2023' 'McClJD2021a' 'HousRA2021'
 'BurnWJ2022' 'HairRW2021' 'McClJD2020' 'McClJD2020b' 'McClJD2023'
 'McClJD2021b' 'WellRE2020' 'CalhNC2023']

Number of common reference IDs: 345
Sample common reference IDs: ['WileTJ2006a', 'CalhNC2018', 'WileTJ2014', 'BlacGL2000', 'BurnWJ2011d']

# Merge the references information into oregon_landslides
oregon_landslides = oregon_landslides.merge(
    references[['REF_ID_COD', 'REFERENCE']], 
    on='REF_ID_COD', 
    how='left'
)

print("Successfully added REFERENCE column to oregon_landslides")
print(f"Records with references: {oregon_landslides['REFERENCE'].notna().sum()}")
print(f"Records without references: {oregon_landslides['REFERENCE'].isna().sum()}")

# Display a sample of the new column
print("\nSample references:")
sample_refs = oregon_landslides[oregon_landslides['REFERENCE'].notna()]['REFERENCE'].head(3)
for i, ref in enumerate(sample_refs):
    print(f"{i+1}. {ref[:100]}...")

Successfully added REFERENCE column to oregon_landslides
Records with references: 69220
Records without references: 2098

Sample references:
1. Hairston-Porter, R., Madin, I., Burns, W., and Appleby, C., 2021, Landslide, coseismic liquefaction ...
2. Hairston-Porter, R., Madin, I., Burns, W., and Appleby, C., 2021, Landslide, coseismic liquefaction ...
3. Hairston-Porter, R., Madin, I., Burns, W., and Appleby, C., 2021, Landslide, coseismic liquefaction ...

print("Columns in oregon_landslides:")
for i, col in enumerate(oregon_landslides.columns):
    print(f"{i+1:2d}. {col}")

Columns in oregon_landslides:
UNIQUE_ID
TYPE_MOVE
MOVE_CLASS
MOVE_CODE
CONFIDENCE
AGE
DATE_MOVE
NAME
GEOL
SLOPE
HS_HEIGHT
FAN_HEIGHT
FAIL_DEPTH
DEEP_SHAL
HS_IS1
IS1_IS2
IS2_IS3
IS3_IS4
HD_AVE
DIRECT
AREA
VOL
REF_ID_COD
MAP_UNIT_L
DESCRIPTION
YEAR
DATE_RANGE
REACTIVATION
MONTH
DAY
Shape_Length
Shape_Area
geometry
filter_CONFIDENCE
filter_MATERIAL
filter_MOVEMENT
REFERENCE

oregon_landslides['filter_DATASET_LINK'] = 'https://www.oregon.gov/dogami/slido/Pages/data.aspx'

oregon_landslides['filter_ORIGIN'] = 'OR'

oregon_landslides = oregon_landslides.rename(columns={'REFERENCE': 'filter_REFERENCE'})

Save into GeoJSON#

oregon_landslides.to_file("./processed_geojson/oregon_landslides_processed.geojson", driver="GeoJSON")

# oregon_landslides.to_file("oregon_landslides_processed.shp")

oregon_landslides.plot(figsize=(8, 6), edgecolor="k", linewidth=0.2)
plt.title("Oregon Landslide Inventory")
plt.axis("off")
plt.show()

oregon_landslides.columns

Index(['UNIQUE_ID', 'TYPE_MOVE', 'MOVE_CLASS', 'MOVE_CODE', 'CONFIDENCE',
       'AGE', 'DATE_MOVE', 'NAME', 'GEOL', 'SLOPE', 'HS_HEIGHT', 'FAN_HEIGHT',
       'FAIL_DEPTH', 'DEEP_SHAL', 'HS_IS1', 'IS1_IS2', 'IS2_IS3', 'IS3_IS4',
       'HD_AVE', 'DIRECT', 'AREA', 'VOL', 'REF_ID_COD', 'MAP_UNIT_L',
       'DESCRIPTION', 'YEAR', 'DATE_RANGE', 'REACTIVATION', 'MONTH', 'DAY',
       'Shape_Length', 'Shape_Area', 'geometry', 'filter_CONFIDENCE',
       'filter_MATERIAL', 'filter_MOVEMENT', 'filter_REFERENCE',
       'filter_DATASET_LINK', 'filter_ORIGIN'],
      dtype='object')