Oregon Landslide Data Preprocessing#

Overview#

This notebook processes landslide data from Oregon’s Statewide Landslide Information Database for Oregon (SLIDO) into a standardized format for integration with other regional datasets.

The workflow includes:

  1. Data Loading: Load GeoDataFrame layers from SLIDO database

  2. Deposits Analysis: Examine the main deposits layer structure and data quality

  3. References Processing: Load and integrate citation references from the References layer. Merge reference information with landslide records using REF_ID_COD

  4. Data Standardization: Standardize confidence levels, movement types, and material classifications

  5. Column Selection: Select and rename columns to match unified schema

  6. Quality Assurance: Validate data completeness and geometry integrity

  7. Export: Save processed data as GeoJSON for further analysis

Key Data Layers:

  • Deposits: Landslide records with detailed attributes

  • References: 376 citation records linked via REF_ID_COD

Obtain the State Statewide Landslide Information Database for Oregon (SLIDO) from the official DOGAMI website: https://www.oregon.gov/dogami/slido/Pages/data.aspx

Create a directory ‘Data’ in your projects root folder and place the downloaded files inside it.

Initialization#

import fiona
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
gdb_dir = "../data/Oregon/SLIDO_Release_4p5_wMetadata.gdb"
layers = fiona.listlayers(gdb_dir)
print("Available layers in the GDB:")
for lyr in layers:
    df = gpd.read_file(gdb_dir, layer=lyr)
    print(f"{lyr:35s} {len(df):6d} data points")
Available layers in the GDB:
Index_LS_Studies                        72 data points
C:\Users\loicb\anaconda3\envs\landslide\Lib\site-packages\pyogrio\raw.py:198: RuntimeWarning: organizePolygons() received a polygon with more than 100 parts. The processing may be really slow.  You can skip the processing by setting METHOD=SKIP, or only make it analyze counter-clock wise parts by setting METHOD=ONLY_CCW if you can assume that the outline of holes is counter-clock wise defined
  return ogr_read(
Detailed_Deep_Landslide_Susceptibility      4 data points
References                             376 data points
Detailed_Susceptibility_Map_Index       12 data points
Historic_Landslide_Points            15386 data points
Scarp_Flanks                         28650 data points
Scarps                               55824 data points
Deposits                             71318 data points
Inventory_Map_Index                   1569 data points
Inventory_Map_Index_2                  355 data points
Inventory_Map_Index_3                  355 data points
References_Check_In_Map_Index_3         21 data points
deposits = gpd.read_file(gdb_dir, layer="Deposits")

deposits.plot(figsize=(8, 6), edgecolor="k", linewidth=0.2)
plt.title("Oregon Landslide Inventory")
plt.axis("off")
plt.show()
../../_images/52a01deaabb9183d1ac3a72c67a86d5d9053c2a34ad15046c4b64b39e02aede1.png

Initial Inspection#

deposits = gpd.read_file(gdb_dir, layer="Deposits")

print("Total number of deposits:", len(deposits))
Total number of deposits: 71318
print(deposits.shape)
print(deposits.dtypes)
(71318, 33)
UNIQUE_ID         object
TYPE_MOVE         object
MOVE_CLASS        object
MOVE_CODE         object
CONFIDENCE        object
AGE               object
DATE_MOVE         object
NAME              object
GEOL              object
SLOPE            float32
HS_HEIGHT        float32
FAN_HEIGHT       float32
FAIL_DEPTH       float32
DEEP_SHAL         object
HS_IS1           float32
IS1_IS2          float32
IS2_IS3          float32
IS3_IS4          float32
HD_AVE           float32
DIRECT           float32
AREA             float32
VOL              float32
REF_ID_COD        object
MAP_UNIT_L        object
DESCRIPTION       object
YEAR             float64
DATE_RANGE        object
REACTIVATION      object
MONTH             object
DAY               object
Shape_Length     float64
Shape_Area       float64
geometry        geometry
dtype: object
import pandas as pd

def completeness(df: pd.DataFrame):
    total = len(df)
    comp = df.count()               # non-null counts per column
    pct = (comp / total * 100).round(1)
    return pd.DataFrame({"non_null": comp, "% filled": pct})


df_completeness = completeness(deposits)
df_completeness.sort_values(by='% filled', ascending=False)
non_null % filled
UNIQUE_ID 71318 100.0
REF_ID_COD 71318 100.0
Shape_Area 71318 100.0
Shape_Length 71318 100.0
DESCRIPTION 71318 100.0
geometry 71318 100.0
AREA 61271 85.9
SLOPE 60079 84.2
VOL 58717 82.3
DIRECT 58062 81.4
FAIL_DEPTH 50489 70.8
HS_HEIGHT 50453 70.7
HD_AVE 46342 65.0
FAN_HEIGHT 44436 62.3
AGE 42388 59.4
MOVE_CLASS 42357 59.4
TYPE_MOVE 42383 59.4
MOVE_CODE 42271 59.3
CONFIDENCE 42127 59.1
GEOL 41432 58.1
HS_IS1 38187 53.5
IS1_IS2 37122 52.1
IS2_IS3 36299 50.9
IS3_IS4 35940 50.4
MAP_UNIT_L 34199 48.0
DEEP_SHAL 29868 41.9
NAME 4088 5.7
DATE_MOVE 3932 5.5
DATE_RANGE 386 0.5
YEAR 265 0.4
MONTH 81 0.1
DAY 42 0.1
REACTIVATION 21 0.0

New Columns#

For filtering and aligning with the other cascadia datasets, we are computing some new columns. They will all start with the prefix “filter_”:

  • Material: [Debris, Earth, Rock, Complex, Water, Submarine]

  • Mouvement: [Flow, Complex, Slide, Slide-Rotational, Slide-Translational, Avalanche, Flood, Deformation, Topple, Spread, Submarine]

  • Confidence: [High, Medium, Low]

  • Dataset link: URL to dataset

  • Reference: Actual reference from the original data collection

the original dataset is concerved

Later in the post processing notebook, we will add more columns from external datasets

numerical_cols = deposits.select_dtypes(include=['number']).columns.tolist()
non_numerical_cols = deposits.select_dtypes(exclude=['number']).columns.tolist()

print("Numerical Columns:")
for col in numerical_cols:
    print(f"  - {col}")

print("\nNon-Numerical Columns:")
for col in non_numerical_cols:
    print(f"  - {col}")
Numerical Columns:
  - SLOPE
  - HS_HEIGHT
  - FAN_HEIGHT
  - FAIL_DEPTH
  - HS_IS1
  - IS1_IS2
  - IS2_IS3
  - IS3_IS4
  - HD_AVE
  - DIRECT
  - AREA
  - VOL
  - YEAR
  - Shape_Length
  - Shape_Area

Non-Numerical Columns:
  - UNIQUE_ID
  - TYPE_MOVE
  - MOVE_CLASS
  - MOVE_CODE
  - CONFIDENCE
  - AGE
  - DATE_MOVE
  - NAME
  - GEOL
  - DEEP_SHAL
  - REF_ID_COD
  - MAP_UNIT_L
  - DESCRIPTION
  - DATE_RANGE
  - REACTIVATION
  - MONTH
  - DAY
  - geometry

Out of the above columns we have to change MOVE_CLASS, MOVE_CODE, CONFIDENCE,

Changed Columns#

Confidence#

Old Confidence Level#

print("\nValue counts for 'CONFIDENCE':")
print(deposits['CONFIDENCE'].value_counts())

plt.figure(figsize=(8, 6))
confidence_counts = deposits['CONFIDENCE'].value_counts()

sns.barplot(x=confidence_counts.index, y=confidence_counts.values, palette='coolwarm', hue=confidence_counts.index)

plt.title('Distribution of Confidence Levels', fontsize=16)
plt.xlabel('Confidence Level', fontsize=12)
plt.ylabel('Number of Deposits', fontsize=12)

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Value counts for 'CONFIDENCE':
CONFIDENCE
Moderate (11-29)     21774
High (=>30)          14541
Low (=<10)            5723
Moderate (20-30)        46
High (>30)              26
Low (<20)               10
Moderate (11-29)         5
High (=<30)              1
Moderate (15-29)         1
Name: count, dtype: int64
../../_images/7e71d82d0d2cc49620573e077a8c12ad171634e0a2c956fac4f887e63475ad3a.png

Here we have a lot of different confidence ranges. We only consider the major class not the ranges. So, values having low will be low and moderate will be moderate even if there is no class conflicts.

We make this decision because there are only a few data points that fall outside the three major classes. So, we try to align this extremely small minority into major classes.

New Confidence Level#

deposits['filter_CONFIDENCE'] = deposits['CONFIDENCE'].str.extract(r'^(High|Moderate|Low)')

print("Value counts for 'NEW_CONFIDENCE':")
print(deposits['filter_CONFIDENCE'].value_counts())
Value counts for 'NEW_CONFIDENCE':
filter_CONFIDENCE
Moderate    21826
High        14568
Low          5733
Name: count, dtype: int64

A quick math show that we have assigned the correct number of landslide deposits to each class.

plt.figure(figsize=(8,6))

new_confidence_counts = deposits['filter_CONFIDENCE'].value_counts()

sns.barplot(x=new_confidence_counts.index, y=new_confidence_counts.values, palette='coolwarm', hue=new_confidence_counts.index)

plt.title('Distribution of New Confidence Levels', fontsize=16)
plt.xlabel('New Confidence Level', fontsize=12)
plt.ylabel('Number of Deposits', fontsize=12)

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
../../_images/5b99dbda8404784a0b78f2b7d6bdceb7570a5da91062b38146139437f401fd3c.png

MOVE CODE#

# print("\nValue counts for 'MOVE_CODE':")
# print(deposits['MOVE_CODE'].value_counts())

print("MOVE_CODE counts (sorted by frequency):")
move_code_counts = deposits['MOVE_CODE'].value_counts()

sorted_move_codes = move_code_counts.sort_values(ascending=False)

for move_code, count in sorted_move_codes.items():
    print(f"{move_code}: {count}")
    print()  # Empty line for readability
MOVE_CODE counts (sorted by frequency):
DFL: 12752

EFL: 6919

RS-R+EFL: 4518

RS-R: 4407

RFL+EFL: 2948

RS-T: 2830

DS-T: 1800

RFL: 1665

RS-T+EFL: 798

ES-R: 777

ES-R+EFL: 736

RF: 733

ES-T: 270

ES-R + EFL: 145

DS-R: 132

RS-T+RS-R: 73

RS-R+RS-T+EFL: 68

RS-R+RS-T: 66

C: 61

RS-T+RS-R+EFL: 51

ES-R-EFL: 36

RS-T+ES-R+EFL: 34

ES-T+EFL: 29

EF: 26

RFL+RF: 25

EFL+RS-R: 19

RT+EFL: 18

RS-R+RFL: 15

RF+DS-T: 15

DS-R-DFL: 13

RFL+EF: 13

C-EFL: 12

RFL+DFL: 12

RS-T+RFL: 12

EFL+DFL: 12

C+RS-R+EFL: 11

DS-R+DFL: 11

DS-T-DFL: 9

C+RS-T+EFL: 9

RF+EF: 9

RS-R-DFL: 9

RS-R+EFL+RF: 8

RFL+RS-R: 8

EFL+RS-T: 7

RF+RS-T: 7

RS-R+DFL: 7

RFL+RS-T: 5

RS-R-EFL: 4

DS-T+DFL: 4

RS-R+ES-T+EFL: 4

RS-T+RF: 3

EFL+RF: 3

RS-R+ES-R: 3

ES-T+ES-R: 3

RS-T+RS-T: 3

DFL+EFL: 3

ES-R-DFL: 3

RF : 3

EFL+EFL: 3

RS-R+RF+EFL: 2

EFL
EFL
EFL: 2

DS-T+RS-T: 2

RS-T+EFL+RF: 2

ELF: 2

RF+EFL: 2

ES-R+EF: 2

RS-R_EFL: 2

RS-T+DFL: 2

DSP: 2

RS_T: 2

RS-R+RFL+EFL: 2

ES-R+DFL: 1

RS-R-RS-T: 1

RS-T+RS-R-EFL: 1

DFL-ES-R: 1

RS-T-DFL: 1

RS-R+RS-T-EFL: 1

ESL+EFL: 1

EFL
EFL: 1

DS-R-EFL: 1

RS-T+EFFL: 1

RF+DS-T+DF: 1

ES-R+ES-T+EFL: 1

RFL+RS-T+EFL: 1

RS-T+RS-R+RFL: 1

DS_R: 1

RS-T+RF+EFL: 1

RFL+EFL+RS-R: 1

RFL+RS-R+EFL: 1

RS-R+RS-T+DFL: 1

DS-R+DS-T: 1

C-ES+EFL: 1

EFL+ES-T: 1

EFL+RS_R: 1

DS-R+RS-R: 1

RS-T+RFL+EFL: 1

RS-R+RS_T+EFL: 1

RS-R+EFL+RFL: 1

DS-T+EFL: 1

ES-T+RS-T: 1

RS-R+DS-R: 1

RFL+RS-R+RS-T: 1

RFL+EFL_RS-R: 1

DFL-DS-R: 1

EFL : 1

RS-T+EFL_RS-R: 1

ES-R+RFL: 1

RS-T+EFL
RS-R+EFL: 1

RS-T+DS-T: 1

RT+ET: 1

EFL_RS-R: 1

EFL
RS-R+EFL: 1

RS-T+RS-R_EFL: 1

RS-T+RS-R+EFL+RF: 1

EFS-R+EFL: 1

DFL+RS+RF: 1

EFL-RFL: 1

TS-T: 1

DF+RF: 1

ES-R+ES-T: 1

RS-T+EFL+RS-R: 1

TS-T+RS-R+EFL: 1

RS-T+RS-R+EF : 1

RS-T : 1

RS-R+RS-R: 1

ES: 1

ES-R_EFL: 1

RS-T+EFLS: 1

C+RS-R+ES-T: 1

RS-T+RS-R+EF: 1

C+RS-R+DFL: 1

ES-R : 1

RS-R+RFL+DFL: 1

RS-R+DSP: 1

DFL
: 1

RF+RFL: 1

RF+RS-R: 1

RS-R : 1

RS-R + EFL: 1
deposits['MOVE_CODE'] = deposits['MOVE_CODE'].str.replace(r'\s+', '', regex=True)

# Get value counts for the cleaned codes
print("\nCleaned MOVE_CODE counts (sorted by frequency):")
clean_move_code_counts = deposits['MOVE_CODE'].value_counts()
sorted_clean_codes = clean_move_code_counts.sort_values(ascending=False)

for move_code, count in sorted_clean_codes.items():
    print(f"{move_code}: {count}")
    print()  # Empty line for readability
Cleaned MOVE_CODE counts (sorted by frequency):
DFL: 12753

EFL: 6920

RS-R+EFL: 4519

RS-R: 4408

RFL+EFL: 2948

RS-T: 2831

DS-T: 1800

RFL: 1665

ES-R+EFL: 881

RS-T+EFL: 798

ES-R: 778

RF: 736

ES-T: 270

DS-R: 132

RS-T+RS-R: 73

RS-R+RS-T+EFL: 68

RS-R+RS-T: 66

C: 61

RS-T+RS-R+EFL: 51

ES-R-EFL: 36

RS-T+ES-R+EFL: 34

ES-T+EFL: 29

EF: 26

RFL+RF: 25

EFL+RS-R: 19

RT+EFL: 18

RS-R+RFL: 15

RF+DS-T: 15

DS-R-DFL: 13

RFL+EF: 13

EFL+DFL: 12

RFL+DFL: 12

C-EFL: 12

RS-T+RFL: 12

C+RS-R+EFL: 11

DS-R+DFL: 11

RS-R-DFL: 9

RF+EF: 9

DS-T-DFL: 9

C+RS-T+EFL: 9

RS-R+EFL+RF: 8

RFL+RS-R: 8

RS-R+DFL: 7

RF+RS-T: 7

EFL+RS-T: 7

RFL+RS-T: 5

DS-T+DFL: 4

RS-R-EFL: 4

RS-R+ES-T+EFL: 4

ES-R-DFL: 3

EFL+EFL: 3

RS-T+RS-T: 3

DFL+EFL: 3

EFL+RF: 3

RS-R+ES-R: 3

RS-T+RF: 3

ES-T+ES-R: 3

ELF: 2

DS-T+RS-T: 2

RS-T+RS-R+EF: 2

RS-R+RF+EFL: 2

EFLEFLEFL: 2

RF+EFL: 2

ES-R+EF: 2

RS-T+EFL+RF: 2

RS_T: 2

RS-R+RFL+EFL: 2

RS-R_EFL: 2

RS-T+DFL: 2

DSP: 2

DS-R-EFL: 1

DS_R: 1

RS-R+RS-T+DFL: 1

EFL-RFL: 1

DFL-ES-R: 1

DFL-DS-R: 1

RF+DS-T+DF: 1

EFLEFL: 1

ES-R+ES-T+EFL: 1

DS-R+RS-R: 1

RS-T+RF+EFL: 1

RT+ET: 1

RS-T-DFL: 1

RS-T+EFFL: 1

RS-R+RS-T-EFL: 1

ES-R+DFL: 1

RS-R+DSP: 1

RFL+RS-T+EFL: 1

RS-T+RS-R+RFL: 1

RFL+EFL+RS-R: 1

RS-R+RS_T+EFL: 1

RFL+RS-R+RS-T: 1

RS-R+DS-R: 1

ES-T+RS-T: 1

DS-T+EFL: 1

RS-R+EFL+RFL: 1

RS-T+RFL+EFL: 1

RFL+RS-R+EFL: 1

RS-T+EFL+RS-R: 1

EFL+RS_R: 1

RS-R+RS-R: 1

RS-T+RS-R-EFL: 1

DF+RF: 1

RS-T+EFLS: 1

ES-R_EFL: 1

RS-T+DS-T: 1

RS-T+EFLRS-R+EFL: 1

EFS-R+EFL: 1

RS-R+RFL+DFL: 1

RFL+EFL_RS-R: 1

ESL+EFL: 1

RS-T+EFL_RS-R: 1

DS-R+DS-T: 1

EFL_RS-R: 1

EFLRS-R+EFL: 1

C+RS-R+ES-T: 1

TS-T: 1

C+RS-R+DFL: 1

RS-T+RS-R_EFL: 1

EFL+ES-T: 1

RF+RFL: 1

ES-R+RFL: 1

RS-T+RS-R+EFL+RF: 1

TS-T+RS-R+EFL: 1

RF+RS-R: 1

DFL+RS+RF: 1

ES-R+ES-T: 1

C-ES+EFL: 1

RS-R-RS-T: 1

ES: 1

MOVE CLASS#

The original Move Class contains the kind of materials involved and the mode of movement. We seperate those into seperate columns to make it more consistent with other existing datasets.

print("\nValue counts for 'MOVE_CLASS':")

move_class_counts = deposits['MOVE_CLASS'].value_counts()
move_class_counts = move_class_counts.sort_values(ascending=False)
for move_class, count in move_class_counts.items():
    print(f"{move_class}: {count}")
    print()  
Value counts for 'MOVE_CLASS':
Debris Flow: 12753

Earth Flow: 7044

Complex: 6007

Rock Slide-Rotational: 5509

Rock Slide-Translational: 2445

Rock Flow: 1729

Debris Slide-Translational: 1650

Complex-Earth Slide-Rotational+Earth Flow: 826

Rock Fall: 732

Rock Slide - Translational: 511

Earth Slide - Rotational: 456

Rock Slide - Rotational: 443

Earth Slide-Rotational: 342

Complex-Rock Slide-Rotational+Earth Flow: 303

Earth Slide-Translational: 250

Rock Slide - Rotational+Earth Flow: 182

Complex-Rock Slide-rotational and Earth Flow: 171

Debris Slide - Translational: 150

Complex-Rock Slide-Translational+Earth Flow: 113

Complex Rock Slide - Translational & Earth Flow: 111

Debris Slide-Rotational: 110

Complex Rock Slide - Rotational & Earth Flow: 63

Rock Slide-Rotational+Earth Flow: 57

Combined Slump - Earth Flow: 37

Complex Earth Slide - Rotational & Earth Flow: 36

Earth Slide - Translational: 34

Rock Slide-Translational+Earth Flow: 33

Earth Fall: 25

Rock Slide - Translational+Earth Flow: 22

Complex-Rock Slide-R+Rock Slide-T+Earth Flow: 19

Debris Slide- Rotational: 18

Earth Slide- Rotational: 18

Rock Slide- Rotational: 14

Rock Fall + Debris Slide-Translational: 13

Debris Slide-Rotational+Debris Flow: 10

Combined Earth Slide-Rotational + Earth Flow: 10

Earth Slide-Rotational+Earth Flow: 9

Channelized Debris Flow Deposition: 8

Debris Slide - Rotational: 7

Rock Slide- Translational: 6

Rock Slide-Rotational and Earth Flow: 5

Earth Slide- Translational: 4

Rock Slide-Translational+ Earth Flow : 4

Rock Slide-Translational and Earthflow: 4

Rock Slide-Translational+ Earth Flow: 3

Debris Slide-Translational+Debris Flow: 3

Rock Slide- Rotational + Earth Flow: 3

Combined Rock Slide-Rotational + Earth Flow: 3

Rock Slide - Translational + Earth Flow: 3

Earth Flow+Earth Flow: 3

Earth Slide - Rotational+Earth Flow: 3

Complex- Rock Slide-Rotational+Earth Flow: 3

Earth Slide-Translational+Earth Flow: 2

Rock Slide- Rotational+Earth Flow: 2

Earth Flow+Rock Slide - Rotational: 2

Debris Spread: 2

Debris Slide- Translational: 2

Rock Fall + Debris Slide-Tanslational: 1

Rock Fall + Rock Slide- Translational: 1

Debris Flow : 1

Rock Slide - Rotational+Rock Slide - Translational: 1

Rock Slide - Rotational+Debris Spread: 1

Complex-Earth Flow+Rock Slide Rotational: 1

Rotational Slide (slump): 1

Rock Fall + Debris Slide: 1

Rock Fall + Debris Slide-Translational + DFlow: 1

Debris Fall: 1

Translational Slide: 1

Complex-Debris Slide-Rotational+Debris Flow: 1

Rock Slide - Rotational+Rock Slide - Rotational: 1

Debris Slide-Transtional+Debris Flow: 1

Rock Slide-Translational+Rotational+Earth Flow: 1

Rock Slide-Rotational+Rock Flow: 1

Debris Slide-Rotiational: 1

Rock Slide- Translational+Rock Slide-Rotational: 1

Rock Slide-Translational and Rock Fall: 1

Debris Flow+ Earth Flow: 1

Debrs Flow: 1

Rock Fall and Rock Slide-Rotational: 1

Complex- Earth Flow+Debris Flow: 1

Earth Slide- Rotational : 1

Earth Topple: 1

Earth Spread: 1

Earth Slide - Rotational : 1

Rockslide translational+Earth Flow: 1

Rock Slide-Translational, Earth flow: 1

Earth flow: 1

Seperate out Materials involved#

import re
import pandas as pd

deposits['filter_MATERIAL'] = ''

def extract_materials(move_class_str):
    # Return NaN for empty or null values instead of 'Other'
    # This is needed since a lot of the MOVE_CLASS values are empty
    if pd.isna(move_class_str) or str(move_class_str).strip() == '':
        return pd.NA
    
    move_class_str = str(move_class_str).lower()
    
    # Fix common typos 
    move_class_str = move_class_str.replace('debrs', 'debris')
    move_class_str = move_class_str.replace('rockslide', 'rock slide')
    
    materials = []
    
    # There is one Rotational Slide (slump): 1 that will be labelled as Earth
    earth_pattern = r'earth|slump'  

    # From PMags.com Talus are bigger rocks providing slightly better traction
    rock_pattern = r'rock|talus'
    
    debris_pattern = r'debr'
    
    # Check for each material type
    if re.search(earth_pattern, move_class_str, re.IGNORECASE):
        materials.append('Earth')
    if re.search(rock_pattern, move_class_str, re.IGNORECASE):
        materials.append('Rock')
    if re.search(debris_pattern, move_class_str, re.IGNORECASE):
        materials.append('Debris')
    
    # Handle special case for "Complex" without specific material
    if 'complex' in move_class_str.lower() and len(materials) == 0:
        return 'Complex'
    
    # If no materials found but the field is not empty, categorize as 'Other'
    if not materials:
        return 'Other'
    
    # Join all found materials with '+'
    # Sort to ensure consistent ordering
    return '+'.join(sorted(materials))  

# Apply the function to create a new column
deposits['filter_MATERIAL'] = deposits['MOVE_CLASS'].apply(extract_materials)

print("\nValue counts for 'MATERIAL' (including nulls):")
material_counts_final = deposits['filter_MATERIAL'].value_counts(dropna=False)
print(material_counts_final)

# Plot distribution excluding NaN values
plt.figure(figsize=(10, 6))
non_null_counts = deposits['filter_MATERIAL'].value_counts()
sns.barplot(x=non_null_counts.index, y=non_null_counts.values, 
            palette='viridis', hue=non_null_counts.index, legend=False)
plt.title('Distribution of Materials in Landslides', fontsize=16)
plt.xlabel('Material Type', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Value counts for 'MATERIAL' (including nulls):
filter_MATERIAL
<NA>            28961
Debris          14719
Rock            11396
Earth            9105
Complex          6007
Earth+Rock       1110
Debris+Rock        17
Debris+Earth        2
Other               1
Name: count, dtype: int64
../../_images/2dc105149118fca39a750d4ec4c2be0c6f0de01ad931c6c0f2f3c74c1bf05102.png

Seperate out Modes of Movement#

import re
import pandas as pd

def extract_movement_class(move_class_str):
    # Return NaN for empty or null values
    # This is needed since a lot of the MOVE_CLASS values are empty
    if pd.isna(move_class_str) or str(move_class_str).strip() == '':
        return pd.NA
    
    move_class_str = str(move_class_str).lower()
    
    # Fix some typos and standardize formatting
    move_class_str = move_class_str.replace('debrs', 'debris')
    move_class_str = move_class_str.replace('rotiational', 'rotational')
    move_class_str = move_class_str.replace('transtional', 'translational')
    move_class_str = move_class_str.replace('tanslational', 'translational')
    
    
    movement_classes = []
    
    # Check for complex first (may contain other movement types)
    if 'complex' in move_class_str:
        movement_classes.append('Complex')
    
    # Check for slide types
    if 'slide' in move_class_str:
        if 'rotational' in move_class_str or 'slump' in move_class_str:
            movement_classes.append('Slide-Rotational')
        elif 'translational' in move_class_str:
            movement_classes.append('Slide-Translational')
        else:
            movement_classes.append('Slide')
            
    # Check for flow
    if 'flow' in move_class_str:
        movement_classes.append('Flow')
        
    # Check for fall
    if 'fall' in move_class_str:
        movement_classes.append('Fall')
        
    # Check for spread
    if 'spread' in move_class_str:
        movement_classes.append('Spread')
        
    # Check for topple
    if 'topple' in move_class_str:
        movement_classes.append('Topple')
        
    # Check for avalanche
    if 'avalanche' in move_class_str:
        movement_classes.append('Avalanche')
    
    # If no movement class was identified but the string isn't empty
    if not movement_classes:
        return 'Other'
    
    # Join all identified movement classes with '+'
    return '+'.join(movement_classes)

# Apply the function to create a new column
deposits['filter_MOVEMENT'] = deposits['MOVE_CLASS'].apply(extract_movement_class)


print("\nValue counts for 'MOVEMENT':")
movement_counts = deposits['filter_MOVEMENT'].value_counts(dropna=False)
print(movement_counts)

plt.figure(figsize=(12, 6))
non_null_counts = deposits['filter_MOVEMENT'].value_counts()
sns.barplot(x=non_null_counts.index, y=non_null_counts.values, 
            palette='viridis', hue=non_null_counts.index, legend=False)
plt.title('Distribution of Movement Classes', fontsize=16)
plt.xlabel('Movement Class', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
Value counts for 'MOVEMENT':
filter_MOVEMENT
<NA>                                28961
Flow                                21578
Slide-Rotational                     6924
Complex                              6007
Slide-Translational                  5053
Complex+Slide-Rotational+Flow        1404
Fall                                  758
Slide-Rotational+Flow                 288
Complex+Slide-Translational+Flow      224
Slide-Translational+Flow               77
Complex+Slide+Flow                     19
Slide-Translational+Fall               16
Spread                                  3
Slide-Rotational+Spread                 1
Slide+Fall                              1
Slide-Translational+Flow+Fall           1
Slide-Rotational+Fall                   1
Complex+Flow                            1
Topple                                  1
Name: count, dtype: int64
../../_images/bba8c26eba0ac695a35a5623fec06c301defa578c01db3ad84023b672ac94bc9.png

Check all new columns#

print(deposits.dtypes)
UNIQUE_ID              object
TYPE_MOVE              object
MOVE_CLASS             object
MOVE_CODE              object
CONFIDENCE             object
AGE                    object
DATE_MOVE              object
NAME                   object
GEOL                   object
SLOPE                 float32
HS_HEIGHT             float32
FAN_HEIGHT            float32
FAIL_DEPTH            float32
DEEP_SHAL              object
HS_IS1                float32
IS1_IS2               float32
IS2_IS3               float32
IS3_IS4               float32
HD_AVE                float32
DIRECT                float32
AREA                  float32
VOL                   float32
REF_ID_COD             object
MAP_UNIT_L             object
DESCRIPTION            object
YEAR                  float64
DATE_RANGE             object
REACTIVATION           object
MONTH                  object
DAY                    object
Shape_Length          float64
Shape_Area            float64
geometry             geometry
filter_CONFIDENCE      object
filter_MATERIAL        object
filter_MOVEMENT        object
dtype: object

Create a new DataFrame#

# Create a new GeoDataFrame
oregon_landslides = gpd.GeoDataFrame(deposits, geometry='geometry', crs=deposits.crs)

Inspect the new DataFrame#

# Verify the structure of the new GeoDataFrame
print("New GeoDataFrame shape:", oregon_landslides.shape)
print("\nColumn names:")
for col in oregon_landslides.columns:
    print(f"  - {col}")

# Display the first few rows to verify
oregon_landslides.head()
New GeoDataFrame shape: (71318, 36)

Column names:
  - UNIQUE_ID
  - TYPE_MOVE
  - MOVE_CLASS
  - MOVE_CODE
  - CONFIDENCE
  - AGE
  - DATE_MOVE
  - NAME
  - GEOL
  - SLOPE
  - HS_HEIGHT
  - FAN_HEIGHT
  - FAIL_DEPTH
  - DEEP_SHAL
  - HS_IS1
  - IS1_IS2
  - IS2_IS3
  - IS3_IS4
  - HD_AVE
  - DIRECT
  - AREA
  - VOL
  - REF_ID_COD
  - MAP_UNIT_L
  - DESCRIPTION
  - YEAR
  - DATE_RANGE
  - REACTIVATION
  - MONTH
  - DAY
  - Shape_Length
  - Shape_Area
  - geometry
  - filter_CONFIDENCE
  - filter_MATERIAL
  - filter_MOVEMENT
UNIQUE_ID TYPE_MOVE MOVE_CLASS MOVE_CODE CONFIDENCE AGE DATE_MOVE NAME GEOL SLOPE ... DATE_RANGE REACTIVATION MONTH DAY Shape_Length Shape_Area geometry filter_CONFIDENCE filter_MATERIAL filter_MOVEMENT
0 WASH_CO2 Flow Debris Flow DFL High (=>30) Historic (<150yrs) vol.M.cr.gr.nd.nd.bas 4.256579 ... None None None None 447.239405 11291.764767 MULTIPOLYGON (((699810 1325184.192, 699822.282... High Debris Flow
1 WASH_CO3 Flow Debris Flow DFL High (=>30) Historic (<150yrs) vol.M.cr.gr.nd.nd.bas 3.350513 ... None None None None 455.775817 13701.014637 MULTIPOLYGON (((699860.473 1325401.027, 699873... High Debris Flow
2 WASH_CO4 Flow Debris Flow DFL High (=>30) Historic (<150yrs) vol.M.cr.gr.nd.nd.bas 5.811321 ... None None None None 568.959463 14879.315815 MULTIPOLYGON (((700295.984 1325083.866, 700283... High Debris Flow
3 WASH_CO5 Flow Debris Flow DFL High (=>30) Historic (<150yrs) sed.Q.qsd.mf.nd.nd.fine 5.413578 ... None None None None 591.097110 20034.804942 MULTIPOLYGON (((700696.889 1325351.537, 700659... High Debris Flow
4 WASH_CO6 Flow Debris Flow DFL High (=>30) Historic (<150yrs) vol.M.cr.gr.nd.nd.bas 13.638928 ... None None None None 391.627031 8462.716893 MULTIPOLYGON (((699242.636 1321687.547, 699279... High Debris Flow

5 rows × 36 columns

Inspect References inside Deposits Layer#

print("Inspecting references in oregon_landslides:")
print(f"Total records: {len(oregon_landslides)}")
print(f"Records with REF_ID_COD: {oregon_landslides['REF_ID_COD'].notna().sum()}")
print(f"Records without REF_ID_COD: {oregon_landslides['REF_ID_COD'].isna().sum()}")

print("\nTop 10 most common references:")
ref_counts = oregon_landslides['REF_ID_COD'].value_counts().head(10)
for ref_id, count in ref_counts.items():
    print(f"{ref_id}: {count}")

print(f"\nTotal unique references: {oregon_landslides['REF_ID_COD'].nunique()}")

print("\nSample of reference IDs:")
sample_refs = oregon_landslides['REF_ID_COD'].dropna().unique()[:10]
for ref in sample_refs:
    print(f"  - {ref}")
Inspecting references in oregon_landslides:
Total records: 71318
Records with REF_ID_COD: 71318
Records without REF_ID_COD: 0

Top 10 most common references:
HairRW2021: 5017
CalhNC2020a: 4096
BurnWJ2017a: 2986
BurnWJ2012a: 2837
BurnWJ2023: 2693
BurnWJ2014: 2483
BurnWJ2021: 2228
McClJD2021b: 2122
CALHNC2020: 2098
BurnWJ2018: 1969

Total unique references: 346

Sample of reference IDs:
  - HairRW2021
  - BurnWJ2023
  - BurnWJ2010a
  - BurnWJ2010b
  - BurnWJ2010c
  - BurnWJ2009b
  - BurnWJ2010d
  - MertSA2007a
  - McClJD2010
  - HladFR2007

Inspect References Layer#

# Load and inspect the References layer
references = gpd.read_file(gdb_dir, layer="References")

print("References layer shape:", references.shape)
print("\nReferences layer columns:")
for col in references.columns:
    print(f"  - {col}")

print("\nFirst few rows of References:")
print(references.head())

print("\nData types:")
print(references.dtypes)

# Check for common reference identifiers between deposits and references
print("\nUnique REF_ID_COD values in deposits (first 10):")
print(deposits['REF_ID_COD'].unique()[:10])

print("\nUnique REF_ID_COD values in references:")
print(references['REF_ID_COD'].unique())

# Check if there's overlap
common_refs = set(deposits['REF_ID_COD'].dropna()) & set(references['REF_ID_COD'].dropna())
print(f"\nNumber of common reference IDs: {len(common_refs)}")
print("Sample common reference IDs:", list(common_refs)[:5])
References layer shape: (376, 4)

References layer columns:
  - REF_ID_COD
  - SCALE
  - REFERENCE
  - DATE

First few rows of References:
   REF_ID_COD     SCALE                                          REFERENCE  \
0   AchJA1991  1:24,000  Ach, J.A., Bateson, J.T., 1991, Geologic map o...   
1  AllaJC2001  1:25,000  Allan, J.C.; Priest, G.R., 2001, Evaluation of...   
2  AlleJE1988  1:24,000  Allen, J.E., 1988, Geologic Hazards in the Col...   
3  AndeJL1978  1:24,000  Anderson, J. L., 1978, The stratigraphy and st...   
4  AshlRP1966  1:21,100  Ashley, R.P., 1966, Metamorphic petrology and ...   

   DATE  
0  1991  
1  2001  
2  1988  
3  1978  
4  1966  

Data types:
REF_ID_COD    object
SCALE         object
REFERENCE     object
DATE          object
dtype: object

Unique REF_ID_COD values in deposits (first 10):
['HairRW2021' 'BurnWJ2023' 'BurnWJ2010a' 'BurnWJ2010b' 'BurnWJ2010c'
 'BurnWJ2009b' 'BurnWJ2010d' 'MertSA2007a' 'McClJD2010' 'HladFR2007']

Unique REF_ID_COD values in references:
['AchJA1991' 'AllaJC2001' 'AlleJE1988' 'AndeJL1978' 'AshlRP1966'
 'BacoCRUnpub' 'BaldEM1952' 'BaldEM1955' 'BaldEM1956' 'BaldEM1961'
 'BaldEM1969' 'BaldEM1973' 'BaraP1991' 'BarnCG1978' 'BeauJD1973hazards'
 'BeauJD1974ahazards' 'BeauJD1974bhazards' 'BeauJD1975hazards'
 'BeauJD1976hazards' 'BeauJD1977ahazards' 'BeauJD1977bhazards'
 'BelaJL1981' 'BelaJL1982' 'BestEA2002Figure3' 'BingNJ1984' 'BlacGL1987'
 'BlacGL1995' 'BlacGL2000' 'BrikTH1983' 'BroeL1995' 'BrooHC1976'
 'BrooHC1977a' 'BrooHC1977cplateA' 'BrooHC1977cplateB' 'BrooHC1979a'
 'BrooHC1979b' 'BrooHC1979c' 'BrooHC1982a' 'BrooHC1982b' 'BrooHC1983'
 'BrooHC1984' 'BrooHCUnpubB' 'BrowCE1966a' 'BrowCE1966b' 'BrowCE1977'
 'BrowDE1980aplate1' 'BrowDE1980aplate2' 'BrowDE1980aplate3'
 'BrowDE1980aplate4' 'BrowDE1980b' 'BrowDE1980c' 'BrowDE1980d'
 'BrowDE1982' 'BrowME1982' 'BurnSF1997' 'BurnWJ2009b' 'BurnWJ2010a'
 'BurnWJ2010b' 'BurnWJ2010c' 'BurnWJ2010d' 'BurnWJ2011' 'BussC2006'
 'CampVCunpubA' 'CampVCunpubB' 'CampVCunpubC' 'CampVCunpubD' 'ConrRM1991'
 'CoxGM1977' 'CummJA1958' 'CunnCT1979' 'DickWR1965easthalf'
 'DickWR1965westhalf' 'DiggMF1991' 'DonaMM1992' 'DonaMM1993' 'ErikA2002'
 'EvanJG1986a' 'EvanJG1987' 'EvanJG1989' 'EvanJG1990' 'EvanJG1991'
 'EvanJG1993' 'EvanJG1995' 'EvanJG2001' 'EvarRC2002' 'EvarRC2004'
 'FEMA2006' 'FernML1983' 'FernML1984' 'FernML1987' 'FernML1993a'
 'FernML1993b' 'FernML2001a' 'FernML2001b' 'FernML2006a' 'FernML2006b'
 'FernML2006c' 'FernML2010' 'FernMLUnpubB' 'GrayF1980' 'GrayF1982'
 'GreeRC1972a' 'HaddGH1959' 'HalePO1975' 'HammPE1982' 'HampER1972'
 'HaucSM1962' 'HeriCW1981' 'HewiSL1970' 'HladFR1992' 'HladFR1993'
 'HladFR1994' 'HladFR1996' 'HladFR1998' 'HladFR1999' 'HladFR2006'
 'HladFR2007' 'HofmRJ2000' 'HoodNF1997' 'HoovL1963' 'HuggJW1978'
 'JackJS1979' 'JacoR1985' 'JenkMDUnpub' 'JohaNP1972' 'JohnJA1994'
 'JohnJA1995' 'JohnJA1998a' 'JohnJA1998b' 'KoroMA1987' 'LangVW1991'
 'MacLJW1994plate1' 'MacLJW1994plate2' 'MacLNS1995sheet1'
 'MacLNS1995sheet2' 'MaddT1965' 'MadiIP2006a' 'MadiIP2006b' 'MadiIP2007'
 'MathAC1993' 'McclJD2006a' 'McclJD2006b' 'McClJD2010' 'McCoVSUnpubA'
 'McCoVSUnpubB' 'MertSA2007a' 'MertSA2007b' 'MertSA2008' 'MertSA2009'
 'MertSAUnpub' 'MillPR1984' 'MinoSA1987b' 'MinoSA1990a' 'MinoSA1990b'
 'Misc_unpub' 'MoorR2004' 'MuntJK1969' 'MuntSR1978' 'MurrRB2001'
 'MurrRB2006a' 'MurrRBUnpubA' 'MurrRBunpubB' 'NappJE1958' 'NichDK1989'
 'NiemAR1985' 'NiemAR1990' 'NoblJB1979' 'OConJE2001' 'ODOT2011'
 'OlesKF1971' 'OlivJA1981' 'OrrWN1986a' 'OrrWN1986b' 'OwenPC1977'
 'PageNJ1977' 'PageNJ1981' 'PattRL1965' 'PertRK1976' 'PeteNV1976Redmond'
 'PeteNV1980' 'PoguKR1999' 'Portland2011' 'PrieGR1982plate5'
 'PrieGR1982plate6' 'PrieGR1983plate2' 'PrieGR1983plate3' 'PrieGR1987'
 'PrieGR1988' 'PrieGR2000' 'PrieGR2004a' 'PrieGR2004b' 'ProsHJ1967'
 'RampL1961' 'RampL1986' 'RitcB1987' 'RobaRC1987' 'RobiPT1975'
 'RobyTL1977' 'RytuJJ1982a' 'RytuJJ1983b' 'RytuJJ1983c' 'RytuJJ1983e'
 'RytuJJ1983g' 'RytuJJ1983h' 'SchlHG1967Plate3' 'SchlHG1972hazards'
 'SchlHG1973' 'SchlHG1974' 'SchlHG1974hazards' 'SchlHG1975'
 'SchlHG1979hazards' 'SchnDR1958' 'SchuMG1980' 'ScotWE1992' 'SherDR1988'
 'SherDR1989' 'SherDR1991' 'SherDR1992' 'SherDR1994' 'SherDR1995'
 'SherDR2000' 'SherDR2004' 'SmitGA1987a' 'SmitGA1987b' 'SmitGA1987c'
 'SmitJG1982' 'SnavPD1976b' 'SnavPD1976c' 'SnavPD1996' 'SobiS2010'
 'StimJP1988' 'StocAO1982' 'StroHR1980' 'SwanDA1969' 'SwanDA1981sheet1'
 'SwanDA1981sheet2' 'SwanDA1981sheet3' 'SwanDA1983' 'SwinCM1968'
 'ThayTP1956' 'ThayTP1966' 'ThayTP1967b' 'ThayTP1967c' 'ThayTP1981'
 'ThomTH1981' 'ThorDJ1984' 'TolaTL1982' 'TolaTL1999sheet1'
 'TolaTL2000sheet2' 'TravPL1977' 'TuckER1975' 'VandDB1988' 'WalkGW1963'
 'WalkGW1965' 'WalkGW1966' 'WalkGW1967' 'WalkGW1968figure2'
 'WalkGW1968figure3' 'WalkGW1979' 'WalkGW1980b' 'WalkGW1989' 'WalkGW2002'
 'WangZ2001' 'WateAC1968a' 'WateAC1968b' 'WateAC1968c' 'WateAC1968d'
 'WeidJP1980' 'WeisKW1984' 'WellFG1949' 'WellFG1953' 'WellRE1975'
 'WellRE1983' 'WellRE1995' 'WellRE2000' 'WendDW1988' 'WethCE1960'
 'WhitDL1994' 'WhitWH1964' 'WilcRE1966' 'WileTJ1991' 'WileTJ1993a'
 'WileTJ1993bplate1' 'WileTJ2006a' 'WileTJ2006b' 'WileTJ2006c'
 'WileTJ2007' 'WileTJunpub' 'WilkRM1986' 'WittRC2007' 'WittRC2009'
 'WoodJD1976' 'YeatRS1996plate2a' 'YeatRS1996plate2b' 'YogoGM1985'
 'YuleJD1996plate10' 'YuleJD1996plate11' 'YuleJD1996plate2'
 'YuleJD1996plate8' 'YuleJD1996plate9' 'BurnWJ2011b' 'BurnWJ2011c'
 'BurnWJ2012a' 'BurnWJ2012b' 'BurnWJ2012c' 'BurnWJ2012d' 'BurnWJ2012f'
 'BurnWJ2012e' 'BurnWJ2012g' 'BurnWJ2012h' 'BurnWJ2012i' 'BurnWJ2012j'
 'BurnWJ2012k' 'BurnWJ2012l' 'BurnWJ2012m' 'BurnWJ2012n' 'BurnWJ2012o'
 'BurnWJ2012p' 'BurnWJ2011d' 'BurnWJ2013a' 'BurnWJ2012q' 'BurnWJ2012r'
 'BurnWJunpub2013a' 'BurnWJ2013c' 'McClJD2012' 'BurnWJunpub2014'
 'McClJDunpub2013' 'MickKA2012' 'WileTJ2011' 'BurnWJ2013d'
 'MickKAunpub2012' 'BurnWJunpub2013b' 'McClJD2013' 'BurnWJunpub2011'
 'BurnWJ2013b' 'WileTJ2014' 'MickKAunpub2014' 'BurnWJ2014' 'BurnWJ2013e'
 'BurnWJ2015a' 'WileTJ2015' 'DirrS2015' 'BurnWJ2018' 'CalhNC2018'
 'BurnWJ2009c' 'BurnWJ2015b' 'NiewCA2018' 'HousRA2018' 'MickKA2011'
 'KossEunpub2019' 'BurnWJunpub2019' 'BurnWJ2010e' 'McClJD2019'
 'MadiIP2019' 'BurnWJ2017a' 'BurnWJ2017' 'BeauJD1976' 'CalhNC2020'
 'CalhNC2020a' 'BurnWJunpub2021' 'BurnWJ2021' 'BurnWJ2021a' 'BurnWJ2021b'
 'BurnWJ2021c' 'BurnWJ2021d' 'BurnWJ2023' 'McClJD2021a' 'HousRA2021'
 'BurnWJ2022' 'HairRW2021' 'McClJD2020' 'McClJD2020b' 'McClJD2023'
 'McClJD2021b' 'WellRE2020' 'CalhNC2023']

Number of common reference IDs: 345
Sample common reference IDs: ['WileTJ2006a', 'CalhNC2018', 'WileTJ2014', 'BlacGL2000', 'BurnWJ2011d']
# Merge the references information into oregon_landslides
oregon_landslides = oregon_landslides.merge(
    references[['REF_ID_COD', 'REFERENCE']], 
    on='REF_ID_COD', 
    how='left'
)

print("Successfully added REFERENCE column to oregon_landslides")
print(f"Records with references: {oregon_landslides['REFERENCE'].notna().sum()}")
print(f"Records without references: {oregon_landslides['REFERENCE'].isna().sum()}")

# Display a sample of the new column
print("\nSample references:")
sample_refs = oregon_landslides[oregon_landslides['REFERENCE'].notna()]['REFERENCE'].head(3)
for i, ref in enumerate(sample_refs):
    print(f"{i+1}. {ref[:100]}...")
Successfully added REFERENCE column to oregon_landslides
Records with references: 69220
Records without references: 2098

Sample references:
1. Hairston-Porter, R., Madin, I., Burns, W., and Appleby, C., 2021, Landslide, coseismic liquefaction ...
2. Hairston-Porter, R., Madin, I., Burns, W., and Appleby, C., 2021, Landslide, coseismic liquefaction ...
3. Hairston-Porter, R., Madin, I., Burns, W., and Appleby, C., 2021, Landslide, coseismic liquefaction ...
print("Columns in oregon_landslides:")
for i, col in enumerate(oregon_landslides.columns):
    print(f"{i+1:2d}. {col}")
Columns in oregon_landslides:
 1. UNIQUE_ID
 2. TYPE_MOVE
 3. MOVE_CLASS
 4. MOVE_CODE
 5. CONFIDENCE
 6. AGE
 7. DATE_MOVE
 8. NAME
 9. GEOL
10. SLOPE
11. HS_HEIGHT
12. FAN_HEIGHT
13. FAIL_DEPTH
14. DEEP_SHAL
15. HS_IS1
16. IS1_IS2
17. IS2_IS3
18. IS3_IS4
19. HD_AVE
20. DIRECT
21. AREA
22. VOL
23. REF_ID_COD
24. MAP_UNIT_L
25. DESCRIPTION
26. YEAR
27. DATE_RANGE
28. REACTIVATION
29. MONTH
30. DAY
31. Shape_Length
32. Shape_Area
33. geometry
34. filter_CONFIDENCE
35. filter_MATERIAL
36. filter_MOVEMENT
37. REFERENCE
oregon_landslides['filter_DATASET_LINK'] = 'https://www.oregon.gov/dogami/slido/Pages/data.aspx'
oregon_landslides['filter_ORIGIN'] = 'OR'
oregon_landslides = oregon_landslides.rename(columns={'REFERENCE': 'filter_REFERENCE'})

Save into GeoJSON#

oregon_landslides.to_file("./processed_geojson/oregon_landslides_processed.geojson", driver="GeoJSON")
# oregon_landslides.to_file("oregon_landslides_processed.shp")
oregon_landslides.plot(figsize=(8, 6), edgecolor="k", linewidth=0.2)
plt.title("Oregon Landslide Inventory")
plt.axis("off")
plt.show()
../../_images/52a01deaabb9183d1ac3a72c67a86d5d9053c2a34ad15046c4b64b39e02aede1.png
oregon_landslides.columns
Index(['UNIQUE_ID', 'TYPE_MOVE', 'MOVE_CLASS', 'MOVE_CODE', 'CONFIDENCE',
       'AGE', 'DATE_MOVE', 'NAME', 'GEOL', 'SLOPE', 'HS_HEIGHT', 'FAN_HEIGHT',
       'FAIL_DEPTH', 'DEEP_SHAL', 'HS_IS1', 'IS1_IS2', 'IS2_IS3', 'IS3_IS4',
       'HD_AVE', 'DIRECT', 'AREA', 'VOL', 'REF_ID_COD', 'MAP_UNIT_L',
       'DESCRIPTION', 'YEAR', 'DATE_RANGE', 'REACTIVATION', 'MONTH', 'DAY',
       'Shape_Length', 'Shape_Area', 'geometry', 'filter_CONFIDENCE',
       'filter_MATERIAL', 'filter_MOVEMENT', 'filter_REFERENCE',
       'filter_DATASET_LINK', 'filter_ORIGIN'],
      dtype='object')