Catalog Wrangling Exercise#
:auth: Nate Stevens
:email: ntsteven@uw.edu
:org: Pacific Northwest Seismic Network
:license: GPLv3
:purpose:
In this exercise we’ll use ObsPy and ObsPlus to create a highly translatable earthquake catalog from events we located through analyses.
The notebook will demonstrate some tools for getting our (meta)data into two well-defined schema:
the ANSS schema (https://ncedc.org/db/Documents/NewSchemas/PI/v1.6.4/PI.1.6.4/index.htm)
the QuakeML schema (https://quake.ethz.ch/quakeml)
Finally, we’ll make an EventBank that lets us save, update, and query our event catalog in a systematic manner.
# PIP Install ObsPlus, if it isn't already loaded
! pip install obsplus
import os
from pathlib import Path
from glob import glob
import pandas as pd
from obspy import read_events, UTCDateTime
import obsplus
from obspy.geodetics import gps2dist_azimuth, kilometer2degrees
from obspy.clients.fdsn import Client
from scipy.spatial import KDTree
# TODO: Make sure this points at wherever you saved your HypoDD outputs
ROOT = Path.cwd()
# Example data
DATA = ROOT/'data'
# Where to save our eventbank
CATD = ROOT/'catalog_files'
# Where to find our raw waveforms
WFDIR = ROOT.parent/'data'
os.makedirs(str(CATD), exist_ok=True)
print(f'The data directory is registered as {DATA}')
The data directory is registered as /Users/nates/Code/GitHub/2025_ML_TSC/notebooks/Nate/data
# Load the HypoDD output into an ObsPy `Catalog` object
flist = glob(str(DATA/'*.pha'))
for _e, _f in enumerate(flist):
if _e == 0:
cat = read_events(_f)
else:
cat += read_events(_f)
# Use ObsPlus to show a DataFrame representation of events (takes a little time)
df_events = cat.to_df()
Compare this table the ORIGIN table in the ANSS schema#
…and look at all those empty fields, just waiting for you to populate them!#
# Display our new table (conveniently formatted in nearly ANSS EVENT table format!)
display(df_events)
time | latitude | longitude | depth | magnitude | event_description | associated_phase_count | azimuthal_gap | event_id | horizontal_uncertainty | ... | standard_error | used_phase_count | station_count | vertical_uncertainty | updated | author | agency_id | creation_time | version | stations | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2022-12-20 00:57:25.304 | 40.6448 | -124.3003 | 8156.0 | 0.459 | 28.0 | NaN | smi:local/event/1 | NaN | ... | 0.503 | 28.0 | 17.0 | 166.0 | NaT | NaT | 1023.NP, 1582.NP, B045.PB, B046.PB, B047.PB, B... | ||||
1 | 2022-12-20 01:21:51.303 | 39.7714 | -122.5413 | 22506.0 | 1.941 | 12.0 | NaN | smi:local/event/2 | NaN | ... | 0.398 | 12.0 | 8.0 | 569.0 | NaT | NaT | GHGB.NC, GROB.NC, GSR.NC, GTSB.BK, GVA.NC, KBN... | ||||
2 | 2022-12-20 02:44:52.382 | 40.4189 | -124.2338 | 23295.0 | 0.012 | 15.0 | NaN | smi:local/event/3 | NaN | ... | 0.319 | 15.0 | 10.0 | 385.0 | NaT | NaT | B045.PB, B046.PB, B047.PB, B049.PB, B932.PB, B... | ||||
3 | 2022-12-20 03:06:59.875 | 40.3154 | -124.3610 | 20037.0 | 0.914 | 35.0 | NaN | smi:local/event/4 | NaN | ... | 0.333 | 35.0 | 20.0 | 193.0 | NaT | NaT | 1586.NP, 89688.CE, B045.PB, B046.PB, B047.PB, ... | ||||
4 | 2022-12-20 03:18:27.700 | 40.3648 | -124.2772 | 28226.0 | 0.909 | 36.0 | NaN | smi:local/event/5 | NaN | ... | 0.477 | 36.0 | 21.0 | 219.0 | NaT | NaT | 89688.CE, B045.PB, B046.PB, B047.PB, B049.PB, ... | ||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1156 | 2022-12-20 23:54:08.504 | 40.5428 | -123.9475 | 19852.0 | 0.568 | 24.0 | NaN | smi:local/event/1157 | NaN | ... | 0.453 | 24.0 | 14.0 | 209.0 | NaT | NaT | B046.PB, B049.PB, B932.PB, B935.PB, BJES.BK, D... | ||||
1157 | 2022-12-20 23:55:52.393 | 40.5971 | -124.3612 | 20997.0 | 0.752 | 17.0 | NaN | smi:local/event/1158 | NaN | ... | 0.408 | 17.0 | 10.0 | 605.0 | NaT | NaT | B046.PB, B047.PB, B932.PB, B935.PB, KCR.NC, KM... | ||||
1158 | 2022-12-20 23:56:38.054 | 40.5076 | -124.2864 | 19549.0 | 0.735 | 19.0 | NaN | smi:local/event/1159 | NaN | ... | 0.501 | 19.0 | 13.0 | 347.0 | NaT | NaT | B046.PB, B047.PB, B049.PB, B932.PB, B935.PB, K... | ||||
1159 | 2022-12-20 23:58:40.340 | 40.6331 | -123.9790 | 20756.0 | 1.130 | 32.0 | NaN | smi:local/event/1160 | NaN | ... | 0.190 | 32.0 | 17.0 | 232.0 | NaT | NaT | 1581.NP, B046.PB, B047.PB, B049.PB, B932.PB, B... | ||||
1160 | 2022-12-20 23:59:34.450 | 40.6240 | -124.0055 | 16070.0 | 0.617 | 16.0 | NaN | smi:local/event/1161 | NaN | ... | 0.515 | 16.0 | 9.0 | 352.0 | NaT | NaT | B046.PB, B049.PB, B932.PB, B935.PB, DMOR.BK, K... |
1161 rows × 28 columns
Althought the ObsPlus documentation is sometimes sparese on examples, their coding is quite good!#
Let’s turn all of our picks into a dataframe
cat.events[0].picks
[Pick
resource_id: ResourceIdentifier(id="smi:local/ff6baaec-8ae2-4104-be00-6be63e07e6e0")
time: UTCDateTime(2022, 12, 20, 0, 57, 27, 960000)
waveform_id: WaveformStreamID(network_code='', station_code='1582.NP', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/22196f41-be0e-4fdb-a747-3354acb20252")
time: UTCDateTime(2022, 12, 20, 0, 57, 30, 173000)
waveform_id: WaveformStreamID(network_code='', station_code='KCT.NC', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/79d0278e-8f35-4293-bbed-7b8528dae957")
time: UTCDateTime(2022, 12, 20, 0, 57, 32, 924000)
waveform_id: WaveformStreamID(network_code='', station_code='KNEE.BK', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/11846bf0-a032-426e-b5fb-3c299bc20d98")
time: UTCDateTime(2022, 12, 20, 0, 57, 33, 118000)
waveform_id: WaveformStreamID(network_code='', station_code='B046.PB', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/4b85c280-ddf2-451d-a49d-af4b1207d052")
time: UTCDateTime(2022, 12, 20, 0, 57, 33, 938000)
waveform_id: WaveformStreamID(network_code='', station_code='B932.PB', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/68537cf6-691b-4742-af15-9c1c218350d6")
time: UTCDateTime(2022, 12, 20, 0, 57, 34, 410000)
waveform_id: WaveformStreamID(network_code='', station_code='PETL.BK', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/9c94ee93-ef6c-4302-b2e7-69af23dbfc15")
time: UTCDateTime(2022, 12, 20, 0, 57, 34, 528000)
waveform_id: WaveformStreamID(network_code='', station_code='B047.PB', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/9ec31755-9246-4a9f-a51d-cea1974d5fe1")
time: UTCDateTime(2022, 12, 20, 0, 57, 34, 760000)
waveform_id: WaveformStreamID(network_code='', station_code='DMOR.BK', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/8a31c156-d49a-4fa8-b538-065c05377b13")
time: UTCDateTime(2022, 12, 20, 0, 57, 35, 530000)
waveform_id: WaveformStreamID(network_code='', station_code='KCR.NC', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/7adfcb68-1647-487d-ac5d-8b3215cf3f28")
time: UTCDateTime(2022, 12, 20, 0, 57, 37, 438000)
waveform_id: WaveformStreamID(network_code='', station_code='B049.PB', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/7bdaf617-da67-48c7-840f-65d6d1da1c7a")
time: UTCDateTime(2022, 12, 20, 0, 57, 37, 573000)
waveform_id: WaveformStreamID(network_code='', station_code='KHMB.NC', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/7ef03acd-d91d-41b8-acf5-4e0a62472fc6")
time: UTCDateTime(2022, 12, 20, 0, 57, 38, 128000)
waveform_id: WaveformStreamID(network_code='', station_code='B935.PB', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/c621a1e1-d87a-45f4-bfa9-8123def3e4f1")
time: UTCDateTime(2022, 12, 20, 0, 57, 39, 83000)
waveform_id: WaveformStreamID(network_code='', station_code='KMR.NC', channel_code='Z')
phase_hint: 'P',
Pick
resource_id: ResourceIdentifier(id="smi:local/56bc9eb9-41a5-4e54-9967-0e0b63c37770")
time: UTCDateTime(2022, 12, 20, 0, 57, 30, 900000)
waveform_id: WaveformStreamID(network_code='', station_code='1582.NP', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/963c64b4-e705-4d11-8f1a-d2466e17909a")
time: UTCDateTime(2022, 12, 20, 0, 57, 31, 150000)
waveform_id: WaveformStreamID(network_code='', station_code='1023.NP', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/1a9dd320-2098-4f7c-b7a4-797fc5d9ca02")
time: UTCDateTime(2022, 12, 20, 0, 57, 34, 333000)
waveform_id: WaveformStreamID(network_code='', station_code='KCT.NC', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/44d3058d-03d5-47f6-b2a0-f91b20293b59")
time: UTCDateTime(2022, 12, 20, 0, 57, 37, 193000)
waveform_id: WaveformStreamID(network_code='', station_code='KMPB.NC', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/963d6b3f-7faa-414c-adf3-9f56c0eccccf")
time: UTCDateTime(2022, 12, 20, 0, 57, 38, 744000)
waveform_id: WaveformStreamID(network_code='', station_code='KNEE.BK', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/7f4689bb-9691-4cdf-a22b-f850ac44aaa6")
time: UTCDateTime(2022, 12, 20, 0, 57, 38, 848000)
waveform_id: WaveformStreamID(network_code='', station_code='B045.PB', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/3b8da5c4-54ba-47d6-8c9b-f6e76e93f1be")
time: UTCDateTime(2022, 12, 20, 0, 57, 39, 658000)
waveform_id: WaveformStreamID(network_code='', station_code='B046.PB', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/bf30537c-7402-4289-98fa-325911eba845")
time: UTCDateTime(2022, 12, 20, 0, 57, 41, 160000)
waveform_id: WaveformStreamID(network_code='', station_code='PETL.BK', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/b0fdc515-051c-4d2c-b1e5-ba9b15f1b1c3")
time: UTCDateTime(2022, 12, 20, 0, 57, 41, 558000)
waveform_id: WaveformStreamID(network_code='', station_code='B932.PB', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/6a5d2478-4062-4df3-a329-ac5484859370")
time: UTCDateTime(2022, 12, 20, 0, 57, 42, 610000)
waveform_id: WaveformStreamID(network_code='', station_code='DMOR.BK', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/c5cbdd22-1cdb-4fc5-8ab4-91b462085dda")
time: UTCDateTime(2022, 12, 20, 0, 57, 42, 738000)
waveform_id: WaveformStreamID(network_code='', station_code='B047.PB', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/b8027637-e311-4cbd-a834-917c23c83552")
time: UTCDateTime(2022, 12, 20, 0, 57, 45, 113000)
waveform_id: WaveformStreamID(network_code='', station_code='KHMB.NC', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/195af882-8bd7-457c-84e4-52bee9ed5075")
time: UTCDateTime(2022, 12, 20, 0, 57, 46, 968000)
waveform_id: WaveformStreamID(network_code='', station_code='B049.PB', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/4e3d9b49-7723-44fb-9cf5-3d02b65c62fa")
time: UTCDateTime(2022, 12, 20, 0, 57, 50, 293000)
waveform_id: WaveformStreamID(network_code='', station_code='KMR.NC', channel_code='N')
phase_hint: 'S',
Pick
resource_id: ResourceIdentifier(id="smi:local/32df4b29-605c-4f96-a9ce-861adbea83e5")
time: UTCDateTime(2022, 12, 20, 0, 57, 54, 853000)
waveform_id: WaveformStreamID(network_code='', station_code='KHBB.NC', channel_code='N')
phase_hint: 'S']
# # We need to re-introduce some information into our pick objects
# # Specifically location and channel codes
# # Get a nested dictionary of all channel codes to
# mseed_list = glob(str(WFDIR/'*'))
# sncl_set = set(['.'.join(os.path.split(_f)[-1].split('.')[:4]) for _f in mseed_list])
# display(sncl_set)
# mapping = {}
# for sncl in sncl_set:
# s,n,c,l = sncl.split('.')
# nslc = '.'.join([n,s,l,c])
# sn = s + '.' + n
# if sn not in mapping.keys():
# mapping.update({sn:{}})
# if c[-1] not in mapping[sn].keys():
# mapping[sn].update({c[-1]: nslc})
# display(mapping)
set()
{}
# Run small corrections
for event in cat.events:
for pick in event.picks:
sn = pick.waveform_id.station_code
pick.waveform_id.station_code=sn.split('.')[0]
pick.waveform_id.network_code=sn.split('.')[1]
df_picks = cat.arrivals_to_df()
display(df_picks)
resource_id | seed_id | pick_id | phase | time_correction | azimuth | distance | takeoff_angle | time_residual | horizontal_slowness_residual | ... | earth_model_id | creation_time | author | agency_id | network | station | location | channel | origin_id | origin_time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | smi:local/2488ad0e-e912-40f6-b29b-e091b37e1a6a | NP.1582..Z | smi:local/ff6baaec-8ae2-4104-be00-6be63e07e6e0 | P | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaT | NP | 1582 | Z | smi:local/origin/1 | 2022-12-20 00:57:25.304 | ||||
1 | smi:local/6744ccc8-dc85-4d68-9593-63da1bb6a5ec | NC.KCT..Z | smi:local/22196f41-be0e-4fdb-a747-3354acb20252 | P | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaT | NC | KCT | Z | smi:local/origin/1 | 2022-12-20 00:57:25.304 | ||||
2 | smi:local/43640bc3-52ba-44f9-95e6-31fc15f72e47 | BK.KNEE..Z | smi:local/79d0278e-8f35-4293-bbed-7b8528dae957 | P | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaT | BK | KNEE | Z | smi:local/origin/1 | 2022-12-20 00:57:25.304 | ||||
3 | smi:local/3593da20-5a39-422b-b14c-d0b541650ab0 | PB.B046..Z | smi:local/11846bf0-a032-426e-b5fb-3c299bc20d98 | P | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaT | PB | B046 | Z | smi:local/origin/1 | 2022-12-20 00:57:25.304 | ||||
4 | smi:local/6d710f7d-e638-4a9d-83cd-3d0e9700fe60 | PB.B932..Z | smi:local/4b85c280-ddf2-451d-a49d-af4b1207d052 | P | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaT | PB | B932 | Z | smi:local/origin/1 | 2022-12-20 00:57:25.304 | ||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
39378 | smi:local/eece0436-e21d-4a3f-ae64-81fb681ec3ee | PB.B932..N | smi:local/d14b33ca-d0e3-4ab9-b9e1-548056d4620f | S | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaT | PB | B932 | N | smi:local/origin/1161 | 2022-12-20 23:59:34.450 | ||||
39379 | smi:local/58da6a92-ef3a-4ba3-b7e6-dd33a7003b86 | PB.B049..N | smi:local/a4194de0-23f5-4010-afb0-766e272189e4 | S | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaT | PB | B049 | N | smi:local/origin/1161 | 2022-12-20 23:59:34.450 | ||||
39380 | smi:local/1cbdbc61-ea63-4a8e-ae53-ad3012ad50c8 | NC.KMR..N | smi:local/d86b0279-bee6-49e3-b267-7094c342f582 | S | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaT | NC | KMR | N | smi:local/origin/1161 | 2022-12-20 23:59:34.450 | ||||
39381 | smi:local/397e79fc-0cf2-490c-b747-8d4ccc639239 | BK.PRDS..N | smi:local/9fe27e89-59b9-425e-b404-a0af61752a94 | S | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaT | BK | PRDS | N | smi:local/origin/1161 | 2022-12-20 23:59:34.450 | ||||
39382 | smi:local/70cb69fb-ad72-489f-b4a0-b68371614938 | NC.KHBB..N | smi:local/50e9b984-02e2-4f98-9bc2-cd4893e650a2 | S | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaT | NC | KHBB | N | smi:local/origin/1161 | 2022-12-20 23:59:34.450 |
39383 rows × 24 columns
Compare this to the ARRIVAL and ASSOCARO tables in the ANSS schema#
ARRIVAL#
ASSOCARO (Association of Arrivals and Origins)#
# Display table
display(df_picks)
Now that we’ve populated an ObsPy Catalog object, we can write into a bunch of different formats#
QuakeML is a well-described, extensible schema for seismic event (meta)data exchange
https://quake.ethz.ch/quakeml
ObsPy saves
Catalog
objects in this format as default
BUT before you go saving everything as one big QuakeML file, be warned that they can get large and slow to read from disk.
You can find one of my past sins against easily accessible (albiet well formatted) data here:
https://zenodo.org/records/8393876
Instead, let’s put our catalog into a tidy directory structure with a client interface!
Another place where ObsPlus
shines!
# Initialize an event bank
ebank = obsplus.EventBank(base_path=CATD/'EventBank',
path_structure='{year}/{month}/{day}/{hour}',
name_structure='{event_id_end}',
format='quakeml')
# Add events to eventbank, and take a look at your file directory!
ebank.put_events(cat)
# Get a summary of events in your event bank
display(ebank.read_index())
Now let’s prove to ourself that this EventBank thingy is persistent#
# DELETE the EventBank Object in our session
del ebank
try:
display(ebank)
except NameError:
print('ebank object does not exist, as expected')
# Re-initialize connection to the EventBank
ebank = obsplus.EventBank(base_path=CATD/'EventBank')
display(ebank)
# Note that the `path_structure` or `name_structure` key-word arguments we defined are saved!
print('Our Event Bank values')
display(ebank.path_structure)
display(ebank.name_structure)
print('Default values')
print('{year}/{month}/{day}')
print('{time}_{event_id_short}')
# Query a subset of events
# Read the index (a pandas DataFrame)
df_index = ebank.read_index()
# Subset by origin times
_df_index = df_index[(df_index.time >= pd.Timestamp('2022-12-20T20:00:00')) & (df_index.time <= pd.Timestamp('2022-12-20T21:40:00'))]
# Get events from your event bank
cat = ebank.get_events(event_id=_df_index.event_id.values)
display(cat)
Let’s modify some event metadata and submit it to our EventBank#
In this case, let’s add distances and back-azimuths to associated phases
# Let's populate some source-receiver geometry information
client = Client('IRIS')
nets = ','.join(list(df_picks.network.unique()))
stas = ','.join(list(df_picks.station.unique()))
inv = client.get_stations(network=nets, station=stas, level='channel',starttime=UTCDateTime('20221220'), endtime=UTCDateTime('20221221'))
# Use ObsPlus added methods to convert the inventory into a dataframe
df_stations = inv.to_df()
display(df_stations)
# Add the maximum azimuthal gap to each origin
# Here's a starting point:
# Iterate across events
origin_gaps = []
for event in cat.events:
# Iterate across origins
for origin in event.origins:
olon = origin.longitude
olat = origin.latitude
# Iterate across associated arrivals
bazs = set([])
for arrival in origin.arrivals:
# Get pick observations
pick = arrival.pick_id.get_referred_object()
# Get station location
network = pick.waveform_id.network_code
station = pick.waveform_id.station_code
_df_sta = df_stations[(df_stations.network==network) & (df_stations.station==station)][['station','network','latitude','longitude']]
try:
slon = _df_sta.longitude.values[0]
slat = _df_sta.latitude.values[0]
except:
continue
# Get distances
dist_m, seaz, esaz = gps2dist_azimuth(slat, slon, olat, olon)
# Convert distance to degrees
arrival.distance = kilometer2degrees(dist_m*1e-3)
# Assign back-azimuth
arrival.azimuth = esaz
## A task for the HACK-A-THON, get azimuthal gaps into your EventBank
# bazs.add(esaz)
# # Calculate gaps
# bazs = list(bazs)
# bazs.sort()
# gaps = [bazs[_e+1] - bazs[_e] for _e in range(len(bazs)-1)] + [360 - bazs[-1] + bazs[0]]
# # Get maximum azimuthal gap
# maxgap = max(gaps)
# # associate with resourceID
# origin_gaps.append([origin.resource_id.id, maxgap])
# # An exercise for users to incorporate 'gap' values into their preferred schema
# display(pd.DataFrame(origin_gaps, columns=['resource_id','gap']))
# Show that the geometry data stuck
display(cat.events[0].origins[0].arrivals)
# Submit the catalog back to the event bank to update
ebank.put_events(cat)
# Delete `cat` and re-load to prove to ourselves that the geometry information was saved
del cat
try:
display(cat)
except NameError:
print('cat does not exist, as expected')
# Re-load the sub-catalog
cat = ebank.get_events(event_id = _df_index.event_id)
display(cat)
# View that the geometry data persist on events we modified
display(cat.events[0].origins[0].arrivals)
# Load all the events and check an unmodified event
cat = ebank.get_events()
# Display catalog (should see everything)
display(cat)
# Display arrivals for the the first event's origin, which we did not add back azimuths or distances to.
display(cat.events[0].origins[0].arrivals)
Ok, this is all well and good - now let’s see how our catalog compares to what’s come before#
Pick and association data are not uniformly available across all ObsPy FDSN Client servers. In our example here, we’ll pull data from the NCEDC (Northern California Earthquake Data Center). In Yiyu’s earlier example we retrieved PNSN event data from the USGS client.
If you want something closer to a one-stop-shop, take a look at usgs-libcomcat
.
# Get an FDSN data client for the Northern California Earthquake Data Center
client = Client('NCEDC')
# extract a box around our catalog
df_llzt = df_events[['latitude','longitude','depth','time']]
_s_min = df_llzt.min()
_s_max = df_llzt.max()
# with a bit of padding
pad = 10/111.2 # [10 km in great-circle distance degrees]
rcat = client.get_events(minlatitude=_s_min.latitude - pad, minlongitude=_s_min.longitude - pad,
maxlatitude=_s_max.latitude + pad, maxlongitude=_s_max.longitude + pad,
starttime=UTCDateTime(_s_min.time.timestamp()),
endtime=UTCDateTime(_s_max.time.timestamp()), includearrivals=True)
# Let's use obspy + obsplus to get ANSS-like data frames for ORIGIN, ASSOCARO, and ARRIVAL
# Get an ANSS-like ORIGIN dataframe
ncedc_origin = rcat.to_df()
# Get an ANSS-like ARRIVAL dataframe
ncedc_assoc = rcat.arrivals_to_df()
# Get an ANSS-like ASSOC(ARO) dataframe
ncedc_arrival = rcat.picks_to_df()
# Show them
display(ncedc_orig)
display(ncedc_arrival)
display(ncedc_assoc)
ourcat_origin = cat.to_df()
display(ourcat_origin)
display(ncedc_origin)
# Get hypocentral parameters into float values
# For NCEDC catalog
ncedc_xyzt = ncedc_origin[['longitude','latitude','depth']]
ncedc_xyzt = ncedc_xyzt.assign(time=[_t.time.timestamp() for _, _t in ncedc_origin.iterrows()])
# For our catalog
ourcat_xyzt = ourcat_origin[['longitude','latitude','depth']]
ourcat_xyzt = ourcat_xyzt.assign(time=[_t.time.timestamp() for _, _t in ourcat_origin.iterrows()])
# Get the mean and standard deviation of each of these from one catalog
ncedc_mu_xyzt = ncedc_xyzt.mean()
ncedc_sig_xyzt = ncedc_xyzt.std()
# Apply the mean and StDev as standard scalars to BOTH catalogs
oc_scaled = (ourcat_xyzt.values - ncedc_mu_xyzt.values)/(ncedc_sig_xyzt.values)
nc_scaled = (ncedc_xyzt.values - ncedc_mu_xyzt.values)/(ncedc_sig_xyzt.values)
# Create a Ball Tree to do an efficient nearest neighbor comparison
# between a reference feature set and a new dataset
# Initialize Ball Tree on NCEDC catalog
ckdt = KDTree(oc_scaled)
# Run query with points in our new catalog
nn = ckdt.query(nc_scaled)
# Convert into a dataframe
comparison = pd.DataFrame(dict(zip(['nn_distance','ourcat_idx'], nn)))
comparison.index.name='ncedc_idx'
# Associate NCEDC origin data
comparison = pd.concat(
[comparison, ncedc_origin[['event_id','longitude','latitude','depth','time']]],
axis=1,
ignore_index=False)
# Join nearest neighbor origins from our catalog
comparison = comparison.join(
ourcat_origin[['event_id','longitude','latitude','depth','time']],
how='left',
on='ourcat_idx',
lsuffix='_ncedc',
rsuffix='_ourcat')
# Calculate dt, dh, and dz columns
comparison = comparison.assign(dt=[(row.time_ncedc - row.time_ourcat).total_seconds() for _, row in comparison.iterrows()])
comparison = comparison.assign(dh=[
gps2dist_azimuth(row.latitude_ncedc, row.longitude_ncedc, row.latitude_ourcat, row.longitude_ourcat)[0]
for _, row in comparison.iterrows()])
comparison = comparison.assign(dz=[row.depth_ncedc - row.depth_ourcat for _, row in comparison.iterrows()])
# Sort & display
display(comparison)#.sort_values(by=['reference_idx','distance'], ascending=True))
# Apply filtering for maximum offsets
display(comparison[(comparison.dt.abs() < 20) &
(comparison.dh.abs() < 10e3) &
(comparison.dz.abs() < 5e3)])