Catalog Wrangling Exercise#

:auth: Nate Stevens :email: ntsteven@uw.edu
:org: Pacific Northwest Seismic Network
:license: GPLv3
:purpose:
In this exercise we’ll use ObsPy and ObsPlus to create a highly translatable earthquake catalog from events we located through analyses.

The notebook will demonstrate some tools for getting our (meta)data into two well-defined schema:

Finally, we’ll make an EventBank that lets us save, update, and query our event catalog in a systematic manner.

# PIP Install ObsPlus, if it isn't already loaded
! pip install obsplus
import os
from pathlib import Path
from glob import glob
import pandas as pd
from obspy import read_events, UTCDateTime
import obsplus
from obspy.geodetics import gps2dist_azimuth, kilometer2degrees
from obspy.clients.fdsn import Client
from scipy.spatial import KDTree
# TODO: Make sure this points at wherever you saved your HypoDD outputs
ROOT = Path.cwd()
# Example data
DATA = ROOT/'data'
# Where to save our eventbank
CATD = ROOT/'catalog_files'
# Where to find our raw waveforms
WFDIR = ROOT.parent/'data'
os.makedirs(str(CATD), exist_ok=True)
print(f'The data directory is registered as {DATA}')
The data directory is registered as /Users/nates/Code/GitHub/2025_ML_TSC/notebooks/Nate/data
# Load the HypoDD output into an ObsPy `Catalog` object
flist = glob(str(DATA/'*.pha'))
for _e, _f in enumerate(flist):
    if _e == 0:
        cat = read_events(_f)
    else:
        cat += read_events(_f)
# Use ObsPlus to show a DataFrame representation of events (takes a little time)
df_events = cat.to_df()

Compare this table the ORIGIN table in the ANSS schema#

https://ncedc.org/db/Documents/NewSchemas/PI/v1.6.4/PI.1.6.4/Content/Tbl_388b5374f81611d6bcce00c04f794c81.htm

…and look at all those empty fields, just waiting for you to populate them!#

# Display our new table (conveniently formatted in nearly ANSS EVENT table format!)
display(df_events)
time latitude longitude depth magnitude event_description associated_phase_count azimuthal_gap event_id horizontal_uncertainty ... standard_error used_phase_count station_count vertical_uncertainty updated author agency_id creation_time version stations
0 2022-12-20 00:57:25.304 40.6448 -124.3003 8156.0 0.459 28.0 NaN smi:local/event/1 NaN ... 0.503 28.0 17.0 166.0 NaT NaT 1023.NP, 1582.NP, B045.PB, B046.PB, B047.PB, B...
1 2022-12-20 01:21:51.303 39.7714 -122.5413 22506.0 1.941 12.0 NaN smi:local/event/2 NaN ... 0.398 12.0 8.0 569.0 NaT NaT GHGB.NC, GROB.NC, GSR.NC, GTSB.BK, GVA.NC, KBN...
2 2022-12-20 02:44:52.382 40.4189 -124.2338 23295.0 0.012 15.0 NaN smi:local/event/3 NaN ... 0.319 15.0 10.0 385.0 NaT NaT B045.PB, B046.PB, B047.PB, B049.PB, B932.PB, B...
3 2022-12-20 03:06:59.875 40.3154 -124.3610 20037.0 0.914 35.0 NaN smi:local/event/4 NaN ... 0.333 35.0 20.0 193.0 NaT NaT 1586.NP, 89688.CE, B045.PB, B046.PB, B047.PB, ...
4 2022-12-20 03:18:27.700 40.3648 -124.2772 28226.0 0.909 36.0 NaN smi:local/event/5 NaN ... 0.477 36.0 21.0 219.0 NaT NaT 89688.CE, B045.PB, B046.PB, B047.PB, B049.PB, ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1156 2022-12-20 23:54:08.504 40.5428 -123.9475 19852.0 0.568 24.0 NaN smi:local/event/1157 NaN ... 0.453 24.0 14.0 209.0 NaT NaT B046.PB, B049.PB, B932.PB, B935.PB, BJES.BK, D...
1157 2022-12-20 23:55:52.393 40.5971 -124.3612 20997.0 0.752 17.0 NaN smi:local/event/1158 NaN ... 0.408 17.0 10.0 605.0 NaT NaT B046.PB, B047.PB, B932.PB, B935.PB, KCR.NC, KM...
1158 2022-12-20 23:56:38.054 40.5076 -124.2864 19549.0 0.735 19.0 NaN smi:local/event/1159 NaN ... 0.501 19.0 13.0 347.0 NaT NaT B046.PB, B047.PB, B049.PB, B932.PB, B935.PB, K...
1159 2022-12-20 23:58:40.340 40.6331 -123.9790 20756.0 1.130 32.0 NaN smi:local/event/1160 NaN ... 0.190 32.0 17.0 232.0 NaT NaT 1581.NP, B046.PB, B047.PB, B049.PB, B932.PB, B...
1160 2022-12-20 23:59:34.450 40.6240 -124.0055 16070.0 0.617 16.0 NaN smi:local/event/1161 NaN ... 0.515 16.0 9.0 352.0 NaT NaT B046.PB, B049.PB, B932.PB, B935.PB, DMOR.BK, K...

1161 rows × 28 columns

Althought the ObsPlus documentation is sometimes sparese on examples, their coding is quite good!#

Let’s turn all of our picks into a dataframe

cat.events[0].picks
[Pick
	 resource_id: ResourceIdentifier(id="smi:local/ff6baaec-8ae2-4104-be00-6be63e07e6e0")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 27, 960000)
	 waveform_id: WaveformStreamID(network_code='', station_code='1582.NP', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/22196f41-be0e-4fdb-a747-3354acb20252")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 30, 173000)
	 waveform_id: WaveformStreamID(network_code='', station_code='KCT.NC', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/79d0278e-8f35-4293-bbed-7b8528dae957")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 32, 924000)
	 waveform_id: WaveformStreamID(network_code='', station_code='KNEE.BK', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/11846bf0-a032-426e-b5fb-3c299bc20d98")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 33, 118000)
	 waveform_id: WaveformStreamID(network_code='', station_code='B046.PB', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/4b85c280-ddf2-451d-a49d-af4b1207d052")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 33, 938000)
	 waveform_id: WaveformStreamID(network_code='', station_code='B932.PB', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/68537cf6-691b-4742-af15-9c1c218350d6")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 34, 410000)
	 waveform_id: WaveformStreamID(network_code='', station_code='PETL.BK', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/9c94ee93-ef6c-4302-b2e7-69af23dbfc15")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 34, 528000)
	 waveform_id: WaveformStreamID(network_code='', station_code='B047.PB', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/9ec31755-9246-4a9f-a51d-cea1974d5fe1")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 34, 760000)
	 waveform_id: WaveformStreamID(network_code='', station_code='DMOR.BK', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/8a31c156-d49a-4fa8-b538-065c05377b13")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 35, 530000)
	 waveform_id: WaveformStreamID(network_code='', station_code='KCR.NC', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/7adfcb68-1647-487d-ac5d-8b3215cf3f28")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 37, 438000)
	 waveform_id: WaveformStreamID(network_code='', station_code='B049.PB', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/7bdaf617-da67-48c7-840f-65d6d1da1c7a")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 37, 573000)
	 waveform_id: WaveformStreamID(network_code='', station_code='KHMB.NC', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/7ef03acd-d91d-41b8-acf5-4e0a62472fc6")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 38, 128000)
	 waveform_id: WaveformStreamID(network_code='', station_code='B935.PB', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/c621a1e1-d87a-45f4-bfa9-8123def3e4f1")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 39, 83000)
	 waveform_id: WaveformStreamID(network_code='', station_code='KMR.NC', channel_code='Z')
	  phase_hint: 'P',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/56bc9eb9-41a5-4e54-9967-0e0b63c37770")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 30, 900000)
	 waveform_id: WaveformStreamID(network_code='', station_code='1582.NP', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/963c64b4-e705-4d11-8f1a-d2466e17909a")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 31, 150000)
	 waveform_id: WaveformStreamID(network_code='', station_code='1023.NP', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/1a9dd320-2098-4f7c-b7a4-797fc5d9ca02")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 34, 333000)
	 waveform_id: WaveformStreamID(network_code='', station_code='KCT.NC', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/44d3058d-03d5-47f6-b2a0-f91b20293b59")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 37, 193000)
	 waveform_id: WaveformStreamID(network_code='', station_code='KMPB.NC', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/963d6b3f-7faa-414c-adf3-9f56c0eccccf")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 38, 744000)
	 waveform_id: WaveformStreamID(network_code='', station_code='KNEE.BK', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/7f4689bb-9691-4cdf-a22b-f850ac44aaa6")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 38, 848000)
	 waveform_id: WaveformStreamID(network_code='', station_code='B045.PB', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/3b8da5c4-54ba-47d6-8c9b-f6e76e93f1be")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 39, 658000)
	 waveform_id: WaveformStreamID(network_code='', station_code='B046.PB', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/bf30537c-7402-4289-98fa-325911eba845")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 41, 160000)
	 waveform_id: WaveformStreamID(network_code='', station_code='PETL.BK', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/b0fdc515-051c-4d2c-b1e5-ba9b15f1b1c3")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 41, 558000)
	 waveform_id: WaveformStreamID(network_code='', station_code='B932.PB', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/6a5d2478-4062-4df3-a329-ac5484859370")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 42, 610000)
	 waveform_id: WaveformStreamID(network_code='', station_code='DMOR.BK', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/c5cbdd22-1cdb-4fc5-8ab4-91b462085dda")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 42, 738000)
	 waveform_id: WaveformStreamID(network_code='', station_code='B047.PB', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/b8027637-e311-4cbd-a834-917c23c83552")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 45, 113000)
	 waveform_id: WaveformStreamID(network_code='', station_code='KHMB.NC', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/195af882-8bd7-457c-84e4-52bee9ed5075")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 46, 968000)
	 waveform_id: WaveformStreamID(network_code='', station_code='B049.PB', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/4e3d9b49-7723-44fb-9cf5-3d02b65c62fa")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 50, 293000)
	 waveform_id: WaveformStreamID(network_code='', station_code='KMR.NC', channel_code='N')
	  phase_hint: 'S',
 Pick
	 resource_id: ResourceIdentifier(id="smi:local/32df4b29-605c-4f96-a9ce-861adbea83e5")
	        time: UTCDateTime(2022, 12, 20, 0, 57, 54, 853000)
	 waveform_id: WaveformStreamID(network_code='', station_code='KHBB.NC', channel_code='N')
	  phase_hint: 'S']
# # We need to re-introduce some information into our pick objects
# # Specifically location and channel codes
# # Get a nested dictionary of all channel codes to 
# mseed_list = glob(str(WFDIR/'*'))
# sncl_set = set(['.'.join(os.path.split(_f)[-1].split('.')[:4]) for _f in mseed_list])
# display(sncl_set)

# mapping = {}
# for sncl in sncl_set:
#     s,n,c,l = sncl.split('.')
#     nslc = '.'.join([n,s,l,c])
#     sn = s + '.' + n
#     if sn not in mapping.keys():
#         mapping.update({sn:{}})
#     if c[-1] not in mapping[sn].keys():
#         mapping[sn].update({c[-1]: nslc})

# display(mapping)
set()
{}
# Run small corrections
for event in cat.events:
    for pick in event.picks:
        sn = pick.waveform_id.station_code
        pick.waveform_id.station_code=sn.split('.')[0]
        pick.waveform_id.network_code=sn.split('.')[1]
df_picks = cat.arrivals_to_df()
display(df_picks)
resource_id seed_id pick_id phase time_correction azimuth distance takeoff_angle time_residual horizontal_slowness_residual ... earth_model_id creation_time author agency_id network station location channel origin_id origin_time
0 smi:local/2488ad0e-e912-40f6-b29b-e091b37e1a6a NP.1582..Z smi:local/ff6baaec-8ae2-4104-be00-6be63e07e6e0 P NaN NaN NaN NaN NaN NaN ... NaT NP 1582 Z smi:local/origin/1 2022-12-20 00:57:25.304
1 smi:local/6744ccc8-dc85-4d68-9593-63da1bb6a5ec NC.KCT..Z smi:local/22196f41-be0e-4fdb-a747-3354acb20252 P NaN NaN NaN NaN NaN NaN ... NaT NC KCT Z smi:local/origin/1 2022-12-20 00:57:25.304
2 smi:local/43640bc3-52ba-44f9-95e6-31fc15f72e47 BK.KNEE..Z smi:local/79d0278e-8f35-4293-bbed-7b8528dae957 P NaN NaN NaN NaN NaN NaN ... NaT BK KNEE Z smi:local/origin/1 2022-12-20 00:57:25.304
3 smi:local/3593da20-5a39-422b-b14c-d0b541650ab0 PB.B046..Z smi:local/11846bf0-a032-426e-b5fb-3c299bc20d98 P NaN NaN NaN NaN NaN NaN ... NaT PB B046 Z smi:local/origin/1 2022-12-20 00:57:25.304
4 smi:local/6d710f7d-e638-4a9d-83cd-3d0e9700fe60 PB.B932..Z smi:local/4b85c280-ddf2-451d-a49d-af4b1207d052 P NaN NaN NaN NaN NaN NaN ... NaT PB B932 Z smi:local/origin/1 2022-12-20 00:57:25.304
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
39378 smi:local/eece0436-e21d-4a3f-ae64-81fb681ec3ee PB.B932..N smi:local/d14b33ca-d0e3-4ab9-b9e1-548056d4620f S NaN NaN NaN NaN NaN NaN ... NaT PB B932 N smi:local/origin/1161 2022-12-20 23:59:34.450
39379 smi:local/58da6a92-ef3a-4ba3-b7e6-dd33a7003b86 PB.B049..N smi:local/a4194de0-23f5-4010-afb0-766e272189e4 S NaN NaN NaN NaN NaN NaN ... NaT PB B049 N smi:local/origin/1161 2022-12-20 23:59:34.450
39380 smi:local/1cbdbc61-ea63-4a8e-ae53-ad3012ad50c8 NC.KMR..N smi:local/d86b0279-bee6-49e3-b267-7094c342f582 S NaN NaN NaN NaN NaN NaN ... NaT NC KMR N smi:local/origin/1161 2022-12-20 23:59:34.450
39381 smi:local/397e79fc-0cf2-490c-b747-8d4ccc639239 BK.PRDS..N smi:local/9fe27e89-59b9-425e-b404-a0af61752a94 S NaN NaN NaN NaN NaN NaN ... NaT BK PRDS N smi:local/origin/1161 2022-12-20 23:59:34.450
39382 smi:local/70cb69fb-ad72-489f-b4a0-b68371614938 NC.KHBB..N smi:local/50e9b984-02e2-4f98-9bc2-cd4893e650a2 S NaN NaN NaN NaN NaN NaN ... NaT NC KHBB N smi:local/origin/1161 2022-12-20 23:59:34.450

39383 rows × 24 columns

Compare this to the ARRIVAL and ASSOCARO tables in the ANSS schema#

ARRIVAL#

https://ncedc.org/db/Documents/NewSchemas/PI/v1.6.4/PI.1.6.4/Content/Tbl_388b5400f81611d6bcce00c04f794c81.htm

ASSOCARO (Association of Arrivals and Origins)#

https://ncedc.org/db/Documents/NewSchemas/PI/v1.6.4/PI.1.6.4/Content/Tbl_388b542ef81611d6bcce00c04f794c81.htm

# Display table
display(df_picks)

Now that we’ve populated an ObsPy Catalog object, we can write into a bunch of different formats#

QuakeML is a well-described, extensible schema for seismic event (meta)data exchange
https://quake.ethz.ch/quakeml

  • ObsPy saves Catalog objects in this format as default

BUT before you go saving everything as one big QuakeML file, be warned that they can get large and slow to read from disk.

You can find one of my past sins against easily accessible (albiet well formatted) data here:
https://zenodo.org/records/8393876

Instead, let’s put our catalog into a tidy directory structure with a client interface!
Another place where ObsPlus shines!

# Initialize an event bank
ebank = obsplus.EventBank(base_path=CATD/'EventBank',
                          path_structure='{year}/{month}/{day}/{hour}',
                          name_structure='{event_id_end}',
                          format='quakeml')
# Add events to eventbank, and take a look at your file directory!
ebank.put_events(cat)
# Get a summary of events in your event bank
display(ebank.read_index())

Now let’s prove to ourself that this EventBank thingy is persistent#

# DELETE the EventBank Object in our session
del ebank
try:
    display(ebank)
except NameError:
    print('ebank object does not exist, as expected')
# Re-initialize connection to the EventBank
ebank = obsplus.EventBank(base_path=CATD/'EventBank')
display(ebank)
# Note that the `path_structure` or `name_structure` key-word arguments we defined are saved!
print('Our Event Bank values')
display(ebank.path_structure)
display(ebank.name_structure)
print('Default values')
print('{year}/{month}/{day}')
print('{time}_{event_id_short}')
# Query a subset of events
# Read the index (a pandas DataFrame)
df_index = ebank.read_index()
# Subset by origin times
_df_index = df_index[(df_index.time >= pd.Timestamp('2022-12-20T20:00:00')) & (df_index.time <= pd.Timestamp('2022-12-20T21:40:00'))]
# Get events from your event bank
cat = ebank.get_events(event_id=_df_index.event_id.values)

display(cat)

Let’s modify some event metadata and submit it to our EventBank#

In this case, let’s add distances and back-azimuths to associated phases

# Let's populate some source-receiver geometry information
client = Client('IRIS')
nets = ','.join(list(df_picks.network.unique()))
stas = ','.join(list(df_picks.station.unique()))
inv = client.get_stations(network=nets, station=stas, level='channel',starttime=UTCDateTime('20221220'), endtime=UTCDateTime('20221221'))

# Use ObsPlus added methods to convert the inventory into a dataframe
df_stations = inv.to_df()

display(df_stations)
# Add the maximum azimuthal gap to each origin
# Here's a starting point:

# Iterate across events
origin_gaps = []
for event in cat.events:
    # Iterate across origins
    for origin in event.origins:
        olon = origin.longitude
        olat = origin.latitude
        # Iterate across associated arrivals
        bazs = set([])
        for arrival in origin.arrivals:
            # Get pick observations
            pick = arrival.pick_id.get_referred_object()
            # Get station location
            network = pick.waveform_id.network_code
            station = pick.waveform_id.station_code
            _df_sta = df_stations[(df_stations.network==network) & (df_stations.station==station)][['station','network','latitude','longitude']]
            try:
                slon = _df_sta.longitude.values[0]
                slat = _df_sta.latitude.values[0]
            except:
                continue
            # Get distances
            dist_m, seaz, esaz = gps2dist_azimuth(slat, slon, olat, olon)
            # Convert distance to degrees
            arrival.distance = kilometer2degrees(dist_m*1e-3)
            # Assign back-azimuth
            arrival.azimuth = esaz

## A task for the HACK-A-THON, get azimuthal gaps into your EventBank

#             bazs.add(esaz)

        
#         # Calculate gaps
#         bazs = list(bazs)
#         bazs.sort()
#         gaps = [bazs[_e+1] - bazs[_e] for _e in range(len(bazs)-1)] + [360 - bazs[-1] + bazs[0]]
#         # Get maximum azimuthal gap
#         maxgap = max(gaps)
#         # associate with resourceID
#         origin_gaps.append([origin.resource_id.id, maxgap])

# # An exercise for users to incorporate 'gap' values into their preferred schema
# display(pd.DataFrame(origin_gaps, columns=['resource_id','gap']))
# Show that the geometry data stuck
display(cat.events[0].origins[0].arrivals)
# Submit the catalog back to the event bank to update
ebank.put_events(cat)
# Delete `cat` and re-load to prove to ourselves that the geometry information was saved
del cat
try:
    display(cat)
except NameError:
    print('cat does not exist, as expected')
# Re-load the sub-catalog
cat = ebank.get_events(event_id = _df_index.event_id)
display(cat)
# View that the geometry data persist on events we modified
display(cat.events[0].origins[0].arrivals)
# Load all the events and check an unmodified event
cat = ebank.get_events()
# Display catalog (should see everything)
display(cat)
# Display arrivals for the the first event's origin, which we did not add back azimuths or distances to.
display(cat.events[0].origins[0].arrivals)

Ok, this is all well and good - now let’s see how our catalog compares to what’s come before#

Pick and association data are not uniformly available across all ObsPy FDSN Client servers. In our example here, we’ll pull data from the NCEDC (Northern California Earthquake Data Center). In Yiyu’s earlier example we retrieved PNSN event data from the USGS client.

If you want something closer to a one-stop-shop, take a look at usgs-libcomcat.

# Get an FDSN data client for the Northern California Earthquake Data Center
client = Client('NCEDC')
# extract a box around our catalog
df_llzt = df_events[['latitude','longitude','depth','time']]
_s_min = df_llzt.min()
_s_max = df_llzt.max()
# with a bit of padding 
pad = 10/111.2 # [10 km in great-circle distance degrees]
rcat = client.get_events(minlatitude=_s_min.latitude - pad, minlongitude=_s_min.longitude - pad,
                         maxlatitude=_s_max.latitude + pad, maxlongitude=_s_max.longitude + pad,
                         starttime=UTCDateTime(_s_min.time.timestamp()),
                         endtime=UTCDateTime(_s_max.time.timestamp()), includearrivals=True)
# Let's use obspy + obsplus to get ANSS-like data frames for ORIGIN, ASSOCARO, and ARRIVAL
# Get an ANSS-like ORIGIN dataframe
ncedc_origin = rcat.to_df()
# Get an ANSS-like ARRIVAL dataframe
ncedc_assoc = rcat.arrivals_to_df()
# Get an ANSS-like ASSOC(ARO) dataframe
ncedc_arrival = rcat.picks_to_df()

# Show them
display(ncedc_orig)
display(ncedc_arrival)
display(ncedc_assoc)
ourcat_origin = cat.to_df()
display(ourcat_origin)
display(ncedc_origin)
# Get hypocentral parameters into float values
# For NCEDC catalog
ncedc_xyzt = ncedc_origin[['longitude','latitude','depth']]
ncedc_xyzt = ncedc_xyzt.assign(time=[_t.time.timestamp() for _, _t in ncedc_origin.iterrows()])
# For our catalog
ourcat_xyzt = ourcat_origin[['longitude','latitude','depth']]
ourcat_xyzt = ourcat_xyzt.assign(time=[_t.time.timestamp() for _, _t in ourcat_origin.iterrows()])

# Get the mean and standard deviation of each of these from one catalog
ncedc_mu_xyzt = ncedc_xyzt.mean()
ncedc_sig_xyzt = ncedc_xyzt.std()
# Apply the mean and StDev as standard scalars to BOTH catalogs
oc_scaled = (ourcat_xyzt.values - ncedc_mu_xyzt.values)/(ncedc_sig_xyzt.values)
nc_scaled = (ncedc_xyzt.values - ncedc_mu_xyzt.values)/(ncedc_sig_xyzt.values)
# Create a Ball Tree to do an efficient nearest neighbor comparison
# between a reference feature set and a new dataset
# Initialize Ball Tree on NCEDC catalog
ckdt = KDTree(oc_scaled)
# Run query with points in our new catalog
nn = ckdt.query(nc_scaled)
# Convert into a dataframe
comparison = pd.DataFrame(dict(zip(['nn_distance','ourcat_idx'], nn)))
comparison.index.name='ncedc_idx'

# Associate NCEDC origin data
comparison = pd.concat(
    [comparison, ncedc_origin[['event_id','longitude','latitude','depth','time']]],
    axis=1,
    ignore_index=False)
# Join nearest neighbor origins from our catalog
comparison = comparison.join(
    ourcat_origin[['event_id','longitude','latitude','depth','time']],
    how='left',
    on='ourcat_idx',
    lsuffix='_ncedc',
    rsuffix='_ourcat')
# Calculate dt, dh, and dz columns
comparison = comparison.assign(dt=[(row.time_ncedc - row.time_ourcat).total_seconds() for _, row in comparison.iterrows()])
comparison = comparison.assign(dh=[
    gps2dist_azimuth(row.latitude_ncedc, row.longitude_ncedc, row.latitude_ourcat, row.longitude_ourcat)[0]
    for _, row in comparison.iterrows()])
comparison = comparison.assign(dz=[row.depth_ncedc - row.depth_ourcat for _, row in comparison.iterrows()])
# Sort & display
display(comparison)#.sort_values(by=['reference_idx','distance'], ascending=True))
# Apply filtering for maximum offsets
display(comparison[(comparison.dt.abs() < 20) &
                   (comparison.dh.abs() < 10e3) &
                   (comparison.dz.abs() < 5e3)])