API Reference¶
Complete reference documentation auto-generated from the opdi source code.
All modules use Google-style docstrings.
Modules
- Configuration –
opdi.config - Ingestion –
opdi.ingestion - Reference Data –
opdi.referenceAirportDetectionZoneGeneratorAirportLayoutGeneratorAirspaceH3Generatorgenerate_circle_polygon()hexagonify_airport()retrieve_osm_data()fill_geometry()fill_geometry_compact()- H3 Airport Detection Zones –
opdi.reference.h3_airport_zones - H3 Airport Layouts –
opdi.reference.h3_airport_layouts - H3 Airspaces –
opdi.reference.h3_airspaces
- Pipeline –
opdi.pipeline - Output –
opdi.output - Monitoring –
opdi.monitoring - Utilities –
opdi.utilsgenerate_months()generate_intervals()get_start_end_of_month()get_data_within_timeframe()get_data_within_interval()SparkSessionManagerget_spark()haversine_distance()add_cumulative_distance()calculate_bearing()destination_point()meters_to_flight_level()flight_level_to_meters()h3_list_prep()get_h3_coords()compact_h3_set()uncompact_h3_set()h3_distance()k_ring()hex_ring()polyfill_geojson()is_valid_h3_index()- DateTime Helpers –
opdi.utils.datetime_helpers - Spark Helpers –
opdi.utils.spark_helpers - Geospatial –
opdi.utils.geospatial - H3 Helpers –
opdi.utils.h3_helpers
Package-Level Exports¶
OPDI - Open Performance Data Initiative¶
A Python package for processing OpenSky Network aviation data through modular ETL pipelines. Provides ingestion, transformation, and output layers for European air-traffic data analysis.
Quick start:
from opdi.config import OPDIConfig
from opdi.utils.spark_helpers import get_spark
config = OPDIConfig.for_environment("dev")
spark = get_spark("dev")
Full pipeline:
from opdi.runner import run_pipeline
run_pipeline(env="live", start_date=date(2024, 1, 1), end_date=date(2024, 6, 1))
- class opdi.OPDIConfig(project=<factory>, spark=<factory>, h3=<factory>, ingestion=<factory>)[source]¶
Bases:
objectMain OPDI configuration container.
- Parameters:
project (ProjectConfig)
spark (SparkConfig)
h3 (H3Config)
ingestion (IngestionConfig)
- project: ProjectConfig¶
- spark: SparkConfig¶
- ingestion: IngestionConfig¶
- classmethod for_environment(env='dev')[source]¶
Create configuration for specific environment.
- Parameters:
env (str) – Environment name (“dev”, “live”, or “local”)
- Returns:
OPDIConfig instance with environment-specific settings
- Return type:
- class opdi.ProjectConfig(project_name='project_opdi', warehouse_path='abfs://storage-fs@cdpdllive.dfs.core.windows.net/data/project/opdi.db/unmanaged', hadoop_filesystem='abfs://storage-fs@cdpdllive.dfs.core.windows.net/data/project/opdi.db/unmanaged')[source]¶
Bases:
objectProject-level configuration.
- class opdi.SparkConfig(app_name='OPDI Pipeline', driver_cores='1', driver_memory='8G', driver_max_result_size='6g', executor_memory='12G', executor_memory_overhead='3G', executor_cores='2', executor_instances='3', dynamic_allocation_max_executors='10', network_timeout='800s', executor_heartbeat_interval='400s', shuffle_compress='true', shuffle_spill_compress='true', ui_show_console_progress='false', iceberg_jar_path='/opt/spark/optional-lib/iceberg-spark-runtime-3.5_2.12-1.5.2.1.23.17218.0-1.jar', handle_timestamp_without_timezone='true', hadoop_group='eur-app-opdi')[source]¶
Bases:
objectSpark session configuration.
- Parameters:
app_name (str)
driver_cores (str)
driver_memory (str)
driver_max_result_size (str)
executor_memory (str)
executor_memory_overhead (str)
executor_cores (str)
executor_instances (str)
dynamic_allocation_max_executors (str)
network_timeout (str)
executor_heartbeat_interval (str)
shuffle_compress (str)
shuffle_spill_compress (str)
ui_show_console_progress (str)
iceberg_jar_path (str)
handle_timestamp_without_timezone (str)
hadoop_group (str)
- class opdi.H3Config(airport_detection_resolution=7, airport_layout_resolution=12, track_resolutions=<factory>, airspace_resolution=7)[source]¶
Bases:
objectH3 hexagonal indexing configuration.
- Parameters:
- class opdi.IngestionConfig(minio_endpoint='https://s3.opensky-network.org', osn_aircraft_db_url='https://s3.opensky-network.org/data-samples/metadata/aircraft-database-complete-2024-10.csv', ourairports_base_url='https://ourairports.com/data/', ourairports_datasets=<factory>, batch_size=250, track_gap_threshold_minutes=30, track_gap_low_altitude_minutes=15, track_gap_low_altitude_meters=1524.0, max_vertical_rate_mps=25.4, altitude_smoothing_window_minutes=5)[source]¶
Bases:
objectData ingestion configuration.
- Parameters:
- osn_aircraft_db_url: str = 'https://s3.opensky-network.org/data-samples/metadata/aircraft-database-complete-2024-10.csv'¶
URL for OpenSky Network aircraft database.
- track_gap_low_altitude_minutes: int = 15¶
Time gap threshold at low altitude for splitting tracks (minutes).