Configuration – opdi.config¶
Configuration management for OPDI pipeline.
Provides centralized configuration using dataclasses for project settings, Spark configurations, H3 parameters, and ingestion settings.
- class opdi.config.ProjectConfig(project_name='project_opdi', warehouse_path='abfs://storage-fs@cdpdllive.dfs.core.windows.net/data/project/opdi.db/unmanaged', hadoop_filesystem='abfs://storage-fs@cdpdllive.dfs.core.windows.net/data/project/opdi.db/unmanaged')[source]¶
Bases:
objectProject-level configuration.
- class opdi.config.SparkConfig(app_name='OPDI Pipeline', driver_cores='1', driver_memory='8G', driver_max_result_size='6g', executor_memory='12G', executor_memory_overhead='3G', executor_cores='2', executor_instances='3', dynamic_allocation_max_executors='10', network_timeout='800s', executor_heartbeat_interval='400s', shuffle_compress='true', shuffle_spill_compress='true', ui_show_console_progress='false', iceberg_jar_path='/opt/spark/optional-lib/iceberg-spark-runtime-3.5_2.12-1.5.2.1.23.17218.0-1.jar', handle_timestamp_without_timezone='true', hadoop_group='eur-app-opdi')[source]¶
Bases:
objectSpark session configuration.
- Parameters:
app_name (str)
driver_cores (str)
driver_memory (str)
driver_max_result_size (str)
executor_memory (str)
executor_memory_overhead (str)
executor_cores (str)
executor_instances (str)
dynamic_allocation_max_executors (str)
network_timeout (str)
executor_heartbeat_interval (str)
shuffle_compress (str)
shuffle_spill_compress (str)
ui_show_console_progress (str)
iceberg_jar_path (str)
handle_timestamp_without_timezone (str)
hadoop_group (str)
- class opdi.config.H3Config(airport_detection_resolution=7, airport_layout_resolution=12, track_resolutions=<factory>, airspace_resolution=7)[source]¶
Bases:
objectH3 hexagonal indexing configuration.
- Parameters:
- class opdi.config.IngestionConfig(minio_endpoint='https://s3.opensky-network.org', osn_aircraft_db_url='https://s3.opensky-network.org/data-samples/metadata/aircraft-database-complete-2024-10.csv', ourairports_base_url='https://ourairports.com/data/', ourairports_datasets=<factory>, batch_size=250, track_gap_threshold_minutes=30, track_gap_low_altitude_minutes=15, track_gap_low_altitude_meters=1524.0, max_vertical_rate_mps=25.4, altitude_smoothing_window_minutes=5)[source]¶
Bases:
objectData ingestion configuration.
- Parameters:
- osn_aircraft_db_url: str = 'https://s3.opensky-network.org/data-samples/metadata/aircraft-database-complete-2024-10.csv'¶
URL for OpenSky Network aircraft database.
- track_gap_low_altitude_minutes: int = 15¶
Time gap threshold at low altitude for splitting tracks (minutes).
- class opdi.config.OPDIConfig(project=<factory>, spark=<factory>, h3=<factory>, ingestion=<factory>)[source]¶
Bases:
objectMain OPDI configuration container.
- Parameters:
project (ProjectConfig)
spark (SparkConfig)
h3 (H3Config)
ingestion (IngestionConfig)
- project: ProjectConfig¶
- spark: SparkConfig¶
- ingestion: IngestionConfig¶
- classmethod for_environment(env='dev')[source]¶
Create configuration for specific environment.
- Parameters:
env (str) – Environment name (“dev”, “live”, or “local”)
- Returns:
OPDIConfig instance with environment-specific settings
- Return type: