Pipeline Overview

The OPDI pipeline transforms raw ADS-B surveillance data into structured aviation performance datasets through a sequence of numbered stages. Each stage corresponds to one or more modules in the opdi package.

Stage 00 – Reference Data & Ingestion Setup

Before processing flight data, several reference datasets must be prepared.

Airport detection zones (opdi.reference.h3_airport_zones)

Generate H3 hexagonal zones around airport reference points at resolution 7 (~5.2 km per hexagon). These zones are later used to detect departures and arrivals.

Airport ground layouts (opdi.reference.h3_airport_layouts)

Build H3 representations of runways, taxiways, and aprons at resolution 12 (~307 m) using OpenStreetMap data retrieved via osmnx.

Airspace boundaries (opdi.reference.h3_airspaces)

Encode FIR / UIR / TMA airspace polygons into H3 hexagons for efficient spatial joins.

OurAirports (opdi.ingestion.ourairports)

Download and ingest airport metadata (ICAO codes, coordinates, runway info) from the OurAirports open dataset.

Aircraft database (opdi.ingestion.osn_aircraft_db)

Download the OpenSky Network aircraft database containing registration, model, and operator information keyed by ICAO 24-bit transponder address.

Stage 01 – State Vector Ingestion

opdi.ingestion.osn_statevectors

Raw ADS-B state vectors are downloaded from the OpenSky Network MinIO server in batched Parquet files. The ingestion module:

  • Authenticates via MinIO using OSN_USERNAME / OSN_KEY.

  • Downloads files in configurable batches (default 250).

  • Normalises column names from camelCase to snake_case.

  • Writes to an Iceberg table partitioned by date.

  • Tracks progress so that re-runs skip already-processed files.

Stage 02 – Track Creation

opdi.pipeline.tracks

State vectors are grouped into flight tracks by:

  1. Sorting by (icao24, event_time).

  2. Splitting where the time gap exceeds a configurable threshold (default 30 minutes).

  3. Assigning a deterministic track_id via SHA-256 hashing.

  4. Encoding each position into H3 hexagons at resolutions 7 and 12.

  5. Computing segment and cumulative distances using the Haversine formula.

  6. Cleaning altitude outliers using a configurable maximum vertical rate.

Stage 03 – Flight List Generation

opdi.pipeline.flights

Tracks are classified into flights:

  • Departures – the first track point falls inside an airport H3 zone.

  • Arrivals – the last track point falls inside an airport H3 zone.

  • Overflights – no match with any airport zone.

Each flight record is enriched with aircraft metadata from the aircraft database (registration, model, operator) and written to the OPDI flight list table.

Stage 04 – Flight Event Detection

opdi.pipeline.events

Detailed events are extracted from each track:

Horizontal segment events

Flight phase classification (ground, climb, descent, cruise, level) using vertical-rate thresholds. Key milestones such as top-of-climb and top-of-descent are identified.

Vertical crossing events

Detection of flight-level crossings at FL50, FL70, FL100, and FL245 – important thresholds for performance analysis.

Airport surface events

Entry and exit of runway, taxiway, and apron areas detected by matching H3 positions against the airport layout reference.

Measurement records

Distance flown and time elapsed are computed between consecutive events.

Stage 05 – Data Extraction

opdi.output.parquet_exporter

The flight list, events, and measurements tables are exported to Parquet files with configurable time intervals (typically 10-day windows). This stage produces the files that are published as the OPDI open dataset.

Stage 06 – Cleanup

opdi.output.csv_exporter

Exported Parquet files are deduplicated and converted to compressed CSV for distribution. The clean_and_save_data() function removes duplicate rows and saves the result as gzip-compressed CSV.

Stage 07 – Basic Statistics

opdi.monitoring.basic_stats

Row counts and basic summary statistics are collected across all OPDI tables to monitor pipeline health and data completeness.

Stage 08 – Advanced Statistics

opdi.monitoring.advanced_stats

In-depth data quality analysis:

  • Daily row-count trend analysis

  • Anomaly detection (suspiciously low counts)

  • Known outage tracking

  • MinIO bucket availability monitoring

  • Interactive Plotly visualisations for exploration

Data Flow Diagram

OpenSky Network              OurAirports         OpenStreetMap
     |                           |                     |
     v                           v                     v
[01] State Vectors          [00] Airports         [00] Layouts
     |                           |                     |
     v                           v                     v
[02] Tracks  <--------  [00] H3 Airport Zones / Airspaces
     |
     v
[03] Flight List  <----  Aircraft Database
     |
     v
[04] Flight Events
     |
     v
[05] Parquet Export  -->  [06] CSV Export
     |
     v
[07] Basic Stats  -->  [08] Advanced Stats

See Also