Reference Data – opdi.reference

Reference data generators for OPDI pipeline.

Provides H3 hexagonal encoding for airport detection zones, airport ground layouts, and airspace boundaries.

class opdi.reference.AirportDetectionZoneGenerator(spark, config, resolution=None, num_points=720, radii_nm=None)[source]

Bases: object

Generates H3 hexagonal detection zones around airports.

Creates concentric rings around each airport at configurable radii, encoded as H3 hexagons. These zones are used by the flight list pipeline to detect departures and arrivals.

The default configuration creates 6 concentric rings at 0-5, 5-10, 10-20, 20-30, and 30-40 NM from the airport reference point.

Parameters:
  • spark (pyspark.sql.SparkSession) – Active SparkSession.

  • config (OPDIConfig) – OPDI configuration object.

  • resolution (int | None) – H3 resolution for hexagons (default: from config).

  • num_points (int) – Number of points for circle polygon approximation.

  • radii_nm (List[float] | None) – List of ring boundary radii in nautical miles.

Example

>>> generator = AirportDetectionZoneGenerator(spark, config)
>>> df = generator.generate()
>>> generator.save_to_parquet("data/airport_hex/zones_res7.parquet")
BBOX_OFFSET = 3
LAT_MIN = 26.74617
LAT_MAX = 70.25976
LON_MIN = -25.86653
LON_MAX = 49.65699
AIRPORT_SCHEMA = [pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField]
generate(airports_url='https://davidmegginson.github.io/ourairports-data/airports.csv')[source]

Generate H3 detection zones for all airports in the European bounding box.

Each airport-ring combination produces a set of H3 hex IDs formed by subtracting the inner circle hexagons from the outer circle hexagons, creating a ring-shaped detection zone.

Parameters:

airports_url (str) – URL to OurAirports airports CSV.

Returns:

Pandas DataFrame with airport detection zone hex IDs. Columns include all airport metadata plus area_type and hex_id.

Return type:

pandas.DataFrame

save_to_parquet(output_path)[source]

Save generated detection zones to a parquet file.

Parameters:

output_path (str) – Path to output parquet file.

Raises:

RuntimeError – If generate() has not been called first.

Return type:

None

prepare_for_flight_list(max_radius_nm=30.0, airport_types=None)[source]

Prepare detection zone data for use by the flight list pipeline.

Filters zones to the specified maximum radius and airport types, explodes hex arrays, adds H3 coordinates, and computes distance from airport center.

Parameters:
  • max_radius_nm (float) – Maximum detection radius in nautical miles.

  • airport_types (List[str] | None) – List of airport types to include (default: large, medium, small).

Returns:

Pandas DataFrame ready for use in flight list generation. Columns: apt_ident, apt_hex_id, distance_from_center, apt_latitude_deg, apt_longitude_deg.

Raises:

RuntimeError – If generate() has not been called first.

Return type:

pandas.DataFrame

class opdi.reference.AirportLayoutGenerator(spark, config, resolution=None, log_dir='OPDI_live/logs')[source]

Bases: object

Generates H3 hexagonal representations of airport ground infrastructure.

Retrieves aeroway features (runways, taxiways, aprons, hangars) from OpenStreetMap for each airport, converts them to H3 hexagons at resolution 12, and writes the results to an Iceberg table.

Processing is resumable: airports that have already been processed are tracked in a progress log and skipped on subsequent runs.

Parameters:
  • spark (pyspark.sql.SparkSession) – Active SparkSession.

  • config (OPDIConfig) – OPDI configuration object.

  • resolution (int | None) – H3 resolution (default: from config, typically 12).

  • log_dir (str) – Directory for progress tracking files.

Example

>>> generator = AirportLayoutGenerator(spark, config)
>>> generator.process_all()
BBOX_OFFSET = 3
LAT_MIN = 26.74617
LAT_MAX = 70.25976
LON_MIN = -25.86653
LON_MAX = 49.65699
fetch_airport_list(airports_url='https://davidmegginson.github.io/ourairports-data/airports.csv', airport_types=None)[source]

Fetch and filter the list of airports to process.

Parameters:
  • airports_url (str) – URL to OurAirports airports CSV.

  • airport_types (List[str] | None) – Airport types to include (default: large_airport, medium_airport).

Returns:

ident, latitude_deg, longitude_deg, elevation_ft, type.

Return type:

DataFrame with columns

process_airport(apt_icao)[source]

Process a single airport and write results to Iceberg.

Parameters:

apt_icao (str) – ICAO airport code.

Returns:

DataFrame with H3 layout data, or None on failure.

Return type:

pandas.DataFrame | None

process_all(airports_url='https://davidmegginson.github.io/ourairports-data/airports.csv', airport_types=None, troublesome_airports=None)[source]

Process all airports, skipping already-processed ones.

Parameters:
  • airports_url (str) – URL to OurAirports airports CSV.

  • airport_types (List[str] | None) – Airport types to include.

  • troublesome_airports (List[str] | None) – ICAO codes to skip (known problematic).

Returns:

Tuple of (successful_airports, failed_airports) lists.

Return type:

Tuple[List[str], List[str]]

create_table_if_not_exists()[source]

Create the hexaero_airport_layouts Iceberg table if it doesn’t exist.

Return type:

None

class opdi.reference.AirspaceH3Generator(spark, config, compact=False)[source]

Bases: object

Converts airspace polygon definitions to H3 hexagonal grids.

Processes three types of airspaces: - ANSP (Air Navigation Service Provider) boundaries - FIR (Flight Information Region) boundaries - Country boundaries

Each airspace polygon is converted to H3 hexagons at resolution 7 and stored in an Iceberg table with validity dates from AIRAC cycles.

Parameters:

Example

>>> generator = AirspaceH3Generator(spark, config)
>>> generator.process_all()
process_ansp(urls=None)[source]

Process ANSP airspace boundaries.

Parameters:

urls (List[str] | None) – List of parquet URLs for ANSP data. Defaults to PRU Atlas ANSP datasets.

Return type:

None

process_fir(urls=None)[source]

Process FIR (Flight Information Region) boundaries.

Parameters:

urls (List[str] | None) – List of parquet URLs for FIR data. Defaults to PRU Atlas FIR datasets.

Return type:

None

process_countries(urls=None)[source]

Process country boundary airspaces.

Parameters:

urls (List[str] | None) – List of parquet URLs for country boundary data. Defaults to PRU Atlas countries dataset.

Return type:

None

process_all(ansp_urls=None, fir_urls=None, country_urls=None)[source]

Process all airspace types (ANSP, FIR, countries).

Parameters:
  • ansp_urls (List[str] | None) – URLs for ANSP data.

  • fir_urls (List[str] | None) – URLs for FIR data.

  • country_urls (List[str] | None) – URLs for country boundary data.

Return type:

None

create_table_if_not_exists()[source]

Create the opdi_h3_airspace_ref Iceberg table if it doesn’t exist.

Return type:

None

opdi.reference.generate_circle_polygon(lon, lat, radius_nautical_miles, num_points=360)[source]

Generate a GeoJSON polygon approximating a circle around a point.

Uses the destination-point formula to compute points along a circle at a given radius from a center coordinate.

Parameters:
  • lon (float) – Center longitude in decimal degrees.

  • lat (float) – Center latitude in decimal degrees.

  • radius_nautical_miles (float) – Circle radius in nautical miles.

  • num_points (int) – Number of polygon vertices (higher = smoother circle).

Returns:

GeoJSON Polygon string.

Return type:

str

opdi.reference.hexagonify_airport(apt_icao, resolution=12)[source]

Process a single airport: fetch OSM data, create polygons, convert to H3.

Parameters:
  • apt_icao (str) – ICAO airport code.

  • resolution (int) – H3 resolution (default: 12 for ~307m hexagons).

Returns:

DataFrame with H3 hexagons for the airport’s infrastructure. Columns: hexaero_apt_icao, hexaero_h3_id, hexaero_latitude, hexaero_longitude, hexaero_res, hexaero_aeroway, hexaero_length, hexaero_ref, hexaero_surface, hexaero_width, hexaero_osm_id, hexaero_type.

Return type:

pandas.DataFrame

opdi.reference.retrieve_osm_data(icao_code)[source]

Retrieve aeroway features from OpenStreetMap for an airport.

Parameters:

icao_code (str) – ICAO airport code (e.g., ‘EBBR’).

Returns:

GeoDataFrame of aeroway features.

Raises:

ValueError – If no data is returned for the airport.

Return type:

gpd.GeoDataFrame

opdi.reference.fill_geometry(geometry_wkt, res=7)[source]

Convert a WKT geometry string to H3 hexagon sets (full polyfill).

Parameters:
  • geometry_wkt (str) – WKT representation of the geometry (may be MultiPolygon).

  • res (int) – H3 resolution.

Returns:

List of sets of H3 indices, one per sub-geometry.

Return type:

list

opdi.reference.fill_geometry_compact(geometry_wkt, res=7)[source]

Convert a WKT geometry string to compact H3 hexagon sets.

Parameters:
  • geometry_wkt (str) – WKT representation of the geometry.

  • res (int) – H3 resolution.

Returns:

List of compacted sets of H3 indices.

Return type:

list

H3 Airport Detection Zones – opdi.reference.h3_airport_zones

H3 airport detection zone generator.

Creates concentric hexagonal rings around airports for flight detection. Each airport gets rings at configurable radii (default: 0-40 NM) encoded as H3 hexagons at resolution 7.

Ported from: OPDI-live/python/v2.0.0/00_create_h3_airport_detection_areas.py

opdi.reference.h3_airport_zones.generate_circle_polygon(lon, lat, radius_nautical_miles, num_points=360)[source]

Generate a GeoJSON polygon approximating a circle around a point.

Uses the destination-point formula to compute points along a circle at a given radius from a center coordinate.

Parameters:
  • lon (float) – Center longitude in decimal degrees.

  • lat (float) – Center latitude in decimal degrees.

  • radius_nautical_miles (float) – Circle radius in nautical miles.

  • num_points (int) – Number of polygon vertices (higher = smoother circle).

Returns:

GeoJSON Polygon string.

Return type:

str

class opdi.reference.h3_airport_zones.AirportDetectionZoneGenerator(spark, config, resolution=None, num_points=720, radii_nm=None)[source]

Bases: object

Generates H3 hexagonal detection zones around airports.

Creates concentric rings around each airport at configurable radii, encoded as H3 hexagons. These zones are used by the flight list pipeline to detect departures and arrivals.

The default configuration creates 6 concentric rings at 0-5, 5-10, 10-20, 20-30, and 30-40 NM from the airport reference point.

Parameters:
  • spark (pyspark.sql.SparkSession) – Active SparkSession.

  • config (OPDIConfig) – OPDI configuration object.

  • resolution (int | None) – H3 resolution for hexagons (default: from config).

  • num_points (int) – Number of points for circle polygon approximation.

  • radii_nm (List[float] | None) – List of ring boundary radii in nautical miles.

Example

>>> generator = AirportDetectionZoneGenerator(spark, config)
>>> df = generator.generate()
>>> generator.save_to_parquet("data/airport_hex/zones_res7.parquet")
BBOX_OFFSET = 3
LAT_MIN = 26.74617
LAT_MAX = 70.25976
LON_MIN = -25.86653
LON_MAX = 49.65699
AIRPORT_SCHEMA = [pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField]
__init__(spark, config, resolution=None, num_points=720, radii_nm=None)[source]
Parameters:
generate(airports_url='https://davidmegginson.github.io/ourairports-data/airports.csv')[source]

Generate H3 detection zones for all airports in the European bounding box.

Each airport-ring combination produces a set of H3 hex IDs formed by subtracting the inner circle hexagons from the outer circle hexagons, creating a ring-shaped detection zone.

Parameters:

airports_url (str) – URL to OurAirports airports CSV.

Returns:

Pandas DataFrame with airport detection zone hex IDs. Columns include all airport metadata plus area_type and hex_id.

Return type:

pandas.DataFrame

save_to_parquet(output_path)[source]

Save generated detection zones to a parquet file.

Parameters:

output_path (str) – Path to output parquet file.

Raises:

RuntimeError – If generate() has not been called first.

Return type:

None

prepare_for_flight_list(max_radius_nm=30.0, airport_types=None)[source]

Prepare detection zone data for use by the flight list pipeline.

Filters zones to the specified maximum radius and airport types, explodes hex arrays, adds H3 coordinates, and computes distance from airport center.

Parameters:
  • max_radius_nm (float) – Maximum detection radius in nautical miles.

  • airport_types (List[str] | None) – List of airport types to include (default: large, medium, small).

Returns:

Pandas DataFrame ready for use in flight list generation. Columns: apt_ident, apt_hex_id, distance_from_center, apt_latitude_deg, apt_longitude_deg.

Raises:

RuntimeError – If generate() has not been called first.

Return type:

pandas.DataFrame

H3 Airport Layouts – opdi.reference.h3_airport_layouts

H3 airport ground layout generator.

Retrieves airport infrastructure data (runways, taxiways, aprons, hangars) from OpenStreetMap via osmnx, converts geometries to H3 hexagons at resolution 12, and stores the results for use in airport event detection.

Ported from: OPDI-live/python/v2.0.0/00_create_h3_airport_layouts.py

opdi.reference.h3_airport_layouts.retrieve_osm_data(icao_code)[source]

Retrieve aeroway features from OpenStreetMap for an airport.

Parameters:

icao_code (str) – ICAO airport code (e.g., ‘EBBR’).

Returns:

GeoDataFrame of aeroway features.

Raises:

ValueError – If no data is returned for the airport.

Return type:

gpd.GeoDataFrame

opdi.reference.h3_airport_layouts.buffer_geometry(line, width_m, always_xy=True)[source]

Buffer a LineString geometry to create a Polygon with a specified width.

Projects to EPSG:3395 (meters), applies the buffer, then projects back to EPSG:4326 (degrees).

Parameters:
  • line (shapely.geometry.LineString) – LineString geometry to buffer.

  • width_m (float) – Buffer width in meters.

  • always_xy (bool) – Ensure transformer follows (x, y) coordinate order.

Returns:

Buffered geometry as a Polygon.

Return type:

shapely.geometry.Polygon

opdi.reference.h3_airport_layouts.is_number(s)[source]

Check if a value is a valid number.

Return type:

bool

opdi.reference.h3_airport_layouts.safe_convert_to_float(s)[source]

Safely convert a string to float, stripping non-numeric characters.

Parameters:

s – Value to convert.

Returns:

Float value or None if conversion fails.

Return type:

float | None

opdi.reference.h3_airport_layouts.fill_missing_width(aeroway, width_m)[source]

Fill missing width values based on aeroway type defaults.

Parameters:
  • aeroway (str) – Type of aeroway (runway, taxiway, etc.).

  • width_m – Existing width value (may be None/NaN).

Returns:

Width in meters, using default if original is missing.

Return type:

float | None

opdi.reference.h3_airport_layouts.convert_to_polygon(geometry, geom_type, width_m)[source]

Convert a geometry to a list of Polygons, buffering if necessary.

Parameters:
  • geometry – Input geometry (Polygon, MultiPolygon, LineString, or Point).

  • geom_type (str) – Geometry type string.

  • width_m (float) – Buffer width for non-polygon geometries.

Returns:

List of Polygon geometries.

Return type:

List[shapely.geometry.Polygon]

opdi.reference.h3_airport_layouts.polygon_to_h3(poly, resolution)[source]

Convert a Polygon to a set of H3 hexagon indices.

Parameters:
  • poly (shapely.geometry.Polygon) – Shapely Polygon.

  • resolution (int) – H3 resolution.

Returns:

Set of H3 index strings covering the polygon.

Return type:

Set[str]

opdi.reference.h3_airport_layouts.clean_str(s)[source]

Clean a string value, converting feet/miles to meters.

Parameters:

s – Input string potentially containing unit suffixes.

Returns:

Cleaned string with values converted to meters.

Return type:

str

opdi.reference.h3_airport_layouts.hexagonify_airport(apt_icao, resolution=12)[source]

Process a single airport: fetch OSM data, create polygons, convert to H3.

Parameters:
  • apt_icao (str) – ICAO airport code.

  • resolution (int) – H3 resolution (default: 12 for ~307m hexagons).

Returns:

DataFrame with H3 hexagons for the airport’s infrastructure. Columns: hexaero_apt_icao, hexaero_h3_id, hexaero_latitude, hexaero_longitude, hexaero_res, hexaero_aeroway, hexaero_length, hexaero_ref, hexaero_surface, hexaero_width, hexaero_osm_id, hexaero_type.

Return type:

pandas.DataFrame

class opdi.reference.h3_airport_layouts.AirportLayoutGenerator(spark, config, resolution=None, log_dir='OPDI_live/logs')[source]

Bases: object

Generates H3 hexagonal representations of airport ground infrastructure.

Retrieves aeroway features (runways, taxiways, aprons, hangars) from OpenStreetMap for each airport, converts them to H3 hexagons at resolution 12, and writes the results to an Iceberg table.

Processing is resumable: airports that have already been processed are tracked in a progress log and skipped on subsequent runs.

Parameters:
  • spark (pyspark.sql.SparkSession) – Active SparkSession.

  • config (OPDIConfig) – OPDI configuration object.

  • resolution (int | None) – H3 resolution (default: from config, typically 12).

  • log_dir (str) – Directory for progress tracking files.

Example

>>> generator = AirportLayoutGenerator(spark, config)
>>> generator.process_all()
BBOX_OFFSET = 3
LAT_MIN = 26.74617
LAT_MAX = 70.25976
LON_MIN = -25.86653
LON_MAX = 49.65699
__init__(spark, config, resolution=None, log_dir='OPDI_live/logs')[source]
Parameters:
fetch_airport_list(airports_url='https://davidmegginson.github.io/ourairports-data/airports.csv', airport_types=None)[source]

Fetch and filter the list of airports to process.

Parameters:
  • airports_url (str) – URL to OurAirports airports CSV.

  • airport_types (List[str] | None) – Airport types to include (default: large_airport, medium_airport).

Returns:

ident, latitude_deg, longitude_deg, elevation_ft, type.

Return type:

DataFrame with columns

process_airport(apt_icao)[source]

Process a single airport and write results to Iceberg.

Parameters:

apt_icao (str) – ICAO airport code.

Returns:

DataFrame with H3 layout data, or None on failure.

Return type:

pandas.DataFrame | None

process_all(airports_url='https://davidmegginson.github.io/ourairports-data/airports.csv', airport_types=None, troublesome_airports=None)[source]

Process all airports, skipping already-processed ones.

Parameters:
  • airports_url (str) – URL to OurAirports airports CSV.

  • airport_types (List[str] | None) – Airport types to include.

  • troublesome_airports (List[str] | None) – ICAO codes to skip (known problematic).

Returns:

Tuple of (successful_airports, failed_airports) lists.

Return type:

Tuple[List[str], List[str]]

create_table_if_not_exists()[source]

Create the hexaero_airport_layouts Iceberg table if it doesn’t exist.

Return type:

None

H3 Airspaces – opdi.reference.h3_airspaces

H3 airspace encoding module.

Converts airspace definitions (ANSPs, FIRs, country boundaries) from PRU Atlas into H3 hexagonal grids at resolution 7. Supports both compact and full polyfill modes.

Ported from: OPDI-live/python/v2.0.0/00_create_h3_airspaces.py

opdi.reference.h3_airspaces.fill_geometry(geometry_wkt, res=7)[source]

Convert a WKT geometry string to H3 hexagon sets (full polyfill).

Parameters:
  • geometry_wkt (str) – WKT representation of the geometry (may be MultiPolygon).

  • res (int) – H3 resolution.

Returns:

List of sets of H3 indices, one per sub-geometry.

Return type:

list

opdi.reference.h3_airspaces.fill_geometry_compact(geometry_wkt, res=7)[source]

Convert a WKT geometry string to compact H3 hexagon sets.

Parameters:
  • geometry_wkt (str) – WKT representation of the geometry.

  • res (int) – H3 resolution.

Returns:

List of compacted sets of H3 indices.

Return type:

list

opdi.reference.h3_airspaces.get_coords(h)[source]

Get geographic coordinates for an H3 index.

Parameters:

h (str) – H3 index string.

Returns:

Tuple of (latitude, longitude).

Return type:

Tuple[float, float]

class opdi.reference.h3_airspaces.AirspaceH3Generator(spark, config, compact=False)[source]

Bases: object

Converts airspace polygon definitions to H3 hexagonal grids.

Processes three types of airspaces: - ANSP (Air Navigation Service Provider) boundaries - FIR (Flight Information Region) boundaries - Country boundaries

Each airspace polygon is converted to H3 hexagons at resolution 7 and stored in an Iceberg table with validity dates from AIRAC cycles.

Parameters:

Example

>>> generator = AirspaceH3Generator(spark, config)
>>> generator.process_all()
__init__(spark, config, compact=False)[source]
Parameters:
process_ansp(urls=None)[source]

Process ANSP airspace boundaries.

Parameters:

urls (List[str] | None) – List of parquet URLs for ANSP data. Defaults to PRU Atlas ANSP datasets.

Return type:

None

process_fir(urls=None)[source]

Process FIR (Flight Information Region) boundaries.

Parameters:

urls (List[str] | None) – List of parquet URLs for FIR data. Defaults to PRU Atlas FIR datasets.

Return type:

None

process_countries(urls=None)[source]

Process country boundary airspaces.

Parameters:

urls (List[str] | None) – List of parquet URLs for country boundary data. Defaults to PRU Atlas countries dataset.

Return type:

None

process_all(ansp_urls=None, fir_urls=None, country_urls=None)[source]

Process all airspace types (ANSP, FIR, countries).

Parameters:
  • ansp_urls (List[str] | None) – URLs for ANSP data.

  • fir_urls (List[str] | None) – URLs for FIR data.

  • country_urls (List[str] | None) – URLs for country boundary data.

Return type:

None

create_table_if_not_exists()[source]

Create the opdi_h3_airspace_ref Iceberg table if it doesn’t exist.

Return type:

None