Reference Data – opdi.reference¶
Reference data generators for OPDI pipeline.
Provides H3 hexagonal encoding for airport detection zones, airport ground layouts, and airspace boundaries.
- class opdi.reference.AirportDetectionZoneGenerator(spark, config, resolution=None, num_points=720, radii_nm=None)[source]¶
Bases:
objectGenerates H3 hexagonal detection zones around airports.
Creates concentric rings around each airport at configurable radii, encoded as H3 hexagons. These zones are used by the flight list pipeline to detect departures and arrivals.
The default configuration creates 6 concentric rings at 0-5, 5-10, 10-20, 20-30, and 30-40 NM from the airport reference point.
- Parameters:
spark (pyspark.sql.SparkSession) – Active SparkSession.
config (OPDIConfig) – OPDI configuration object.
resolution (int | None) – H3 resolution for hexagons (default: from config).
num_points (int) – Number of points for circle polygon approximation.
radii_nm (List[float] | None) – List of ring boundary radii in nautical miles.
Example
>>> generator = AirportDetectionZoneGenerator(spark, config) >>> df = generator.generate() >>> generator.save_to_parquet("data/airport_hex/zones_res7.parquet")
- BBOX_OFFSET = 3¶
- LAT_MIN = 26.74617¶
- LAT_MAX = 70.25976¶
- LON_MIN = -25.86653¶
- LON_MAX = 49.65699¶
- AIRPORT_SCHEMA = [pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField]¶
- generate(airports_url='https://davidmegginson.github.io/ourairports-data/airports.csv')[source]¶
Generate H3 detection zones for all airports in the European bounding box.
Each airport-ring combination produces a set of H3 hex IDs formed by subtracting the inner circle hexagons from the outer circle hexagons, creating a ring-shaped detection zone.
- Parameters:
airports_url (str) – URL to OurAirports airports CSV.
- Returns:
Pandas DataFrame with airport detection zone hex IDs. Columns include all airport metadata plus area_type and hex_id.
- Return type:
- save_to_parquet(output_path)[source]¶
Save generated detection zones to a parquet file.
- Parameters:
output_path (str) – Path to output parquet file.
- Raises:
RuntimeError – If generate() has not been called first.
- Return type:
None
- prepare_for_flight_list(max_radius_nm=30.0, airport_types=None)[source]¶
Prepare detection zone data for use by the flight list pipeline.
Filters zones to the specified maximum radius and airport types, explodes hex arrays, adds H3 coordinates, and computes distance from airport center.
- Parameters:
- Returns:
Pandas DataFrame ready for use in flight list generation. Columns: apt_ident, apt_hex_id, distance_from_center, apt_latitude_deg, apt_longitude_deg.
- Raises:
RuntimeError – If generate() has not been called first.
- Return type:
- class opdi.reference.AirportLayoutGenerator(spark, config, resolution=None, log_dir='OPDI_live/logs')[source]¶
Bases:
objectGenerates H3 hexagonal representations of airport ground infrastructure.
Retrieves aeroway features (runways, taxiways, aprons, hangars) from OpenStreetMap for each airport, converts them to H3 hexagons at resolution 12, and writes the results to an Iceberg table.
Processing is resumable: airports that have already been processed are tracked in a progress log and skipped on subsequent runs.
- Parameters:
spark (pyspark.sql.SparkSession) – Active SparkSession.
config (OPDIConfig) – OPDI configuration object.
resolution (int | None) – H3 resolution (default: from config, typically 12).
log_dir (str) – Directory for progress tracking files.
Example
>>> generator = AirportLayoutGenerator(spark, config) >>> generator.process_all()
- BBOX_OFFSET = 3¶
- LAT_MIN = 26.74617¶
- LAT_MAX = 70.25976¶
- LON_MIN = -25.86653¶
- LON_MAX = 49.65699¶
- fetch_airport_list(airports_url='https://davidmegginson.github.io/ourairports-data/airports.csv', airport_types=None)[source]¶
Fetch and filter the list of airports to process.
- process_airport(apt_icao)[source]¶
Process a single airport and write results to Iceberg.
- Parameters:
apt_icao (str) – ICAO airport code.
- Returns:
DataFrame with H3 layout data, or None on failure.
- Return type:
pandas.DataFrame | None
- class opdi.reference.AirspaceH3Generator(spark, config, compact=False)[source]¶
Bases:
objectConverts airspace polygon definitions to H3 hexagonal grids.
Processes three types of airspaces: - ANSP (Air Navigation Service Provider) boundaries - FIR (Flight Information Region) boundaries - Country boundaries
Each airspace polygon is converted to H3 hexagons at resolution 7 and stored in an Iceberg table with validity dates from AIRAC cycles.
- Parameters:
spark (pyspark.sql.SparkSession) – Active SparkSession.
config (OPDIConfig) – OPDI configuration object.
compact (bool) – Whether to use compact H3 representation.
Example
>>> generator = AirspaceH3Generator(spark, config) >>> generator.process_all()
- opdi.reference.generate_circle_polygon(lon, lat, radius_nautical_miles, num_points=360)[source]¶
Generate a GeoJSON polygon approximating a circle around a point.
Uses the destination-point formula to compute points along a circle at a given radius from a center coordinate.
- Parameters:
- Returns:
GeoJSON Polygon string.
- Return type:
- opdi.reference.hexagonify_airport(apt_icao, resolution=12)[source]¶
Process a single airport: fetch OSM data, create polygons, convert to H3.
- Parameters:
- Returns:
DataFrame with H3 hexagons for the airport’s infrastructure. Columns: hexaero_apt_icao, hexaero_h3_id, hexaero_latitude, hexaero_longitude, hexaero_res, hexaero_aeroway, hexaero_length, hexaero_ref, hexaero_surface, hexaero_width, hexaero_osm_id, hexaero_type.
- Return type:
- opdi.reference.retrieve_osm_data(icao_code)[source]¶
Retrieve aeroway features from OpenStreetMap for an airport.
- Parameters:
icao_code (str) – ICAO airport code (e.g., ‘EBBR’).
- Returns:
GeoDataFrame of aeroway features.
- Raises:
ValueError – If no data is returned for the airport.
- Return type:
gpd.GeoDataFrame
- opdi.reference.fill_geometry(geometry_wkt, res=7)[source]¶
Convert a WKT geometry string to H3 hexagon sets (full polyfill).
- opdi.reference.fill_geometry_compact(geometry_wkt, res=7)[source]¶
Convert a WKT geometry string to compact H3 hexagon sets.
H3 Airport Detection Zones – opdi.reference.h3_airport_zones¶
H3 airport detection zone generator.
Creates concentric hexagonal rings around airports for flight detection. Each airport gets rings at configurable radii (default: 0-40 NM) encoded as H3 hexagons at resolution 7.
Ported from: OPDI-live/python/v2.0.0/00_create_h3_airport_detection_areas.py
- opdi.reference.h3_airport_zones.generate_circle_polygon(lon, lat, radius_nautical_miles, num_points=360)[source]¶
Generate a GeoJSON polygon approximating a circle around a point.
Uses the destination-point formula to compute points along a circle at a given radius from a center coordinate.
- Parameters:
- Returns:
GeoJSON Polygon string.
- Return type:
- class opdi.reference.h3_airport_zones.AirportDetectionZoneGenerator(spark, config, resolution=None, num_points=720, radii_nm=None)[source]¶
Bases:
objectGenerates H3 hexagonal detection zones around airports.
Creates concentric rings around each airport at configurable radii, encoded as H3 hexagons. These zones are used by the flight list pipeline to detect departures and arrivals.
The default configuration creates 6 concentric rings at 0-5, 5-10, 10-20, 20-30, and 30-40 NM from the airport reference point.
- Parameters:
spark (pyspark.sql.SparkSession) – Active SparkSession.
config (OPDIConfig) – OPDI configuration object.
resolution (int | None) – H3 resolution for hexagons (default: from config).
num_points (int) – Number of points for circle polygon approximation.
radii_nm (List[float] | None) – List of ring boundary radii in nautical miles.
Example
>>> generator = AirportDetectionZoneGenerator(spark, config) >>> df = generator.generate() >>> generator.save_to_parquet("data/airport_hex/zones_res7.parquet")
- BBOX_OFFSET = 3¶
- LAT_MIN = 26.74617¶
- LAT_MAX = 70.25976¶
- LON_MIN = -25.86653¶
- LON_MAX = 49.65699¶
- AIRPORT_SCHEMA = [pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField, pyspark.sql.types.StructField]¶
- __init__(spark, config, resolution=None, num_points=720, radii_nm=None)[source]¶
- Parameters:
spark (pyspark.sql.SparkSession)
config (OPDIConfig)
resolution (int | None)
num_points (int)
- generate(airports_url='https://davidmegginson.github.io/ourairports-data/airports.csv')[source]¶
Generate H3 detection zones for all airports in the European bounding box.
Each airport-ring combination produces a set of H3 hex IDs formed by subtracting the inner circle hexagons from the outer circle hexagons, creating a ring-shaped detection zone.
- Parameters:
airports_url (str) – URL to OurAirports airports CSV.
- Returns:
Pandas DataFrame with airport detection zone hex IDs. Columns include all airport metadata plus area_type and hex_id.
- Return type:
- save_to_parquet(output_path)[source]¶
Save generated detection zones to a parquet file.
- Parameters:
output_path (str) – Path to output parquet file.
- Raises:
RuntimeError – If generate() has not been called first.
- Return type:
None
- prepare_for_flight_list(max_radius_nm=30.0, airport_types=None)[source]¶
Prepare detection zone data for use by the flight list pipeline.
Filters zones to the specified maximum radius and airport types, explodes hex arrays, adds H3 coordinates, and computes distance from airport center.
- Parameters:
- Returns:
Pandas DataFrame ready for use in flight list generation. Columns: apt_ident, apt_hex_id, distance_from_center, apt_latitude_deg, apt_longitude_deg.
- Raises:
RuntimeError – If generate() has not been called first.
- Return type:
H3 Airport Layouts – opdi.reference.h3_airport_layouts¶
H3 airport ground layout generator.
Retrieves airport infrastructure data (runways, taxiways, aprons, hangars) from OpenStreetMap via osmnx, converts geometries to H3 hexagons at resolution 12, and stores the results for use in airport event detection.
Ported from: OPDI-live/python/v2.0.0/00_create_h3_airport_layouts.py
- opdi.reference.h3_airport_layouts.retrieve_osm_data(icao_code)[source]¶
Retrieve aeroway features from OpenStreetMap for an airport.
- Parameters:
icao_code (str) – ICAO airport code (e.g., ‘EBBR’).
- Returns:
GeoDataFrame of aeroway features.
- Raises:
ValueError – If no data is returned for the airport.
- Return type:
gpd.GeoDataFrame
- opdi.reference.h3_airport_layouts.buffer_geometry(line, width_m, always_xy=True)[source]¶
Buffer a LineString geometry to create a Polygon with a specified width.
Projects to EPSG:3395 (meters), applies the buffer, then projects back to EPSG:4326 (degrees).
- opdi.reference.h3_airport_layouts.is_number(s)[source]¶
Check if a value is a valid number.
- Return type:
- opdi.reference.h3_airport_layouts.safe_convert_to_float(s)[source]¶
Safely convert a string to float, stripping non-numeric characters.
- Parameters:
s – Value to convert.
- Returns:
Float value or None if conversion fails.
- Return type:
float | None
- opdi.reference.h3_airport_layouts.fill_missing_width(aeroway, width_m)[source]¶
Fill missing width values based on aeroway type defaults.
- opdi.reference.h3_airport_layouts.convert_to_polygon(geometry, geom_type, width_m)[source]¶
Convert a geometry to a list of Polygons, buffering if necessary.
- opdi.reference.h3_airport_layouts.polygon_to_h3(poly, resolution)[source]¶
Convert a Polygon to a set of H3 hexagon indices.
- opdi.reference.h3_airport_layouts.clean_str(s)[source]¶
Clean a string value, converting feet/miles to meters.
- Parameters:
s – Input string potentially containing unit suffixes.
- Returns:
Cleaned string with values converted to meters.
- Return type:
- opdi.reference.h3_airport_layouts.hexagonify_airport(apt_icao, resolution=12)[source]¶
Process a single airport: fetch OSM data, create polygons, convert to H3.
- Parameters:
- Returns:
DataFrame with H3 hexagons for the airport’s infrastructure. Columns: hexaero_apt_icao, hexaero_h3_id, hexaero_latitude, hexaero_longitude, hexaero_res, hexaero_aeroway, hexaero_length, hexaero_ref, hexaero_surface, hexaero_width, hexaero_osm_id, hexaero_type.
- Return type:
- class opdi.reference.h3_airport_layouts.AirportLayoutGenerator(spark, config, resolution=None, log_dir='OPDI_live/logs')[source]¶
Bases:
objectGenerates H3 hexagonal representations of airport ground infrastructure.
Retrieves aeroway features (runways, taxiways, aprons, hangars) from OpenStreetMap for each airport, converts them to H3 hexagons at resolution 12, and writes the results to an Iceberg table.
Processing is resumable: airports that have already been processed are tracked in a progress log and skipped on subsequent runs.
- Parameters:
spark (pyspark.sql.SparkSession) – Active SparkSession.
config (OPDIConfig) – OPDI configuration object.
resolution (int | None) – H3 resolution (default: from config, typically 12).
log_dir (str) – Directory for progress tracking files.
Example
>>> generator = AirportLayoutGenerator(spark, config) >>> generator.process_all()
- BBOX_OFFSET = 3¶
- LAT_MIN = 26.74617¶
- LAT_MAX = 70.25976¶
- LON_MIN = -25.86653¶
- LON_MAX = 49.65699¶
- __init__(spark, config, resolution=None, log_dir='OPDI_live/logs')[source]¶
- Parameters:
spark (pyspark.sql.SparkSession)
config (OPDIConfig)
resolution (int | None)
log_dir (str)
- fetch_airport_list(airports_url='https://davidmegginson.github.io/ourairports-data/airports.csv', airport_types=None)[source]¶
Fetch and filter the list of airports to process.
- process_airport(apt_icao)[source]¶
Process a single airport and write results to Iceberg.
- Parameters:
apt_icao (str) – ICAO airport code.
- Returns:
DataFrame with H3 layout data, or None on failure.
- Return type:
pandas.DataFrame | None
H3 Airspaces – opdi.reference.h3_airspaces¶
H3 airspace encoding module.
Converts airspace definitions (ANSPs, FIRs, country boundaries) from PRU Atlas into H3 hexagonal grids at resolution 7. Supports both compact and full polyfill modes.
Ported from: OPDI-live/python/v2.0.0/00_create_h3_airspaces.py
- opdi.reference.h3_airspaces.fill_geometry(geometry_wkt, res=7)[source]¶
Convert a WKT geometry string to H3 hexagon sets (full polyfill).
- opdi.reference.h3_airspaces.fill_geometry_compact(geometry_wkt, res=7)[source]¶
Convert a WKT geometry string to compact H3 hexagon sets.
- class opdi.reference.h3_airspaces.AirspaceH3Generator(spark, config, compact=False)[source]¶
Bases:
objectConverts airspace polygon definitions to H3 hexagonal grids.
Processes three types of airspaces: - ANSP (Air Navigation Service Provider) boundaries - FIR (Flight Information Region) boundaries - Country boundaries
Each airspace polygon is converted to H3 hexagons at resolution 7 and stored in an Iceberg table with validity dates from AIRAC cycles.
- Parameters:
spark (pyspark.sql.SparkSession) – Active SparkSession.
config (OPDIConfig) – OPDI configuration object.
compact (bool) – Whether to use compact H3 representation.
Example
>>> generator = AirspaceH3Generator(spark, config) >>> generator.process_all()
- __init__(spark, config, compact=False)[source]¶
- Parameters:
spark (pyspark.sql.SparkSession)
config (OPDIConfig)
compact (bool)