openNEM facility data
Module for downloading and parsing openNEM facility data
This is a simple set of functions for downloading and parsing station and duid meta data from openNEM.
Essentially works as follows:
- gets the master list of stations from openNEM
- iteratively downloads and saves the json data for each of the stations within this list (about 400)
- parses the downloaded data into a flat dataframe
The json data is stored locally, to prevent having to re-download the the every station each time you might want to adapt the parser and/or change the data you want to record.
The json is validated with pydantic (to deal with missing fields, and other irreularities in the openNEM json). There is probably a smarter way to flatten the validated data to pandas than what I have now, but it does the job.
Note there are two stations (commented out in the code) that are missing or have another issue.
Requirements
Written using Python 3.11. Uses pandas
, requests
, simplejson
and pydantic
(for json data validation).
Usage
Before using the module, there is global variable (LOCALDIR
) that needs to be set to specifify where the station json data is stored.
To download all the station json:
import opennem_facilities
opennem_facilities.download_all_stations()
Top parse the station data:
import opennem_facilities
= opennem_facilities.parse_station_data() df
This should return a dataframe as follows (where the code
here is DUID)
network_region | code | fueltech | capacity_registered | lat | lon | station_name | station_code | |
---|---|---|---|---|---|---|---|---|
0 | NSW1 | APPIN | gas_wcmg | 55 | -34.2109 | 150.793 | Appin | APPIN |
1 | NSW1 | AVLSF1 | solar_utility | 245 | -34.9191 | 146.61 | Avonlie | AVLSF |
2 | NSW1 | AWABAREF | bioenergy_biogas | 1 | -33.0233 | 151.551 | Awaba | AWABAREF |
3 | NSW1 | BANGOWF2 | wind | 84.8 | -34.7672 | 148.921 | Bango | BANGOWF |
… | … | … | … | … | … | … | … | … |
Extending / adapting
To parse additional details / metadata - you would have to adapt the Station
pydantic model (i.e. add the fields you want to parse), and also adapt the function to flatten the data to pandas as appropriate.
code
The code can be downloaded from here: ’opennem_facilities.py, and is shown below as well:
# Basic python script to download and restructure DUID and station data
# from the openNEM facilities dataset
#
# Copyright (C) 2023 Dylan McConnell
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
import os
from typing import List, Optional
import pandas as pd
import requests
import simplejson
from pydantic import BaseModel
= "https://data.opennem.org.au/v3/geo/au_facilities.json"
GEOJSON = "/path/to/local/dir/"
LOCALDIR = "https://api.opennem.org.au/station/au/NEM/{}"
STATION_URL
def get_master():
"""
Download master geojson file from openNEM, returning JSON
"""
= requests.get(GEOJSON)
response return simplejson.loads(response.content)
def get_station(station_code: str = "LIDDELL"):
"""
Download and store station json from openNEM
"""
= requests.get(STATION_URL.format(station_code))
response = simplejson.loads(response.content)
json
= station_filename(json["code"])
filename with open(os.path.join(LOCALDIR, filename), "w") as f:
=2)
simplejson.dump(json, f, indent
def station_filename(code: str):
"""
Simple function to replace problematic characters in station codes
and return a filename
"""
= code.replace("/", "_")
clean_code return f"{clean_code}.json"
def load_station(station_code: str):
"""
Load station json from local directory
"""
= station_filename(station_code)
filename with open(os.path.join(LOCALDIR, filename), "r") as f:
return simplejson.load(f)
def station_generator(master_json):
"""
Generator that yields the station code for every station in the NEM
"""
for station in master_json["features"]:
if station["properties"]["network"] == "NEM":
yield station["properties"]["station_code"]
def download_all_stations():
"""
Downloads all the station json data from the master list.
"""
= get_master()
master_json for station_code in station_generator(master_json):
if station_code != "SLDCBLK":
try:
load_station(station_code)except FileNotFoundError:
print("downloading ", station_code)
get_station(station_code)
"""
Some pydantic models for validating openNEM data
"""
class DispatchUnit(BaseModel):
str
network_region: str
code: str
fueltech: float] = None
capacity_registered: Optional[
class Location(BaseModel):
float] = None
lat: Optional[float] = None
lng: Optional[
class Station(BaseModel):
str
name: str
code:
location: Location
facilities: List[DispatchUnit]
def parse_station_data():
"""
Parses all station data from the master list.
Assumes all station json already downloaded.
"""
= get_master()
master_json = []
data
for station_code in station_generator(master_json):
if station_code not in ["MWPS", "SLDCBLK"]:
= load_station(station_code)
station_json = Station(**station_json)
valid_station
data.append(flatten_station(valid_station))
return pd.concat(data).reset_index(drop=True)
def flatten_station(valid_station: Station):
"""
Simple function to convert a validated station to pandas dataframe
(probably could be done neater / cleaner with pd.normalize_json)
"""
= []
d = valid_station.dict()
station_dict for du in valid_station.facilities:
= du.dict()
data "lat"] = station_dict["location"]["lat"]
data["lon"] = station_dict["location"]["lng"]
data["station_name"] = station_dict["name"]
data["station_code"] = station_dict["code"]
data[
d.append(data)
return pd.DataFrame(d)