EDP (Energy Data Platform)¶
Database¶
The Energy Data Platform (EDP) is a comprehensive, research-grade dataset of high-resolution (5-minute) energy data from 951 Australian sites, spanning 2018–2025. It integrates:
-
Energy measurements: solar PV generation, household consumption, battery charging/discharging, voltage, and current.
-
Site metadata: system specifications, inverter/subarray details, battery capacity, and connection type.
-
Survey responses: household characteristics, occupancy, appliances, heating/cooling, and energy-related behaviors.
Key Research Applications¶
-
Energy Behavior Analysis
-
Study consumption patterns, appliance usage, and load profiles.
-
Investigate temporal trends, daily routines, and peak demand behavior.
-
-
Distributed Energy Resources (DER) Research
-
Evaluate solar PV generation and battery storage performance.
-
Model and optimize energy systems at household or community scale.
-
-
Socio-Technical Insights
-
Link energy behavior to household demographics, dwelling type, and occupancy.
-
Explore the influence of lifestyle and building characteristics on energy use.
-
Structure & Data¶
It consists of 5-minute energy data for 951 sites, spanning from 2018 to the end of October 2025. The database includes four types of tables, with one set being monthly partitions of all the data.
| Database Table | Columns |
|---|---|
| edp_data_{year}_{month} | edp_site_id, unix_time, datetime, edp_device_and_circuit, circuit_label, edp_circuit_label, real_energy, real_energy_negative, real_energy_positive, current_avg, current_min, current_max, voltage_avg, voltage_min, voltage_max |
The data is provided by Solar Analytics and Wattwatchers and de-identified by UNSW. Solar Analytics site IDs start with “S”, while Wattwatchers site IDs start with “W”.
Due to the large volume of time-series data, energy tables are partitioned by month (except for edp_data_2018). From 2019 onwards, data is stored in monthly partitions such as edp_data_2019_01 for January 2019, with each partition ranging between 1–2 GB (2018 has ~4 GB). Partitions are accessible via the data downloader as regular tables.
Each site has different temporal coverage. For the latest updates on first and last available dates, refer to the tableedp_data_first_and_last_dates
| Database Table | Columns |
|---|---|
| edp_data_first_and_last_dates | edp_site_id, date_of_first_data, date_of_last_data |
| edp_sites_metadata | first_date_metadata_received, last_date_metadata_received, edp_site_id, state, site_time_zone, monitoring_hardware, inverter_manufacturer, inverter_model, subarray_manufacturer, subarray_model, inverter_ac_rating_kw, subarray_orientation, subarray_tilt, subarray_strings, subarray_modules_per_string, subarray_dc_rating_kw, islandable, has_battery, battery_size_make_model, limit_enabled, limit_amount, limit_applied |
| edp_survey_answers | edp_site_id, survey_date, consent_edp, consent_spt, consent_contact, consent_contact_method, over_18, postcode, state, account_holder, property_use, property_year, property_construction, property_construction_other, property_star_rating, num_floors, dwelling_type, dwelling_type_other, num_bedrooms, tenure, tenure_not_occupied, tenure_other, tenure_rented_from, tenure_rented_other, num_occupants, num_children, num_occupants_70plus, has_pets, num_type_pets, has_livestock, num_type_livestock, num_days_occupied, income_weekly, has_gas, has_gas_heating, has_gas_cooking, has_gas_hot_water, has_aircon, aircon_type, num_rooms_aircon, num_rooms_heated, num_refrigerators, has_pool_pump, dryer_usage, has_ev, ev_type, ev_charger_type, significant_loads_other, commercial_loads, business_hours, connection_type, has_controlled_load, islandable, property_power_outage_types, property_power_outage_other, area_power_outage_types, area_power_outage_other, power_outage_latest, hot_water_heat_type, hot_water_heat_type_other, rooms_heat_type, rooms_heat_type_other, has_battery, battery_size_make_model |
Appendix A: Original Data and Preparation¶
EDP consists of 5-minute energy data from Solar Analytics and Wattwatchers. Both provide energy data through their APIs and send us the survey responses and some metadata as excel files.
1. Solar Analytics¶
The first batch of Solar Analytics customers that participate in the EDP project includes 129 sites from all around the country. The first survey for this group was performed in December 2021. The second batch we received data for completed the survey in April 2022 (a few in June 2022) and included 485 sites. For each batch, Solar Analytics has provided us with two excel files, one the survey responses, and one the system details (sites metadata). A few sites from each batch opted out soon after the project commenced, leaving us with a total of 599 sites.
1.1. Survey Responses¶
The survey questions and the answers for all sites were provided in an excel file. Complete questions were the headers of the columns and answers for each site were provided as a row. We have mapped each survey question to a column name. Below is the list of column mapping used for the survey questions. The actual questions and the answer options for each can be found on the Column Mapping spreadsheet.
| # | Field Name | # | Field Name | # | Field Name | ||
|---|---|---|---|---|---|---|---|
| 1 | CONSENT_EDP | 23 | DWELLING_TYPE | 45 | AIRCON_TYPE | ||
| 2 | CONSENT_SOLA | 24 | DWELLING_TYPE_OTHER | 46 | NUM_ROOMS_AIRCON | ||
| 3 | CONSENT_SPT | 25 | NUM_BEDROOMS | 47 | NUM_ROOMS_HEATED | ||
| 4 | CONSENT_CONTACT | 26 | TENURE | 48 | NUM_REFRIGERATORS | ||
| 5 | CONSENT_CONTACT_METHOD | 27 | TENURE_NOT_OCCUPIED | 49 | HAS_POOLPUMP | ||
| 6 | NAME | 28 | TENURE_OTHER | 50 | DRYER_USAGE | ||
| 7 | 29 | TENURE_RENTED_FROM | 51 | HAS_EV | |||
| 8 | PHONE | 30 | TENURE_RENTED_OTHER | 52 | EV_TYPE | ||
| 9 | OVER_18 | 31 | NUM_OCCUMPANTS | 53 | EV_CHARGER_TYPE | ||
| 10 | SOLA_SITE_ID | 32 | NUM_CHILDREN | 54 | SIGNIFICANT_LOADS_OTHER | ||
| 11 | SOLA_SITE_NAME | 33 | NUM_OCCUPANTS_70PLUS | 55 | COMMERCIAL_LOADS | ||
| 12 | STREET | 34 | HAS_PETS | 56 | BUSINESS_HOURS | ||
| 13 | SUBURB | 35 | NUM_TYPE_PETS | 57 | CONNECTION_TYPE | ||
| 14 | POSTCODE | 36 | HAS_LIVESTOCK | 58 | HAS_CONTROLLED_LOAD | ||
| 15 | STATE | 37 | NUM_TYPE_LIVESTOCK | 59 | ISLANDABLE | ||
| 16 | ACCOUNT_HOLDER | 38 | NUM_DAYS_OCCUPIED | 60 | PROPERTY_POWER_OUTAGE_TYPES | ||
| 17 | PROPERTY_USE | 39 | INCOME_WEEKLY | 61 | PROPERTY_POWER_OUTAGE_OTHER | ||
| 18 | PROPERTY_YEAR | 40 | HAS_GAS | 62 | AREA_POWER_OUTAGE_TYPES | ||
| 19 | PROPERTY_CONSTRUCTION | 41 | HAS_GAS_HEATING | 63 | AREA_POWER_OUTAGE_OTHER | ||
| 20 | PROPERTY_CONSTRUCTION_OTHER | 42 | HAS_GAS_COOKING | 64 | POWER_OUTAGE_LATEST | ||
| 21 | PROPERTY_STAR_RATING | 43 | HAS_GAS_HOT_WATER | ||||
| 22 | NUM_FLOORS | 44 | HAS_AIRCON |
Questions 31 to 38 were asked twice; the second set (all having if known) had no answers, hence was not included.
1.2. Systems Metadata¶
Solar Analytics has provided us with a second excel sheet containing the sites’ metadata. The list of the column names is as follows:
| # | Field Name |
|---|---|
| 1 | Site ID |
| 2 | State |
| 3 | Monitoring Hardware |
| 4 | Inverter Manufacturer |
| 5 | Inverter Model |
| 6 | Inverter AC Rating (kW) |
| 7 | Subarray Manufacturer |
| 8 | Subarray Model |
| 9 | Subarray Orientation |
| 10 | Subarray Tilt |
| 11 | Subarray Strings |
| 12 | Subarray Modules per String |
| 13 | Subarray DC Rating (kW) |
| 14 | Has Battery |
| 15 | Opted in |
| 16 | Survey email |
Each site has had one or more rows in this file. The data was mostly clean, most answers to each question in a unified format, number values in numeric format, etc.
1.3. Energy Data¶
Energy timeseries data with the granularity of 5 minutes has been retrieved from Solar Analytic API. Then for each circuit we extract device name, circuit label, and data which includes the following fields:
-
time
-
site id
-
energy
-
negative energy
-
positive energy
-
reactive energy
-
power
-
power factor
-
reactive power
-
apparent power
-
voltage
-
current
The returned timestamp values are in Unix format in site’s local time. We convert the time to datetime format and keep both the original Unix time and the corresponding datetime. Also renamed the fields as below:
| # | Field Name |
|---|---|
| 1 | SolA Field Name |
| 2 | EDP Field Name |
| 3 | energy |
| 4 | realEnergy |
| 5 | energyNeg |
| 6 | realEnergyNegative |
| 7 | energyPos |
| 8 | realEnergyPositive |
| 9 | power |
| 10 | realPower |
1.4. Deidentification¶
We de-identify all the sites and energy data. Site data includes site metadata and survey. We de-identify site data by removing name, full address, email, phone, and Solar Analytics customer ID. From the address fields we only keep postcode and state as they are required for some analyses but by themselves are not identifying. To de-identify the energy data we remove site ID and device names. For each site we generate and allocate a unique EDP site ID. This new id is then attached to all the data of the site, i.e., site metadata, survey, and energy timeseries data. Since we remove device name to deidentify the energy data, for each device we allocate a letter which we concatenate at the end of circuit label and add channel number after that. For example, if in the original data we have (typical for Solar Analytics energy data):
| Site ID | Device Name | Circuit Label |
|---|---|---|
| 98765 | D111122233_1 | ac_net_load |
After the process of deidentification we will have:
| edp_site_id | edp_device_and_circuit | edp_circuit_label |
|---|---|---|
| S1234 | A1 | ac_net_load_A1 |
In this example, device name D111122233 is replaced with letter A and together with the channel number becomes A1, is named edp_device_and_circuit, and also added to the circuit label to differentiate it from the data of other devices/circuits (becoming edp_circuit_label).
2. Wattwatchers¶
We have been receiving data and metadata from 352 Wattwatchers sites, each site having 6 devices. This section will be completed with data extraction and de-identification processes, however, the major differences between data from Solar Analytics and Wattwatchers are explained in the next section.
3. Unifying Data from Solar Analytics and Wattwatchers Sites¶
The aim has been to have the data from Solar Analytics and Wattwatchers in the same database tables: one table for all the energy data, one table for metadata, and one table for survey responses.
3.1. Unifying Data Fields¶
Energy data¶
There were no power columns in Wattwatchers data. Additionally, power fields (real power, apparent power, reactive power, power factor) are calculated from other fields and do not add any new information. Therefore, it was decided not to have those fields in the database, that is not to upload power fields of the Solar Analytics energy timeseries to the database. However, the deidentified csv files still include those fields. Additionally, Wattwatchers data had minimum and maximum voltage and current, while Solar Analytics data had average voltage and average current. It was decided to have three fields for each measure (min, max, average), fill the ones that have available values and keep the rest empty.
Metadata and survey¶
To be completed.
3.2. Unifying Energy Units¶
The energy data units were different for the two companies. Solar Analytics energy data unit was Wh while Wattwatchers energy data was in Joules. Therefore, we convert energy values of Wattwatchers from Joules to Wh.
3.3. Unifying Time¶
The Solar Analytics data is in local time but as UTC. We have converted it to local date and time and stored both values in the database (as timestamp and datetime). According to Solar Analytics because it is in local time, daylight saving is applied. The data for the one hour that the clocks are wound back is lost at Solar Analytics’ end and currently there is not a way to get that data. Because it happens overnight though, we would expect no production and (they expect) very little constant consumption. For analysis, the data could probably be interpolated to re-create that hour if needed (or just be ignored if appropriate). Wattwatchers data comes as Unix timestamps in local time. We store timezone of each site in the metadata and survey tables.
SolA¶
| Timestamp | Date & Time |
|---|---|
| 1648950900 | 3/04/2022 1:55 |
| 1648951200 | 3/04/2022 2:00 |
| 1648951500 | 3/04/2022 2:05 |
| ... | ... |
| 1648954200 | 3/04/2022 2:50 |
| 1648954500 | 3/04/2022 2:55 |
| 1648954800 | 3/04/2022 3:00 |
| 1648955100 | 3/04/2022 3:05 |
WW¶
| Timestamp | Date & Time |
|---|---|
| 1648911300 | 3/04/2022 1:55 |
| 1648911600 | 3/04/2022 2:00 |
| 1648911900 | 3/04/2022 2:05 |
| ... | ... |
| 1648914600 | 3/04/2022 2:50 |
| 1648914900 | 3/04/2022 2:55 |
| 1648915200 | 3/04/2022 2:00 |
| 1648915500 | 3/04/2022 2:05 |
| 1648915800 | 3/04/2022 2:10 |
| ... | ... |
| 1648918500 | 3/04/2022 2:55 |
| 1648918800 | 3/04/2022 3:00 |
| 1648919100 | 3/04/2022 3:05 |
SolA¶
| Timestamp | Date & Time |
|---|---|
| 1664675700 | 2/10/2022 1:55 |
| 1664676000 | 2/10/2022 2:00 |
| 1664676300 | 2/10/2022 2:05 |
| ... | ... |
| 1664679000 | 2/10/2022 2:50 |
| 1664679300 | 2/10/2022 2:55 |
| 1664679600 | 2/10/2022 3:00 |
| 1664679900 | 2/10/2022 3:05 |
WW¶
| Timestamp | Date & Time |
|---|---|
| 1664639700 | 2/10/2022 1:55 |
| 1664640000 | 2/10/2022 3:00 |
| 1664640300 | 2/10/2022 3:05 |