This dataset is a copy of the Energy Performance Certificates for England and Wales made available in a cloud-native geospatial geoparquet format. The original dataset is distributed as a zipped collection CSVs, and is available for download from here.
The data has been made available as admin-partitioned geoparquet.You can query the entire dataset using DuckDB like so.
Alternatively, you can use the aws cli to access the data directly from the S3 bucket:
Or if you want the data for just one local authority then choose one from the browse section of this page.
The geoparquet format is compliant with GDAL 3.5 onwards and readable in QGIS and many other platforms. See the geoparquet website for more info on this
These steps should help you emulate the process of extracting the EPC The data source data exists here. You will need to sign up. Once doing this you should find the data avaible to download as a ZIP. This ZIP contains a directory for each local authortiy in England and Wales. I've created this script to extract just the certificates.csv from each directory.
The next steps involve using the DuckDB client to load the data, geocoding each row by joining to the OS Open UPRN dataset and then exporting the entire dataset to admin-partitioned parquet files. Then we'll need to convert these to geoparquet using gpq tool. There are 347 CSVs totalling 23.4 GB in size. You can convert the data to parquet using various tools. Once you've done that the total combined file size of the Parquet files is only 3.7 GB. The steps below
Once all the views have been created we can then export to partionied parquet files. The partition used here is the local authority name.
It is currently not possible to export to geoparquet using DuckDB so to convert all the outputted parquet to geoparquet and remove the parquet files I created this script
Date is licensed under the Open Government Licence v3.0. More details can be found here.
This is an experimental dataset and I hope to add more attributes in the future. If you have any questions about the process you can contact me using matt@addresscloud.com