This dataset merges Google's V3 Open Buildings, Microsoft's GlobalMLFootprints, and OpenStreetMap building footprints. It contains 2,705,459,584 footprints and is divided into 200 partitions. Each footprint is labelled with its respective source, either Google, Microsoft, or OpenStreetMap. It can be accessed in cloud-native geospatial formats such as GeoParquet, FlatGeobuf and PMTiles.
You can Observable to get a quick overview of the dataset or go to VIDA to see it in action.
The original Google V3 open buildings is downloadable from this link as gzipped CSV files. Here are some key details about the original dataset:
The dataset contains 1.8 billion building detections, across an inference area of 58M km2 within Africa, South Asia, South-East Asia, Latin America and the Caribbean.
Each building in the dataset has a polygon defining its footprint on the ground, a confidence score indicating how certain we are that this is a building, and a Plus Code corresponding to the centre of the building. There is no information about the type of building, its street address, or any details other than its geometry.
For more comprehensive information, please visit the description page. You can also check out the FAQ section for additional information.
The latest version of Microsoft's building footprints can be downloaded from Microsoft Planetary Computer as gzipped partitioned files.
The Microsoft Global Open Buildings dataset was generated through Bing Maps, which detected a total of 1.24 billion buildings. These buildings were identified using imagery from Bing Maps, encompassing data collected between 2014 and 2023, including images from Maxar, Airbus, and IGN France.
For more detailed information please visit the github page
The OpenStreetMap building footprints are sourced from OpenStreetMap. The dataset is updated regularly and contains building footprints from all over the world.
OpenStreetMap is a collaborative project that creates a free editable map of the world. The data is collected by volunteers and can be used for various purposes, including building footprints. There is more than 615 million building footprints in the OpenStreetMap dataset at the time of writing.
For more detailed information please visit the OpenStreetMap Wiki
The data is available in the following formats:
This extensive dataset is organized into 182 root partitions. Each partition typically corresponds to a country's administrative boundary, as defined by the Comprehensive Global Administrative Zones (CGAZ) at the ADM0 level, which can be accessed here. There is also a sub-partition available, based on the S2 grid.
Both FlatGeobuf and GeoParquet are categorized by country boundaries, in accordance with the ADM0 level of the CGAZ geoboundary definition. This means that building footprints are separated by countries within each format. For naming conventions, we utilize the country's ISO CODE.
/geoparquet/by_country/country_iso={ISO}/{ISO}.parquet
Note: There is a partition labeled
country_iso=None
, which represents a MULTIPOLYGON containing geoboundaries (POLYGONS) that have not been explicitly defined or named by CGAZ. These geoboundaries are still captured by CGAZ at the ADM0 level, but they lack specific names and therefore labellednull
. As a result, building footprints located within these geoboundaries are included in this partition labeledcountry_iso=None
. For instance, the area between Sudan and South Sudan includes a piece of land known as "Abyei" which remains unclaimed due to recurring conflicts, and therefore, it lacks an assigned name.
To enhance performance, particularly with GeoParquet files, we've introduced an S2 sub-partitioning strategy. Each ISO partition is further divided using an S2 grid ID, ensuring a cap of 20 million building footprints per grid ID. This S2 grid partitioning is exclusive to GeoParquet files.
/geoparquet/by_country_s2/country_iso={ISO}/{S2_GRID_ID}.parquet
Each row in the dataset provides information on a specific building footprint with associated information on individual columns:
We invite you to read our blog post for more detailed information on our dataset merging approach, which includes insights into the optimization techniques we investigated and the query performance on BigQuery. In this section, we provide a high-level summary of the merging process, highlighting its crucial aspects.
We imported both datasets into BigQuery for further processing. From the Google dataset, we excluded columns like full_plus_code
, latitude
, and longitude
. For the Microsoft dataset we did not drop any columns.
We then matched each building footprint with a boundary ID, determined by the intersection of its centroid with the country geoboundaries in the CGAZ ADM0 dataset. Footprints whose centroids didn't overlap with any country geoboundary were mapped to the nearest geoboundary based on their centroid's position.
If you'd like more information about the dataset or the processing steps, feel free to write an email to darell@vida.place.
Current version: 2.0
The data is shared under the Creative Commons Attribution (CC BY-4.0) license and the Open Data Commons Open Database License (ODbL) v1.0 license. As the user, you can pick which of the two licenses you prefer and use the data under the terms of that license.