Featured Products

Source Cooperative is a Radiant Earth project

Google-Microsoft-OSM Open Buildings - combined by VIDA · VIDA · Source Cooperative | Source Cooperative

Google-Microsoft-OSM Open Buildings - combined by VIDA

This dataset merges Google's V3 Open Buildings, Microsoft's GlobalMLFootprints, and OpenStreetMap building footprints. It contains 2,705,459,584 footprints and is divided into 200 partitions. Each footprint is labelled with its respective source, either Google, Microsoft, or OpenStreetMap. It can be accessed in cloud-native geospatial formats such as GeoParquet, FlatGeobuf and PMTiles.

Product Details

Visibility: Public
Owner: VIDA
Created: 26 Aug 2024
Last Updated: 21 Aug 2025

Product Contents

root

README

Google-Microsoft-OSM Open Buildings - combined by VIDA

Overview

See it in action

You can Observable to get a quick overview of the dataset or go to VIDA to see it in action.

Original datasets

Google

The original Google V3 open buildings is downloadable from this link as gzipped CSV files. Here are some key details about the original dataset:

The dataset contains 1.8 billion building detections, across an inference area of 58M km2 within Africa, South Asia, South-East Asia, Latin America and the Caribbean.

Each building in the dataset has a polygon defining its footprint on the ground, a confidence score indicating how certain we are that this is a building, and a Plus Code corresponding to the centre of the building. There is no information about the type of building, its street address, or any details other than its geometry.

For more comprehensive information, please visit the description page. You can also check out the FAQ section for additional information.

Microsoft

The latest version of Microsoft's building footprints can be downloaded from Microsoft Planetary Computer as gzipped partitioned files.

The Microsoft Global Open Buildings dataset was generated through Bing Maps, which detected a total of 1.24 billion buildings. These buildings were identified using imagery from Bing Maps, encompassing data collected between 2014 and 2023, including images from Maxar, Airbus, and IGN France.

For more detailed information please visit the github page

OpenStreetMap

The OpenStreetMap building footprints are sourced from OpenStreetMap. The dataset is updated regularly and contains building footprints from all over the world.

OpenStreetMap is a collaborative project that creates a free editable map of the world. The data is collected by volunteers and can be used for various purposes, including building footprints. There is more than 615 million building footprints in the OpenStreetMap dataset at the time of writing.

For more detailed information please visit the OpenStreetMap Wiki

Data Formats

The data is available in the following formats:

GeoParquet 1.1.0
- By country - single file
- By country - S2 partitioned
FlatGeobuf
- By country - single file
- By country - S2 partitioned
PMTiles
- Global - single layer
- Global - layer per country based on the 3-letter ISO code
- By country

Partitioning

This extensive dataset is organized into 182 root partitions. Each partition typically corresponds to a country's administrative boundary, as defined by the Comprehensive Global Administrative Zones (CGAZ) at the ADM0 level, which can be accessed here. There is also a sub-partition available, based on the S2 grid.

By country

Both FlatGeobuf and GeoParquet are categorized by country boundaries, in accordance with the ADM0 level of the CGAZ geoboundary definition. This means that building footprints are separated by countries within each format. For naming conventions, we utilize the country's ISO CODE.

/geoparquet/by_country/country_iso={ISO}/{ISO}.parquet

Note: There is a partition labeled country_iso=None, which represents a MULTIPOLYGON containing geoboundaries (POLYGONS) that have not been explicitly defined or named by CGAZ. These geoboundaries are still captured by CGAZ at the ADM0 level, but they lack specific names and therefore labelled null. As a result, building footprints located within these geoboundaries are included in this partition labeled country_iso=None. For instance, the area between Sudan and South Sudan includes a piece of land known as "Abyei" which remains unclaimed due to recurring conflicts, and therefore, it lacks an assigned name.

By country + S2 grid

To enhance performance, particularly with GeoParquet files, we've introduced an S2 sub-partitioning strategy. Each ISO partition is further divided using an S2 grid ID, ensuring a cap of 20 million building footprints per grid ID. This S2 grid partitioning is exclusive to GeoParquet files.

/geoparquet/by_country_s2/country_iso={ISO}/{S2_GRID_ID}.parquet

Schema

Each row in the dataset provides information on a specific building footprint with associated information on individual columns:

boundary_id (INTEGER): A unique ID linking the CGAZ level 0 boundary ISO to an integer, created for partitioning the datasets within BigQuery.
confidence (FLOAT): A metric denoting the model's confidence about the accuracy of the building footprint. Microsoft-sourced footprints set this column to null since the original dataset doesn't feature this attribute.
bf_source (STRING): Indicates the footprint's origin - Google or Microsoft.
area_in_meters (FLOAT): Represents the polygon's area in square meters.
s2_id (INT): Exclusive to the S2 partitioning scheme, it represents the S2 grid ID.

Data Processing

We invite you to read our blog post for more detailed information on our dataset merging approach, which includes insights into the optimization techniques we investigated and the query performance on BigQuery. In this section, we provide a high-level summary of the merging process, highlighting its crucial aspects.
We imported both datasets into BigQuery for further processing. From the Google dataset, we excluded columns like full_plus_code, latitude, and longitude. For the Microsoft dataset we did not drop any columns. We then matched each building footprint with a boundary ID, determined by the intersection of its centroid with the country geoboundaries in the CGAZ ADM0 dataset. Footprints whose centroids didn't overlap with any country geoboundary were mapped to the nearest geoboundary based on their centroid's position.

Contact details

If you'd like more information about the dataset or the processing steps, feel free to write an email to support@vida.place.

Changelog

Current version: 2.0

Version 2.0 - 2024-09-04

Initial release!
- 2,705,459,584 building footprints for various regions by merging Google's V3 Open Buildings, Microsoft's GlobalMLFootprints, and OpenStreetMap building footprints.
- Using GeoParquet schema version 1.1.0.
- Data is partitioned by country and S2 grid.
- Data is available in GeoParquet, FlatGeobuf, and PMTiles formats.
- Google Open Buildings last updated 2023-10-02
- Microsoft GlobalMLFootprints last updated 2024-05-28
- OpenStreetMap last updated 2024-05-28

Dataset Licenses

The data is shared under the Open Data Commons Open Database License (ODbL) v1.0 license.

README

Google-Microsoft-OSM Open Buildings - combined by VIDA

Overview

See it in action

You can Observable to get a quick overview of the dataset or go to VIDA to see it in action.

Original datasets

Google

The original Google V3 open buildings is downloadable from this link as gzipped CSV files. Here are some key details about the original dataset:

The dataset contains 1.8 billion building detections, across an inference area of 58M km2 within Africa, South Asia, South-East Asia, Latin America and the Caribbean.

Each building in the dataset has a polygon defining its footprint on the ground, a confidence score indicating how certain we are that this is a building, and a Plus Code corresponding to the centre of the building. There is no information about the type of building, its street address, or any details other than its geometry.

For more comprehensive information, please visit the description page. You can also check out the FAQ section for additional information.

Microsoft

The latest version of Microsoft's building footprints can be downloaded from Microsoft Planetary Computer as gzipped partitioned files.

The Microsoft Global Open Buildings dataset was generated through Bing Maps, which detected a total of 1.24 billion buildings. These buildings were identified using imagery from Bing Maps, encompassing data collected between 2014 and 2023, including images from Maxar, Airbus, and IGN France.

For more detailed information please visit the github page

OpenStreetMap

The OpenStreetMap building footprints are sourced from OpenStreetMap. The dataset is updated regularly and contains building footprints from all over the world.

OpenStreetMap is a collaborative project that creates a free editable map of the world. The data is collected by volunteers and can be used for various purposes, including building footprints. There is more than 615 million building footprints in the OpenStreetMap dataset at the time of writing.

For more detailed information please visit the OpenStreetMap Wiki

Data Formats

The data is available in the following formats:

GeoParquet 1.1.0
- By country - single file
- By country - S2 partitioned
FlatGeobuf
- By country - single file
- By country - S2 partitioned
PMTiles
- Global - single layer
- Global - layer per country based on the 3-letter ISO code
- By country

Partitioning

By country

/geoparquet/by_country/country_iso={ISO}/{ISO}.parquet

Note: There is a partition labeled country_iso=None, which represents a MULTIPOLYGON containing geoboundaries (POLYGONS) that have not been explicitly defined or named by CGAZ. These geoboundaries are still captured by CGAZ at the ADM0 level, but they lack specific names and therefore labelled null. As a result, building footprints located within these geoboundaries are included in this partition labeled country_iso=None. For instance, the area between Sudan and South Sudan includes a piece of land known as "Abyei" which remains unclaimed due to recurring conflicts, and therefore, it lacks an assigned name.

By country + S2 grid

/geoparquet/by_country_s2/country_iso={ISO}/{S2_GRID_ID}.parquet

Schema

Each row in the dataset provides information on a specific building footprint with associated information on individual columns:

boundary_id (INTEGER): A unique ID linking the CGAZ level 0 boundary ISO to an integer, created for partitioning the datasets within BigQuery.
confidence (FLOAT): A metric denoting the model's confidence about the accuracy of the building footprint. Microsoft-sourced footprints set this column to null since the original dataset doesn't feature this attribute.
bf_source (STRING): Indicates the footprint's origin - Google or Microsoft.
area_in_meters (FLOAT): Represents the polygon's area in square meters.
s2_id (INT): Exclusive to the S2 partitioning scheme, it represents the S2 grid ID.

Data Processing

Contact details

If you'd like more information about the dataset or the processing steps, feel free to write an email to support@vida.place.

Changelog

Current version: 2.0

Version 2.0 - 2024-09-04

Initial release!
- 2,705,459,584 building footprints for various regions by merging Google's V3 Open Buildings, Microsoft's GlobalMLFootprints, and OpenStreetMap building footprints.
- Using GeoParquet schema version 1.1.0.
- Data is partitioned by country and S2 grid.
- Data is available in GeoParquet, FlatGeobuf, and PMTiles formats.
- Google Open Buildings last updated 2023-10-02
- Microsoft GlobalMLFootprints last updated 2024-05-28
- OpenStreetMap last updated 2024-05-28

Dataset Licenses

The data is shared under the Open Data Commons Open Database License (ODbL) v1.0 license.