A collection of datasets from various Dutch institutions to demonstrate a Spatial Data Infrastructure built on Portolan.
# Buildings (INSPIRE Harmonized) — Kadaster / Netherlands
## What This Dataset Is
All buildings in the Netherlands — 24.2 million footprint polygons — from the BAG
(Basisregistratie Adressen en Gebouwen, the Dutch national building registry), harmonized
to the EU INSPIRE Buildings schema by Kadaster and published via PDOK.
This is the authoritative Dutch building dataset. Every building that has ever been
registered in the BAG is here, including historical versions. Construction dates range
from before 1800 to present, with ~1 million "planned" entries dated 9999-01-01.
The dataset includes both **current** and **historical** building records. To get only
current buildings, filter for `endLifespanVersion IS NULL`.
**No building heights or floor counts.** The INSPIRE Buildings 2D profile from PDOK only
includes footprints, construction dates, and identifiers. For heights, use
[3DBAG](https://3dbag.nl) from TU Delft, which derives building heights from the AHN
point cloud and includes fields like `h_dak_50pct` (median roof height).
**Source:** https://www.pdok.nl/introductie/-/article/gebouwen-inspire-geharmoniseerd-
**Provider:** Kadaster (Netherlands Cadastre, Land Registry and Mapping Agency)
**License:** CC0 (public domain)
## How to Access
This is a GeoParquet 1.1.0 file with bbox covering columns and Hilbert spatial sorting,
making it efficient for spatial range queries. Use DuckDB with the spatial extension.
```python
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
# Load from the STAC asset 'data' link:
URL = 'https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/buildings.parquet'
df = con.execute(f"""
SELECT * FROM read_parquet('{URL}')
LIMIT 5
""").df()
```
The file is 2.1 GB. For large queries, DuckDB can stream it efficiently using HTTP range
requests — you don't need to download the full file. The bbox covering columns and Hilbert
sorting enable DuckDB to skip irrelevant row groups for spatial filters.
## Schema — Field Meanings
| Field | Type | Meaning |
|-------|------|---------|
| `geometry` | WKB Polygon | Building footprint in **EPSG:4258** (ETRS89). Practically identical to WGS84 — no transform needed for most uses. |
| `localId` | int64 | **BAG pand ID** — the key identifier. Links to the Dutch national building registry. Multiple rows may share a `localId` (version history). |
| `gml_id` | string | Full INSPIRE ID: `nl-imbag-bu.<localId>.<version>`. Unique per row. |
| `anyPoint` | string | **Construction date** (INSPIRE `dateOfConstruction.anyPoint`). ISO 8601 format like `1975-01-01T00:00:00+01:00`. A value of `9999-01-01` means planned or unknown. |
| `beginLifespanVersion` | string | When this version of the record entered the dataset. ISO 8601 datetime. |
| `endLifespanVersion` | string | When this version was superseded. **NULL means current** — this is how you filter for current buildings. Non-null means this is a historical version. |
| `namespace` | string | Always `nl-imbag-bu`. |
| `OGC_FID` | int64 | Internal row ID from WFS extraction. Not meaningful. |
| `referenceGeometry` | bool | Always `true`. |
| `horizontalGeometryEstimatedAccuracy` | string | Not populated in this dataset. |
| `verticalGeometryEstimatedAccuracy` | string | Not populated in this dataset. |
| `beginning` | string | Not populated — use `anyPoint` for construction date. |
| `end` | string | Not populated — use `anyPoint` for construction date. |
## Important Columns
The columns you'll actually use are:
- **`localId`** — BAG building ID (the key identifier)
- **`anyPoint`** — construction date
- **`endLifespanVersion`** — null means current, non-null means historical
- **`geometry`** — building footprint polygon
Everything else is either constant, unpopulated, or internal.
## Geometry Notes
- CRS is **EPSG:4258** (ETRS89) — this is the EU standard CRS, practically identical to WGS84 (EPSG:4326). The difference is sub-meter, so you can treat coordinates as WGS84 for most purposes.
- Geometry column is named `geometry` (WKB encoded)
- All geometries are simple Polygons (no MultiPolygon)
- Bounding box: longitude 3.16° to 7.23°, latitude 50.75° to 53.54° (all of Netherlands)
- The file has bbox covering columns and Hilbert spatial sorting — DuckDB can use these for efficient spatial filtering
## Current vs Historical Records
The dataset contains 24.2M rows total:
- **12.6M current** records (`endLifespanVersion IS NULL`)
- **11.5M superseded** records (`endLifespanVersion IS NOT NULL`)
Multiple rows with the same `localId` represent the version history of a single building.
Always filter for current records unless you specifically need the history.
## Construction Date Distribution
| Era | Count |
|-----|-------|
| Pre-1800 | 112,724 |
| 1800–1899 | 361,319 |
| 1900–1949 | 3,235,036 |
| 1950–1999 | 11,476,444 |
| 2000–2024 | 7,640,565 |
| 2025+ | 309,834 |
| Planned/unknown (9999) | 1,050,839 |
Peak construction years: 2021 (486K), 2020 (479K), 2019 (446K). The 1950–1999 period
dominates — post-war reconstruction and suburban expansion.
## Useful Query Patterns
### Get current buildings only (most common starting point)
```sql
SELECT localId, anyPoint, geometry
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/buildings.parquet')
WHERE endLifespanVersion IS NULL
LIMIT 100
```
### Count current vs historical records
```sql
SELECT
CASE WHEN endLifespanVersion IS NULL THEN 'current' ELSE 'historical' END AS status,
COUNT(*) AS count
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/buildings.parquet')
GROUP BY 1
```
### Construction year distribution
```sql
SELECT
LEFT(anyPoint, 4) AS year,
COUNT(*) AS buildings_built
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/buildings.parquet')
WHERE endLifespanVersion IS NULL
AND LEFT(anyPoint, 4) != '9999'
GROUP BY 1
ORDER BY 1
```
### Find buildings in a specific area (DuckDB spatial)
```sql
INSTALL spatial; LOAD spatial;
SELECT localId, anyPoint,
ST_AsText(ST_GeomFromWKB(geometry)) AS wkt
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/buildings.parquet')
WHERE endLifespanVersion IS NULL
AND bbox.xmin >= 4.88 AND bbox.xmax <= 4.92
AND bbox.ymin >= 52.36 AND bbox.ymax <= 52.38
```
This uses the bbox covering columns for fast filtering — DuckDB skips row groups whose
bbox ranges don't overlap. Much faster than `ST_Intersects` for bounding-box queries.
### Spatial intersection with a point (e.g., Amsterdam Centraal)
```sql
INSTALL spatial; LOAD spatial;
SELECT localId, anyPoint
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/buildings.parquet')
WHERE endLifespanVersion IS NULL
AND bbox.xmin <= 4.9003 AND bbox.xmax >= 4.9003
AND bbox.ymin <= 52.3792 AND bbox.ymax >= 52.3792
AND ST_Intersects(
ST_GeomFromWKB(geometry),
ST_Point(4.9003, 52.3792)
)
```
### Buildings constructed before 1800 (still standing)
```sql
SELECT localId, anyPoint, ST_AsText(ST_Centroid(ST_GeomFromWKB(geometry))) AS center
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/buildings.parquet')
WHERE endLifespanVersion IS NULL
AND LEFT(anyPoint, 4) < '1800'
AND LEFT(anyPoint, 4) != '0001'
ORDER BY anyPoint
LIMIT 20
```
### Version history of a specific building
```sql
SELECT localId, gml_id, anyPoint, beginLifespanVersion, endLifespanVersion
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/buildings.parquet')
WHERE localId = 363100000000197
ORDER BY beginLifespanVersion
```
### Count buildings per municipality (using BAG ID prefix)
The first 4 digits of `localId` encode the municipality code (gemeentecode).
```sql
SELECT
CAST(localId / 10000000000 AS INT) AS gemeente_code,
COUNT(*) AS building_count
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/buildings.parquet')
WHERE endLifespanVersion IS NULL
GROUP BY 1
ORDER BY building_count DESC
LIMIT 20
```
### Compute building footprint area
```sql
INSTALL spatial; LOAD spatial;
SELECT localId, anyPoint,
ST_Area_Spheroid(ST_GeomFromWKB(geometry)) AS area_m2
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/buildings.parquet')
WHERE endLifespanVersion IS NULL
ORDER BY area_m2 DESC
LIMIT 20
```
### Load into GeoPandas
```python
import geopandas as gpd
# For the full dataset (2.1 GB download):
gdf = gpd.read_parquet(
'https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/buildings.parquet',
columns=['localId', 'anyPoint', 'endLifespanVersion', 'geometry']
)
gdf = gdf[gdf['endLifespanVersion'].isna()] # current buildings only
```
## INSPIRE Schema Context
This dataset follows the INSPIRE Buildings 2D profile (bu-core2d). The INSPIRE schema
defines many more fields (currentUse, conditionOfConstruction, heightAboveGround,
numberOfFloorsAboveGround, etc.) but the Dutch PDOK service only populates the core
identity and lifecycle fields listed above. For richer building attributes (height, roof
type, floor count, etc.), use the 3DBAG dataset from TU Delft (3dbag.nl) which uses
native Dutch BAG field names instead of INSPIRE names.
## Caveats
- **Construction dates as strings:** The `anyPoint` field is an ISO 8601 string, not a
proper date type. Use `LEFT(anyPoint, 4)` to extract the year, or cast with
`CAST(LEFT(anyPoint, 4) AS INT)` for numeric comparisons.
- **9999 dates:** About 1 million records have `anyPoint = '9999-01-01T00:00:00+01:00'`,
meaning the construction date is unknown or the building is planned. Filter these out
for historical analysis.
- **0001 dates:** A small number of records have year 0001, likely data quality issues.
- **Version history inflates row count:** The 24.2M total includes 11.5M superseded
records. If you just want current buildings, filter for `endLifespanVersion IS NULL`
to get ~12.6M records.
- **No building attributes beyond dates:** This INSPIRE harmonization only includes
construction dates and identifiers. There's no height, use type, or condition data.
For those, see 3DBAG.
- **Municipality code extraction:** The `localId` encodes the gemeente code in its first
4 digits (e.g., localId 363100000000197 → gemeente 0363 = Amsterdam). But this
requires knowing the digit structure: `LPAD(CAST(localId / 10000000000 AS VARCHAR), 4, '0')`.
## Visualization Styles
Two Mapbox GL v8 styles are available for interactive map visualization via the PMTiles file.
Style files are Mapbox GL v8 JSON with relative PMTiles source paths. They can be
used with MapLibre GL JS, OpenLayers (via ol-mapbox-style), or any Mapbox GL v8-compatible renderer.
- **`styles/default.json`** — Gray building footprints with darker outlines. Neutral monochrome rendering, visually distinct from the BAG light version.
- **`styles/by-construction-date.json`** — **Construction era analysis.** Parses year from the ISO 8601 `anyPoint` field and applies a brown-to-yellow age ramp. Same analytical purpose as the BAG by-age style but using INSPIRE-harmonized date fields.
Style files are at: `https://data.source.coop/cholmes/portolan-nl/kadaster/inspire_buildings/styles/`
## Also Available As
- **GML (INSPIRE format):** `buildings.gml.gz` (2.7 GB compressed, ~45 GB uncompressed)
— the original INSPIRE-harmonized GML from PDOK's ATOM download service.
- **PMTiles (vector tiles):** `buildings.pmtiles` — for web map visualization with
MapLibre GL JS or similar. Features are dropped at lower zoom levels for performance.
- **WFS:** Live access via `https://service.pdok.nl/kadaster/bu/wfs/v1_0` (slower, paginated).