A collection of datasets from various Dutch institutions to demonstrate a Spatial Data Infrastructure built on Portolan.
# CBS Wijken en Buurten 2025 (Neighborhoods and Districts with Demographics)
## What This Dataset Is
The definitive Dutch demographic geodataset. Published annually by CBS (Statistics Netherlands),
it provides population, age, household, marital status, migration background, and urbanity
statistics at three nested geographic levels covering the entire Netherlands:
- **Buurten** (neighborhoods): 14,823 areas -- the finest granularity
- **Wijken** (districts): 3,423 areas -- each aggregates multiple buurten
- **Gemeenten** (municipalities): 424 areas -- top-level administrative units
Statistics reflect the situation as of **1 January 2025**. All three layers share the same
~42 demographic columns. The dataset is the go-to source for any neighborhood-level
demographic analysis in the Netherlands.
**Source:** https://service.pdok.nl/cbs/wijkenbuurten/2025/atom/index.xml
**Provider:** CBS (Centraal Bureau voor de Statistiek / Statistics Netherlands)
**License:** CC-BY-4.0
**CRS:** EPSG:28992 (Amersfoort / RD New)
## CRITICAL: Sentinel Value -99997
**The value -99997 means "data not available or privacy-suppressed."** You MUST filter it out
before computing any averages, sums, medians, or other aggregations. If you don't, your
results will be wildly wrong.
Reasons for -99997:
- **Privacy protection:** Areas with fewer than ~50 residents have statistics suppressed to
prevent identification of individuals.
- **Water areas:** Water-only areas (where `water = 'JA'`) have no meaningful demographics.
- **Buitenland entry:** A boundary marker entry labeled 'Buitenland' (Foreign) in each layer
with all statistics set to -99997.
**Always add `WHERE column_name != -99997` (or `> -99997`) to your queries.**
## How to Access
Three separate GeoParquet files, one per geographic level. Use DuckDB with the spatial extension.
```python
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
BASE = 'https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/'
BUURTEN = BASE + 'buurten.parquet' # 60 MB, 14,823 features
WIJKEN = BASE + 'wijken.parquet' # 42 MB, 3,423 features
GEMEENTEN = BASE + 'gemeenten.parquet' # 29 MB, 424 features
df = con.execute(f"""
SELECT * FROM read_parquet('{BUURTEN}')
LIMIT 5
""").df()
```
File sizes are moderate -- DuckDB can efficiently stream them via HTTP range requests without
downloading the full files.
## Geographic Hierarchy and Code System
The three layers form a **strict spatial hierarchy**. Each buurt belongs to exactly one wijk,
and each wijk belongs to exactly one gemeente. The CBS area codes encode this relationship:
```
BU00140000 --> buurt 0000 within wijk 001400
WK001400 --> wijk 00 within gemeente 0014
GM0014 --> gemeente Groningen
```
**Extracting parent codes from a buurtcode:**
- Gemeente code: characters 2-6 of buurtcode (e.g., BU**0014**0000 -> GM0014)
- Wijk code: characters 2-8 of buurtcode (e.g., BU**001400**00 -> WK001400)
In SQL:
```sql
-- Extract gemeente code from buurtcode
'GM' || SUBSTRING(buurtcode, 3, 4) -- GM0014
-- Extract wijk code from buurtcode
'WK' || SUBSTRING(buurtcode, 3, 6) -- WK001400
```
## Schema -- All Columns
All three layers share the same column schema. The area code/name columns differ by layer
(buurtcode/buurtnaam vs wijkcode/wijknaam vs gemeentecode/gemeentenaam), but all statistical
columns are identical.
### Identity & Classification Columns
| Column | Type | Meaning |
|--------|------|---------|
| `buurtcode` / `wijkcode` / `gemeentecode` | string | CBS area code with prefix BU/WK/GM |
| `buurtnaam` / `wijknaam` / `gemeentenaam` | string | Human-readable area name |
| `wijkcode` | string | Parent district code (buurten layer only) |
| `gemeentecode` | string | Parent municipality code (buurten and wijken layers) |
| `gemeentenaam` | string | Parent municipality name (buurten and wijken layers) |
| `indelingswijziging_wijken_en_buurten` | int | Boundary change flag: 1 = changed vs previous year, 0 = unchanged |
| `water` | string | Water area indicator: 'JA' = water body, 'NEE' = land |
| `meest_voorkomende_postcode` | string | Most common 4-digit postal code (PC4) in the area |
| `dekkingspercentage` | int | % of addresses covered by the most common postal code |
| `jrstatcode` | string | Composite key: year + area code (e.g., '2025BU00140000') |
| `jaar` | int | Reference year (2025) |
### Population & Density
| Column | Type | Meaning |
|--------|------|---------|
| `aantal_inwoners` | int | **Total population** (number of residents) |
| `mannen` | int | Male population count |
| `vrouwen` | int | Female population count |
| `bevolkingsdichtheid_inwoners_per_km2` | int | Population density (residents per km2 of land) |
| `omgevingsadressendichtheid` | int | Avg addresses within 1 km radius of each address |
| `stedelijkheid_adressen_per_km2` | int | CBS urbanity class: 1=very urban (>=2500), 2=urban (1500-2500), 3=moderate (1000-1500), 4=slightly urban (500-1000), 5=rural (<500) |
### Age Distribution (percentages, sum to ~100)
| Column | Meaning |
|--------|---------|
| `percentage_personen_0_tot_15_jaar` | % aged 0-14 (children) |
| `percentage_personen_15_tot_25_jaar` | % aged 15-24 (young adults/students) |
| `percentage_personen_25_tot_45_jaar` | % aged 25-44 (young working age) |
| `percentage_personen_45_tot_65_jaar` | % aged 45-64 (older working age) |
| `percentage_personen_65_jaar_en_ouder` | % aged 65+ (elderly/retired) |
### Marital Status (percentages, sum to ~100)
| Column | Meaning |
|--------|---------|
| `percentage_ongehuwd` | % never married |
| `percentage_gehuwd` | % currently married (incl. registered partnerships) |
| `percentage_gescheid` | % divorced |
| `percentage_verweduwd` | % widowed |
### Households
| Column | Type | Meaning |
|--------|------|---------|
| `aantal_huishoudens` | int | Total number of private households |
| `percentage_eenpersoonshuishoudens` | int | % single-person households |
| `percentage_huishoudens_zonder_kinderen` | int | % multi-person households without children |
| `percentage_huishoudens_met_kinderen` | int | % households with children |
| `gemiddelde_huishoudsgrootte` | float | Average household size (persons per household) |
### Migration Background (percentages, sum to ~100)
| Column | Meaning |
|--------|---------|
| `percentage_met_herkomstland_nederland` | % with Dutch background (both parents born in NL) |
| `percentage_met_herkomstland_uit_europa_excl_nl` | % with European background (excl. NL) |
| `percentage_met_herkomstland_buiten_europa` | % with non-European background |
### Migration Background -- Detailed (generation breakdown)
| Column | Meaning |
|--------|---------|
| `percentage_geb_in_nl_met_herkomstland_nederland` | % born in NL, both parents born in NL (native Dutch) |
| `perc_geb_in_nl_met_herkomstland_in_europa_ex_nl` | % born in NL, European background (2nd gen European) |
| `perc_geb_in_nl_met_herkomstland_buiten_europa` | % born in NL, non-European background (2nd gen non-European) |
| `perc_geb_buiten_nl_met_herkomstlnd_in_europa_ex_nl` | % born abroad, European background (1st gen European) |
| `perc_geb_buiten_nl_met_herkomstlnd_buiten_europa` | % born abroad, non-European background (1st gen non-European) |
### Area Measurements
| Column | Type | Meaning |
|--------|------|---------|
| `oppervlakte_totaal_in_ha` | int | Total area in hectares (land + water) |
| `oppervlakte_land_in_ha` | int | Land area in hectares |
| `oppervlakte_water_in_ha` | int | Water area in hectares |
### Geometry
| Column | Type | Meaning |
|--------|------|---------|
| `geometry` | WKB MultiPolygon | Area boundary in **EPSG:28992** (Amersfoort / RD New) |
## Useful Query Patterns
All examples use the base URL:
```
BASE = 'https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/'
```
### Population by municipality (top 20 largest)
```sql
SELECT gemeentenaam, aantal_inwoners, bevolkingsdichtheid_inwoners_per_km2
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/gemeenten.parquet')
WHERE aantal_inwoners != -99997
ORDER BY aantal_inwoners DESC
LIMIT 20
```
### Densest neighborhoods in the Netherlands
```sql
SELECT buurtnaam, gemeentenaam, bevolkingsdichtheid_inwoners_per_km2, aantal_inwoners
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/buurten.parquet')
WHERE bevolkingsdichtheid_inwoners_per_km2 != -99997
AND water = 'NEE'
ORDER BY bevolkingsdichtheid_inwoners_per_km2 DESC
LIMIT 20
```
### Age distribution for a specific municipality
```sql
SELECT gemeentenaam,
aantal_inwoners,
percentage_personen_0_tot_15_jaar AS pct_0_14,
percentage_personen_15_tot_25_jaar AS pct_15_24,
percentage_personen_25_tot_45_jaar AS pct_25_44,
percentage_personen_45_tot_65_jaar AS pct_45_64,
percentage_personen_65_jaar_en_ouder AS pct_65_plus
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/gemeenten.parquet')
WHERE gemeentenaam = 'Amsterdam'
```
### Neighborhoods with highest percentage of elderly
```sql
SELECT buurtnaam, gemeentenaam,
percentage_personen_65_jaar_en_ouder AS pct_65_plus,
aantal_inwoners
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/buurten.parquet')
WHERE percentage_personen_65_jaar_en_ouder != -99997
AND aantal_inwoners > 100
ORDER BY percentage_personen_65_jaar_en_ouder DESC
LIMIT 20
```
### Filter out water areas and suppressed data properly
```sql
-- Standard filter for meaningful land-based demographic data:
SELECT *
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/buurten.parquet')
WHERE water = 'NEE'
AND aantal_inwoners != -99997
AND buurtnaam != 'Buitenland'
```
### Average household size by urbanity class
```sql
SELECT
CASE stedelijkheid_adressen_per_km2
WHEN 1 THEN 'Very highly urban'
WHEN 2 THEN 'Highly urban'
WHEN 3 THEN 'Moderately urban'
WHEN 4 THEN 'Slightly urban'
WHEN 5 THEN 'Rural'
END AS urbanity,
ROUND(AVG(gemiddelde_huishoudsgrootte), 2) AS avg_household_size,
COUNT(*) AS num_neighborhoods
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/buurten.parquet')
WHERE gemiddelde_huishoudsgrootte > 0
AND stedelijkheid_adressen_per_km2 BETWEEN 1 AND 5
GROUP BY stedelijkheid_adressen_per_km2
ORDER BY stedelijkheid_adressen_per_km2
```
### Single-person household concentration
```sql
SELECT buurtnaam, gemeentenaam,
percentage_eenpersoonshuishoudens AS pct_single,
aantal_huishoudens,
percentage_personen_15_tot_25_jaar AS pct_youth
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/buurten.parquet')
WHERE percentage_eenpersoonshuishoudens != -99997
AND aantal_huishoudens > 50
ORDER BY percentage_eenpersoonshuishoudens DESC
LIMIT 20
```
### Migration background diversity by municipality
```sql
SELECT gemeentenaam,
aantal_inwoners,
percentage_met_herkomstland_nederland AS pct_dutch,
percentage_met_herkomstland_uit_europa_excl_nl AS pct_european,
percentage_met_herkomstland_buiten_europa AS pct_non_european
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/gemeenten.parquet')
WHERE aantal_inwoners != -99997
ORDER BY percentage_met_herkomstland_buiten_europa DESC
LIMIT 20
```
### Join buurten with gemeenten names using code hierarchy
```sql
-- The buurtcode encodes the gemeente: BU00140000 -> GM0014
SELECT
b.buurtnaam,
b.gemeentenaam,
b.aantal_inwoners AS buurt_pop,
g.aantal_inwoners AS gemeente_pop,
ROUND(100.0 * b.aantal_inwoners / g.aantal_inwoners, 2) AS pct_of_gemeente
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/buurten.parquet') b
JOIN read_parquet('https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/gemeenten.parquet') g
ON b.gemeentecode = g.gemeentecode
WHERE b.aantal_inwoners != -99997
AND g.aantal_inwoners != -99997
ORDER BY pct_of_gemeente DESC
LIMIT 20
```
### Load into GeoPandas
```python
import geopandas as gpd
BASE = 'https://data.source.coop/cholmes/portolan-nl/cbs/wijken_buurten/'
# Buurten (finest level, 60 MB)
buurten = gpd.read_parquet(BASE + 'buurten.parquet')
# Filter to land areas with valid demographic data
buurten = buurten[
(buurten['water'] == 'NEE') &
(buurten['aantal_inwoners'] != -99997)
]
# CRS is EPSG:28992 -- convert to WGS84 for web mapping:
buurten_wgs84 = buurten.to_crs(epsg=4326)
```
## Geometry Notes
- CRS is **EPSG:28992** (Amersfoort / RD New) -- the standard Dutch national coordinate
system. You must reproject to EPSG:4326 for web maps or if combining with WGS84 data.
- Geometry type is MultiPolygon across all three layers.
- Bounding box in WGS84: approximately [3.21, 50.73, 7.24, 53.58] (all of Netherlands).
- For spatial filtering with DuckDB, convert the geometry or query coordinates to EPSG:28992.
## Caveats
- **Sentinel value -99997:** Always filter this out. It appears in ALL numeric columns for
privacy-suppressed, water, and boundary-marker areas. Forgetting to filter will corrupt
aggregations (averages, sums, etc.) dramatically.
- **Percentages are integers:** All percentage columns are rounded to whole numbers. The five
age groups, four marital statuses, and three migration backgrounds each sum to approximately
100, but rounding may cause +/-1 deviations.
- **424 gemeenten vs 342 current municipalities:** The dataset includes more entries than
currently active municipalities due to historical/special entries. The extra entries include
the 'Buitenland' marker and possibly entries for recent municipal mergers.
- **Water areas:** Areas where `water = 'JA'` are water bodies with suppressed demographics.
Filter with `WHERE water = 'NEE'` for land-only analysis.
- **Municipal mergers:** The Netherlands regularly merges municipalities. When this happens,
all buurten and wijken within those municipalities get renumbered. The
`indelingswijziging_wijken_en_buurten` flag (1 = changed) helps identify areas whose
boundaries changed vs the previous year, making year-over-year comparison unreliable for
those areas.
- **Population density denominator:** `bevolkingsdichtheid_inwoners_per_km2` uses land area
only (`oppervlakte_land_in_ha`), not total area. This is the correct measure but means
areas with large water surfaces may appear denser than expected.
## Related Datasets
- **CBS Gebiedsindelingen:** Boundary geometries without demographic data, covering additional
geographic levels (COROP regions, provinces, labor market regions, etc.). Use Wijken en
Buurten when you need statistics; use Gebiedsindelingen when you only need boundaries or
need geographic levels beyond gemeente/wijk/buurt.
- **Bestuurlijke Gebieden (Kadaster):** Authoritative administrative boundaries for
municipalities, provinces, and national territory. CBS boundaries are derived from these
but may differ slightly.
## Also Available As
- **PMTiles (vector tiles):** `wijken_buurten.pmtiles` -- for web map visualization with
MapLibre GL JS. Shows buurten boundaries.
- **Source GeoPackage:** Available from PDOK at
`https://service.pdok.nl/cbs/wijkenbuurten/2025/atom/downloads/wijkenbuurten_2025.gpkg`
- **OGC API Features:** `https://api.pdok.nl/cbs/wijken-en-buurten-2025/ogc/v1`