A collection of datasets from various Dutch institutions to demonstrate a Spatial Data Infrastructure built on Portolan.
# CBS Wijken en Buurten (Districts and Neighborhoods) — Statistics Netherlands
## What This Dataset Is
Geometry of all **municipalities (gemeenten)**, **districts (wijken)**, and **neighborhoods
(buurten)** in the Netherlands with statistical key figures (kerncijfers) as attributes.
Published annually by **CBS** (Centraal Bureau voor de Statistiek / Statistics Netherlands)
via PDOK. Versions are available for 2021-2025.
The three geographic levels form a hierarchy:
- **Gemeente** (municipality) — the top administrative level (~340 municipalities)
- **Wijk** (district) — subdivisions of a municipality (~3,000 districts)
- **Buurt** (neighborhood) — the finest level, subdivisions of a district (~14,000 neighborhoods)
Each feature has demographic, household, and area statistics as attributes. Boundaries are
derived from the BRK cadastral registration (municipalities), municipality submissions
(neighborhoods), and Bestand Bodemgebruik (water/land borders).
**Source:** https://www.pdok.nl/introductie/-/article/cbs-wijken-en-buurten
**CBS page:** https://www.cbs.nl/nl-nl/dossier/nederland-regionaal/geografische-data/wijk-en-buurtkaart-2024
**Provider:** CBS (Centraal Bureau voor de Statistiek / Statistics Netherlands)
**License:** CC0 (public domain)
**Update frequency:** Annual
**Attribution:** CBS and Kadaster
## How to Access
Three GeoParquet files (one per geographic level) and a combined PMTiles file for
web visualization are available from Source Cooperative.
| File | Features | Size | Contents |
|------|----------|------|----------|
| `buurten.parquet` | 14,823 | 58 MB | Neighborhoods with statistics |
| `wijken.parquet` | 3,505 | 39 MB | Districts with statistics |
| `gemeenten.parquet` | 424 | 26 MB | Municipalities with statistics |
| `wijken_en_buurten.pmtiles` | — | 99 MB | All three layers as vector tiles |
**Base URL:** `https://data.source.coop/cholmes/portolan-nl/cbs/wijken_en_buurten/`
```python
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
URL = 'https://data.source.coop/cholmes/portolan-nl/cbs/wijken_en_buurten/buurten.parquet'
df = con.execute(f"SELECT * FROM read_parquet('{URL}') LIMIT 5").df()
```
Also available via WFS and WMS from PDOK:
- **WFS (2025):** `https://service.pdok.nl/cbs/wijkenbuurten/2025/wfs/v1_0`
- **WMS (2025):** `https://service.pdok.nl/cbs/wijkenbuurten/2025/wms/v1_0`
## Schema — Field Meanings
| Field | Type | Meaning |
|-------|------|---------|
| `buurtcode` | string | **Neighborhood code** (e.g. "BU09989999"). Unique ID for each buurt. Prefix "BU" + 8 digits. |
| `buurtnaam` | string | **Neighborhood name** (Dutch). |
| `wijkcode` | string | **District code** (e.g. "WK099899"). Unique ID for each wijk. Prefix "WK" + 6 digits. |
| `gemeentecode` | string | **Municipality code** (e.g. "GM0998"). Unique ID for each gemeente. Prefix "GM" + 4 digits. |
| `gemeentenaam` | string | **Municipality name** (Dutch). |
| `water` | string | Water classification. "B" indicates a water body area. |
| `meestVoorkomendePostcode` | string | Most common 4-digit postcode in the area. |
| `omgevingsadressendichtheid` | int | **Address density**: average number of addresses within 1 km radius per address. |
| `stedelijkheidAdressenPerKm2` | int | **Urbanity**: addresses per km². Basis for the CBS urbanity classification. |
| `bevolkingsdichtheidInwonersPerKm2` | int | **Population density**: inhabitants per km². |
| `aantalInwoners` | int | **Number of inhabitants**. |
| `mannen` | int | Number of male inhabitants. |
| `vrouwen` | int | Number of female inhabitants. |
| `percentagePersonen0Tot15Jaar` | int | % of population aged 0-15. |
| `percentagePersonen15Tot25Jaar` | int | % of population aged 15-25. |
| `percentagePersonen25Tot45Jaar` | int | % of population aged 25-45. |
| `percentagePersonen45Tot65Jaar` | int | % of population aged 45-65. |
| `percentagePersonen65JaarEnOuder` | int | % of population aged 65+. |
| `percentageOngehuwd` | int | % of population unmarried. |
| `percentageGehuwd` | int | % of population married. |
| `percentageGescheid` | int | % of population divorced. |
| `percentageVerweduwd` | int | % of population widowed. |
| `aantalHuishoudens` | int | **Number of households**. |
| `percentageEenpersoonshuishoudens` | int | % single-person households. |
| `percentageHuishoudensZonderKinderen` | int | % households without children. |
| `percentageHuishoudensMetKinderen` | int | % households with children. |
| `gemiddeldeHuishoudsgrootte` | float | **Average household size** (persons per household). |
| `oppervlakteTotaalInHa` | int | Total area in hectares. |
| `oppervlakteLandInHa` | int | Land area in hectares. |
| `oppervlakteWaterInHa` | int | Water area in hectares. |
| `jaar` | int | **Reference year** for the statistics. |
| `jrstatcode` | string | Combined year + statistical code identifier. |
| `geom` | WKB MultiPolygon | Area boundary in **EPSG:28992** (RD New / Amersfoort). |
## Important Columns
The columns you'll most often use:
- **`buurtcode` / `wijkcode` / `gemeentecode`** — the key identifiers at each level
- **`buurtnaam` / `gemeentenaam`** — human-readable names
- **`aantalInwoners`** — total population
- **`bevolkingsdichtheidInwonersPerKm2`** — population density
- **`aantalHuishoudens`** — number of households
- **`gemiddeldeHuishoudsgrootte`** — average household size
- **`jaar`** — the reference year (for filtering when multiple years are combined)
- **`geom`** — the area boundary polygon
## Geographic Hierarchy
The code prefixes encode the hierarchy:
```
GM0363 = Amsterdam (gemeente)
WK036300 = Wijk 00 Centrum (district within Amsterdam)
BU03630000 = Buurt Burgwallen-Oude Zijde (neighborhood within Centrum)
BU03630001 = Buurt Burgwallen-Nieuwe Zijde
WK036301 = Wijk 01 Westelijk havengebied
BU03630100 = Buurt Westelijk havengebied
```
To aggregate neighborhoods to districts: group by `wijkcode`.
To aggregate neighborhoods to municipalities: group by `gemeentecode`.
## Statistical Categories
The dataset covers these statistical domains:
1. **Demographics**: population count, gender split, age distribution (5 brackets)
2. **Marital status**: unmarried, married, divorced, widowed percentages
3. **Households**: count, single-person %, with/without children %, average size
4. **Urbanity**: address density, urbanity classification
5. **Area**: total, land, and water surface in hectares
6. **Population density**: inhabitants per km²
Note: The WFS service contains additional fields for heritage/origin percentages
and other socioeconomic indicators not listed here.
## Useful Query Patterns
### Load and explore neighborhoods
```sql
INSTALL spatial; LOAD spatial;
SELECT buurtcode, buurtnaam, gemeentenaam, aantalInwoners, bevolkingsdichtheidInwonersPerKm2
FROM read_parquet('https://data.source.coop/cholmes/portolan-nl/cbs/wijken_en_buurten/buurten.parquet')
ORDER BY aantalInwoners DESC
LIMIT 20
```
### Most populated neighborhoods
```sql
SELECT buurtnaam, gemeentenaam, aantalInwoners, bevolkingsdichtheidInwonersPerKm2
FROM buurten
WHERE aantalInwoners > 0
ORDER BY aantalInwoners DESC
LIMIT 20
```
### Age distribution by municipality
```sql
SELECT gemeentenaam,
SUM(aantalInwoners) AS totaal,
AVG(percentagePersonen0Tot15Jaar) AS avg_pct_0_15,
AVG(percentagePersonen65JaarEnOuder) AS avg_pct_65plus
FROM buurten
WHERE aantalInwoners > 0
GROUP BY gemeentenaam
ORDER BY avg_pct_65plus DESC
LIMIT 20
```
### Urbanity classification
CBS uses address density to classify urbanity into 5 levels:
```sql
SELECT
CASE
WHEN omgevingsadressendichtheid >= 2500 THEN '1 - Zeer sterk stedelijk'
WHEN omgevingsadressendichtheid >= 1500 THEN '2 - Sterk stedelijk'
WHEN omgevingsadressendichtheid >= 1000 THEN '3 - Matig stedelijk'
WHEN omgevingsadressendichtheid >= 500 THEN '4 - Weinig stedelijk'
ELSE '5 - Niet stedelijk'
END AS stedelijkheid,
COUNT(*) AS buurten,
SUM(aantalInwoners) AS inwoners
FROM buurten
WHERE omgevingsadressendichtheid >= 0
GROUP BY 1
ORDER BY 1
```
### Household composition analysis
```sql
SELECT gemeentenaam,
SUM(aantalHuishoudens) AS huishoudens,
AVG(percentageEenpersoonshuishoudens) AS avg_pct_eenpersoons,
AVG(percentageHuishoudensMetKinderen) AS avg_pct_met_kinderen,
AVG(gemiddeldeHuishoudsgrootte) AS avg_grootte
FROM buurten
WHERE aantalHuishoudens > 0
GROUP BY gemeentenaam
ORDER BY huishoudens DESC
LIMIT 20
```
### Land vs water area
```sql
SELECT gemeentenaam,
SUM(oppervlakteTotaalInHa) AS totaal_ha,
SUM(oppervlakteLandInHa) AS land_ha,
SUM(oppervlakteWaterInHa) AS water_ha,
ROUND(100.0 * SUM(oppervlakteWaterInHa) / NULLIF(SUM(oppervlakteTotaalInHa), 0), 1) AS pct_water
FROM buurten
WHERE oppervlakteTotaalInHa > 0
GROUP BY gemeentenaam
ORDER BY pct_water DESC
LIMIT 20
```
### Spatial join with other datasets
```sql
INSTALL spatial; LOAD spatial;
-- Find which neighborhood a point falls in (convert WGS84 to RD New)
WITH target AS (
SELECT ST_Transform(ST_Point(4.9003, 52.3792), 'EPSG:4326', 'EPSG:28992') AS pt
)
SELECT buurtcode, buurtnaam, gemeentenaam, aantalInwoners
FROM buurten, target
WHERE ST_Intersects(geom, pt)
```
## Geometry Notes
- CRS is **EPSG:28992** (RD New / Amersfoort) — coordinates are in **meters**, NOT
degrees. This is the Dutch national coordinate system.
- To convert to WGS84 (lon/lat) for web mapping:
```sql
ST_Transform(geom, 'EPSG:28992', 'EPSG:4326')
```
- All geometries are **MultiPolygon** (some areas consist of multiple parts)
- Bounding box in WGS84: [3.37, 50.73, 7.24, 53.55] (all of Netherlands)
## Visualization Styles
Three Mapbox GL v8 styles are available for interactive map visualization via the PMTiles file, all rendering the `buurten` (neighborhood) layer.
Style files are Mapbox GL v8 JSON with relative PMTiles source paths. They can be
used with MapLibre GL JS, OpenLayers (via ol-mapbox-style), or any Mapbox GL v8-compatible renderer.
- **`styles/default.json`** — **Population density.** Red sequential ramp on `bevolkingsdichtheidInwonersPerKm2`. White (sparse) to dark red (10,000+ per km²). Shows urban density gradients within cities.
- **`styles/by-aging.json`** — **Aging population analysis.** Purple sequential ramp on `percentagePersonen65JaarEnOuder` (% aged 65+). Light lavender to deep purple. Reveals aging patterns — retirement communities, historic villages with older populations, vs. young family suburbs.
- **`styles/by-household-size.json`** — **Household composition.** Diverging blue-to-red ramp on `gemiddeldeHuishoudsgrootte` (average household size). Blue = small households (singles, city apartments), red = large households (family suburbs). Shows the spatial sorting of household types across Dutch neighborhoods.
Style files are at: `https://data.source.coop/cholmes/portolan-nl/cbs/wijken_en_buurten/styles/`
## Caveats
- **Suppressed values**: The value **-99997** means the statistic is suppressed or
unavailable, typically for privacy reasons when the area has too few inhabitants.
Always filter out -99997 before aggregating:
```sql
WHERE aantalInwoners >= 0
```
- **Annual updates**: Each year has its own version. Municipality mergers and boundary
changes mean codes and boundaries change between years. Do not assume buurtcodes are
stable across years.
- **EPSG:28992 coordinates**: Coordinates are in meters (RD New), not degrees. You must
transform to WGS84 for web maps or combining with global data.
- **Three feature types**: The WFS service provides `buurten`, `wijken`, and `gemeenten`
as separate layers. Each has the same statistical fields but aggregated to different
levels.
- **Attribution**: Credit CBS (Centraal Bureau voor de Statistiek) and Kadaster as data
sources when publishing derived work.
- **Water areas**: Some "neighborhoods" are actually water bodies (e.g., parts of the
IJsselmeer). Check the `water` field ("B") to identify these.
## Related Datasets in This Catalog
- **Bestuurlijke Gebieden** (`kadaster/bestuurlijke_gebieden/`) — administrative boundaries
from Kadaster (municipalities, provinces, water boards) without the statistical attributes
- **BAG Light** (`kadaster/bag_light/`) — all 11.4M buildings, can be spatially joined with
neighborhoods for building-level analysis
- **Bestand Bodemgebruik** (`cbs/bestand_bodemgebruik_2017/`) — land use data from CBS,
used to derive water/land borders in this dataset