A cloud-native repackaging of Meta's DINOv3 Global Canopy Height Map v2 (ml3):
a global ~1.19 m canopy-height raster (213,109 Web-Mercator tiles, ~23.8 TB of COGs).
This package does not copy the 23.8 TB of pixels. Instead it adds the three
things the original distribution lacks for streaming + analysis, and references
Meta's COGs in place:
Fast spatial lookup of which tile covers an AOI; drives maplibre/deck.gl tile loaders and DuckDB spatial queries. Replaces the original 56 MB tiles.geojson.
stac/
Discovery + access via STAC tooling (pystac, stac-fastapi, TiTiler, odc-stac). Asset hrefs point at Meta's COGs (s3://dataforgood-fb-data/...).
zarr/chm.zarr.icechunk
xarray/dask analytics and Zarr-native multiscale visualization (Earthmover Arraylake, ZarrViz). Opens as a 6-level pyramid; reads translate to byte-range GETs against Meta's COGs — no pixels are duplicated here.
CRS & grid
EPSG:3857 (Web Mercator) — matches the source; zero reprojection for web maps.
10-character Bing/Microsoft quadkey tile grid (zoom 10). Native pixel ≈ 1.19 m at the equator.
Pixel values are canopy height in meters (uint8, 0–255). 0 is both true-zero and
no-data (the source sets no explicit nodata mask).
This is a static STAC catalog: a collection.json plus a stac-geoparquet
items.parquet (213,109 Items). No STAC API server is required. The parquet is
written in 54 quadkey-sorted (Z-order) row groups with per-group bbox statistics, so a
spatial query reads only the matching byte ranges over HTTP — e.g. a city-scale bbox
touches ~3 of 54 row groups, skipping ~94% of the file. Point any of these tools straight
at the public URL:
Also works: geopandasread_parquet(..., bbox=...), Polarsscan_parquet, and the
stac-geoparquet library. For teams that specifically need the pystac-client API
(Client.open(...).search(...)), serve items.parquet behind
stac-fastapi-geoparquet — no
database required.
Search → xarray (odc.stac): turn found Items into a lazily-loaded, dask-backed
xarray cube reading the COGs directly (EPSG:3857, no reprojection):
1import os, pystac, odc.stac
2from rustac import DuckdbClient
3
4os.environ.update(AWS_NO_SIGN_REQUEST="YES", AWS_REGION="us-east-1") # Meta's COGs are anonymous
5bbox = [13.30, 52.45, 13.45, 52.55]
6items = [pystac.Item.from_dict(d) for d in DuckdbClient().search(ITEMS_URL, bbox=bbox)]
1import os, pystac, odc.stac
2from rustac import DuckdbClient
3
4os.environ.update(AWS_NO_SIGN_REQUEST="YES", AWS_REGION="us-east-1") # Meta's COGs are anonymous
5bbox = [13.30, 52.45, 13.45, 52.55]
6items = [pystac.Item.from_dict(d) for d in DuckdbClient().search(ITEMS_URL, bbox=bbox)]
See examples/search_and_read.py and examples/odc_stac_load.py in the
chm-zarr repo.
The VirtualiZarr store contains only chunk references (offsets/lengths) into
Meta's public COGs — if Meta's bucket moves, the Zarr reads break by design.