This data repository contains global raster and vector outputs for the WaterNet model at 20 meters, and it links the code for three python modules related to the generation of this data. This research was supported by Bridges to Prosperity and conducted by Better Planet Laborary. If using this data please cite the original WaterNet paper:
Pierson, Matthew., and Mehrabi, Zia. 2024. Mapping waterways worldwide with deep learning. arXiv. https://doi.org/10.48550/arXiv.2412.00050
Projection: All files use EPSG: 4326.
A Live demo of the raster data can be found at The Fika Map Website. A STAC catalog is coming soon!
The raster outputs are in the raster folder. The file names follow the format {xtile}_{ytile}.tif
and correspond to a zoom level 6 xyz map tile. A geojson file in the root directory named waterway_model_outputs_20m_raster_overview.geojson
contains the bounding boxes for each raster file. The raster files are configured as COGs (Cloud-Optimized GeoTIFF) files.
There are a few known issues with the raster product. These include lower accuracy of capturing swampy areas and deserts (although alternative feature weightings of the model can allow for better capture of swamps, and our vectorization process aims to resolve the issues in deserts). We also note higher noise in areas where it is difficult to create cloud free composites (coastal areas, near the equator); and future integration of SAR data may help alleviate these particularly on a near-real time deploy. There is also artifact in Greenland (missing cells) that we expect is due to Sentinel-2 feature inputs, for which further investigation and back fill is likely required.
Vector data has been made available as a series of geoparquet files as well as a set of pmtiles files. A geojson file in the root directory named waterway_model_outputs_vector_overview.geojson
contains the bounding boxes for each parquet vector file. The individual geoparquet files are available in the vector folder and the pmtiles files are available in the pmtiles folder. There is also a joined pmtiles file named waterway_model_outputs_20m_vector.pmtiles
. The pmtiles files offer live previews of the data if you click on an individual file. A live demo of the vector data can be found at The Fika Map Website.
Files can be found in the vector folder and follow the naming format {hydrobasins level 2 id}_{part}.parquet
. Files correspond to parts of hydrobasin level 2 basins, split due to 2.5 GB size limits. Files are configured as geoparquet files.
The vector outputs have the following naming convention {hydrobasins level 2 id}_{part}.parquet
. The level 2 HydroBasins can be obtained from their website. As the naming convention suggests, each file corresponds to part of a hydrobasin level 2 basin. This data set is also available on Harvard Dataverse for Academic purposes. Since many of the files were larger than 2.5 GB, they were split into parts to satisfy the Harvard Dataverse individual file size limit.
Vectorization of any large body of water (lakes, swamps, wide rivers, etc) can result in multiple geometries that should all be a single geometry. One method for identifying such artifacts are to search for streams of order 1 whose target streams are high in order. Since these artifacts often occur in lakes, we have included the column "intersects_lake" in the output, which indicates if the geometry intersects a lake in HydroLakes.
Code for model creation, training, and vectorization are linked in the GitHub Repositories below. Global inference code has not been included as the large amount of data required makes such an effort difficult to generalize across compute hardware.
The research behind the models is detailed in the following papers:
The data is also available in Harvard dataverse with a permanent DOI at https://doi.org/10.7910/DVN/YY2XMG
This research was funded by Bridges to Prosperity and carried out by Better Planet Laboratory.
This data is licensed under the Creative Commons Attribution 4.0 International License. Please cite the original WaterNet paper and Bridges to Prosperity if you use this data.