This repository hosts the AI-ready datasets for the paper "Global 3D Reconstruction of Clouds & Tropical Cyclones." It contains paired 2D geostationary imagery (GOES, MSG, Himawari) and 3D vertical profiles (CloudSat/CALIPSO) used to train ML models for 3D cloud reconstruction, with a dedicated dataset for tropical cyclones.
GOES finetuning subset from the Global 3D Cloud Reconstruction Dataset. Contains colocated pairs of GOES/ABI geostationary imagery with CloudSat radar profiles for supervised 3D cloud structure reconstruction. Each sample includes: multispectral GOES imagery (16 spectral channels + satellite/solar angles), CloudSat vertical profiles as ground truth, and a colocation mask indicating valid CloudSat footprint pixels. 256x256 pixel patches in Cloud-Optimized GeoTIFF format.
Version: 0.1.0
License: CC-BY-4.0
Keywords: cloud microphysics, 3d reconstruction, geostationary satellites, GOES-16, CloudSat, remote sensing, tropical cyclones, deep learning
Tasks: regression, foundation-model
Partitions: 95 files Spatial coverage: [-122.94, -42.83, -27.30, 42.90] (WGS84) Temporal coverage: 2000-01-01 to 2020-08-26
Root: FOLDER (31,046 samples)
Hierarchy:
NOAA — producer
https://www.noaa.gov
European Space Agency (ESA) — licensor
https://www.esa.int
source.coop — host
https://source.coop
If you use this dataset in your research, please cite:
DOI: 10.48550/arXiv.2511.04773
Ermis, S., Aybar, C., Freischem, L., Girtsou, S., Bintsi, K.-M., Diaz Salas-Porras, E., Eisinger, M., Jones, W., Jungbluth, A., & Tremblay, B. (2025). Global 3D Reconstruction of Clouds & Tropical Cyclones. Tackling Climate Change with Machine Learning Workshop at NeurIPS 2025.
Primary publication describing the dataset and methodology
Generated with ❤️ using TacoToolbox v0.23.3
stac:time_middle | timestamp[us] | Middle timestamp (μs since Unix epoch, UTC) |
split | string | Dataset partition identifier (train, test, or validation) |
cloud3d:cyclone | bool | Whether this sample is from a tropical cyclone observation |
cloud3d:satellite | string | Geostationary satellite source (GOES, Himawari, MSG) |
cloud3d:geostationary_id | string | Original geostationary satellite file identifier |
cloud3d:cloudsat_id | string | CloudSat granule/profile identifier |
cloud3d:has_flxhr | bool | Whether 2B-FLXHR radiative flux/heating rate data is available |
majortom:code | string | MajorTOM spherical grid cell identifier (e.g., 0100km_0003U_0005R) with ~dist_km spacing |
geoenrich:elevation | float | Mean elevation in meters (GLO-30 DEM) |
geoenrich:precipitation | float | Mean annual precipitation in mm estimated from GPM data |
geoenrich:temperature | float | Mean annual temperature in °C estimated from MODIS LST data |
geoenrich:admin_countries | string | Country name at centroid location |
internal:current_id | int64 | Current sample position at this level (0-indexed). Enables O(1) random access and relational JOINs (ZIP, FOLDER, TACOCAT). |
internal:parent_id | int64 | Foreign key referencing parent sample position in previous level (ZIP, FOLDER, TACOCAT). |
| Relative path from DATA/ directory. Format: {parent_path}/{id} or {id} for level0 (ZIP, FOLDER, TACOCAT). |