This repository hosts the AI-ready datasets for the paper "Global 3D Reconstruction of Clouds & Tropical Cyclones." It contains paired 2D geostationary imagery (GOES, MSG, Himawari) and 3D vertical profiles (CloudSat/CALIPSO) used to train ML models for 3D cloud reconstruction, with a dedicated dataset for tropical cyclones.
Himawari finetuning subset from the Global 3D Cloud Reconstruction Dataset. Contains colocated pairs of Himawari-8/9 AHI geostationary imagery with CloudSat radar profiles for supervised 3D cloud structure reconstruction. Each sample includes: multispectral Himawari imagery (16 spectral channels + satellite/solar angles), CloudSat vertical profiles as ground truth, and a colocation mask indicating valid CloudSat footprint pixels. 256x256 pixel patches in Cloud-Optimized GeoTIFF format.
Version: 0.1.0
License: CC-BY-4.0
Keywords: cloud microphysics, 3d reconstruction, geostationary satellites, Himawari-8, Himawari-9, CloudSat, remote sensing, tropical cyclones, deep learning
Tasks: regression, foundation-model
Partitions: 103 files Spatial coverage: [-180.00, -43.62, 180.00, 43.67] (WGS84) Temporal coverage: 2015-07-07 to 2020-08-26
Root: FOLDER (57,373 samples)
Hierarchy:
JMA — producer
https://www.jma.go.jp
European Space Agency (ESA) — licensor
https://www.esa.int
source.coop — host
https://source.coop
If you use this dataset in your research, please cite:
DOI: 10.48550/arXiv.2511.04773
Ermis, S., Aybar, C., Freischem, L., Girtsou, S., Bintsi, K.-M., Diaz Salas-Porras, E., Eisinger, M., Jones, W., Jungbluth, A., & Tremblay, B. (2025). Global 3D Reconstruction of Clouds & Tropical Cyclones. Tackling Climate Change with Machine Learning Workshop at NeurIPS 2025.
Primary publication describing the dataset and methodology
Generated with ❤️ using TacoToolbox v0.23.3
stac:time_middle | timestamp[us] | Middle timestamp (μs since Unix epoch, UTC) |
split | string | Dataset partition identifier (train, test, or validation) |
cloud3d:cyclone | bool | Whether this sample is from a tropical cyclone observation |
cloud3d:satellite | string | Geostationary satellite source (GOES, Himawari, MSG) |
cloud3d:geostationary_id | string | Original geostationary satellite file identifier |
cloud3d:cloudsat_id | string | CloudSat granule/profile identifier |
cloud3d:has_flxhr | bool | Whether 2B-FLXHR radiative flux/heating rate data is available |
majortom:code | string | MajorTOM spherical grid cell identifier (e.g., 0100km_0003U_0005R) with ~dist_km spacing |
geoenrich:elevation | float | Mean elevation in meters (GLO-30 DEM) |
geoenrich:precipitation | float | Mean annual precipitation in mm estimated from GPM data |
geoenrich:temperature | float | Mean annual temperature in °C estimated from MODIS LST data |
geoenrich:admin_countries | string | Country name at centroid location |
internal:current_id | int64 | Current sample position at this level (0-indexed). Enables O(1) random access and relational JOINs (ZIP, FOLDER, TACOCAT). |
internal:parent_id | int64 | Foreign key referencing parent sample position in previous level (ZIP, FOLDER, TACOCAT). |
| Relative path from DATA/ directory. Format: {parent_path}/{id} or {id} for level0 (ZIP, FOLDER, TACOCAT). |