This repository hosts the AI-ready datasets for the paper "Global 3D Reconstruction of Clouds & Tropical Cyclones." It contains paired 2D geostationary imagery (GOES, MSG, Himawari) and 3D vertical profiles (CloudSat/CALIPSO) used to train ML models for 3D cloud reconstruction, with a dedicated dataset for tropical cyclones.
Himawari-8 imagery subset from the Global 3D Cloud Reconstruction Dataset. Contains multispectral geostationary satellite imagery from Himawari-8/AHI for 3D cloud structure reconstruction. Each sample contains 16 spectral channels. 512x512 pixel patches in Cloud-Optimized GeoTIFF format.
Version: 0.1.0
License: CC-BY-4.0
Keywords: cloud microphysics, 3d reconstruction, geostationary satellites, Himawari-8, remote sensing, tropical cyclones, deep learning
Tasks: regression, foundation-model
Partitions: 93 files Spatial coverage: [-180.00, -41.19, 180.00, 51.02] (WGS84) Temporal coverage: 2015-07-07 to 2022-12-12
Root: FILE (88,711 samples)
Japan Meteorological Agency (JMA) — producer
https://www.jma.go.jp
European Space Agency (ESA) — licensor
https://www.esa.int
source.coop — host
https://source.coop
If you use this dataset in your research, please cite:
DOI: 10.48550/arXiv.2511.04773
Ermis, S., Aybar, C., Freischem, L., Girtsou, S., Bintsi, K.-M., Diaz Salas-Porras, E., Eisinger, M., Jones, W., Jungbluth, A., & Tremblay, B. (2025). Global 3D Reconstruction of Clouds & Tropical Cyclones. Tackling Climate Change with Machine Learning Workshop at NeurIPS 2025.
Primary publication describing the dataset and methodology
Generated with ❤️ using TacoToolbox v0.23.3
timestamp[us] |
| Middle timestamp (μs since Unix epoch, UTC) |
geotiff:stats | list<item: list<item: float>> | Per-band statistics (List[List[Float32]]): categorical mode returns class probabilities, continuous mode returns [min, max, mean, std, valid%, p25, p50, p75, p95] |
cloud3d:satellite | string | Geostationary satellite platform (GOES, HIMAWARI, or MSG) |
cloud3d:cyclone | bool | Whether the sample contains tropical cyclone imagery |
majortom:code | string | MajorTOM spherical grid cell identifier (e.g., 0100km_0003U_0005R) with ~dist_km spacing |
geoenrich:elevation | float | Mean elevation in meters (GLO-30 DEM) |
geoenrich:precipitation | float | Mean annual precipitation in mm estimated from GPM data |
geoenrich:temperature | float | Mean annual temperature in °C estimated from MODIS LST data |
geoenrich:admin_countries | string | Country name at centroid location |
internal:current_id | int64 | Current sample position at this level (0-indexed). Enables O(1) random access and relational JOINs (ZIP, FOLDER, TACOCAT). |
internal:parent_id | int64 | Foreign key referencing parent sample position in previous level (ZIP, FOLDER, TACOCAT). |