This repository hosts the AI-ready datasets for the paper "Global 3D Reconstruction of Clouds & Tropical Cyclones." It contains paired 2D geostationary imagery (GOES, MSG, Himawari) and 3D vertical profiles (CloudSat/CALIPSO) used to train ML models for 3D cloud reconstruction, with a dedicated dataset for tropical cyclones.
Cesar Aybar¹ · Shirin Ermis² · Lilli Freischem² · Stella Girtsou³⁴ · Kyriaki-Margarita Bintsi⁵ Emiliano Diaz Salas-Porras¹ · Michael Eisinger⁶ · William Jones² · Anna Jungbluth⁶ · Benoit Tremblay⁷
¹Universitat de València · ²University of Oxford · ³National Observatory of Athens · ⁴National Technical University of Athens · ⁵Harvard Medical School & Massachusetts General Hospital · ⁶European Space Agency · ⁷Environment and Climate Change Canada
Cloud3DTACO pairs 2D multispectral geostationary imagery from GOES-16, Himawari-8/9, and MSG/SEVIRI with co-located 3D vertical cloud profiles from the CloudSat radar. The collection contains roughly 500,000 samples across 10 datasets, organized into a self-supervised pretraining stage (no labels) and a supervised finetuning stage (CloudSat profiles as ground truth). Each stage covers the three operational geostationary regions and includes a tropical-cyclone variant. The dataset is published as TACO v3 and follows FAIR principles.
DATA/{idx}.tif.DATA/{idx}/geo_patch.tif (geostationary imagery) and DATA/{idx}/cloudsat_aligned.tif (vertical cloud profile labels).This work was enabled by the Frontier Development Lab (FDL) Earth Systems Lab, a public-private partnership between the European Space Agency (ESA), Trillium Technologies, and the University of Oxford. We thank the teams behind the GOES, Himawari, and Meteosat satellites for their invaluable data. This research was supported by computational resources from Google Cloud.