Fields of the World · Wherobots · Source Cooperative | Source Cooperative
Fields of the World
High-resolution Sentinel-2 mosaics and AI-derived field-boundary probability maps for Japan, Mexico, Rwanda, South Africa, and Switzerland. This repository contains cloud-optimized Zarr datasets produced using the WherobotsAI platform during the FTW Phase 2 Model Bakeoff. Includes bi-temporal mosaics (Planting and Harvest seasons) and raw softmax model outputs for agricultural field segmentation.
Fields of the World: Country-Scale Field Boundary Predictions
Country-scale Sentinel-2 mosaics and AI-derived field boundary probability maps for Japan, Mexico, Rwanda, South Africa, and Switzerland.
This dataset contains the underlying raster data generated during the evaluation of models produced during the FTW Phase 2 Model Bakeoff using the WherobotsAI platform. It includes both the input imagery (seasonally optimized mosaics) and the raw model outputs (softmax predictions) hosted in cloud-optimized Zarr format.
The model architecture and training pipeline are fully open-source. You can run inference, reproduce our baselines, or train your own models using the Fields of The World (FTW) repository and the FTW CLI tool.
Installation:
1pip install ftw-tools
1pip install ftw-tools
Documentation & Source Code:
For tutorials and usage instructions, visit the ftw-baselines repository.
🌍 Dataset Coverage
The dataset covers five agricultural systems across 2 seasons and 2 years (2023 and 2024), totaling 4.76 million km². This data represents the inference stage of models produced during the FTW Phase 2 Model Bakeoff.
Country
Area Processed (M km²)
Median Prediction Field Area (ha)
Mexico
2.39
0.09
South Africa
1.60
0.07
Japan
0.65
0.19
Switzerland
0.09
0.28
Rwanda
0.02
0.06
🏗 Model Architecture
These predictions were generated using models produced during the FTW Phase 2 Model Bakeoff, which include U-Net architectures with EfficientNet encoders, specifically optimized for deployment robustness at scale.
Robustness: Trained with Log-Cosh Dice Loss and targeted augmentations (channel shuffling, brightness, resizing) to handle atmospheric variations in large-scale mosaics.
Input Processing: Mosaics use latitude-based heuristics to select optimal planting and harvest windows, producing cloud-free, seasonally aligned composites.
Scale: The inference pipeline processes millions of tiles using Gaussian-weighted averaging to merge across patch overlaps, reducing tiling artifacts in predictions and producing seamless probability maps across country borders.
⚙️ Inference Pipeline
The high-throughput inference pipeline runs on the WherobotsAI platform and prioritizes spatial consistency and cloud-native efficiency.
Seasonally-Aligned Mosaicking: Uses latitude-dependent heuristics to identify optimal Planting and Harvest windows for each tile. A greedy selection algorithm constructs cloud-free composites from Sentinel-2 imagery, minimizing temporal redundancy while maximizing coverage.
Sliding Window Inference with Apodization: Processes data in 256×256 patches with 25% overlap. Gaussian-weighted averaging (apodization) merges across patch overlaps to reduce tiling artifacts in predictions, suppressing lower-quality predictions near patch edges and prioritizing spatially consistent center predictions.
Parallel Zarr Construction: Handles reprojection to Web Mercator (EPSG:3857) on-the-fly. Results are written in parallel directly to Zarr stores, generating seamless, cloud-optimized probability maps for entire countries in minutes.
💾 Data Structure (Zarr)
The data is hosted as cloud-optimized Zarr stores. Each store corresponds to a country and contains high-resolution raster arrays with time coordinates for 2023 and 2024:
Mosaics are "valid-pixel" composites, filtered by cloud cover
Model Predictions:
Raw Softmax Probabilities: Pixel-wise probabilities for background, field, and field boundary.
Raw logits support custom thresholding, watershedding, or vectorization parameters.
🚀 Usage (Data Access)
The dataset uses cloud-native Zarr format which allows access to subsets for analysis without downloading the entire dataset using Python libraries like xarray and zarr.
1. Setup and Define AOI
Define a sample Area of Interest (AOI) over Japan:
If you use this data in your research, please cite:
1@inproceedings{kerner2025fields,
2 title={Fields of the world: A machine learning benchmark dataset for global agricultural field boundary segmentation},
3 author={Kerner, Hannah and Chaudhari, Snehal and Ghosh, Aninda and Robinson, Caleb and Ahmad, Adeel and Choi, Eddie and Jacobs, Nathan and Holmes, Chris and Mohr, Matthias and Dodhia, Rahul and others},
4 booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
5 volume={39},
6 number={27},
7 pages={28151--28159},
8 year={2025}
9}
1@inproceedings{kerner2025fields,
2 title={Fields of the world: A machine learning benchmark dataset for global agricultural field boundary segmentation},
3 author={Kerner, Hannah and Chaudhari, Snehal and Ghosh, Aninda and Robinson, Caleb and Ahmad, Adeel and Choi, Eddie and Jacobs, Nathan and Holmes, Chris and Mohr, Matthias and Dodhia, Rahul and others},
4 booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
5 volume={39},
6 number={27},
7 pages={28151--28159},
8 year={2025}
9}
📄 License
The current model checkpoints used to produce these predictions were trained on the full FTW dataset, which included CC-BY-NC labels, resulting in CC-BY-NC licensed predictions. We will be releasing predictions trained on only the CC-BY licensed labels in the next few weeks.