Source Cooperative is a Radiant Earth project

Major-TOM Elliot-Pretrain · Major TOM · Source Cooperative | Source Cooperative

README

Major TOM ELLIOT-Pretrain

ELLIOT-Pretrain is a Major TOM expansion focused on fast pre-training of multi-modal AI foundation models on Earth Observation data.

Built on the original MajorTOM global 10 km grid
10.56 km tiles with harmonised 10 m resolution across all sensors for fast access
Four co-registered modalities in every tile, optical, thermal, elevation, and land cover
Three sub-datasets covering complementary temporal regimes

Tile locations were selected using hierarchical spherical k-means clustering over AlphaEarth Foundation embeddings to maximise global environmental diversity. The dataset follows the TACO v3 specification.

Sub-datasets

Part	Name	Tiles	Timesteps	Cadence	Purpose
A	Monotemporal	250,000	1	n/a	Global coverage, spatial diversity
B	Monthly	12,500	12	monthly	Seasonal dynamics, phenology
C	Burst	16,666	6	~5-day	Rapid events, floods, fires, landslides

All three parts share the same four sensors.

Modality	Source	Resolution	Bands
Sentinel-2 L1C	ESA Copernicus	10 m	13 spectral
Landsat 8/9 OLI-TIRS	USGS	30 m	11 (9 OLI + 2 TIRS)
Copernicus DEM GLO-30	ESA / TanDEM-X	30 m	1 (elevation)
ESA WorldCover	ESA	10 m	1 (land cover class)

Quick Start

Pick a tile index from Part A (monotemporal) and visualize all four modalities.

Reproducible Example

Full notebook covering all three parts, metadata queries with filtering, and a streaming PyTorch DataLoader with parallel fetching.

Open in Colab

Dataset Structure

Join key across metadata files is internal:parent_id, which points back to internal:current_id in collection.parquet.

Citation

License

CC-BY-SA-4.0

Acknowledgements

ELLIOT-Pretrain has been made possible thanks to Asterisk Labs, the ELLIOT project (European Commission, Horizon Europe, Grant 101214398), and the Image and Signal Processing Group (ISP) at Universitat de Valencia.

1elliot-pretrain/
2├── monotemporal/          # Part A, 250k tiles, 1 timestep
3│   ├── COLLECTION.json
4│   ├── METADATA/
5│   │   ├── collection.parquet
6│   │   ├── sample__s2.parquet
7│   │   ├── sample__l8.parquet
8│   │   ├── sample__dem.parquet
9│   │   └── sample__lc.parquet
10│   └── DATA/{tile_id}/
11│       ├── s2/data.tif
12│       ├── l8/data.tif
13│       ├── dem/data.tif
14│       └── lc/data.tif
15├── monthly/               # Part B, 12.5k tiles, 12 timesteps
16│   ├── COLLECTION.json
17│   ├── METADATA/          # same 5 parquet files
18│   └── DATA/{tile_id}/
19│       ├── s2/img_00.tif ... img_11.tif
20│       ├── l8/img_00.tif ... img_11.tif
21│       ├── dem/main.tif
22│       └── lc/main.tif
23└── burst/                 # Part C, 16.7k tiles, 6 timesteps (~5-day)
24    ├── COLLECTION.json
25    ├── METADATA/          # same 5 parquet files
26    └── DATA/{tile_id}/
27        ├── s2/img_00.tif ... img_05.tif
28        ├── l8/img_00.tif ... img_05.tif
29        ├── dem/main.tif
30        └── lc/main.tif

1elliot-pretrain/
2├── monotemporal/          # Part A, 250k tiles, 1 timestep
3│   ├── COLLECTION.json
4│   ├── METADATA/
5│   │   ├── collection.parquet
6│   │   ├── sample__s2.parquet
7│   │   ├── sample__l8.parquet
8│   │   ├── sample__dem.parquet
9│   │   └── sample__lc.parquet
10│   └── DATA/{tile_id}/
11│       ├── s2/data.tif
12│       ├── l8/data.tif
13│       ├── dem/data.tif
14│       └── lc/data.tif
15├── monthly/               # Part B, 12.5k tiles, 12 timesteps
16│   ├── COLLECTION.json
17│   ├── METADATA/          # same 5 parquet files
18│   └── DATA/{tile_id}/
19│       ├── s2/img_00.tif ... img_11.tif
20│       ├── l8/img_00.tif ... img_11.tif
21│       ├── dem/main.tif
22│       └── lc/main.tif
23└── burst/                 # Part C, 16.7k tiles, 6 timesteps (~5-day)
24    ├── COLLECTION.json
25    ├── METADATA/          # same 5 parquet files
26    └── DATA/{tile_id}/
27        ├── s2/img_00.tif ... img_05.tif
28        ├── l8/img_00.tif ... img_05.tif
29        ├── dem/main.tif
30        └── lc/main.tif

1import numpy as np
2import rasterio
3import matplotlib.pyplot as plt
4
5BASE = "https://data.source.coop/major-tom/elliot-pretrain"
6IDX = 42  # change this to explore different tiles
7
8MODS = ["s2", "l8", "dem", "lc"]
9fig, axes = plt.subplots(1, 4, figsize=(20, 5))
10
11for j, mod in enumerate(MODS):
12    url = f"/vsicurl/{BASE}/monotemporal/DATA/{IDX}/{mod}/data.tif"
13    with rasterio.open(url) as src:
14        if mod in ("s2", "l8"):
15            rgb = src.read([4, 3, 2]).astype(np.float32)
16            axes[j].imshow(np.clip(rgb / 3000, 0, 1).transpose(1, 2, 0))
17        elif mod == "dem":
18            axes[j].imshow(src.read(1), cmap="terrain")
19        else:
20            axes[j].imshow(src.read(1), cmap="tab20")
21    axes[j].set_title(mod)
22    axes[j].axis("off")
23
24plt.tight_layout()
25plt.show()

1import numpy as np
2import rasterio
3import matplotlib.pyplot as plt
4
5BASE = "https://data.source.coop/major-tom/elliot-pretrain"
6IDX = 42  # change this to explore different tiles
7
8MODS = ["s2", "l8", "dem", "lc"]
9fig, axes = plt.subplots(1, 4, figsize=(20, 5))
10
11for j, mod in enumerate(MODS):
12    url = f"/vsicurl/{BASE}/monotemporal/DATA/{IDX}/{mod}/data.tif"
13    with rasterio.open(url) as src:
14        if mod in ("s2", "l8"):
15            rgb = src.read([4, 3, 2]).astype(np.float32)
16            axes[j].imshow(np.clip(rgb / 3000, 0, 1).transpose(1, 2, 0))
17        elif mod == "dem":
18            axes[j].imshow(src.read(1), cmap="terrain")
19        else:
20            axes[j].imshow(src.read(1), cmap="tab20")
21    axes[j].set_title(mod)
22    axes[j].axis("off")
23
24plt.tight_layout()
25plt.show()

1@inproceedings{francis2024majortom,
2  title={Major TOM: Expandable Datasets for Earth Observation},
3  author={Francis, Alistair and Czerkawski, Mikolaj},
4  booktitle={IGARSS 2024},
5  pages={2935--2940},
6  year={2024},
7  doi={10.1109/IGARSS53475.2024.10640760}
8}

1@inproceedings{francis2024majortom,
2  title={Major TOM: Expandable Datasets for Earth Observation},
3  author={Francis, Alistair and Czerkawski, Mikolaj},
4  booktitle={IGARSS 2024},
5  pages={2935--2940},
6  year={2024},
7  doi={10.1109/IGARSS53475.2024.10640760}
8}