Index · Major TOM · Source Cooperative | Source Cooperative
Index
Global metadata catalog for the MajorTOM 10 km grid. Over 5 million tiles enriched with terrain, climate, soil, socioeconomic, and administrative attributes. GeoParquet format.
The Major TOM Index is a global metadata catalog for the Major TOM grid at 10 km resolution. It provides a single entry point to discover, filter, and select tiles across sensors, locations, and time without downloading any imagery.
The index covers over 5 million tiles spanning the entire Earth. Each tile corresponds to a 1056 × 1056 px patch (10.56 × 10.56 km) aligned to Sentinel-2 MGRS tiles at 10 m resolution. Every tile is enriched with terrain, climate, soil, socioeconomic, and administrative attributes derived from public Earth Engine datasets.
What can you do with this index?
Find tiles by location. Filter by country, state, MGRS tile code, or bounding box using the GeoParquet geometry column.
Select tiles by environmental criteria. Want arid, high-elevation tiles? Filter by climate:precipitation < 200 and terrain:elevation > 3000.
Stratify sampling for training sets. Use the enrichment columns to build geographically and environmentally balanced splits for foundation model pretraining.
Link to imagery. The land_s2 and land_l8 files include sensor-specific image IDs (s2:id_gee, l8:id_gee) that point directly to the source products in Google Earth Engine.
Use the ELLIOT splits. The elliot.parquet file provides pre-built monotemporal and temporal splits designed for multi-sensor, multi-temporal EO research.
All files are self-contained GeoParquet with ZSTD compression, sorted by majortom:code_1000km → majortom:code_100km → id for efficient spatial predicate pushdown.
Schema
Columns are organized into namespaces. Each namespace groups related attributes.
Grid (majortom:)
Tile identity and spatial reference within the Major TOM grid system.
Column
Type
Description
id
string
Unique tile identifier (e.g. MT10_770U_395R).
majortom:code_100km
string
Parent 100 km grid cell. Used for spatial grouping.
majortom:code_1000km
string
Parent 1000 km grid cell. Used for coarse-level partitioning.
majortom:crs
string
Native UTM CRS of the tile (e.g. EPSG:32647).
majortom:mgrs_tile
string
MGRS tile code (e.g. 47WNS). Links to Sentinel-2 tiling grid.
majortom:mgrs_n
uint8
Number of overlapping MGRS tiles (1 after deduplication).
majortom:mgrs_candidates
list<string>
All candidate MGRS tiles before deduplication.
majortom:footprint_pct
float
Percentage of tile covered by the assigned MGRS tile.
STAC (stac:)
Spatial and temporal reference following STAC conventions. Present in land_s2 and land_l8 only, where it replaces the majortom: grid columns.
Column
Type
Description
stac:crs
string
Coordinate reference system.
stac:geotransform
list<int64>
Affine geotransform for the image patch.
stac:tensor_shape
list<int32>
Shape of the image tensor [bands, height, width].
stac:time_start
int64
Acquisition start time (Unix timestamp).
stac:time_end
int64
Acquisition end time (Unix timestamp).
Sentinel-2 (s2:)
Sensor metadata for the assigned Sentinel-2 image. Present in land_s2 only.
Column
Type
Description
s2:id_gee
string
Google Earth Engine image ID. Use this to fetch the actual imagery.
s2:product_id
string
ESA product identifier.
s2:spacecraft
string
Spacecraft name (Sentinel-2A or Sentinel-2B).
s2:processing_baseline
string
Processing baseline version.
s2:orbit_number
uint16
Relative orbit number.
s2:mean_solar_azimuth
float
Mean solar azimuth angle, averaged across all bands and detectors (degrees).
s2:mean_solar_zenith
float
Mean solar zenith angle, averaged across all bands and detectors (degrees).
s2:mean_view_azimuth
float
Mean viewing azimuth angle from band B8 (degrees).
s2:mean_view_zenith
float
Note on solar vs viewing angles. The sun has a single position relative to the scene, so ESA provides one solar azimuth and one solar zenith averaged across all bands. Viewing angles are different: Sentinel-2 uses a pushbroom sensor where each spectral band has its own detector array in the focal plane, each observing from a slightly different angle. That is why GEE provides per-band viewing angles (MEAN_INCIDENCE_*_ANGLE_B1 through _B12). We use band B8 (NIR, 10 m) as the reference because it is at native 10 m resolution and sits near the center of the focal plane, making it a representative proxy for the viewing geometry of the 10 m and 20 m bands.
Landsat 8/9 (l8:)
Sensor metadata for the assigned Landsat image. Present in land_l8 only.
Column
Type
Description
l8:id_gee
string
Google Earth Engine image ID. Use this to fetch the actual imagery.
l8:product_id
string
USGS product identifier.
l8:spacecraft
string
Spacecraft name (Landsat 8 or Landsat 9).
l8:collection_number
uint8
USGS Collection number.
l8:collection_category
string
Collection category (T1, T2, RT).
l8:processing_software
string
Processing software version.
l8:wrs_path
uint16
WRS-2 path number.
l8:wrs_row
uint16
WRS-2 row number.
l8:cloud_cover
float
Scene cloud cover percentage.
Terrain (terrain:)
Column
Type
Range
Description
terrain:elevation
float
~-420 to 8,849 (m)
Mean elevation in meters from the Copernicus GLO-30 DEM, a 30 m resolution Digital Surface Model derived from TanDEM-X radar satellite data (2011 to 2015). Includes buildings, infrastructure, and vegetation. Uses the EGM2008 vertical datum.
Climate (climate:)
Column
Type
Range
Description
climate:precipitation
float
0+ (mm/year)
Mean annual precipitation estimated from GPM (Global Precipitation Measurement) satellite data, aggregated as a long-term annual mean.
climate:temperature
float
~-40 to 50 (°C)
Mean annual land surface temperature estimated from MODIS LST satellite data, aggregated as a long-term annual mean.
Soil (soil:)
Surface-layer soil properties from the OpenLandMap dataset, derived from machine learning predictions on global soil survey data at 250 m resolution.
Column
Type
Range
Description
soil:clay
float
0 to 100 (%)
Clay content weight fraction at 0 cm depth. Source.
soil:sand
float
0 to 100 (%)
Sand content weight fraction at 0 cm depth. Source.
soil:carbon
float
0+ (g/kg)
Soil organic carbon content at 0 cm depth. Source.
GDP per capita at purchasing power parity (PPP, constant 2021 USD) for the year 2022. From the Kummu et al. (2025) gridded dataset, downscaled to admin-2 level (43,501 units) at 5 arc-min resolution. GEE catalog.
socio:population
float
0+ (people)
Estimated number of people per grid cell from the Meta High Resolution Settlement Layer (HRSL). Uses satellite imagery and census data at ~30 m resolution.
socio:human_modification
float
0.0 to 1.0
Cumulative degree of human modification of terrestrial ecosystems from the Global Human Modification v3 (Theobald et al. 2025). Combines the spatial footprint and intensity of 13 stressors across five categories: settlement, agriculture, transportation, mining/energy, and electrical infrastructure. 0 = no modification, 1 = fully modified. 300 m resolution. GEE catalog.
Administrative (admin:)
Human-readable administrative boundary names resolved from rasterized boundary datasets.
Column
Type
Description
admin:country
string
Country name. Tiles over ocean/lakes are labeled Ocean/Sea/Lakes.
admin:state
string
State or province name.
admin:district
string
District or county name.
Other
Column
Type
Description
geometry
binary (WKB)
Tile geometry. All files include GeoParquet metadata for spatial queries.
split
string
ELLIOT split assignment: monotemporal or temporal. Present in elliot.parquet only.
Files
File
Rows
Columns
Size
Description
global.parquet
5,055,204
26
146 MB
Every 10 km tile on Earth. The complete grid with all enrichment columns.
land.parquet
2,767,104
26
91 MB
Tiles covered by land-observing sensors (Sentinel-2 and Landsat). Same schema as global.
land_s2.parquet
2,547,253
34
127 MB
Land tiles with a Sentinel-2 image assigned. Adds stac: and s2: sensor metadata.
land_l8.parquet
2,255,537
38
97 MB
Land tiles with a Landsat 8/9 image assigned. Adds stac: and l8: sensor metadata.
elliot.parquet
279,166
27
14 MB
ELLIOT subset with monotemporal and temporal split assignments. Same enrichment as global plus split column.
Namespace availability per file
Namespace
global
land
land_s2
land_l8
elliot
majortom:
✓
✓
✓
stac:
✓
✓
s2:
✓
l8:
✓
terrain:
✓
✓
✓
✓
✓
climate:
✓
✓
✓
✓
✓
Quick Start
DuckDB
Pandas
ELLIOT Splits
The elliot.parquet file contains 279,166 tiles selected for the ELLIOT project multi-temporal dataset extension. Tile locations were sampled using hierarchical spherical k-means (530 × 528 = 279,840 clusters) over AlphaEarth Foundation embeddings to ensure global environmental diversity.
The split column defines two subsets:
Monotemporal (250,000 tiles). One cloud-free image per sensor per location. Designed for tasks where spatial coverage matters more than temporal depth: land cover classification, feature extraction, or pretraining foundation models on diverse global scenes.
Temporal (29,166 tiles). Multiple observations per location across time. Designed for tasks that require temporal context: change detection, phenology tracking, seasonal compositing, or training models that learn from multi-temporal sequences. This subset is further divided into monthly cadence (12,500 tiles × 12 timesteps) and five-daily cadence (16,666 tiles × 6 timesteps).
Critical Infrastructure Spatial Index (Nirandjan et al. 2022). Aggregates OpenStreetMap data on 39 types of critical infrastructure across seven systems: transportation, energy, telecommunication, waste, water, education, and health. 0 = no infrastructure, 1 = highest density. 0.10° resolution. GEE catalog.