A deduplicated time series of Clay v1.5 embeddings, selecting one representative embedding per Major TOM grid cell per month. This removes redundancy from overlapping scene coverage at MGRS tile borders, normalizes observation frequency across regions with varying revisit rates, and simplifies time series analysis by providing consistent monthly snapshots.
Partitioning
The data is hive-partitioned by model version, collection, chip size, embedding dimensions, geohash, year, and month:
1monthly-aggregated/
2 model_version=v1.5/
3 collection=sentinel-2-l2a/
4 chip_size=1280m/
5 dims=256/
6 geohash=ab/year=2024/month=06/{hash}.parquet
1monthly-aggregated/
2 model_version=v1.5/
3 collection=sentinel-2-l2a/
4 chip_size=1280m/
5 dims=256/
6 geohash=ab/year=2024/month=06/{hash}.parquet
1aws s3 ls --no-sign-request s3://us-west-2.opendata.source.coop/clay/lgnd-embeddings/monthly-aggregated/model_version=v1.5/collection=sentinel-2-l2a/chip_size=1280m/dims=256/geohash=9y/year=2025/month=01/ --summarize --human-readable
1aws s3 ls --no-sign-request s3://us-west-2.opendata.source.coop/clay/lgnd-embeddings/monthly-aggregated/model_version=v1.5/collection=sentinel-2-l2a/chip_size=1280m/dims=256/geohash=9y/year=2025/month=01/ --summarize --human-readable
Schema
Column
Type
Description
chips_id
string
Spatio-temporally unique identifier for each embedding
cell_id
string
Spatially unique identifier for the Major TOM cell
rasters_id
string
Unique, internal LGND identifier for the source raster
stac_item_id
string
STAC item ID of the source Sentinel-2 scene, from Earth Search. Join key to the rasters dataset.
collection
string
STAC collection name (e.g. sentinel-2-l2a)
datetime
timestamp
Acquisition timestamp of the source imagery
embedding
list<float32>
256-dimensional Clay v1.5 embedding vector
bbox
struct
Bounding box (xmin, ymin, xmax, ymax) for geoparquet spatial filtering
geometry
binary (WKB)
Polygon geometry of the chip footprint in EPSG:4326
Data Format
GeoParquet 1.1 with geo and covering metadata
Hilbert curve spatial ordering within each partition