A STAC GeoParquet catalog of every satellite scene processed to produce the embeddings in this repository, organized by collection. Use this dataset to look up source imagery metadata — scene footprints, acquisition times, and COG asset URLs — and join back to the embeddings via stac_item_id.
File
One STAC GeoParquet file per collection:
1rasters/
2 collection=sentinel-2-l2a/
3 stac.parquet
4 collection=naip/ (future)
5 stac.parquet
1rasters/
2 collection=sentinel-2-l2a/
3 stac.parquet
4 collection=naip/ (future)
5 stac.parquet
Schema (sentinel-2-l2a)
Column
Type
Description
id
string
Scene identifier (e.g. S2B_33TVH_20170930_0_L2A). Join key to stac_item_id in the embeddings datasets.
collection
string
STAC collection name (e.g. sentinel-2-l2a)
datetime
timestamp
Acquisition timestamp
geometry
binary (WKB)
Scene footprint polygon in EPSG:4326
bbox
struct
Bounding box (xmin, ymin, xmax, ymax)
assets
struct
COG asset URLs per band (see below)
gsd
double
Ground sample distance in meters (10.0)
file_last_modified
string
Last modified timestamp of source file
stac_version
string
STAC spec version (1.0.0)
stac_extensions
list
STAC extensions used (empty)
links
list
STAC links (empty)
type
string
GeoJSON feature type (Feature)
Data Format
STAC GeoParquet 1.1.0 with geo and stac-geoparquet metadata