This dataset is a read-ready Rasteret collection derived from the AlphaEarth Foundations (AEF) Embedings COGs files and STAC-GeoParquet index from Source Coop TGE-labs AEF repository, enriched with COG header metadata.
It covers the full 2018-2024 COGs available in Source Coop buckets, and will be updated in future when 2025 embeddings comes out.
It is intended for Rasteret users who want to skip the build/enrich step and start directly from a filterable Parquet index using Rasteret, DuckDB, Polars, or PyArrow.
This dataset is a read-ready Rasteret collection derived from the public AlphaEarth Foundations (AEF) annual index and enriched with Rasteret's per-band Cloud Optimized GeoTIFF header metadata.
It is intended for Rasteret users who want to skip the build/enrich step and start directly from a filterable Parquet index using Rasteret, DuckDB, Polars, or PyArrow.
Each row corresponds to one AEF annual tile and includes:
index.parquet: a narrow GeoParquet 1.1 index with fid, crs, path, year,
utm_zone, bbox_utm, bbox, geom, and locationdata/part-xxxxx.parquet: read-ready Rasteret collection shards with canonical
runtime columns id, datetime, geometry, bbox, assets, proj:epsg,
year, utm_zone, plus A00_metadata through A63_metadataThis dataset does not store embedding pixel arrays inside Parquet. It stores metadata that lets Rasteret fetch pixels directly from the original COGs.
The source AEF STAC-geoparquet is already public, but Rasteret normally needs one more step: COG header enrichment. This collection publishes that work once so users can:
This derived metadata index is licensed under CC-BY-SA-4.0. It contains no pixel data — only Parquet metadata and COG headers of the imagery.
This dataset was built from the public AlphaEarth Foundations Satellite Embedding Dataset published on Source Cooperative by Taylor Geospatial Engine Labs:
The underlying AlphaEarth Foundations Satellite Embedding data is produced by Google and Google DeepMind and is licensed under CC-BY-4.0. This Terrafloww dataset is derived from the Source Cooperative AEF index and the Source Cooperative-hosted AEF COGs.
When using this derived index or the underlying embeddings, include the required upstream attribution text:
"The AlphaEarth Foundations Satellite Embedding dataset is produced by Google and Google DeepMind."
Reference links:
Use this citation for the derived Terrafloww dataset:
Also acknowledge the upstream data source in accompanying text:
Source Cooperative data is stored on S3. Use s3:// URIs for glob support and predicate pushdown.
index.parquet: narrow GeoParquet 1.1 metadata index for remote filteringdata/part-xxxxx.parquet: read-ready Rasteret collection shardsyear, utm_zone, and WGS84 spatial orderThis dataset is for metadata filtering, selection, and read-time handoff into Rasteret. It is especially useful when you want to combine AEF tiles with your own Arrow-native metadata workflows before reading embeddings.