GDELT Saudi Arabia Daily News Events
Enriched GDELT 2.0 event records related to Saudi Arabia, updated daily. Each record is deduplicated, geo-enriched with nearest-city matching, and includes scraped article full text when available.
Demo: geonews.tabaqat.dev - Interactive map to explore the data.
Bucket Structure
1 s3://us-west-2.opendata.source.coop/tabaqat/gdelt-sa/
2 └── country=SA/
3 └── year=YYYY/
4 ├── YYYY_MM_DD.parquet
5 ├── YYYY_MM_DD.parquet
6 └── ...
1 s3://us-west-2.opendata.source.coop/tabaqat/gdelt-sa/
2 └── country=SA/
3 └── year=YYYY/
4 ├── YYYY_MM_DD.parquet
5 ├── YYYY_MM_DD.parquet
6 └── ...
One Parquet file per day (ZSTD compressed). New files are synced daily around midnight UTC via GitHub Actions.
What's in Each File
Every row is a single GDELT event. Columns fall into four groups:
~76 columns total. Full schema details in the source repo .
Processing Pipeline
Raw GDELT 15-minute dumps go through:
Filter — keep only events involving Saudi Arabia
Clean — remove nulls, empty URLs, invalid tone values
Deduplicate — one best record per source URL (scored by data completeness)
Geo-enrich — match coordinates to nearest city using a 33K+ world cities reference via Haversine distance
Scrape — extract article text using Trafilatura → Newspaper4k → Playwright fallback chain (up to 500/day)
Quick Query (DuckDB)
1 -- read a single day
2 SELECT * FROM 's3://us-west-2.opendata.source.coop/tabaqat/gdelt-sa/country=SA/year=2026/2026_01_01.parquet' LIMIT 10 ;
3
4 -- count events with articles across all days
5 SELECT count ( * ) AS total,
6 count (ArticleTitle) AS with_articles
7 FROM 's3://us-west-2.opendata.source.coop/tabaqat/gdelt-sa/country=SA/**/*.parquet' ;
8
9 -- top mentioned cities
10 SELECT NearestCity, count ( * ) AS n
11 FROM 's3://us-west-2.opendata.source.coop/tabaqat/gdelt-sa/country=SA/**/*.parquet'
12 WHERE NearestCity IS NOT NULL
13 GROUP BY 1 ORDER BY 2 DESC LIMIT 10 ;
1 -- read a single day
2 SELECT * FROM 's3://us-west-2.opendata.source.coop/tabaqat/gdelt-sa/country=SA/year=2026/2026_01_01.parquet' LIMIT 10 ;
3
4 -- count events with articles across all days
5 SELECT count ( * ) AS total,
6 count (ArticleTitle) AS with_articles
7 FROM 's3://us-west-2.opendata.source.coop/tabaqat/gdelt-sa/country=SA/**/*.parquet' ;
8
9 -- top mentioned cities
10 SELECT NearestCity, count ( * ) AS n
11 FROM 's3://us-west-2.opendata.source.coop/tabaqat/gdelt-sa/country=SA/**/*.parquet'
12 WHERE NearestCity IS NOT NULL
13 GROUP BY 1 ORDER BY 2 DESC LIMIT 10 ;
License & Attribution
Data derived from the GDELT Project . Refer to GDELT's terms of use for redistribution rights. Pipeline source code is MIT licensed.