Source Stats
Weekly statistics about Source Cooperative's data storage and usage, generated from S3 inventory reports.
Available Data
This repository contains three types of CSV reports, updated weekly:
Account Statistics (accounts/)
Storage metrics grouped by account (data contributor).
Filename format : accounts/YYYYMMDD.csv
Columns :
account - Account identifier
repositories - Number of repositories per account
objects - Total file count
storage_gb - Storage used in gigabytes
avg_object_size_mb - Average file size in megabytes
oldest_file - Timestamp of oldest file
newest_file - Timestamp of newest file
Repository Statistics (repositories/)
Detailed breakdown by individual repository.
Filename format : repositories/YYYYMMDD.csv
Columns :
account - Account identifier
repository - Repository name
objects - Total file count
storage_gb - Storage used in gigabytes
avg_object_size_mb - Average file size in megabytes
oldest_file - Timestamp of oldest file
newest_file - Timestamp of newest file
High-level metrics for the entire Source Cooperative platform.
Filename format : source/YYYYMMDD.csv
Columns :
metric - Metric name
value - Metric value
Metrics included :
Total Accounts
Total Repositories
Total Objects
Total Storage (TB)
Data Notes
Update Frequency : Weekly, typically generated when new S3 inventory reports are available
Date Format : Files are named using YYYYMMDD.csv format based on the report generation date
Data Scope : Covers AWS S3 storage only (excludes Microsoft Azure hosted datasets)
Structure : Analyzes data following Source's [account]/[repository] folder structure
Deduplication : Inventory records are automatically deduplicated to ensure accurate counts
Usage Examples
List Available Files
1 # List all account statistics files
2 aws s3 ls s3://us-west-2.opendata.source.coop/source/source-stats/accounts/ --no-sign-request
3
4 # List all repository statistics files
5 aws s3 ls s3://us-west-2.opendata.source.coop/source/source-stats/repositories/ --no-sign-request
6
7 # List all platform summary files
8 aws s3 ls s3://us-west-2.opendata.source.coop/source/source-stats/source/ --no-sign-request
1 # List all account statistics files
2 aws s3 ls s3://us-west-2.opendata.source.coop/source/source-stats/accounts/ --no-sign-request
3
4 # List all repository statistics files
5 aws s3 ls s3://us-west-2.opendata.source.coop/source/source-stats/repositories/ --no-sign-request
6
7 # List all platform summary files
8 aws s3 ls s3://us-west-2.opendata.source.coop/source/source-stats/source/ --no-sign-request
Download Files
1 # Download latest account statistics (replace YYYYMMDD with actual date)
2 aws s3 cp s3://us-west-2.opendata.source.coop/source/source-stats/accounts/YYYYMMDD.csv . --no-sign-request
3
4 # Download via direct URL
5 curl -O https://us-west-2.opendata.source.coop/source/source-stats/accounts/YYYYMMDD.csv
1 # Download latest account statistics (replace YYYYMMDD with actual date)
2 aws s3 cp s3://us-west-2.opendata.source.coop/source/source-stats/accounts/YYYYMMDD.csv . --no-sign-request
3
4 # Download via direct URL
5 curl -O https://us-west-2.opendata.source.coop/source/source-stats/accounts/YYYYMMDD.csv
Programmatic Access
1 import pandas as pd
2
3 # Load latest account statistics (replace YYYYMMDD with actual date)
4 df = pd.read_csv( 'https://us-west-2.opendata.source.coop/source/source-stats/accounts/YYYYMMDD.csv' )
5
6 # Load platform summary
7 summary = pd.read_csv( 'https://us-west-2.opendata.source.coop/source/source-stats/source/YYYYMMDD.csv' )
1 import pandas as pd
2
3 # Load latest account statistics (replace YYYYMMDD with actual date)
4 df = pd.read_csv( 'https://us-west-2.opendata.source.coop/source/source-stats/accounts/YYYYMMDD.csv' )
5
6 # Load platform summary
7 summary = pd.read_csv( 'https://us-west-2.opendata.source.coop/source/source-stats/source/YYYYMMDD.csv' )
Data Generation
These statistics are automatically generated from S3 inventory reports using AWS Athena queries. The source code for the generation process is available at github.com/source-cooperative/source-stats .
Questions or Issues
For questions about this data or to report issues, please contact Source Cooperative support or create an issue in the source-stats repository .