This dataset contains a comprehensive archive of Reddit comments and submissions spanning from June 2005 to December 2025.
.zst)The dataset is distributed as highly compressed .zst files icontaining JSON objects separated by newlines (NDJSON).
Example Python scripts and tools for parsing and handling this data can be found in the following GitHub repository: Watchful1/PushshiftDumps
No license specified. Please note that the content consists of user-generated data scraped from Reddit. The underlying textual work may be protected by copyright. Researchers should use this data responsibly and consider Reddit's API and data usage guidelines.
If you use this dataset in your research or academic work, you can reference it using the following BibTeX entry: