New York City's official bike-sharing system. The dataset contains comprehensive information including bike station locations, pickup times, ride durations, and many others, making it an ideal resource for analyzing CitiBike ridership patterns across New York City.
Description:
This dataset is a copy of the New York City Citibike data, which can be found here: https://citibikenyc.com/system-data, made available in the cloud-native geoparquet format partitioned by year and month.
The original data scehma exhibits inconsistencies across source files,particularly in data type definitions.
This dataset has been standardized and divided into twodistinct schemas ('new' and 'old') to best reflect the official Citibike data schema variations.
Specific data transformation steps are documented in my Medium blog post: https://medium.com/@calvinluozhengpei/ from-wall-street-to-central-park-decoding-nycs-citi-bike-patterns-part-1-3155569278cf
Feel free to read and provide feedback on how to make the process more efficient!
Example Usage:
Load data with Duckdb in Python:
Basic EDA in Python: