Description:
This dataset is a copy of the New York City Citibike data, which can be found here: https://citibikenyc.com/system-data, made available in the cloud-native geoparquet format partitioned by year and month.
The original data scehma exhibits inconsistencies across source files,particularly in data type definitions.
This dataset has been standardized and divided into twodistinct schemas ('new' and 'old') to best reflect the official Citibike data schema variations.
Specific data transformation steps are documented in my Medium blog post: https://medium.com/@calvinluozhengpei/ from-wall-street-to-central-park-decoding-nycs-citi-bike-patterns-part-1-3155569278cf
Feel free to read and provide feedback on how to make the process more efficient!
Example Usage:
Load data with Duckdb in Python:
Basic EDA in Python: