this post was submitted on 08 Dec 2024
9 points (100.0% liked)

Data Engineering

387 readers
1 users here now

A community for discussion about data engineering

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 2 years ago
MODERATORS
 

I am creating a couple of bigger database tables with at least hundreds of millions of observations, but growing. Some tables are by minute, some by milliseconds. timestamps are not necessarily unique.

Should I create separate year, month, or date and time columns? Is one unique datetime column enough? At what size would you partition the tables?

Raw data is in csv.

Currently I aim for postgres and duckdb. Does timescaledb make a significant difference?

you are viewing a single comment's thread
view the rest of the comments
[–] jupyter@programming.dev 4 points 2 weeks ago

Thanks! That helps a lot