Lately I’m dealing with some RDBMS’s again. Nothing particularly worth talking about in itself, except that it involves dealing with timestamps and specialised date/time types.
This suddenly brought back some very painful memories from a time where I was dealing with date/time and timestamp types in Postgres, and the hard and fast rules the team I worked on devised to make our life easier.
The Rules
- No business ID’s in primary key fields
- Store any kind of timestamp as a plain long epoch timestamp
- Give explicit (ie, longer) names to primary and foreign key fields
- Express as many constraints in the schema as possible
Primary Keys & Business IDs
Business IDs are tied to the real world. And the real world tends to change on its own accord, not giving a damn if it breaks your nice schema.
Save yourself some future pain by defining an auto-generated LONG
of BIGSERIAL
primary key on your tables and using these to express relations across your entities. This way, the actual business relevant data may change in whatever way they want, your schema has a higher chance of accomodating said changes without too many headaches.1
This rule also has a little sister: if you’re hesitating between using an existing identifier and adding a new primary key, add a primary key.
Timestamps as plain long
Custom date/time types seem pretty nice. Your favourite RDBMS probably offers loads of nice functionalities related to them. But each RDBMS will do things differently, and have its own quirks.
For example, older versions of Postgres used to store things using a time-zone that dependend on the server locale. And this was totally not obvious. Update the server’s time zone, and all hell breaks loose.
This has been fixed (AFAIK), but, often, I’m still left wondering what time-zone is this stored under? And how does this particular type behave?
Having a convention to store all times as epoch milliseconds (or seconds), with comments in the schema making this explicit, makes it easy to reason about time in your schema. And localisation of an epoch timestamp to any other format or convention is an easy task in any language or database system.
Full names for primary and foreign key fields
Given a table book
, on which we want a primary key, we will name that field book_id
.
Similarly, if we have an editor
table, we will name its PK editor_id
. This way, if book
needs a reference to editor
, that foreign key can naturally be named editor_id
and it’s perfectly clear what it points to.
This really reduces frustrations when schemas get bigger, and the only cost is slightly longer names for your foreign keys. When working with tools like jOOq2, it makes joining across many tables, including derived ones, extremely easy.
Expressing constraints
The biggest value of a relational database management system is that it can understand and enforce some constraints your data may have. A unique name for anything that belongs to a customer, only a single active setting per device, etc.
RDBMS’s are built to do so (reasonably) efficiently, and expressing a few constraints on the SQL level can save you a lot of time higher up in the stack when you are writing and reading the data.
In short, a solid DB schema is an excellent complement to a strong type system: it will catch a lot of the errors you’re likely to commit.
Underlying Principles
These rules are enticing to me because they help with:
- future proofing your code & architecture,
- making things explicit, even at the cost of more verbosity,
- preventing certain classes of mistakes, or catching errors early on.
As the saying goes, Your mileage may vary: said rules haven’t aged a bit and still apply nicely.
-
I remember stumbling on this old stackoverflow answer, which sums up my point, back in 2014 when thinking about this question. ↩︎
-
Shameless Proselytism: if you’re working in the JVM world and dealing with SQL databases, you absolutely need to check out jOOq if you haven’t already done so. ↩︎