The SQL standard includes a boolean type which, unsurprisingly, represents a Boolean value. This can be useful, but I’m going to argue that it should almost never be used in a normalised data model.

The Problems with Boolean

Here are few of the problems you’ll inevitably run into if you use boolean in a normalised data model:

It’s not really binary

So you’re chatting to your domain expert and he says something like “books are either hardback or paperback”. Something twigs in your mind: two options… binary… that’s a Boolean! So you duly write:

1CREATE TABLE book (
2  ...
3  is_paperback BOOLEAN,
4  ...
5);

This might work, for now at least. But here’s the thing: things in businesses or nature are very rarely binary. Even if it appears binary now it won’t be long before some third option comes along. Biologists thought all flowering plants fit neatly into two categories (monocots and dicots) for 250 years, but that is no longer the case. In my experience it takes far less time than 250 years for things to turn out to be non-binary in practice!

It’s not really independent

You’re chatting to another domain expert and she says “some of our products we ship ourselves, but all others are shipped by an external supplier directly to the consumer”. So you think, OK, a boolean field is_shipped_internally. Then to another engineer she says “some of our products require sign-off by a qualified professional” and he duly adds another boolean field requires_sign_off. You end up with:

1CREATE TABLE book (
2  ...
3  is_shipped_internally BOOLEAN,
4  required_sign_off BOOLEAN,
5  ...
6);

Later you discover that for legal reasons all products that require sign-off will be shipped externally, so now your data model allows an impossible value, namely something that’s shipped internally and requires sign-off.

It doesn’t capture the real data

This one can even crop up with purely internal details. Imagine a “task” model that captures the state of some internal task. It’s very tempting to use boolean fields to capture things like is_started, is_completed etc. But what if someone then asks when was this task completed? You don’t know. Using a boolean field has actually led to losing the real underlying data.

What to do Instead

Use enums

If there is a categorical property associated with your records, you can use an enum to model it. Postgres includes first-class support for enumerated types; for other databases you might just need to use a varchar and control the values in software. In Postgres for the book example you might have:

1CREATE TYPE book_type AS ENUM ('hardback', 'paperback');
2
3CREATE TABLE book (
4  ...
5  book_type book_type,
6  ...
7);

In Django we would probably use a CharField with the choices option. It would be nice if it supported the Postgres enum but it doesn’t seem to work out of the box.

Use views or properties

Boolean values exist and are incredibly common, but they are best thought of as answers to a question, also known as a predicate. For example, we could ask “is this book paperback?”, which might be written in code as is_paperback(book).

There are a couple of ways to support this. The first is in SQL itself, using views, for example:

1CREATE VIEW book_view AS
2  SELECT book_type = 'paperback' AS is_paperback,
3         book_type = 'hardback' AS is_hardback
4    FROM book;

Postgres will stop you writing a type like book_type = 'paperbac', by the way.

You could also do this in your code, for example, in Django using a property:

1class Book(Model):
2
3    ...
4
5    @property
6    def is_paperback(self) -> bool:
7        return self.book_type == "paperback"

The lets you write code at a higher abstraction level without exposing the implementation of those properties.

This also means you can enforce your invariant properties, like no book can be both internally shipped and require sign-off (strictly speaking the invariant here is is_internal(b) ^ requires_signoff(b) = false):

1class Book(Model):
2
3    @property
4    def is_shipped_internally(self) -> bool:
5        return self.shipping_type == "internal"
6
7    @property
8    def requires_sign_off(self) -> bool:
9        return self.shipping_type == "external_with_sign_off"

It’s easy to see that it’s not possible for a book to be both internally shipped and requiring sign off.

Dig deeper

Once you think about deriving Boolean values from data using predicates, you’ll often find there’s some deeper data you could store instead of storing the Boolean directly. For example, instead of storing is_published as can store published_at as a timestamp. Now you can always derive is_published when you need it, but you’ve also captured some of the underlying data that feeds into this value. I find just by thinking twice before using boolean I’ll often realise there’s some deeper data I can record instead.

YAGNI?

After all of this you might be thinking “You’re Aren’t Going to Need It” (YAGNI). Using a boolean is fine if it works, and if I ever need a third option I’ll just write a migration. But here’s the real problem: migrating from boolean to something else sucks. Furthermore, in the case of needing published_at it isn’t even possible. Even if you could write the migration to, say, an enum type, this is a multi-stage process of walking out database migrations and code changes, or alternatively downtime. It’s both tedious and laborious.

On the other hand, using any of the above solutions instead of boolean costs you hardly anything. Even if there are only two options are there are only ever two options, was it really that difficult to write out those two options one time instead of using the built-in true/false values? No, it really isn’t. Maybe a boolean is slightly faster to query than an enum, but you should only be thinking about such things if and when they become a problem. Premature optimisation is the root of all evil, remember. You can always denormalise your table into a materialised view if needed later.

Conclusion

So in conclusion I would seriously consider just never using boolean in your normalised data models at all. I really struggle to think of times when it’s the most appropriate field type but I’ve encountered countless times when it was inappropriate. It’s still useful for denormalised (materialised) views or BI tables etc., just keep it out of your normalised models.