Clean Domain Models with SQLAlchemy

George Kettleborough · December 8, 2024

One of the most tried and tested techniques in programming is to build components that map closely to the real-world problems you are trying to solve. In domain-driven design (DDD), one of the key tenets is to have a domain model at the centre of your application.

The domain model should be built in close collaboration with domain experts and, crucially, should consist purely of high-level domain logic and be completely free of low-level application logic like databases, GUIs, web frameworks etc.

For example, a publishing system might have an action called “publish” that makes sense to all domain experts. An implementation might ultimately be as simple as a SQL UPDATE on a table setting is_published=true, but the domain model should not be polluted with such details. “Publish” is part of the domain model; UPDATE is not.

Let’s look at how we can use SQLAlchemy to handle database persistence of domain objects without polluting our domain model.

A domain model

The domain model is the heart of our application, so it makes sense to start with that.

 1class TodoList:
 2
 3    def __init__(self, name: str, items: list[str]):
 4        self.name = name
 5        self.items = items
 6
 7    def add_to_bottom(self, item: str) -> None:
 8        self.items.append(item)
 9
10    def add_to_top(self, item: str) -> None:
11        self.items.insert(0, item)
12
13    def get_first_item(self) -> str | None:
14        if self.items:
15            return self.items[0]
16        return None
17
18    @property
19    def hashed_name(self) -> str:
20        m = hashlib.sha256()
21        m.update(self.name.encode())
22        return m.hexdigest()

There are a few things to note about this TodoList:

It has a simple attribute name,
It has a complex attribute items, this is a list of str. Being a list means it is ordered,
It has an invariant property hashed_name which is always the sha256 hash of name.

But, more importantly, note that this has absolutely no logic concerning databases or persistence in any way.

Now let’s use SQLAlchemy to persist these TodoList objects in a database. The goal is to not touch the domain model at all. We are going to build a layer of application logic which does not concern the domain experts and therefore it should not be part of the domain model.

Simple fields

Let’s start with the simple attribute name. We can easily define a database table that has the single field name, along with a database generated id column. Note that this column is not part of our domain model, but is useful to generate anyway. The following describes a database table using SQLAlchemy:

 1from sqlalchemy import Column, Integer, String, Table
 2from sqlalchemy.orm import registry
 3
 4mapper_registry = registry()
 5
 6todo_lists = Table(
 7    "todo_lists",
 8    mapper_registry.metadata,
 9    Column("id", Integer, primary_key=True, autoincrement=True),
10    Column("name", String(), unique=True, nullable=False),
11)

We now tell SQLAlchemy to map this table to our domain class:

1mapper_registry.map_imperatively(TodoList, todo_lists)

By default SQLAlchemy maps columns names to attribute names, but you can tell it to do otherwise (as we will later). What we’ve just set up is an object-relational mapping (ORM). SQLAlchemy will map data to/from relations (in a SQL database) and object (in our application).

Now we can persist an object in a database, but only the name field is saved:

 1engine = create_engine("sqlite:///test.db", echo=True)
 2mapper_registry.metadata.drop_all(engine)
 3mapper_registry.metadata.create_all(engine)
 4
 5# construct domain object
 6my_list = TodoList("my_list", ["one", "two", "three"])
 7
 8# save in database
 9with Session(engine) as s:
10    s.add(my_list)
11    s.commit()
12
13# retrieve from database
14with Session(engine) as s:
15    l = s.get(TodoList, 1)
16
17l.name    # => "my_list"
18l.items   # => AttributeError

Now let’s look at how to save the items field.

Complex fields

The items attribute of TodoList is complex: it is made up of multiple parts. In addition, it encodes important domain semantics, namely it is an ordered list.

Some databases, like Postgres, support many complex field types, including arrays, which would fit our purpose here quite nicely. But if we don’t want to rely on a particular database implementation we have to do something else.

In SQL the way to handle this is to use foreign keys and joins. What we need is another table containing the todo items which has a foreign key to a todo list. That’s easy enough:

1todo_items = Table(
2    "todo_items",
3    mapper_registry.metadata,
4    Column("list_id", ForeignKey("todo_lists.id"), primary_key=True),
5    Column("item_name", String(), primary_key=True),
6)

Now, we need to create a class for SQLAlchemy to map to. This class won’t be part of our domain model, it exists purely so that SQLAlchemy can do its thing:

1@dataclass
2class TodoListItem:
3    item_name: str
4    list_id: int | None = None
5
6mapper_registry.map_imperatively(TodoListItem, todo_items)

Note the list_id is optional because we won’t have a value for this when we construct a new TodoList.

At this point, SQLAlchemy knows how to map this new TodoListItem class to the table todo_items, but we don’t want to use this class directly, we want to go through our domain model. We need to tell SQLAlchemy how to handle this relationship by updating the mapping for TodoList:

 1mapper_registry.map_imperatively(
 2    TodoList, todo_lists,
 3    properties={
 4        "_items": relationship(
 5            TodoListItem,
 6            cascade="all, delete-orphan",
 7            lazy="selectin",
 8        ),
 9    },
10)

This doesn’t quite do what we want, this would map an attribute _items on our domain model to those TodoListItem objects, which are not part of our domain model at all. What we want is to extract just the item_name from the related table and map those to our TodoList.items list. We can do that with an association_proxy:

1from sqlalchemy.ext.associationproxy import association_proxy
2
3TodoList.items = association_proxy("_items", "item_name")

Now we can persist the items:

 1engine = create_engine("sqlite:///test.db", echo=True)
 2mapper_registry.metadata.drop_all(engine)
 3mapper_registry.metadata.create_all(engine)
 4
 5# construct domain object
 6my_list = TodoList("my_list", ["one", "two", "three"])
 7
 8# save in database
 9with Session(engine) as s:
10    s.add(my_list)
11    s.commit()
12
13# retrieve from database
14with Session(engine) as s:
15    l = s.get(TodoList, 1)
16
17l.name    # => "my_list"
18l.items   # => ['one', 'three', 'two']

But hold on, what is going on with the order?! This is because we are still missing an important part of the database representation. Not only are we forced to have separate tables and foreign keys, we also must handle the order ourselves. SQL databases are strictly unordered, unless an order is specified.

First let’s amend our todo_items table to add a position:

1todo_items = Table(
2    "todo_items",
3    mapper_registry.metadata,
4    Column("list_id", ForeignKey("todo_lists.id"), primary_key=True),
5    Column("item_name", String(), primary_key=True),
6    Column("position", Integer()),
7)

We should also add this to our TodoListItem class:

1@dataclass
2class TodoListItem:
3    item_name: str
4    list_id: int | None = None
5    position: int | None = None

Now we can tell SQLAlchemy to order the relation by specifying order-by="TodoListItem.position" but how do we write those positions in the first place? Fortunately there is special collection type called ordering_list which will handle this for us:

 1mapper_registry.map_imperatively(
 2    TodoList, todo_lists,
 3    properties={
 4        "_items": relationship(
 5            TodoListItem,
 6            order_by="TodoListItem.position",
 7            collection_class=ordering_list("position"),
 8            cascade="all, delete-orphan",
 9            lazy="selectin",
10        ),
11    },
12)

Now when we write an object to the database, ordering_list will automatically fill in the position column for us according to the order of the list in the domain object. When we retrieve an object from the database the list will be ordered according to those positions again.

We’ve now managed to persist a complex field in the database without polluting our domain model with anything at all. As it happens we needed to create a new table, a foreign key and an ordering column, but our domain model is none the wiser! It’s still just a list.

Next let’s see how we can persist that generated hashed_name field in the database.

Generated fields

Our domain model has an invariant: the hashed_name is always the sha256 of the name. If we want to be able to search for this hash efficiently we will need to persist it in the database. What we’d like is to write this field to the database when the object is created or updated.

First, let’s add it to the table definition:

1todo_lists = Table(
2    "todo_lists",
3    mapper_registry.metadata,
4    Column("id", Integer, primary_key=True, autoincrement=True),
5    Column("name", String(), unique=True, nullable=False),
6    Column(
7        "hashed_name", String(), index=True, nullable=False, key="_hashed_name"
8    ),
9)

Note we set key="_hashed_name". This causes SQLAlchemy to map it to/from a hidden field on model _hashed_name, rather than try to set the property, which it can’t.

In order to update this value according to our domain model we can set some triggers:

1@event.listens_for(TodoList, "before_insert")
2@event.listens_for(TodoList, "before_update")
3def populate_hashed_name(mapper, connection, target):
4    target._hashed_name = target.hashed_name

Now we can persist the whole thing and retrieve by hash:

 1engine = create_engine("sqlite:///test.db", echo=True)
 2mapper_registry.metadata.drop_all(engine)
 3mapper_registry.metadata.create_all(engine)
 4
 5# construct domain object
 6my_list = TodoList("my_list", ["one", "two", "three"])
 7
 8# save in database
 9with Session(engine) as s:
10    s.add(my_list)
11    s.commit()
12
13# retrieve from database
14with Session(engine) as s:
15    result = s.execute(select(TodoList).where(
16        TodoList._hashed_name == "495a613093452715b9989b8233829836804bce4c1f95e221f86da526ea93281b"
17    ))
18    for obj in result.scalars():
19        print(obj.name)         # => 'my_list'

Conclusion

So now we are able to persist our domain model fully into the database. SQLAlchemy does its job as an ORM to make this mapping complete. From the point of view of our domain model the TodoList is just a Python class and the business rules can be expressed and tested in regular Python code.

The full code is here:

 1from dataclasses import dataclass
 2import hashlib
 3
 4from sqlalchemy import (
 5    Column,
 6    ForeignKey,
 7    Integer,
 8    String,
 9    Table,
10    create_engine,
11    event,
12)
13from sqlalchemy.ext.associationproxy import association_proxy
14from sqlalchemy.ext.orderinglist import ordering_list
15from sqlalchemy.orm import registry, relationship
16
17# --- Domain ---
18
19class TodoList:
20    """The domain model for a todo list"""
21
22    def __init__(self, name: str, items: list[str]):
23        self.name = name
24        self.items = items
25
26    def __eq__(self, other) -> bool:
27        if isinstance(other, TodoList) and self.name == other.name:
28            return True
29        return False
30
31    def __hash__(self) -> int:
32        return hash(self.name)
33
34    def add_to_bottom(self, item: str) -> None:
35        self.items.append(item)
36
37    def add_to_top(self, item: str) -> None:
38        self.items.insert(0, item)
39
40    def get_first_item(self) -> str | None:
41        if self.items:
42            return self.items[0]
43        return None
44
45    @property
46    def hashed_name(self) -> str:
47        m = hashlib.sha256()
48        m.update(self.name.encode())
49        return m.hexdigest()
50
51
52# --- ORM stuff ---
53
54mapper_registry = registry()
55
56todo_lists = Table(
57    "todo_lists",
58    mapper_registry.metadata,
59    Column("id", Integer, primary_key=True, autoincrement=True),
60    Column("name", String(), unique=True, nullable=False),
61    Column("hashed_name", String(), index=True, nullable=False, key="_hashed_name"),
62)
63
64todo_items = Table(
65    "todo_items",
66    mapper_registry.metadata,
67    Column("list_id", ForeignKey("todo_lists.id"), primary_key=True),
68    Column("item_name", String(), primary_key=True),
69    Column("position", Integer()),
70)
71
72@dataclass
73class TodoListItem:
74    item_name: str
75    list_id: int | None = None
76    position: int | None = None
77
78mapper_registry.map_imperatively(TodoListItem, todo_items)
79
80mapper_registry.map_imperatively(
81    TodoList, todo_lists,
82    properties={
83        "_items": relationship(
84            TodoListItem,
85            order_by="TodoListItem.position",
86            collection_class=ordering_list("position"),
87            cascade="all, delete-orphan",
88            lazy="selectin",
89        ),
90    },
91)
92
93TodoList.items = association_proxy("_items", "item_name")
94
95@event.listens_for(TodoList, "before_insert")
96@event.listens_for(TodoList, "before_update")
97def populate_hashed_name(mapper, connection, target):
98    target._hashed_name = target.hashed_name

In an ORM like Django, we would be forced to bend our model to the needs of the database, like having a foreign key relationship for items, and would have to pollute our model with ORM specific stuff like column types etc. When it comes to testing, you end up needing a database the moment you have a relationship (or the complex field in our example).

Alternatively we could consider Django models to be just database tables and manually map them to domain models ourselves. But in that case we’d also have to implement the unit of work pattern ourselves and track the changes to each object so we know which ones to update. But isn’t that what the ORM is for? Django seems to only do half the job. SQLAlchemy does this for us, of course.

One thing you might be wondering is whether it’s true that SQLAlchemy didn’t touch the domain model. What are those _items and _hashed_name attributes? And what about this _sa_instance_state that you’ll see on instances from the db? SQLAlchemy does indeed dynamically modify the instances to keep track of changes and implement some of the magic. You do have to take care when setting up the mapping, but it should always be the mapping that bends to the needs of the domain model, not the other way around.

If done properly the mapping won’t affect the way the domain model operates in any way. You could instantiate an instance either via SQLAlchemy or its __init__ method, or perhaps by a special test repository that doesn’t use a database. It will all be the same. But that doesn’t mean an end-to-end test or two wouldn’t be appropriate.

Finally, I did wonder about using a deque for the items instead of a list. After all, self.items.insert(0, item) is not an efficient operation for a list (nor would a pop_first_item method, for example). Using a deque isn’t quite so easy. SQLAlchemy includes the machinery for list, set and dict, but you would have to provide your own proxy_factory argument to association_proxy to use other collections. This is possible, though, if you need it.

A domain model

Simple fields

Complex fields

Generated fields

Conclusion

Contents