One of the most tried and tested techniques in programming is to build components that map closely to the real-world problems you are trying to solve. In domain-driven design (DDD), one of the key tenets is to have a domain model at the centre of your application.
The domain model should be built in close collaboration with domain experts and, crucially, should consist purely of high-level domain logic and be completely free of low-level application logic like databases, GUIs, web frameworks etc.
For example, a publishing system might have an action called “publish” that makes sense
to all domain experts. An implementation might ultimately be as simple as a SQL
UPDATE
on a table setting is_published=true
, but the domain model should not be
polluted with such details. “Publish” is part of the domain model; UPDATE
is not.
Let’s look at how we can use SQLAlchemy to handle database persistence of domain objects without polluting our domain model.
A domain model
The domain model is the heart of our application, so it makes sense to start with that.
1class TodoList:
2
3 def __init__(self, name: str, items: list[str]):
4 self.name = name
5 self.items = items
6
7 def add_to_bottom(self, item: str) -> None:
8 self.items.append(item)
9
10 def add_to_top(self, item: str) -> None:
11 self.items.insert(0, item)
12
13 def get_first_item(self) -> str | None:
14 if self.items:
15 return self.items[0]
16 return None
17
18 @property
19 def hashed_name(self) -> str:
20 m = hashlib.sha256()
21 m.update(self.name.encode())
22 return m.hexdigest()
There are a few things to note about this TodoList
:
- It has a simple attribute
name
, - It has a complex attribute
items
, this is alist
ofstr
. Being a list means it is ordered, - It has an invariant property
hashed_name
which is always the sha256 hash ofname
.
But, more importantly, note that this has absolutely no logic concerning databases or persistence in any way.
Now let’s use SQLAlchemy to persist these TodoList
objects in a database. The goal is
to not touch the domain model at all. We are going to build a layer of application
logic which does not concern the domain experts and therefore it should not be part of
the domain model.
Simple fields
Let’s start with the simple attribute name
. We can easily define a database table that
has the single field name
, along with a database generated id
column. Note that this
column is not part of our domain model, but is useful to generate anyway. The following
describes a database table using SQLAlchemy:
1from sqlalchemy import Column, Integer, String, Table
2from sqlalchemy.orm import registry
3
4mapper_registry = registry()
5
6todo_lists = Table(
7 "todo_lists",
8 mapper_registry.metadata,
9 Column("id", Integer, primary_key=True, autoincrement=True),
10 Column("name", String(), unique=True, nullable=False),
11)
We now tell SQLAlchemy to map this table to our domain class:
1mapper_registry.map_imperatively(TodoList, todo_lists)
By default SQLAlchemy maps columns names to attribute names, but you can tell it to do otherwise (as we will later). What we’ve just set up is an object-relational mapping (ORM). SQLAlchemy will map data to/from relations (in a SQL database) and object (in our application).
Now we can persist an object in a database, but only the name
field is saved:
1engine = create_engine("sqlite:///test.db", echo=True)
2mapper_registry.metadata.drop_all(engine)
3mapper_registry.metadata.create_all(engine)
4
5# construct domain object
6my_list = TodoList("my_list", ["one", "two", "three"])
7
8# save in database
9with Session(engine) as s:
10 s.add(my_list)
11 s.commit()
12
13# retrieve from database
14with Session(engine) as s:
15 l = s.get(TodoList, 1)
16
17l.name # => "my_list"
18l.items # => AttributeError
Now let’s look at how to save the items
field.
Complex fields
The items
attribute of TodoList
is complex: it is made up of multiple parts. In
addition, it encodes important domain semantics, namely it is an ordered list.
Some databases, like Postgres, support many complex field types, including arrays, which would fit our purpose here quite nicely. But if we don’t want to rely on a particular database implementation we have to do something else.
In SQL the way to handle this is to use foreign keys and joins. What we need is another table containing the todo items which has a foreign key to a todo list. That’s easy enough:
1todo_items = Table(
2 "todo_items",
3 mapper_registry.metadata,
4 Column("list_id", ForeignKey("todo_lists.id"), primary_key=True),
5 Column("item_name", String(), primary_key=True),
6)
Now, we need to create a class for SQLAlchemy to map to. This class won’t be part of our domain model, it exists purely so that SQLAlchemy can do its thing:
1@dataclass
2class TodoListItem:
3 item_name: str
4 list_id: int | None = None
5
6mapper_registry.map_imperatively(TodoListItem, todo_items)
Note the list_id
is optional because we won’t have a value for this when we construct
a new TodoList
.
At this point, SQLAlchemy knows how to map this new TodoListItem
class to the table
todo_items
, but we don’t want to use this class directly, we want to go through our
domain model. We need to tell SQLAlchemy how to handle this relationship by updating the
mapping for TodoList
:
1mapper_registry.map_imperatively(
2 TodoList, todo_lists,
3 properties={
4 "_items": relationship(
5 TodoListItem,
6 cascade="all, delete-orphan",
7 lazy="selectin",
8 ),
9 },
10)
This doesn’t quite do what we want, this would map an attribute _items
on our domain
model to those TodoListItem
objects, which are not part of our domain model at
all. What we want is to extract just the item_name
from the related table and map
those to our TodoList.items
list. We can do that with an association_proxy
:
1from sqlalchemy.ext.associationproxy import association_proxy
2
3TodoList.items = association_proxy("_items", "item_name")
Now we can persist the items:
1engine = create_engine("sqlite:///test.db", echo=True)
2mapper_registry.metadata.drop_all(engine)
3mapper_registry.metadata.create_all(engine)
4
5# construct domain object
6my_list = TodoList("my_list", ["one", "two", "three"])
7
8# save in database
9with Session(engine) as s:
10 s.add(my_list)
11 s.commit()
12
13# retrieve from database
14with Session(engine) as s:
15 l = s.get(TodoList, 1)
16
17l.name # => "my_list"
18l.items # => ['one', 'three', 'two']
But hold on, what is going on with the order?! This is because we are still missing an important part of the database representation. Not only are we forced to have separate tables and foreign keys, we also must handle the order ourselves. SQL databases are strictly unordered, unless an order is specified.
First let’s amend our todo_items
table to add a position:
1todo_items = Table(
2 "todo_items",
3 mapper_registry.metadata,
4 Column("list_id", ForeignKey("todo_lists.id"), primary_key=True),
5 Column("item_name", String(), primary_key=True),
6 Column("position", Integer()),
7)
We should also add this to our TodoListItem
class:
1@dataclass
2class TodoListItem:
3 item_name: str
4 list_id: int | None = None
5 position: int | None = None
Now we can tell SQLAlchemy to order the relation by specifying
order-by="TodoListItem.position"
but how do we write those positions in the first
place? Fortunately there is special collection type called ordering_list
which will
handle this for us:
1mapper_registry.map_imperatively(
2 TodoList, todo_lists,
3 properties={
4 "_items": relationship(
5 TodoListItem,
6 order_by="TodoListItem.position",
7 collection_class=ordering_list("position"),
8 cascade="all, delete-orphan",
9 lazy="selectin",
10 ),
11 },
12)
Now when we write an object to the database, ordering_list
will automatically fill in
the position
column for us according to the order of the list in the domain
object. When we retrieve an object from the database the list will be ordered according
to those positions
again.
We’ve now managed to persist a complex field in the database without polluting our domain model with anything at all. As it happens we needed to create a new table, a foreign key and an ordering column, but our domain model is none the wiser! It’s still just a list.
Next let’s see how we can persist that generated hashed_name
field in the database.
Generated fields
Our domain model has an invariant: the hashed_name
is always the sha256 of the
name
. If we want to be able to search for this hash efficiently we will need to
persist it in the database. What we’d like is to write this field to the database when
the object is created or updated.
First, let’s add it to the table definition:
1todo_lists = Table(
2 "todo_lists",
3 mapper_registry.metadata,
4 Column("id", Integer, primary_key=True, autoincrement=True),
5 Column("name", String(), unique=True, nullable=False),
6 Column(
7 "hashed_name", String(), index=True, nullable=False, key="_hashed_name"
8 ),
9)
Note we set key="_hashed_name"
. This causes SQLAlchemy to map it to/from a hidden
field on model _hashed_name
, rather than try to set the property, which it can’t.
In order to update this value according to our domain model we can set some triggers:
1@event.listens_for(TodoList, "before_insert")
2@event.listens_for(TodoList, "before_update")
3def populate_hashed_name(mapper, connection, target):
4 target._hashed_name = target.hashed_name
Now we can persist the whole thing and retrieve by hash:
1engine = create_engine("sqlite:///test.db", echo=True)
2mapper_registry.metadata.drop_all(engine)
3mapper_registry.metadata.create_all(engine)
4
5# construct domain object
6my_list = TodoList("my_list", ["one", "two", "three"])
7
8# save in database
9with Session(engine) as s:
10 s.add(my_list)
11 s.commit()
12
13# retrieve from database
14with Session(engine) as s:
15 result = s.execute(select(TodoList).where(
16 TodoList._hashed_name == "495a613093452715b9989b8233829836804bce4c1f95e221f86da526ea93281b"
17 ))
18 for obj in result.scalars():
19 print(obj.name) # => 'my_list'
Conclusion
So now we are able to persist our domain model fully into the database. SQLAlchemy does
its job as an ORM to make this mapping complete. From the point of view of our domain
model the TodoList
is just a Python class and the business rules can be expressed and
tested in regular Python code.
The full code is here:
1from dataclasses import dataclass
2import hashlib
3
4from sqlalchemy import (
5 Column,
6 ForeignKey,
7 Integer,
8 String,
9 Table,
10 create_engine,
11 event,
12)
13from sqlalchemy.ext.associationproxy import association_proxy
14from sqlalchemy.ext.orderinglist import ordering_list
15from sqlalchemy.orm import registry, relationship
16
17# --- Domain ---
18
19class TodoList:
20 """The domain model for a todo list"""
21
22 def __init__(self, name: str, items: list[str]):
23 self.name = name
24 self.items = items
25
26 def __eq__(self, other) -> bool:
27 if isinstance(other, TodoList) and self.name == other.name:
28 return True
29 return False
30
31 def __hash__(self) -> int:
32 return hash(self.name)
33
34 def add_to_bottom(self, item: str) -> None:
35 self.items.append(item)
36
37 def add_to_top(self, item: str) -> None:
38 self.items.insert(0, item)
39
40 def get_first_item(self) -> str | None:
41 if self.items:
42 return self.items[0]
43 return None
44
45 @property
46 def hashed_name(self) -> str:
47 m = hashlib.sha256()
48 m.update(self.name.encode())
49 return m.hexdigest()
50
51
52# --- ORM stuff ---
53
54mapper_registry = registry()
55
56todo_lists = Table(
57 "todo_lists",
58 mapper_registry.metadata,
59 Column("id", Integer, primary_key=True, autoincrement=True),
60 Column("name", String(), unique=True, nullable=False),
61 Column("hashed_name", String(), index=True, nullable=False, key="_hashed_name"),
62)
63
64todo_items = Table(
65 "todo_items",
66 mapper_registry.metadata,
67 Column("list_id", ForeignKey("todo_lists.id"), primary_key=True),
68 Column("item_name", String(), primary_key=True),
69 Column("position", Integer()),
70)
71
72@dataclass
73class TodoListItem:
74 item_name: str
75 list_id: int | None = None
76 position: int | None = None
77
78mapper_registry.map_imperatively(TodoListItem, todo_items)
79
80mapper_registry.map_imperatively(
81 TodoList, todo_lists,
82 properties={
83 "_items": relationship(
84 TodoListItem,
85 order_by="TodoListItem.position",
86 collection_class=ordering_list("position"),
87 cascade="all, delete-orphan",
88 lazy="selectin",
89 ),
90 },
91)
92
93TodoList.items = association_proxy("_items", "item_name")
94
95@event.listens_for(TodoList, "before_insert")
96@event.listens_for(TodoList, "before_update")
97def populate_hashed_name(mapper, connection, target):
98 target._hashed_name = target.hashed_name
In an ORM like Django, we would be forced to bend our model to the needs of the
database, like having a foreign key relationship for items
, and would have to pollute
our model with ORM specific stuff like column types etc. When it comes to testing, you
end up needing a database the moment you have a relationship (or the complex field in
our example).
Alternatively we could consider Django models to be just database tables and manually map them to domain models ourselves. But in that case we’d also have to implement the unit of work pattern ourselves and track the changes to each object so we know which ones to update. But isn’t that what the ORM is for? Django seems to only do half the job. SQLAlchemy does this for us, of course.
One thing you might be wondering is whether it’s true that SQLAlchemy didn’t touch the
domain model. What are those _items
and _hashed_name
attributes? And what about this
_sa_instance_state
that you’ll see on instances from the db? SQLAlchemy does indeed
dynamically modify the instances to keep track of changes and implement some of the
magic. You do have to take care when setting up the mapping, but it should always be the
mapping that bends to the needs of the domain model, not the other way around.
If done properly the mapping won’t affect the way the domain model operates in any
way. You could instantiate an instance either via SQLAlchemy or its __init__
method,
or perhaps by a special test repository that doesn’t use a database. It will all be the
same. But that doesn’t mean an end-to-end test or two wouldn’t be appropriate.
Finally, I did wonder about using a deque
for the items
instead of a list
. After
all, self.items.insert(0, item)
is not an efficient operation for a list (nor would a
pop_first_item
method, for example). Using a deque
isn’t quite so easy. SQLAlchemy
includes the machinery for list
, set
and dict
, but you would have to provide your
own proxy_factory
argument to association_proxy
to use other collections. This is
possible, though, if you need it.