Most internal system (as known as HR management system) will provide a tree view of the organization’s hierarchy. It’s simple to query the whole tree from the root, but what if one day morning, your boss asks you to implement a feature that he/she can select an arbitrary node as root and specify a depth to filter the children? How can you make it happen before this day off? One quick solution is Common Table Expression (CTE).
We can also use CTE in querying a comments thread, for example, a reply thread like Reddit. In this kind of comments thread system, the user can reply to any single comment, and each comment represents the root of a sub tree. CTE can help us write a single, simple SQL to query the comments tree from the database, by its recursion feature.
Let’s build a very basic organization tree map. In this system, the user can:
From the requirements, we can easily design a table schema like this:
class Staff(Base):
__tablename__ = 'staff'
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String(64), nullable=False)
supervisor_id = sa.Column(
sa.Integer,
sa.ForeignKey('staff.id'),
nullable=True,
)
supervisor = relationship(
'Staff',
backref=backref('subordinates', cascade='all, delete-orphan'),
remote_side=[id],
# allow circular
post_update=True,
)
def __str__(self):
supervisor = '(none)'
if self.supervisor:
supervisor = f'({self.supervisor.id}, {self.supervisor.name})'
return f'<Staff id={self.id} name={self.name} supervisor={supervisor}>'
def __repr__(self):
return self.__str__()
The creation part is easy, we just construct a staff with the associated settings and persist it to the database:
alice = Staff(name='alice')
session.add(alice)
bob = Staff(name='bob', supervisor=alice)
session.add(bob)
session.commit()
Common Table Expression (CTE) is a way to specify a temporary named result set
with a select query. This result set can be used in the associated execution scope
(includes SELECT
/ INSERT
/ UPDATE
/ DELETE
, may vary with database system).
In most RDBMS, this clause can also be used for creating view.
Most CTE syntax looks like this:
WITH name_of_the_cte (column_1, column_2, ...)
AS
(
SELECT table.field1 as column_1, table.field2 as column_2
FROM table
WHERE
table.field3 = 'limit the rows'
)
SELECT count(column_1) as c
FROM name_of_the_cte
GROUP BY column_2
The more powerful part of CTE is the recursive usage. Recursive means inside the CTE query, we can refer to the CTE (temporary table) too:
WITH RECURSIVE recursive_cte (depth)
AS
(
SELECT 1 -- initial step
UNION ALL -- combine all these steps
SELECT depth + 1 -- reduce the depth (generate new value base on last step)
FROM recursive_cte -- refer to the temporary table
WHERE depth < 10 -- limit the steps can run
)
By adding (self) recursive usage, we can model tree like structures more easily.
MySQL added support for CTE since 8.0, you can find out more in here.
So let’s implement the query function. The main idea here is to read the direct children in the initial step, then query all direct children of these children in the reduce step:
WITH RECURSIVE staff_cte (id)
AS
(
SELECT id -- initial step
FROM staff
WHERE
supervisor_id = {the_root_of_the_tree}
UNION ALL
SELECT staff.id -- reduce step
FROM staff, staff_cte
WHERE
staff.supervisor_id = staff_cte.id -- select all direct children
-- from last step
)
SQLAlchemy empowered us with built-in CTE support:
def query_tree(session, supervisor_id: int) -> List[Staff]:
_cte_base = session.query(Staff) \
.filter(Staff.supervisor_id == supervisor_id) \
.cte(name='staff_cte', recursive=True)
_cte_base_alias = _cte_base.alias()
_cte = _cte_base.union_all(
session
.query(Staff)
.filter(_cte_base_alias.c.id == Staff.supervisor_id)
)
q = sa.select([
_cte.c.id.label('id'),
_cte.c.name.label('name'),
_cte.c.supervisor_id.label('supervisor_id'),
])
return session.query(Staff).from_statement(q).all()
To set limit to the query, we can add a constraint like this:
WITH RECURSIVE staff_cte (id, depth)
AS
(
SELECT id, 1 as depth -- initial step
FROM staff
WHERE
supervisor_id = {the_root_of_the_tree}
UNION ALL
SELECT staff.id, staff_cte + 1 as depth -- reduce step
FROM staff, staff_cte
WHERE
staff.supervisor_id = staff_cte.id -- select all direct children
-- from last step
AND staff_cte.depth < depth_limit -- limit the depth of the tree
)
def query_tree_with_depth_limit(
session,
supervisor_id: int,
depth_limit: int,
) -> List[Staff]:
_cte_base = sa.select([
Staff,
sa.sql.literal_column('1').label('depth_limit'),
]) \
.where(Staff.supervisor_id == supervisor_id) \
.cte(name='staff_cte', recursive=True)
_cte_base_alias = _cte_base.alias()
_cte = _cte_base.union_all(
sa.select([
Staff,
(_cte_base_alias.c.depth_limit + 1).label('depth_limit'),
])
.where(_cte_base_alias.c.id == Staff.supervisor_id)
.where(_cte_base_alias.c.depth_limit < depth_limit)
)
q = sa.select([
_cte.c.id.label('id'),
_cte.c.name.label('name'),
_cte.c.supervisor_id.label('supervisor_id'),
])
return session.query(Staff).from_statement(q).all()
CTE is a powerful feature that can help us to write more readable sql while retain the low maintainance cost.
A full example of code can be check out from the GitHub repo.
There are other solutions for the tree query, for instance:
Nested set
Hope we can talk more about them next time.