Snippet for Fun

Tree Query with recursive CTE

Posted at — Aug 22, 2019

Background

Most internal system (as known as HR management system) will provide a tree view of the organization’s hierarchy. It’s simple to query the whole tree from the root, but what if one day morning, your boss asks you to implement a feature that he/she can select an arbitrary node as root and specify a depth to filter the children? How can you make it happen before this day off? One quick solution is Common Table Expression (CTE).

We can also use CTE in querying a comments thread, for example, a reply thread like Reddit. In this kind of comments thread system, the user can reply to any single comment, and each comment represents the root of a sub tree. CTE can help us write a single, simple SQL to query the comments tree from the database, by its recursion feature.

Example

Let’s build a very basic organization tree map. In this system, the user can:

Basic Schema

From the requirements, we can easily design a table schema like this:

class Staff(Base):

    __tablename__ = 'staff'

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(64), nullable=False)
    supervisor_id = sa.Column(
        sa.Integer,
        sa.ForeignKey('staff.id'),
        nullable=True,
    )
    supervisor = relationship(
        'Staff',
        backref=backref('subordinates', cascade='all, delete-orphan'),
        remote_side=[id],
        # allow circular
        post_update=True,
    )

    def __str__(self):
        supervisor = '(none)'
        if self.supervisor:
            supervisor = f'({self.supervisor.id}, {self.supervisor.name})'
        return f'<Staff id={self.id} name={self.name} supervisor={supervisor}>'

    def __repr__(self):
        return self.__str__()

The creation part is easy, we just construct a staff with the associated settings and persist it to the database:

alice = Staff(name='alice')
session.add(alice)
bob = Staff(name='bob', supervisor=alice)
session.add(bob)

session.commit()

(Recursive) Common Table Expression

Common Table Expression (CTE) is a way to specify a temporary named result set with a select query. This result set can be used in the associated execution scope (includes SELECT / INSERT / UPDATE / DELETE, may vary with database system). In most RDBMS, this clause can also be used for creating view.

Most CTE syntax looks like this:

WITH name_of_the_cte (column_1, column_2, ...)
AS
(
  SELECT table.field1 as column_1, table.field2 as column_2
  FROM table
  WHERE
    table.field3 = 'limit the rows'
)
SELECT count(column_1) as c
FROM name_of_the_cte
GROUP BY column_2

Recursive CTE

The more powerful part of CTE is the recursive usage. Recursive means inside the CTE query, we can refer to the CTE (temporary table) too:

WITH RECURSIVE recursive_cte (depth)
AS
(
  SELECT 1           -- initial step
  UNION ALL          -- combine all these steps
  SELECT depth + 1   -- reduce the depth (generate new value base on last step)
  FROM recursive_cte -- refer to the temporary table
  WHERE depth < 10   -- limit the steps can run
)

By adding (self) recursive usage, we can model tree like structures more easily.

MySQL CTE

MySQL added support for CTE since 8.0, you can find out more in here.

Query a sub-tree

So let’s implement the query function. The main idea here is to read the direct children in the initial step, then query all direct children of these children in the reduce step:

WITH RECURSIVE staff_cte (id)
AS
(
  SELECT id                                  -- initial step
  FROM staff
  WHERE
    supervisor_id = {the_root_of_the_tree}
  UNION ALL
  SELECT staff.id                            -- reduce step
  FROM staff, staff_cte
  WHERE
    staff.supervisor_id = staff_cte.id       -- select all direct children
                                             -- from last step
)

SQLAlchemy empowered us with built-in CTE support:

def query_tree(session, supervisor_id: int) -> List[Staff]:
    _cte_base = session.query(Staff) \
        .filter(Staff.supervisor_id == supervisor_id) \
        .cte(name='staff_cte', recursive=True)
    _cte_base_alias = _cte_base.alias()

    _cte = _cte_base.union_all(
        session
        .query(Staff)
        .filter(_cte_base_alias.c.id == Staff.supervisor_id)
    )

    q = sa.select([
        _cte.c.id.label('id'),
        _cte.c.name.label('name'),
        _cte.c.supervisor_id.label('supervisor_id'),
    ])
    return session.query(Staff).from_statement(q).all()

Query a sub-tree, with depth limit

To set limit to the query, we can add a constraint like this:

WITH RECURSIVE staff_cte (id, depth)
AS
(
  SELECT id, 1 as depth                      -- initial step
  FROM staff
  WHERE
    supervisor_id = {the_root_of_the_tree}
  UNION ALL
  SELECT staff.id, staff_cte + 1 as depth    -- reduce step
  FROM staff, staff_cte
  WHERE
    staff.supervisor_id = staff_cte.id       -- select all direct children
                                             -- from last step
    AND staff_cte.depth < depth_limit        -- limit the depth of the tree
)
def query_tree_with_depth_limit(
    session,
    supervisor_id: int,
    depth_limit: int,
) -> List[Staff]:
    _cte_base = sa.select([
        Staff,
        sa.sql.literal_column('1').label('depth_limit'),
    ]) \
        .where(Staff.supervisor_id == supervisor_id) \
        .cte(name='staff_cte', recursive=True)
    _cte_base_alias = _cte_base.alias()

    _cte = _cte_base.union_all(
        sa.select([
            Staff,
            (_cte_base_alias.c.depth_limit + 1).label('depth_limit'),
        ])
        .where(_cte_base_alias.c.id == Staff.supervisor_id)
        .where(_cte_base_alias.c.depth_limit < depth_limit)
    )

    q = sa.select([
        _cte.c.id.label('id'),
        _cte.c.name.label('name'),
        _cte.c.supervisor_id.label('supervisor_id'),
    ])
    return session.query(Staff).from_statement(q).all()

Conclusion

CTE is a powerful feature that can help us to write more readable sql while retain the low maintainance cost.

A full example of code can be check out from the GitHub repo.

Bonus

There are other solutions for the tree query, for instance:

Hope we can talk more about them next time.