Extending the SQLAlchemy SQL expression language

The SQL expression language from SQLAlchemy is already very flexible and allows you to build almost any standard SQL query, but sometimes you just need to use a SQL extension that isn’t supported by SQLAlchemy.

Suppose I have a very simple music datababse and I want a list of all artists with their latest album name. Oracle has special aggregate functions FIRST/LAST, which together with the KEEP clause can help with queries like this, but unfortunately it isn’t supported by the Oracle dialect in SQLAlchemy. The SQL query would look like this:

SELECT
    artist,
    MIN(name) KEEP (DENSE_RANK LAST ORDER BY release_date)
FROM album
GROUP BY artist

Recently, I’ve learned that it’s possible to extend the SQL compiler and add support for custom clauses that SQLAlchemy doesn’t understand natively. In my specific case of the KEEP clause, I need this code to make it work:

import itertools
from sqlalchemy.util import to_list
from sqlalchemy.sql.expression import ColumnElement, ClauseList
from sqlalchemy.ext.compiler import compiles

class Keep(ColumnElement):

    def __init__(self, func, order_by, first=True):
        super(Keep, self).__init__()
        self.func = func
        self.order_by = ClauseList(*to_list(order_by))
        self.first = first

    @property
    def type(self):
        return self.func.type

def keep_first(func, order_by):
    return Keep(func, order_by)

def keep_last(func, order_by):
    return Keep(func, order_by, first=False)

@compiles(Keep)
def compile_keep(keep, compiler, **kwargs):
    return "%s KEEP (DENSE_RANK %s ORDER BY %s)" % (
        compiler.process(keep.func),
        "FIRST" if keep.first else "LAST",
        compiler.process(keep.order_by)
    )

With this, I could simply use keep_first() and keep_last() as any other SQL expression functions. For example, the above SQL query would have been written like this:

session.query(Album.artist,
              keep_last(sql.func.min(Album.name), Album.release_date)).\
    group_by(Album.artist)

The main advantage of having it all written as a SQL expression is that I don’t need to know the actual column names, which get more and more complicated once you have some joins and subqueries in the SQL expression.