SQLAlchemy query adding pseudo-column of related data as an array of dicts

The following relates to a web-application backend written in python and using SQLAlchemy to interface to a postgres (v9.5) relational database.

Assume we have two tables: items and relations. Each item has an ID and may be related to multiple other items. Such relations are persisted in the relations table, which stores it’s own ID, the IDs of the two related items plus the name of the relationship.

Here is some sample data:

     items:                               relations:

  id      name                id     item1_id  item2_id    name
---------------              ----------------------------------
  1       car                  1        1         2       weight
  2       boat                 2        1         3       colour
  3       bike                 3        1         4        age
  4       jet-pack             4        5         1        age
  5       plane
  6       rocket          

So, for instance car and bike are related by colour.

I’m looking an SQLAlchemy code extract for a database query to return all items plus an additional derived-, or pseudo-column containing relations data for each tuple as an array of structs:

 id       name                                 relations
---------------------------------------------------------------------------------
  1       car            [ { 2, weight }, { 3, colour }, { 4, age }, { 5, age } ]
  2       boat           [ { 1, weight } ]
  3       bike           [ { 1, colour } ]
  4       jet-pack       [ { 1, age } ]
  5       plane          [ { 1, age } ]
  6       rocket         []

A good first step is to produce the SQL query before attempting convertion to SQLAlchemy. Closest to the above I’ve achieved:

SELECT i.*, 
  (SELECT array_agg(row(CASE WHEN r.item1_id=i.id THEN r.item2_id ELSE r.item1_id END, i.name)) 
   AS relations from relations r 
   WHERE r.item1_id=i.id OR r.item2_id=i.id) 
FROM items i ;

But a little nicer is to actually output relations as json:

SELECT i.*,
  (SELECT jsonb_agg(jsonb_build_object('item', CASE WHEN r.item1_id=i.id THEN r.item2_id ELSE r.item1_id END, 'relation', r.name))     
     AS relations
     FROM relations r   
     WHERE r.item1_id=i.id OR r.item2_id=i.id)
  FROM items i;

which yields:

 id       name                                 relations
---------------------------------------------------------------------------------
  1       car            [ { "item": 2, "relation": "weight" }, { "item": 3, "relation": "colour" }, { "item": 4, "relation": "age" }, { "item": 5, "relation": "age" } ]
  2       boat           [ { "item": 1, "relation": "weight" } ]
  3       bike           [ { "item": 1, "relation": "colour" } ]
  4       jet-pack       [ { "item": 1, "relation": "age" } ]
  5       plane          [ { "item": 1, "relation": "age" } ]
  6       rocket         []

How can this subquery be achieved in sqlalchemy?

Answer

After a long journey through SQLalchemy documentation and many web-sites, I found a solution based on a correlated sub-query here:

Can we make correlated queries with SQLAlchemy – Answer #1.

Adapting this as follows yields what I was looking for:

from sqlalchemy import select, func, table, Column, Integer, Text, case
from sqlalchemy.sql.expression import or_


items     = table('items',     Column('id', Integer), 
                               Column('name', Text))

relations = table('relations', Column('id', Integer), 
                               Column('item1_id', Integer), 
                               Column('item2_id', Integer), 
                               Column('name', Text))


subquery = select( [ func.jsonb_agg(
                       func.jsonb_build_object(
                         'item', case( [ ( relations.c.item1_id == items.c.id, relations.c.item2_id ) ],
                                       else_ = relations.c.item1_id),
                         'relation', relations.c.name
                       )
                     )
                   ]
                 ).where(or_(relations.c.item1_id == items.c.id, relations.c.item2_id == items.c.id) 
                 .correlate(items)

query = (
    select([items.*, subquery.label('relations')]).select_from(items)
)

Leave a Reply

Your email address will not be published. Required fields are marked *