Monday, May 20, 2019

Relationships in sqlalchemy

Sqlalchemy is a powerful and flexible framework in python, to interact with relational databases.
This post will cover only a small portion of working with relationships.

What are the advantages of relationships ?

They allow us to query data of related objects along with the main object.
This can be done via a join, or separate queries, either eagerly, or lazily( when the related object is accessed)
Also, they make it easy to insert/delete/update data into related tables, especially in case of OTM/MTMs


Sample Entities

Consider the entities defined below :

----

class User(Base):
     __tablename__ = 'users'

     id = Column(Integer, primary_key=True)
     name = Column(String(50))

     addresses = relationship("Address", back_populates="user")

class Address(Base):
     __tablename__ = 'addresses'

     id = Column(Integer, primary_key=True)
     city = Column(String(50))
     street = Column(String(50))
   
     user_id = Column(Integer, ForeignKey('users.id'))
     user = relationship("User", back_populates="addresses")

----
Here, a user can have many addresses, reflected by the addresses relationship. An address on the other hand, belongs to a single user, reflected by the user relationship.

Sample Data

Consider the following data :

ed_user = User(name='Edward')
ed_user.addresses = [ Address(city='Pune'), Address(city='Mumbai')]
bob_user = User(name='Bob')
bob_user.addresses = [ Address(city='Pune'),Address(city='Delhi')]

session.add(ed_user)
session.add(bob_user)

Creating the tables

Its possible to create the tables needed for the entities using metadata.create_all() :

engine = create_engine('sqlite:///:memory:') # Memory engine
Session = sessionmaker(bind=engine)
session = Session()

User.metadata.create_all(engine) # Create the tables

Logging of sqls can be enabled with :
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)

Test Scenarios

All Users, with all addresses

qry_users = session.query(User).all()

print( "All users", [ (qu.name, [add.city for add in qu.addresses]) for qu in qry_users])

This is quite straightforward. We did not explicitly query for addresses. The users will be queried. Since the relationship loading default is lazy, queries for addresses of each user will be fired when the address details are accessed. Since a separate query is fired for each user, the lazy option is not performant if there are many rows of the main object, and we need to access addresses for each.

Users with a Mumbai address,  Mumbai addresses only

Why do we specify the Mumbai condition twice ? The first part is to filter users, and fetch only those with Mumbai addresses. The second part is to filter the addresses for each user, and restrict only to Mumbai addresses. This can be a bit confusing first. To filter the main object, we could do :

qry_users = session.query(User).filter(User.addresses.any(city='Mumbai')).all()

However, this will filter user, not addresses, so we will get non-Mumbai addresses too for each user.

In this case, since both are to be filtered, an inner join will suffice.

qry_users = session.query(User).join(User.addresses).options(contains_eager(User.addresses)).filter(Address.city=='Mumbai').all()

We joined with User.addresses, this way, we do not have to repeat the join condition, it is picked up from the relationship.
What is the need for the contains_eager ? It says that the related addresses have already been loaded from this query, do not fire the relationship queries again. Without it, not only will the related addresses query fire again(poor performance), but all addresses will be fetched, which we do not want.

Lets try to query all users again. What's this ? Edward's addresses show only Mumbai ! This is a result of caching. The Edward user object was last populated only with Mumbai address, and was cached along with its related objects. A session.rollback(), or session.expire_all() or session.expire(obj) can be used to clear the cache and make sqlalchemy fetch the latest data from the db. It would be a good idea to put one of these before each test scenario, to get the expected results.

All Users,  Mumbai addresses only

Note that in this case, we are not filtering user, only addresses. So if a user does not have a Mumbai address,she should still be listed, albeit with an empty addresses collection. i.e an outer join. This is usually true for related objects. This seems to be a straightforward case, and maybe something like filterrelated( obj, condition) should have been available. But its not. We have to again perform a join, an outer one.

----
addresses = User.metadata.tables['addresses'] # reference to a table object
sel = addresses.select().where(Address.city=='Mumbai')
qry_users = session.query(User).add_entity(Address).outerjoin(('addresses',  sel)).options(contains_eager(User.addresses)).all()

----
We have used a slightly different format, with the select, since i wanted to avoid duplicating the join condition with addresses. Sqlalchemy has many such options. Again, note the contains_eager,  to avoid querying for related addresses again.

----
print( "All users, Mumbai addresses only", [ (qu[0].name, [add.city for add in qu[0].addresses]) for qu in qry_users])
----
Note that with multiple entities selected in the join, the output is not a single entity, but multiple, wrapped in a Result object. Also, unlike with a single entity, the results will contain duplicates, as in a sql join. If we choose specific columns instead of the entire entity, the result will wrap the column without any entity. This is undesirable : changing the query changes the way in which results are accessed.

**Actually, with a contains_eager, one would expect to get only the User entity, with the address as a related entity. There are some inconsistencies or difficult to understand usages. Dropping the add_entity above, leads to a single User entity in the output.



qry_users = session.query(User).outerjoin(('addresses',  sel)).options(contains_eager(User.addresses)).all()


All Users with Delhi address,  all addresses

Here, we want to filter user using addresses, but fetch all addresses of the filtered users. This scenario shows how filtering and fetching related objects are separate things.

qry_users = session.query(User).filter(User.addresses.any(city='Delhi')).all()

A common mistake here might be to try User.addresses.city. Try to print type(User.addresses). Its an InstrumentedAttribute, not a list of Address. So it won't have a city member and trying to access it will throw a "AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with User.addresses has an attribute 'city'". However, the results of the query execution, will be entities, so  qu.addresses will be a list of addresses, as we have already seen above. Its important to understand the difference between Entity class and Entity instance.





3 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete