Hierarchical Queries on RDBMS With JPA
Hello! In this post I’ll explore the different alternatives for querying hierarchical data stored on a RDBMS using JPA.
What are we trying to do?
Let’s use a silly example to illustrate what we are trying to achieve. Say we want to model mother-daughter relationships, we can do so with an entity like this:
In OO-talk that’s just one Type with a reference to itself like this:
And once we start creating a bunch of mothers, we’ll get a directed acyclic graph. Although for this example, and to keep things simple, we’ll just model the offspring of just one single person so we get a nice tree like this:
But in DB this is stored in 2 tables. One contains the Woman data and the other one the relationships.
1 2 3 4 5 6 7 8 9 10 |
|
We’ll focus in just one single type of query, namely:
Find a node and return the whole subtree underneath
Or in domain specific terms:
Find a woman and return her along with her offspring
Alternatives
Let’s explore our options. For the following examples I’ll be using Kotlin, Spring Data, JPA, Hibernate and an Oracle Database. Thanks to JPA you’ll notice that most of the time the solutions provided work on whatever you’ve decided to use (I’ll note it when it’s not the case). You can use Java instead of Kotlin, Eclipselink instead of Hibernate and MySQL instead of Oracle, and even abandon Spring completely.
Because this is meant to be a formative post I’ll start with some of the “wrong” approaches first and explain its downsides. If you’re in a rush you can just scroll to the bottom 😃
Eager fetching
The simplest solution is using Eager fetching. We can enable eager fetching specifying the fetch
parameter on the @OneToMany
annotation like this:
Great, it works! But wait, why do I see so many SELECT calls to the database? As you’ve probably figured out the issue with this approach is that it falls in the N + 1 SELECT problem.
Some authors even go as far as considering eager fetching a code smell. In the words of Vlad Mihalcea (emphasis mine):
The EAGER fetching strategy is a code smell. Most often it’s used for simplicity sake without considering the long-term performance penalties. The fetching strategy should never be the entity mapping responsibility. Each business use case has different entity load requirements and therefore the fetching strategy should be delegated to each individual query.
Using join fetch
Another alternative would be to use join fetch
as part of the JPQL query.
With join fetch
we can tell JPA to do a join
to execute the query and also include the data as part of the response object when mapping to the OO world.
The huge downside is that we’re actually querying ALL women and only then selecting the one we were looking for from the resultList
. This means that, unless you’re querying for the root node, you’ll always be doing extra work.
And no, a where
clause can’t help us here. Using where w.id = :id
would leave us with just the root and one level of descendants in the resultList.
Unfortunately I couldn’t find a way of making this a single JPQL query. Vlad Mihalcea has a great article explaining how to use join fetch
to query all the leaves of the tree and then build the associations all the way up the entity hierarchy. Sadly we can’t apply the same approach here because we’re querying a multi-level homogeneous tree. That means that all the nodes are of the same type, and thus mapped to the same table, and furthermore leaves can be N levels deep. So if we try Vlad’s approach we’d end up with the similar where
problem mentioned above.
Entity graph
Another proposed solution is to use Entity Graphs introduced in JPA 2.1. You can use Entity Graphs as a query hint to specify which fields you want eagerly fetched as part of the query.
Entity graphs have attributes that correspond to the fields that will be eagerly fetched during a find or query operation
They come in 2 flavors. Static through annotations:
Or you can construct the graph programmatically to have some more flexibility:
I thought: “This is it IT! If I can manually build the graph for each query I can even specify how many levels deep I want to go”. But once again, it didn’t quite work 😞
I tried creating graphs with subgraphs N levels deep but the query would just ignore them and only fetch up to 2 levels. From the annotations flavor this limitation is clear since:
- You can’t use a subgraph reference on an attribute of the same graph recursively (it’ll fail at runtime)
- And
@NamedSubgraph
can’t have othersubgraphs
defined underneath.
I was hoping this would be possible through the API, but apparently it’s not. Maybe Hibernate is to blame here, I don’t really know… But I couldn’t find any documentation beyond the basic 2 level example.
Using @BatchSize
Going back to the Eager Fetch approach, we hated it because it resulted in N queries where N is the number of Women in the table. With @BatchSize
we can mitigate this.
By setting the batch size to 10 we’re basically saying: “Since you’re going to the DB to execute a query bring up to 10 elements instead of just one”. Effectively reducing the number of roundtrips to the database.
@BatchSize
is a Hibernate specific annotation but on Eclipselink you can use eclipselink.batch.size
to achieve the same result. In fact, remember that as a rule of thumb it’s better to apply this kind of configurations to the query instead of using an annotation on the entity.
Connect by (Best solution!)
If you’re using Oracle then we can solve this using a native query with CONNECT BY
. Turns out this clause was created specifically for selecting rows in hierarchical order.
We use START WITH
to specify the root node of the hierarchy, and then CONNECT BY
to specify the relationship between rows. Our query would look like this:
1 2 3 4 5 |
|
Note how in the CONNECT BY
clause we use PRIOR
to refer to the parent row.
This achieves exactly what we want in just one query!. But this doesn’t come for free. By using native queries we are stepping out of the JPA magical world™. This means that what the query returns is not an object but a list of rows of the form:
1 2 3 4 5 6 7 8 9 10 |
|
So we’ll have to write the code to transform this back to our domain model. To achieve this I used the @SqlResultSetMapping
to model the native query result.
And we can finally write the query like this:
Using the buildGraph
function to traverse the query result and reconstruct the root Woman
object.
If you’re using HibernateCallback
here’s another way of writing the result mapping: https://docs.spring.io/spring/docs/2.0.x/javadoc-api/org/springframework/orm/hibernate/HibernateCallback.html
If you’re not using Oracle (or some other DB that supports CONNECT BY
) you can do recursive queries using Common Table Expression (CTE) to achieve a similar result.
Is RDBMS right for you?
If you find yourself bending over backwards to make this graph-like queries work, or if you need a query language that will let you express something like:
Find all women whose great-grandmother is named Carol and have at least one descendant named Brenda
Then maybe you should consider your options beyond RDBMS. You can take a look at graph databases such as Neo4j and get familiar with it’s query language. Or, if you’re not ready to make the jump but want to explore a graph API on top of your relational data you should take a look at Oracle Spatial and Graph.
Phew! that was a long one… Hope you find it useful!
If you know of a better way of doing this kind of queries I’d love to hear about it, leave me a comment down here👇 or just ping me on Twitter.