Like a pair of jumper cables, ActiveRecord's joins
, includes
, preload
, and eager_load
methods are incredibly useful, but also very dangerous when used incorrectly. Knowing when and where to use each approach - and even when to combine them - can save you considerable trouble as your app grows.
I'll explore the when and where of each method below.
joins
?If you are just filtering results - not accessing records from a relationship - joins
is your goto. The example below fetches all blog posts with a comment authored by Derek. I'm not accessing any of the associated comments, so joins
is a great fit:
Post.joins(:comments).where(:comments => {author: 'Derek'}).map { |post| post.title } Post Load (1.2ms) SELECT "posts".* FROM "posts" INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" WHERE "comments"."author" = $1 => ["One weird trick to better Rails apps", "1,234 weird tricks to faster Rails apps", "You wouldn't believe what happened to this Rails developer after 14 days"]
By itself, no. joins
does not load data from the relationship into memory: accessing columns from the relationship will trigger N+1 queries.
For example, notice all of the additional queries when accessing the Comment
relationship:
Post.joins(:comments).where(:comments => {author: 'Derek'}).map { |post| post.comments.size } Post Load (1.2ms) SELECT "posts".* FROM "posts" INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" WHERE "comments"."author" = $1 (1.0ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 (3.0ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 (0.3ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 (1.0ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 (2.1ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 (1.4ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 => [3,5,2,4,2,1]
includes
, preload
, and eager_load
?Yes. The join type specified by joins
(default is INNER JOIN
) will override any join applied by includes
or eager_load
. Note that preload
doesn't apply a join.
Yes. includes
will load (1) all records of the parent and (2) all of the records referenced as arguments in the includes
method.
Notice how using includes
in the example below only triggers 1 additional query. Without includes
, there would be an additional query to count the number of comments for every post:
Post.includes(:comments).map { |post| post.comments.size } Post Load (1.2ms) SELECT "posts".* FROM "posts" Comment Load (2.0ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (1, 3, 4, 5, 6) => [3,5,2,4,2,1]
includes
always generate a separate query to fetch the records in the relationship?No. includes
will either use a separate query (like above) or a LEFT OUTER JOIN
. If you have a where
or order
clause that references a relationship, a LEFT OUTER JOIN
is used versus a separate query.
Digging through the ActiveRecord source, I don't believe ActiveRecord makes the decision to use two queries or a single join query based on performance. If you are seeing slow performance with an includes
query, I'd suggest using a tool like Scout DevTrace locally and examining which approach ActiveRecord is using when running includes
.
If two queries are being used, you can experiment with a single LEFT OUTER JOIN
query by adding references
to the ActiveRecord relation:
Post.includes(:comments).references(:comments).map { |post| post.comments.size }
ActiveRecord will return all of the parent records and just the relationship records that match the condition.
For example, the following will return all Post
records with a Comment
by Derek, and just those comments authored by Derek:
Post.includes(:comments).references(:comments).where(comments => {author: 'Derek'}).map { |post| post.comments.size }
No. If you are accessing data in a nested relationship, that data isn't preloaded. For example, an additional query would be required to load the Comment#likes
association for each comment:
<%> <%=> %=>%>
Yes. You can load nested relationships via includes
:
Post.includes(comments => :likes).references(:comments).map { |post| post.comments.size }
No. It's very easy to end up initializing a significant number of records. For example, a popular Comment
may have thousands of Like
records, which would result in a slow query and significant memory allocations.
A tool like Scout DevTrace running locally against production-like data can help you determine the faster approach.
Sometimes, but not by default. I use preload
versus includes
if I know using a LEFT OUTER JOIN
to load a relationship is significantly slower. Otherwise, if I add where
or order
clauses at a later date, those clauses would trigger eager_load
, which would trigger a join.
joins
with preload
?If I need all relationship records - not just just those that match a relationship condition - I'll combine preload
and joins
. For example:
Post
records with a Comment
authored by DerekPost
records and the total count of comments for each postincludes
will only fetch Comment
records authored by Derek, not all comments associated with each post.
Post.joins("LEFT OUTER JOIN comments ON comments.post_id = posts.id").where(:comments => {author: 'Derek'}).preload(:comments).map { |post| post.comments.size }
includes
delegates to eager_load
when a where
or order
clause references a relationship.
Yes. If I've found includes
to be slow using two queries, using eager_load
will force a single query via a LEFT OUTER JOIN
. Its presence in my code indicates I'm future-proofing against using two queries to fetch records.
Yes. In the following example:
Post.joins(:comments).eager_load(:comments).map { |post| post.comments.size }
ActiveRecord will do the following:
Array
of Post
records with comments. Post
.It's includes
with an INNER JOIN
vs. a LEFT OUTER JOIN
.
I'd roughly summarize my approach to these methods like this:
joins
.includes
.includes
is slow using two separate queries, I'll use eager_load
to force a single query and compare performance.There are many edge cases when accessing relationships via ActiveRecord. Hopefully this is enough to prevent some of the more basic performance deadends when using joins
, includes
, preload
, and eager_load
.
Want more Rails insights like this delivered monthly to your inbox? Just put your email into the sidebar form.