How To Avoid N + 1 problem in Rails

Author profile picture

Full stack web developer Ruby | JavaScript | PHP | CSS | RoR

Almost any app that we build has a lot of interactive options with a lot of data, to this we can also have conditions to which data should we show to what type of user, making us request data from different tables based on a unique identifier, if the queries are not handled in a proper manner this can slow down the loading, and that is because of the "N + 1" problem.
What is the "N + 1" problem?
The problem occurs when we have to make a query to get each of the associations, "1" being the first query to get the parent and "N" the number of queries required to get all the associations, let's take a look into an example.
class Author < ApplicationRecord has_many :articles
class Article < ApplicationRecord belongs_to :author
class SalesController < ApplicationController def index @articles_list = Article.where("created_at >= ? AND created_at <= ?", start_date, end_date) end
<% @articles_list.each do |article| %> <h3><%= article.created_at %></h3> <h1><%= %></h1> <br>
<% end %>
If we run the example above we will notice that we are going to get more than one query in the terminal.

We have on query to get all the articles in the range of date

Article Load (0.4ms) SELECT "articles".* FROM "articles" WHERE (published_at >= '2020-03-13' AND published_at <= '2020-06-24')

And we have an extra query to get the authors name of each article
Author Load (0.1ms) SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT ? [["id", 3], ["LIMIT", 1]]
Author Load (0.1ms) SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT ? [["id", 4], ["LIMIT", 1]]
Author Load (0.1ms) SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT ? [["id", 1], ["LIMIT", 1]]
Author Load (0.1ms) SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT ? [["id", 2], ["LIMIT", 1]]
This is the "N + 1" problem, we have one query to get all the article but once we try to access the child (the relation to the author model) it will create a new query for each one.
There are Three ways to avoid this problem Preload, Includes and eager_load, we will not talk about preload since the behavior is pretty much the same as includes, but if we use Preload we can't use a "where" clause in the associated table and with Includes we can!
Includes helps us to fix the problem by making only two queries to get all the data, let's modify our example and take a look to the terminal.
@articles_list = Article.includes(:author).where("published_at >= ? AND published_at <= ?", '2020-03-13', '2020-06-24')
We only have to add "includes(:name_of_association)" next to the name of the model and we will be using includes!
Now let's refresh and take a look at our terminal
Article Load (0.4ms) SELECT "articles".* FROM "articles" WHERE (published_at >= '2020-03-13' AND published_at <= '2020-06-24') Author Load (0.5ms) SELECT "authors".* FROM "authors" WHERE "authors"."id" IN (?, ?, ?, ?) [["id", 3], ["id", 4], ["id", 1], ["id", 2]]
Now we only make two queries to get all the data of the main model and the specified associations if we want to get all the associations we have to add the name to the includes like this
Model.includes(:association_one, :association_two)
This will create a single query using a left outer join to get the data of the main table and the association, we can also use a where clause to give conditions to the associated table, to use it is pretty much the same as we do with includes


@articles_list = Article.eager_load(:author).where("published_at >= ? AND published_at <= ?", start_date, end_date)
We have to add the name of the association in the parenthesis next to eager_load as we would do it with includes


SQL (0.4ms) SELECT "articles"."id" AS t0_r0, "articles"."title" AS t0_r1, "articles"."published_at" AS t0_r2, "articles"."author_id" AS t0_r3, "articles"."created_at" AS t0_r4, "articles"."updated_at" AS t0_r5, "authors"."id" AS t1_r0, "authors"."name" AS t1_r1, "authors"."age" AS t1_r2, "authors"."city" AS t1_r3, "authors"."created_at" AS t1_r4, "authors"."updated_at" AS t1_r5 FROM "articles" LEFT OUTER JOIN "authors" ON "authors"."id" = "articles"."author_id" WHERE (published_at >= '2020-03-13' AND published_at <= '2020-06-24')
As we can see with this we only need a single query to get all the data, the structure of the query will be created using a left outer join.
As I metioned earlier we didn't used Preload since Includes does pretty much the same but we the difference that we can use a where clause, and Includes can also do the same as eager_load, we can force a left outer join using references
@articles_list = Article.includes(:author).references(:author).where("published_at >= ? AND published_at <= ?", '2020-03-13', '2020-06-24')
Not only that but when using includes even without the use of references sometimes it will automatically choose to use a left outer join query, then why don't we just stick with includes? well the reason is that although is designed to use the best possible pattern, sometimes it can go wrong and force a join where is not needed slowing down the process, you can get it right more often than rails.
The Noonification banner

Subscribe to get your daily round-up of top tech stories!