Entry No. 14

Using eager_load with order and Pagy in Active Model might cause the issue.

Origin

One week ago, our FCS QA/QC mentioned that the pagination info we displayed does not tally with the number of items we listed. Originally, I thought it is a simple bug, a bug caused by our presenter. We use a presenter in our listing page that can customize the info we want to display. Our presenter might filter out some data after we do the pagination of our data. But I was wrong about this.

After I traced Pagy’s source code and did lots of tests, I figured out the reason why we can not get the result we want. I will use a simple example to explain this and apply one solution to it below.

Scenario Simulation

Let’s say we have a website that allows different authors to post articles on it. And the readers can upvote the post to support the post they like. We have a page that lists all the authors and order by the upvotes of the author’s post. And the amount of the authors might be very huge, so we need to apply the pagination for it.

Reproduce The Issue I Faced

I create a sample app for this in Github. To focus on the topic I simplify the app a lot.

The next step is to add sample data. Let’s add at least 25 authors, two with two posts, two with one post and leave others with zero posts. Start the server and input the URL http://lvh.me:3000/authors in our browser and see what it will show.

The result will be sth like this.

As you can see, the pagination info shows items 1-20. But the number of authors displayed is 18.

As you can see, the pagination info shows items 1-20. But the number of authors displayed is 18.

Obviously, the pagination info does not tally with the number of authors. This is not what we want. What causes this and how could we fix this?

Investigate and Figure Out Why

Let’s take a look of the form app/forms/author_search_form.rb. This is the key to this issue.

class AuthorSearchForm

  attr_reader :results

  def submit
    @results = Author.eager_load(:posts).all.order('upvotes')
  end

end

Another key is that author to post is a one-to-many relationship.

The picture below shows how rails deal with this.

The first SQL is for Pagy to get the total number of authors. The second SQL is for Pagy to get the first 20 data. The third SQL is to get the info of related authors and posts.

The First SQL is for Pagy to get the total number of authors. The second SQL is for Pagy to get the first 20 data. The third SQL is to get the info of related authors and posts.

The purpose of the first SQL is to get the total number of authors. From the resulting picture above, we know the result is correct. The third SQL just gets the info by the id provided by the second SQL. And you can see that the third SQL get the same data twice with id 1 and 5. It seems that the second SQL has some issues. Let’s examine the second SQL carefully.

The second SQL is SELECT DISTINCT upvotes AS alias_0, "authors"."id" FROM "authors" LEFT OUTER JOIN "posts" ON "posts"."author_id" = "authors"."id" ORDER BY upvotes LIMIT 20 OFFSET 0. And the result of running it is on the picture below.

The results are ordered by upvotes for sure, but they have two duplicated ids.

The results are ordered by upvotes for sure, but they have two duplicated ids.

From this picture, we can know the reason why this happened. After we joined the table, the author with more than two posts will have duplicated data. Even though rails use SELECT DISTINCT in the beginning, [15, 5] is not the same as [80, 5]. (More about SELECT DISTINCT) When we executed the third SQL, we got the same data by only using the id from the result of the second SQL. That’s why 20 records became 18. Now we understand why this happened, then the next question will be that how could we fix this?

Solution

In order to solve this, we need to know what is SQL join. First, we need to decide how we going to order with multiple post authors. Let’s choose the max upvotes of it. The modified code will be sth like below.

class AuthorSearchForm

  attr_reader :results

  def submit
    @results = Author.joins(join_query).all.order('upvotes')
  end

  private

  def join_query
    'LEFT JOIN posts
       ON authors.id = posts.author_id
         And upvotes = (Select Max(upvotes) From posts Where authors.id = posts.author_id)'
  end

end

The pagination info tally with the number of authors displayed now.

The pagination info tally with the number of authors displayed now.

The numbers are the same now! But it seems that the logic is not correct. I want the most upvotes to be on the top and the author without a post to be on the bottom. The final code will be like sth below.

class AuthorSearchForm

  attr_reader :results

  def submit
    @results = Author.joins(join_query).all.order('upvotes DESC NULLS LAST')
  end

  private

  def join_query
    'LEFT JOIN posts
       ON authors.id = posts.author_id
         And upvotes = (Select Max(upvotes) From posts Where authors.id = posts.author_id)'
  end

end

Conclusion

Active Model provides a lot of functions for us to use. We may know when to use it, but we might not know how it works. Active Model also hides a lot of SQL queries from us. We should learn the concept of SQL and what query rails generate if we want to master Rails.

In this case, the key to it is very simple. We want to sort by the attribute of the post, but what about the author with multiple posts? We ignore this, so the issue appears. However, when the search form becomes more and more complex, then this is not easy to find out. Knowing how Rails generates SQL queries and the concept of SQL become more important.