Using eager_load with order and Pagy in Active Model might cause the issue.

Origin

One week ago, our FCS QA/QC mentioned that the pagination info we displayed does not tally with the number of items we listed. Originally, I thought it is a simple bug, a bug caused by our presenter. We use a presenter in our listing page that can customize the info we want to display. Our presenter might filter out some data after we do the pagination of our data. But I was wrong about this.

After I traced Pagy’s source code and did lots of tests, I figured out the reason why we can not get the result we want. I will use a simple example to explain this and apply one solution to it below.

Scenario Simulation

Let’s say we have a website that allows different authors to post articles on it. And the readers can upvote the post to support the post they like. We have a page that lists all the authors and order by the upvotes of the author’s post. And the amount of the authors might be very huge, so we need to apply the pagination for it.

Reproduce The Issue I Faced

I create a sample app for this in Github. To focus on the topic I simplify the app a lot.

The next step is to add sample data. Let’s add at least 25 authors, two with two posts, two with one post and leave others with zero posts. Start the server and input the URL  http://lvh.me:3000/authors  in our browser and see what it will show.

The result will be sth like this.

Obviously, the pagination info does not tally with the number of authors. This is not what we want. What causes this and how could we fix this?

Investigate and Figure Out Why

Let’s take a look of the form  app/forms/author_search_form.rb . This is the key to this issue.


Another key is that author to post is a one-to-many relationship.

The picture below shows how rails deal with this.

The purpose of the first SQL is to get the total number of authors. From the resulting picture above, we know the result is correct. The third SQL just gets the info by the id provided by the second SQL. And you can see that the third SQL get the same data twice with id 1 and 5. It seems that the second SQL has some issues. Let’s examine the second SQL carefully.

The second SQL is  SELECT DISTINCT upvotes AS alias_0, “authors”.”id” FROM “authors” LEFT OUTER JOIN “posts” ON “posts”.”author_id” = “authors”.”id” ORDER BY upvotes LIMIT 20 OFFSET 0 . And the result of running it is on the picture below.

From this picture, we can know the reason why this happened. After we joined the table, the author with more than two posts will have duplicated data. Even though rails use  SELECT DISTINCT  in the beginning, [15, 5] is not the same as [80, 5]. (More about SELECT DISTINCT) When we executed the third SQL, we got the same data by only using the id from the result of the second SQL. That’s why 20 records became 18. Now we understand why this happened, then the next question will be that how could we fix this?

Solution
In order to solve this, we need to know what is SQL join. First, we need to decide how we going to order with multiple post authors. Let’s choose the max upvotes of it. The modified code will be sth like below.


Conclusion

Active Model provides a lot of functions for us to use. We may know when to use it, but we might not know how it works. Active Model also hides a lot of SQL queries from us. We should learn the concept of SQL and what query rails generate if we want to master Rails.

In this case, the key to it is very simple. We want to sort by the attribute of the post, but what about the author with multiple posts? We ignore this, so the issue appears. However, when the search form becomes more and more complex, then this is not easy to find out. Knowing how Rails generates SQL queries and the concept of SQL become more important.