Export big csv with Rails Active Job and Action Cable

Situation Overview

Exporting data was a nightmare to us all at FCS. We used to have it running directly in the controller, which causes system crash when client try to export, let’s say 15000 records. We tried optimizing the query, changing the data structures, etc. But none of it could solve the problem as long as the export code is still standing in the controller.

One week ago, I was paired with Austin(our team lead) to do some refactoring around our system, and I thought this should be fixed ASAP since this is a feature used frequently by our clients. So I did some research and discussed with Austin that what we could/should do is to move the export to a background job, so that it doesn’t block the main thread of the server anymore.

He thought it was a good idea, so I went on and implement it using Rails Active Job with Sidekiq. However, there was a problem with using Active Job, the job is executed in the background so it doesn’t inform the main server when it is completed. So we executed the job, got the exported file, and now what? We cannot give the file back to the server to return it to our client. This is where Rails Action Cable comes in.

Where it all started

So here I have an example controller with an export method to return the data csv

and this is what inside  app/commands/export_command.rb 

And here it is in action

As you can see, this is definitely doesn’t work out. Now let’s move the export to background job so our web page can load properly.

Moving it to the background

rails g job Export

This command will generate  app/jobs/export_job.rb . Now we move the export code to the export_job.rb file, and call the job inside the controller instead.

This is our  ExportJob 

So our controller will now be like this, it will just return  202 Accepted  to the request and move on, the csv generation will now be handle in the background instead of in our main app

Now let’s see how it goes, remember to have sidekiq running or the job will not be performed, open a new terminal tab, go the the project folder and run

bundle exec sidekiq

We can see that no file is returned. The reason is as I said before:

The job is executed in the background so it doesn’t inform the main server when it is completed.

Action Cable to the rescue

In order to return the csv file to the client, we will create a channel using Action Cable and broadcast the file back when it’s ready.

First, we generate a channel called  ExportChannel  by this command:

rails g channel export

This command will generate a bunch of files you need to make the channel work
Now we will config cable to use Redis to broadcast our data, go to  config/cable.ylm  and do

Next, we will create a subscription to our ExportChannel, go to  app/assets/javascripts/channels/export.coffee  and change its content to:

We wrap the subscription inside a function that receive an  export_id  param so that it only trigger when the button is clicked. After receiving the data, it will disconnect from the channel and terminate itself.

Now that we have the subscription, we will edit the channel to match our subscription. Change the content of   app/channels/export_channel.rb  to the following

Then we will add the code to broadcast our file in the job so it will return the csv when the job is performed

And we need to pass the  export_id  from the params to the job so that it’ll be able to broadcast to the correct channel. Finally, we’ll make the button generate an UUID and append to the URL then subscribes to the channel on click, we will also disable it and re-enable it when the data is returned, go to  app/assets/data.coffee  and do:

and we’ll add an extra class to the button in the view so it’s easier to identify, this is our  app/views/data/index.html.erb 

Now let’s run it again

That’s it. Now we can do both the export and other stuff at the same time. The only note I’d add is that this method will create 2 jobs, 1 is when we click the link, the other is when we modify the link and call get method again. This might raise some performance issues.

As you can see in the sidekiq server log, it started 2 jobs for 2 request. The temporary fix for this is to add a guard clause to the job so that it returns when there is no  export_id . Like this:

Then it will decrease the time of the 1st job by a lot

That is it for now, if you have any other solutions. Feel free to share with us. Until next time.

This demo project is available on github

Whatsapp: +84932129990