Entry No. 13

Export big csv with Rails Active Job and Action Cable

Situation Overview

Exporting data was a nightmare to us all at FCS. We used to have it running directly in the controller, which causes system crash when client try to export, let’s say 15000 records. We tried optimizing the query, changing the data structures, etc. But none of it could solve the problem as long as the export code is still standing in the controller.

One week ago, I was paired with Austin(our team lead) to do some refactoring around our system, and I thought this should be fixed ASAP since this is a feature used frequently by our clients. So I did some research and discussed with Austin that what we could/should do is to move the export to a background job, so that it doesn’t block the main thread of the server anymore.

He thought it was a good idea, so I went on and implement it using Rails Active Job with Sidekiq. However, there was a problem with using Active Job, the job is executed in the background so it doesn’t inform the main server when it is completed. So we executed the job, got the exported file, and now what? We cannot give the file back to the server to return it to our client. This is where Rails Action Cable comes in.

Where it all started

So here I have an example controller with an export method to return the data csv


class DataController < ApplicationController

  def index; end

  def export
    respond_to do |format|
      format.csv { send_data ExportCommand.call }
    end
  end

end

and this is what inside app/commands/export_command.rb

  require 'csv'

  class ExportCommand
    def self.call
      sleep 10

      file = CSV.generate do |csv|
        (1..15_000).each do |_i|
          csv << %w[data1 data2 data3 data4 data5 data6 data7]
        end
      end

      file
    end
  end

And here it is in action

As you can see, the server doesn’t respond to any other request while it’s exporting. This is unacceptable.

As you can see, the server doesn't respond to any other request while it's exporting. This is unacceptable.

As you can see, this is definitely doesn’t work out. Now let’s move the export to background job so our web page can load properly.

Moving it to the background

rails g job Export

This command will generate app/jobs/export_job.rb. Now we move the export code to the export_job.rb file, and call the job inside the controller instead.

This is our ExportJob

class ExportJob < ApplicationJob
  queue_as :default

  def perform(data)
    ExportCommand.call
  end
end

So our controller will now be like this, it will just return 202 Accepted to the request and move on, the csv generation will now be handle in the background instead of in our main app

class DataController < ApplicationController

  def index; end

  def export
    ExportJob.perform_later
    head :accepted
  end

end

Now let’s see how it goes, remember to have sidekiq running or the job will not be performed, open a new terminal tab, go the the project folder and run

bundle exec sidekiq
Now it runs smoothly. But where's our file? (゚ー゚)
Now it runs smoothly. But where's our file? (゚ー゚)

We can see that no file is returned. The reason is as I said before:

The job is executed in the background so it doesn’t inform the main server when it is completed.

Action Cable to the rescue

In order to return the csv file to the client, we will create a channel using Action Cable and broadcast the file back when it’s ready.

First, we generate a channel called ExportChannel by this command:

rails g channel export

This command will generate a bunch of files you need to make the channel work export channel

Now we will config cable to use Redis to broadcast our data, go to config/cable.ylm and do

redis: &redis
  adapter: redis
  url: redis://localhost:6379/1

development: *redis
test: *redis

production:
  adapter: redis
  url: <%= ENV["REDIS_URL"] %>
  channel_prefix: your_project_production

Next, we will create a subscription to our ExportChannel, go to app/assets/javascripts/channels/export.coffee and change its content to:

# This is to call this function in a nother coffee file
window.ExportChannel ?= {}

window.ExportChannel.Subscribe = (export_id, callback) ->
  App.export = App.cable.subscriptions.create({
    channel: 'ExportChannel'
    export_id: export_id
  },
    connected: -> callback()
    disconnected: ->
    received: (data) ->
      blob = new Blob([data['csv_file']['content']]);

      # Create a link with the data and trigger click event to download the file
      csv_download_link = document.createElement('a');
      csv_download_link.href = window.URL.createObjectURL(blob);
      csv_download_link.download = data['csv_file']['file_name'];
      csv_download_link.click();

      # Re-enable the export btn
      $(".export-btn").html("Export Data")
      $(".export-btn").removeClass("disabled");

      App.export.unsubscribe()
      App.cable.disconnect()
      delete App.export
      return
  )
  return

We wrap the subscription inside a function that receive an export_id param so that it only trigger when the button is clicked. After receiving the data, it will disconnect from the channel and terminate itself.

Now that we have the subscription, we will edit the channel to match our subscription. Change the content of app/channels/export_channel.rb to the following

class ExportChannel < ApplicationCable::Channel

  def subscribed
    # Export id is sent from button click in web client
    stream_from "export_channel_#{params[:export_id]}"
  end

  def unsubscribed; end

end

Then we will add the code to broadcast our file in the job so it will return the csv when the job is performed

class ExportJob < ApplicationJob
  queue_as :default

  def perform(export_id)
    csv_content = ExportCommand.call

    ActionCable.server.broadcast(
      "export_channel_#{export_id}",
      csv_file: {
        file_name: 'data.csv',
        content: csv_content
      }
    )
  end
end

And we need to pass the export_id from the params to the job so that it’ll be able to broadcast to the correct channel.

Finally, we’ll make the button generate an UUID and append to the URL then subscribes to the channel on click, we will also disable it and re-enable it when the data is returned, go to app/assets/data.coffee and do:

$(document).on 'turbolinks:load', ->
  $('body').on 'click', '.export-btn', (e) ->
    uuid = generateUUID()
    # Disable Export btn
    $('.export-btn').html("Exporting...")
    $('.export-btn').addClass('disabled');

    # Append UUID to the URL
    URL = decodeURI(e.target.href + '?export_id=' + uuid;)

    # Subscribe to the channel
    window.ExportChannel.Subscribe(uuid, ->
      $.get encodeURI(URL)
      return
    )

# https://www.w3resource.com/javascript-exercises/javascript-math-exercise-23.php
generateUUID = ->
  d = new Date().getTime()
  'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace /[xy]/g, (c) ->
    r = (d + Math.random() * 16) % 16 | 0
    d = Math.floor(d / 16)
    (if c is 'x' then r else r & 0x3 | 0x8).toString 16

and we’ll add an extra class to the button in the view so it’s easier to identify, this is our app/views/data/index.html.erb

<div style='display: flex; justify-content: center; margin-top: 120px;'>
  <%= link_to 'Go to home page', '/', class: 'btn btn-primary'%>
  &nbsp;
  <%= link_to 'Export Data', data_export_path, method: get, remote: true, class: 'btn btn-primary export-btn' %>
</div>

Now let’s run it again

Voila. There's our file, and we can also goto other pages while at it.
Voila. There's our file, and we can also goto other pages while at it.

That’s it. Now we can do both the export and other stuff at the same time. The only note I’d add is that this method will create 2 jobs, 1 is when we click the link, the other is when we modify the link and call get method again. This might raise some performance issues.

Sidekiq server log

As you can see in the sidekiq server log, it started 2 jobs for 2 request. The temporary fix for this is to add a guard clause to the job so that it returns when there is no export_id. Like this:

class ExportJob < ApplicationJob
  queue_as :default

  def perform(export_id)
    return if export_id.blank?

    csv_content = ExportCommand.call

    ActionCable.server.broadcast(
      "export_channel_#{export_id}",
      csv_file: {
        file_name: 'data.csv',
        content: csv_content
      }
    )
  end
end

Then it will decrease the time of the 1st job by a lot

Sidekiq server log after optimized

That is it for now, if you have any other solutions. Feel free to share with us. Until next time.

This demo project is available on github