There are a variety of reasons why you would want to use multiple databases in your Ruby on Rails application. In our case, we wanted to store vast amounts of messages reflecting on user events, such as Birthdays, Anniversaries, etc. To improve performance we decided to decouple the messages table from the overloaded legacy database.

With multiple databases, if we want to fetch the associated data from the other database we need to write custom SQL queries until Rails 6.
Rails 7 now supports handling associations with joins across databases.

In this article, we'll look at how the disable_joins: true option can be used to fetch data belonging to different databases using associations.

Let's say we have the following DB structure:

Multi-DB setup for Rails >= 6

Please refer to the Rails Guides document to learn more about how to set up multiple databases in your application.

# config/database.yml
production:
  primary:
    database: demo-app-primary
    username: root
    password: <%= ENV['ROOT_PASSWORD'] %>
    adapter: mysql2

  message_sourced:
    database: demo-app-message-sourced
    username: message_sourced_root
    password: <%= ENV['MESSAGE_SOURCED_ROOT_PASSWORD'] %>
    adapter: mysql2
    migrations_paths: db/message_sourced_migrate

This is how the models look like:

# app/models/application_record.rb
class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
  connects_to database: { writing: :primary, reading: :primary }
end

# app/models/message_sourced_record.rb
class MessageSourcedRecord < ActiveRecord::Base
  self.abstract_class = true
  connects_to database: { writing: :message_sourced, reading: :message_sourced }
end

# app/models/user.rb
class User < ApplicationRecord
  has_many :events
end

# app/models/event.rb
class Event < ApplicationRecord
  belongs_to :user
  has_many :messages
  has_one  :latest_message, -> { order('messages.created_at desc').limit(1) }, class_name: 'Message'
end

# app/models/message.rb
class Message < MessageSourcedRecord
  belongs_to :event
end

Implementation

Let's say we have the following scenarios:

  1. Fetch the messages of all the events of a user
  2. Fetch the latest message on the latest event of a user.

Before Rails 7

We would have to add custom methods, as has_many :through/has_one :through associations won't work across databases.

# app/models/user.rb
class User < ApplicationRecord
  has_many :events
  has_one :latest_event, -> { order('events.created_at desc').limit(1) }, class_name: 'Event'

  def messages
    Message.where(event_id: events.pluck(:id))
  end
  
  def latest_event_message
    Message.where(event_id: latest_event&.id).order('messages.created_at desc').first
  end
end

With Rails >= 7

We can query using has_many :through and has_one :through association with the newly added disable_joins: true option.

# app/models/user.rb
class User < ApplicationRecord
  has_many :events
  has_many :messages, through: :events, disable_joins: true

  has_one :latest_event, -> { order('events.created_at desc').limit(1) }, class_name: 'Event'
  has_one :latest_message, through: :latest_event, disable_joins: true
end

What does disable_joins actually do?

Rails typically loads the association lazily and attempts to perform a join whenever the associated data is requested. However, if the associated data is in another database, it is unable to do so and throws an error.
So we have to tell Rails Active Record to not apply the join by specifying disable_joins.

Instead of performing a join, it executes two separate queries to fetch the event_id/s first and then the message/s data using the event_id/s from the first query.

The SQL queries look like this:

irb(main):001:0> user.messages
  Event Pluck (25.5ms)  SELECT "events"."id" FROM "events" WHERE "events"."user_id" = $1  [["user_id", 1]]
  Message Load (2.5ms)  SELECT "messages".* FROM "messages" WHERE "messages"."event_id" IN ($1, $2)  [["event_id", 1], ["event_id", 2]]
irb(main):002:0> user.latest_message
  Event Pluck (0.9ms)  SELECT "events"."id" FROM "events" WHERE "events"."user_id" = $1 ORDER BY events.created_at desc  [["user_id", 1]]                 
  Message Load (1.0ms)  SELECT "messages".* FROM "messages" WHERE "messages"."event_id" = $1 ORDER BY messages.created_at desc LIMIT $2  [["event_id", 1], ["LIMIT", 1]]

Why not allow multi-DB associations to work without the disable_joins option?

As mentioned by the contributor here, Rails loads associations lazily, and by the time it realises they're in different databases, it's already performed a join. As a result, we must inform it ahead of time that it should not perform a join. It's a significant change in the way the Rails association works. So, in the future, we could see associations load data from multiple databases without having to specify disable_joins.

Final thoughts

Nonetheless, this update makes it convenient for developers to retrieve data in a multi-DB system by using associations rather than writing custom SQL queries.

References