There are a variety of reasons why you would want to use multiple databases in your Ruby on Rails application. In our case, we wanted to store vast amounts of messages reflecting on user events, such as Birthdays, Anniversaries, etc. To improve performance we decided to decouple the messages table from the overloaded legacy database.
With multiple databases, if we want to fetch the associated data from the other database we need to write custom SQL queries until Rails 6.
Rails 7 now supports handling associations with joins across databases.
In this article, we'll look at how the disable_joins: true
option can be used to fetch data belonging to different databases using associations.
Let's say we have the following DB structure:
Multi-DB setup for Rails >= 6
Please refer to the Rails Guides document to learn more about how to set up multiple databases in your application.
# config/database.yml
production:
primary:
database: demo-app-primary
username: root
password: <%= ENV['ROOT_PASSWORD'] %>
adapter: mysql2
message_sourced:
database: demo-app-message-sourced
username: message_sourced_root
password: <%= ENV['MESSAGE_SOURCED_ROOT_PASSWORD'] %>
adapter: mysql2
migrations_paths: db/message_sourced_migrate
This is how the models look like:
# app/models/application_record.rb
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
connects_to database: { writing: :primary, reading: :primary }
end
# app/models/message_sourced_record.rb
class MessageSourcedRecord < ActiveRecord::Base
self.abstract_class = true
connects_to database: { writing: :message_sourced, reading: :message_sourced }
end
# app/models/user.rb
class User < ApplicationRecord
has_many :events
end
# app/models/event.rb
class Event < ApplicationRecord
belongs_to :user
has_many :messages
has_one :latest_message, -> { order('messages.created_at desc').limit(1) }, class_name: 'Message'
end
# app/models/message.rb
class Message < MessageSourcedRecord
belongs_to :event
end
Implementation
Let's say we have the following scenarios:
- Fetch the messages of all the events of a user
- Fetch the latest message on the latest event of a user.
Before Rails 7
We would have to add custom methods, as has_many :through
/has_one :through
associations won't work across databases.
# app/models/user.rb
class User < ApplicationRecord
has_many :events
has_one :latest_event, -> { order('events.created_at desc').limit(1) }, class_name: 'Event'
def messages
Message.where(event_id: events.pluck(:id))
end
def latest_event_message
Message.where(event_id: latest_event&.id).order('messages.created_at desc').first
end
end
With Rails >= 7
We can query using has_many :through
and has_one :through
association with the newly added disable_joins: true
option.
# app/models/user.rb
class User < ApplicationRecord
has_many :events
has_many :messages, through: :events, disable_joins: true
has_one :latest_event, -> { order('events.created_at desc').limit(1) }, class_name: 'Event'
has_one :latest_message, through: :latest_event, disable_joins: true
end
What does disable_joins
actually do?
Rails typically loads the association lazily and attempts to perform a join whenever the associated data is requested. However, if the associated data is in another database, it is unable to do so and throws an error.
So we have to tell Rails Active Record to not apply the join by specifying disable_joins
.
Instead of performing a join, it executes two separate queries to fetch the event_id/s
first and then the message/s
data using the event_id/s
from the first query.
The SQL queries look like this:
irb(main):001:0> user.messages
Event Pluck (25.5ms) SELECT "events"."id" FROM "events" WHERE "events"."user_id" = $1 [["user_id", 1]]
Message Load (2.5ms) SELECT "messages".* FROM "messages" WHERE "messages"."event_id" IN ($1, $2) [["event_id", 1], ["event_id", 2]]
irb(main):002:0> user.latest_message
Event Pluck (0.9ms) SELECT "events"."id" FROM "events" WHERE "events"."user_id" = $1 ORDER BY events.created_at desc [["user_id", 1]]
Message Load (1.0ms) SELECT "messages".* FROM "messages" WHERE "messages"."event_id" = $1 ORDER BY messages.created_at desc LIMIT $2 [["event_id", 1], ["LIMIT", 1]]
Why not allow multi-DB associations to work without the disable_joins
option?
As mentioned by the contributor here, Rails loads associations lazily, and by the time it realises they're in different databases, it's already performed a join. As a result, we must inform it ahead of time that it should not perform a join. It's a significant change in the way the Rails association works. So, in the future, we could see associations load data from multiple databases without having to specify disable_joins
.
Final thoughts
Nonetheless, this update makes it convenient for developers to retrieve data in a multi-DB system by using associations rather than writing custom SQL queries.
References
- PR#41937 Added disable_joins option to has_many relation
- PR#42079 Added disable_joins option to has_one relation
- disable_joins