Sometimes dealing with DB processes in a Rails application consume a lot of time. Often there are a bunch of small and simple DB queries that go endless. For example, bulk insertions or bulk updates in tables. Dividing it into various background jobs is an idle solution. In a few scenarios, we can't prefer background jobs e.g. the change you apply, makes the existing data invalid.
In this article, we will see how to complete a large set of DB queries under minutes which would otherwise take hours if not optimized.
It's a better practice to create a rake
task, that will run after a DB migration call. But for simplicity, we'd look at some inline code and you can be a better judge to put it into better practice!
The example in this article uses the following Rails & PG version along with Heroku PaaS:
- Rails
>=5
- Postgresql
>=8
- PaaS:
Heroku
In this database optimization, we will be modifying an existing column of a table. It's a bulk update
operation. With serialization/encryption - we'd be resetting older values with encrypted values.
Connections Pool Size
By default, Rails has only 5
connections pooled. Postgresql will allow many more connections as per the server capability. Often the unused connections are better utilized via background processes. The next step is to find out if we have more than 5 connections. This can be checked using the command heroku pg:info
i.e. through Heroku console.
=== DATABASE_URL
Plan: Standard 0
Status: Available
Data Size: 5.23 GB
Tables: 134
PG Version: 9.5.22
Connections: 7/120
Credentials: 1
Fork/Follow: Available
Now we know that the DB plan we are using has a DB pool size of 120 max connections. Great! Let's reconfigure ActiveRecord Connection Pool to use the max potential. (At least config for this single process as a migration)
Max out the Rails Connection Pool
CUSTOM_CONN_POOL = 100
def set_conn_pool_from_env!
config = ActiveRecord::Base.configurations[Rails.env]
config['pool'] = CUSTOM_CONN_POOL
ActiveRecord::Base.establish_connection(config) if ENV['DB_POOL']
end
The above code will allow us to increase ConnectionPool size. To know more about Connection Pool management follow a guide by Heroku here.
With above, ActiveRecord#ConnectionPool
can now pool much larger connections. Now we go and take advantage of Threads i.e. Concurrency.
Add new Thread per connection
def new_thread_connection
Thread.new do |t|
ApplicationRecord.connection.with_connection { yield }
end
end
ActiveRecord on Ruby (MRI) works synchronously. So any queries executed within a connection will block the main
thread. If a database query takes 200ms, the next line of code has to wait for those 200ms. The purpose of more threads is so that we do not wait for query calls. We cannot make all queries parallel - but we can optimize code to make most of the idle connections.
How to use Threaded Connection Pool
encryptor = MyEncryptor.new
def new_thread_connection
Thread.new do |t|
ApplicationRecord.connection.with_connection { yield }
end
end
def update_batch(batch)
batch.each do |visitor|
visitor.update(encrypted_info: encryptor.encrypt(visitor.info))
end
end
def migrate
#..update column from string to text type...
set_conn_pool_from_env!
updater_threads = Visitor.in_batches_of(max_conn_size).map do |visitors_batch|
new_thread_connection { update_batch(visitors_batch) }
end
# join all threads to main thread
updater_threads.each(&:join)
end
- If given a 100
max_conn_size
, Divided visitors in a batch size of 100. - A total of 10,000 visitors converted to a batch of 100 by the
main
thread. - Each batch can now process in a separate thread.
- Each Thread now fetches a connection, then releases it when done.
- At last with
updater_threads
get merged and provides all control back to the main thread.
There is more scope for improvement i.e. we can bulk update in a single statement using SQL case statements
Building PG's Case Statements with Arel
arel_builder = Arel::Nodes::Case.new(Visitor.arel_attribute(:id))
arel_builder.when(visitor.id).then(encryptor_lib.encrypt(visitor.info))
Arel is a query builder used by ActiveRecord. Unfortunately, there's no ActiveRecord API/helper in place for case statements. Arel has a ton of SQL builder helpers in house.
Summing it up
class UpdateVisitorsInfo < ActiveRecord::Migration[6.0]
CUSTOM_CONN_POOL = 100
def migrate
change_column(:visitors, :info, :text)
# update all columns from plain to encrypted values
set_conn_pool_from_env!
updater_threads = Visitor.in_batches_of(max_conn_size).map do |visitors_batch|
new_thread_connection do
visitors_batch.update_all("info = ?", build_case_stmt(visitors_batch))
end
end
updater_threads.each(&:join)
end
private
def new_thread_connection
Thread.new do |t|
ApplicationRecord.connection.with_connection { yield }
end
end
def set_conn_pool_from_env!
config = ActiveRecord::Base.configurations[Rails.env]
config['pool'] = CUSTOM_CONN_POOL
ActiveRecord::Base.establish_connection(config) if ENV['DB_POOL']
end
def build_case_stmt(batch)
arel_builder = Arel::Nodes::Case.new(Visitor.arel_attribute(:id))
batch.each do |visitor|
arel_builder.when(visitor.id).then(encryptor_lib.encrypt(visitor.info))
end
arel_builder.to_sql
end
end
Have threads ever been this useful to you or they have been a disaster? Do you run into issues with Connection Pool? Did you like the idea in this article of how connection pools and threads are used?