We often come across use cases in a Rails application where we need to create, update, or delete records in bulk. Most ORMs naively update or insert each record individually by executing N+1 database queries. To address this problem, Rails 6 added the upsert_all method that provides a faster way to update and insert a large number of records.
upsert_all
uses a single SQL INSERT
statement for updating/inserting the records. And it does so without instantiating any models. Nor does it trigger any Active Record callbacks or validations !
How upsert_all
works?
If we check the source code of upsert_all, we understand that it calls execute
method of InsertAll under the hood which later updates or inserts (upserts) multiple records into the database using a single SQL INSERT
statement.
For Postgres and SQlite3 upsert_all
utilizes ON CONFLICT clause, whereas for MySQL it uses ON DUPLICATE KEY clause.
Check implementation of build_insert_sql
method under postgresql_adapter.rb, sqlite3_adapter.rb and abstract_mysql_adapter.rb to get further details.
In this article, we will first see the available options with upsert_all
in Rails 6, and then the newly introduced ones in Rails 7.
Available options for upsert_all
in Rails 6
:returning
(PostgreSQL only) An array of attributes to return for all successfully inserted records, which by default is the primary key. Pass
returning: %w[ id name ]
for both id and name orreturning: false
to omit the underlying returning SQL clause entirely.
You can also pass an SQL string if you need more control on the return values (for example, returning: "id, name as new_name").
:unique_by
(PostgreSQL and SQLite only) By default rows are considered to be unique by every unique index on the table. Any duplicate rows are skipped. To skip rows according to just one unique index pass
:unique_by
.
Consider a Book model where no duplicate ISBNs make sense, but if any row has an existing id, or is not unique by another unique index,
ActiveRecord::RecordNotUnique
is raised.
Unique indexes can be identified by columns or name:
unique_by: :isbn
unique_by: %i[ author_id name ]
unique_by: :index_books_on_isbn
New options added in Rails 7
:update_only
This option is added with the intent to make the conflict resolutions easy during upsert_all
. It accepts a list of column names that requires to be updated in case of conflict. If no column is provided, upsert_all
will update all the columns that can be updated.
# There is a conflict in the first and the third record. As only price is passed
# to "update_only", only price will be updated and the author would remain
# the same i.e 'Icode'
Book.upsert_all(
[
{ name: 'Ruby for beginners', price: 150, author: 'Icode' },
{ name: 'Well-Grounded Rubyist', price: 200, author: 'David A' },
{ name: 'Ruby for beginners', price: 300, author: 'Icode Academy' },
],
unique_by: :name,
update_only: [:price]
)
Book.where(name: 'Ruby for beginners')
=> #<ActiveRecord::Relation [#<Book id: 1, name: "Ruby for beginners", price: 300, author: "Icode">]>
:on_duplicate
Even this option is added with the same intent of making conflict resolutions easy during upsert_all
. It can not be used along with :update_only
option. While using this option, one must provide all the columns that need to be updated.
# There is a conflict in the first and the third record. As only price is passed
# to "on_duplicate" as a raw SQL query, just price will be updated and
# the author would remain the same i.e 'Icode'
Book.upsert_all(
[
{ name: 'Ruby for beginners', price: 150, author: 'Icode' },
{ name: 'Well-Grounded Rubyist', price: 200, author: 'David A' },
{ name: 'Ruby for beginners', price: 300, author: 'Icode Academy' },
],
unique_by: :name,
on_duplicate: Arel.sql("price = LEAST(books.price, EXCLUDED.price)")
)
Book.where(name: 'Ruby for beginners')
=> #<ActiveRecord::Relation [#<Book id: 1, name: "Ruby for beginners", price: 150, author: "Icode">]>
:record_timestamps
By default, the automatic setting of timestamp columns is controlled by the model's record_timestamps
config. To override this and force the automatic setting of timestamp columns, the :record_timestamps
option can be used.
When record_timestamps
is set to false
, value for created_at
and updated_at
are not recorded.
Book.upsert_all(
[
{ id: 1, name: 'Ruby for beginners', price: 1500 },
{ id: 2, name: 'The Well-Grounded Rubyist', price: 2000 },
],
unique_by: :name,
record_timestamps: false
)
When record_timestamps
is set to true
, value for created_at
and updated_at
are recorded. And the value is set based on the value of CURRENT_TIMESTAMP
Book.upsert_all(
[
{ id: 1, name: 'Ruby for beginners', price: 1500 },
{ id: 2, name: 'The Well-Grounded Rubyist', price: 2000 },
],
unique_by: :name,
record_timestamps: true
)