In the previous article of the Serialization in Ruby on Rails
series, we became familiar with various serialization formats provided by the Ruby libraries. In this article, we will see how the YAML and JSON formats are used by Rails to store Ruby objects in a database.
Serialization in Rails for Storage
Rails framework allows complex objects to be stored in a database column via the ActiveRecord::Serialization
module. But understanding when to store serialized data in a column is more crucial than knowing how to do it. Through this article, we will first learn "when" to store serialized data and then the "how" part.
When should you store serialized data?
You may think why would anyone store objects in a column? Doesn't it contradict the normalization principle?
Yes, it does. But there are some cases where storing serialized object in a column is more logical than adding additional columns to store the same data. Let's see some of these use cases:
1) An object contains several arbitrary parameters that do not appear in every record, such as user preferences. Storing them in separate columns would result in NULL values for some columns in most cases.
2) While saving the response data from APIs, you may not know what fields are being sent by the API. Adding a column for each new field sent by the API (on which you have no control) is not practical.
3) You may sometimes want to store the history or snapshot of a record at some point in time. Since this data is not going to be updated or changed, it is safe and practical to store the snapshot or version of a record in the serialized fashion. The best example of this use-case is the Papertrail
gem.
When should you not store serialized data?
Storing serialized data in the database could open up a can of worms if we are not mindful. Following are some of the conditions when you may avoid serializing the data:
1) You need to query the serialized data across the records quite often.
2) The state of the serialized data would change frequently. This would need frequent updates to the serialized column which can be a heavy operation.
3) There is a possibility that the field/property name of an object would change in future. (Note: This is not a pressing issue as we do have ways to fix this as explained in the Custom Serializer section below.)
Once you are convinced that all the above criteria do not apply to your use-case, you may go ahead and implement serialization for storing the data.
How to store serialized data in a column?
Let's say we have a shopping website, and the sellers have the freedom to add custom attributes to each of their products. These attributes being very specific to a single product, it is not very practical to have a column created dynamically for each attribute. Additionally, we can assume that these properties are not going to change a lot for the existing products. Using a document-based search engine like ElasticSearch
will leave us less paranoid or worry-free about any frequent SQL query search on this column.
Considering all these factors, we can safely dump the product's custom attributes in a column called custom_specifications
in the products
table.
- First, we need to ensure that the data type of the column
custom_specifications
is of typetext
orvarchar
with the max limit specified. This is very very important, you may not want to run into situations where the object size exceeds the column size. - After this, we have to tell
ActiveRecord
to serialize the object while storing it in the DB column. As specified by this ruby doc, you can store an object to the database and retrieve the data as the same object just by passing the name of the column to theserialize
method. - Along with the column name, the
serialize
method accepts another parameter calledclass_name_or_coder
that notifiesActiveRecord
about the format in which data would be persisted in the database.
Let us see various ways to use class_name_or_coder
:
1. No class_name_or_coder
passed - YAML
The default serialization format used by ActiveRecord is YAML.
class Product < ApplicationRecord
serialize :custom_specifications
end
Now when we save the record, the object is stored in the Database in the YAML format.
irb(main):007:0* Product.create(name: '3D-Fidget Spinner Toy',
description: '...', price: '3000', custom_specifications: { precision:
'ABEC-1(P0)', noise_level: 'Z2, Z2, Z3, Z4', vibration_level: 'V1, V2, V3', hardness: '59-63 HRC' })
(0.4ms) BEGIN
Product Create (0.4ms)
INSERT INTO `products` (`name`, `description`, `price`, `created_at`, `updated_at`, `custom_specifications`)
VALUES ( '3D-Fidget Spinner Toy', '...', 3000,
'2021-01-29 05:11:55', '2021-01-29 05:11:55',
'---\n:precision: ABEC-1(P0)\n:noise_level: Z2, Z2, Z3,Z4\n:vibration_level: V1, V2, V3\n:hardness: 59-63 HRC\n' )
(95.1ms) COMMIT
=> <Product id: 2, name: "3D-Fidget Spinner Toy", description: "..",
price: 3000, upc_code: nil, created_at: "2021-01-29 05:11:55", updated_at: "2021-01-29 05:11:55",
custom_specifications: {:precision=>"ABEC-1(P0)", :noise_level=>"Z2, Z2, Z3, Z4", :vibration_level=>"V1, V2, V3",
:hardness=>"59-63 HRC"}>
If you see the MySQL insert statement above, you will notice the YAML format being used for storage i.e.
---\n:precision: ABEC-1(P0)\n:noise_level: Z2, Z2, Z3, Z4\n:vibration_level: V1, V2, V3\n:hardness: 59-63 HRC\n
But if you observe the product object returned, you will notice that the custom_specifications
is an object of class Hash and not as YAML. We received the Hash object that we had passed to the create method.
If we pass an object of some other class or an Array object, we would get back the object of the same class as given below:
This is how the serialize
method works.
2. class_name_or_coder
as JSON
If we want to store the data in JSON format, we need to specify the same as follows.
class Product < ApplicationRecord
serialize :custom_specifications, JSON
end
If we run the same statements, it will store the data in the database as a JSON string - just observe the insert statement.
irb(main):007:0> Product.create(name: "3-D fidget spinner",
custom_specifications: ProductCustomSpecification.new('fidget spinner', precision: 'ABEC-1(P0)', noise_level: 'Z2, Z2, Z3, Z4'))
(0.2ms) BEGIN
Product Create (0.4ms)
INSERT INTO `products` (`name`, `created_at`, `updated_at`, `custom_specifications`)
VALUES ('3-D fidget spinner', '2021-02-01 04:45:36', '2021-02-01 04:45:36',
'{\"product_name\":\"fidget spinner\",\"attributes\":{\"precision\":\"ABEC-1(P0)\",\"noise_level\":\"Z2, Z2, Z3, Z4\"}}')
(108.6ms) COMMIT
=> <Product id: 5, name: "3-D fidget spinner", description: nil, price: nil, upc_code: nil,
created_at: "2021-02-01 04:45:36", updated_at: "2021-02-01 04:45:36",
custom_specifications: {"product_name"=>"fidget spinner", "attributes"=>{"precision"=>"ABEC-1(P0)", "noise_level"=>"Z2, Z2, Z3, Z4"}}>
irb(main):021:0> Product.find(5).custom_specifications.class
Product Load (0.3ms) SELECT `products`.* FROM `products` WHERE `products`.`id` = 5 LIMIT 1
=> Hash
When we save an object of any class using JSON, the retrieved object is a hash and not the original class unlike the default YAML. The reason for this is that ActiveRecord
calls to_h
on our custom object before passing it to the serializer. This means when deserializing, ActiveRecord
has no information about the type of the original object.
3. class_name_or_coder
as Hash
The above two examples showed how to use a coder
- JSON/YAML
with serialize
. As the name suggests this method also accepts a class name that the object type should equal to.
In the above examples, we could store an array or a hash or an object of any class, without any restriction on the type of data that would be retrieved. So one row would contain YAML serialized hash and another would contain YAML serialized array and it would still work completely fine. We would use this if we are not concerned with the consistency of the datatype of the retrieved data.
If we want to typecast all the rows to contain the same type of data, we have the option to specify the name such as Hash or Array or any other class. If we specify class name as Hash and if we try to store an Array or any other object, we would get a Serialization Type Mismatch error
.
class Product < ApplicationRecord
serialize :custom_specifications, Hash
end
Let us run these statements again.
irb(main):024:0> Product.create(name: '3D-Fidget Spinner Toy', description: '..', price: '3000',
custom_specifications: { precision: 'ABEC-1(P0)', noise_level: 'Z2, Z2, Z3, Z4'})
(0.2ms) BEGIN
Product Create (0.5ms)
INSERT INTO `products` (`name`, `description`, `price`, `created_at`, `updated_at`, `custom_specifications`)
VALUES ('3D-Fidget Spinner Toy', '..', 3000, '2021-02-01 05:05:51', '2021-02-01 05:05:51',
'---\n:precision: ABEC-1(P0)\n:noise_level: Z2, Z2, Z3, Z4\n')
(66.4ms) COMMIT
=> <Product id: 8, name: "3D-Fidget Spinner Toy", description: "..", price: 3000, upc_code: nil,
created_at: "2021-02-01 05:05:51", updated_at: "2021-02-01 05:05:51",
irb(main):025:0> Product.create(name: "3-D fidget spinner", custom_specifications: ['low noise', 'good precision'])
ActiveRecord::SerializationTypeMismatch: can't serialize `custom_specifications`: was supposed to be a Hash,
but was a Array. -- ["low noise", "good precision"]
irb(main):026:0> Product.create(name: "3-D fidget spinner",
custom_specifications: ProductCustomSpecification.new('fidget spinner', precision: 'ABEC-1(P0)', noise_level: 'Z2, Z2, Z3, Z4'))
ActiveRecord::SerializationTypeMismatch: can't serialize `custom_specifications`: was supposed to be a Hash,
but was a ProductCustomSpecification.
So now all the rows will accept only a Hash. If we try to save any other type of data it would raise an error.
Ifclass_name
is specified, the serialized object must be of that class on assignment and retrieval. Otherwise,SerializationTypeMismatch
will be raised.
4. Custom Serializer
We can also create our own serializer by defining a class that responds to a load
and dump
method. Then we can pass the new serializer class as the coder to the serialize
method.
Let us take the following hypothetical scenario to understand how we can use custom serializer to manipulate data.
Let us assume that we are storing Product
objects in a column called product_snapshot
in the product_orders
table.
class ProductOrder < ApplicationRecord
serialize :product_snapshot, Product
end
For some business purposes, suppose we had to change the column name of products
from selling_price
to price
. But the existing snapshots in product_orders
would retain the old selling_price
attribute in the retrieved Product
record (This is a major drawback of storing serialized data).
Instead of updating the snapshot data for existing records, we can use custom serializers as a workaround (though dirty) and override the load
method to replace the old selling_price
column with price
.
class CustomProductSerializer < Product
class << self
def load(value)
return {} if value.blank?
product_snapshot = YAML.load(value)
rebuild(product_snapshot)
end
def dump(value)
YAML.dump(value)
end
private
def rebuild(product_snapshot)
new_attributes = product_snapshot.attributes
new_attributes['price'] = new_attributes.delete('selling_price') unless new_attributes['price'].present?
Product.new(new_attributes)
end
end
end
We will use the above serializer class as the coder
and then try retrieving ProductOrder
records that contain an old snapshot of the Product
.
class ProductOrder < ApplicationRecord
serialize :product_snapshot, CustomProductSerializer
end
Generally, it is very rare that we would need a custom serializer, but it's good to know that this flexibility is available when needed.
Summary
There are few pros and many cons of storing serialized data in relational DB as we have seen so far. It is important to be very cautious while choosing this approach for storing data. We can consider this as an effective option when the data would majorly be used for reading and performance is not a major concern - like reporting or analysis.
I hope this article has been able to explain how to store serialized data in a column in Rails and also the benefits and pitfalls of doing so. We will explore how to serialize data while generating API responses in the next article.
I am sure you would have had bitter and/or sweet experiences 😄 with handling serialized data in the database. Thanks for reading.