In the previous article of the Serialization in Ruby on Rails series, we became familiar with various serialization formats provided by the Ruby libraries. In this article, we will see how the YAML and JSON formats are used by Rails to store Ruby objects in a database.

Serialization in Rails for Storage

Rails framework allows complex objects to be stored in a database column via the ActiveRecord::Serialization module. But understanding when to store serialized data in a column is more crucial than knowing how to do it. Through this article, we will first learn "when" to store serialized data and then the "how" part.

When should you store serialized data?

You may think why would anyone store objects in a column? Doesn't it contradict the normalization principle?

Yes, it does. But there are some cases where storing serialized object in a column is more logical than adding additional columns to store the same data. Let's see some of these use cases:

1)  An object contains several arbitrary parameters that do not appear in every record, such as user preferences. Storing them in separate columns would result in NULL values for some columns in most cases.

user_preferences_1 = { high_contrast_mode: "ON", inverted_colors: "ON", transparency: "Medium"
}


user_preferences_2 = { reader_mode: "ON", contrast_preference: "Light", time_zone: "Kolkata, India"
}
Arbitrary User Preferences

2)  While saving the response data from APIs, you may not know what fields are  being sent by the API. Adding a column for each new field sent by the API (on which you have no control) is not practical.

{
  "id": "ch_xxxxx",
  "amount": 100,
  "amount_captured": 0,
  "amount_refunded": 0,
  "balance_transaction": "txn_xxxxx",
  "billing_details": {
    "address": {
      ...
    }
  },
  "paid": true,
  "payment_method": "card_xxxxx",
  "payment_method_details": {
    "card": {
      "brand": "visa",
      "checks": {
        ...
      },
      ...
      "funding": "credit",
      "network": "visa"
    },
    "type": "card"
  },
  "receipt_number": null,
  "receipt_url": "https://pay.stripe.com/receipts/acct_xxxxx/ch_xxxxx/rcpt_xxxxx",
  "refunded": false,
  "refunds": {
    ...
  },
  "status": "succeeded",
}
Sample Stripe API Response having multiple dynamic attributes

3)  You may sometimes want to store the history or snapshot of a record at some   point in time. Since this data is not going to be updated or changed, it is safe and practical to store the snapshot or version of a record in the serialized fashion. The best example of this use-case is the Papertrail gem.

When should you not store serialized data?

Storing serialized data in the database could open up a can of worms if we are not mindful. Following are some of the conditions when you may avoid serializing the data:

1)  You need to query the serialized data across the records quite often.

irb(main):019:0> Product.where("custom_specifications LIKE ?", '%Z3%')
Product Load (0.9ms) SELECT `products`.* FROM `products` WHERE (custom_specifications LIKE '%Z3%') LIMIT 10

=> #<ActiveRecord::Relation [#<Product id: 1, name: "3D-Fidget Spinner Toy", description: "...",
selling_price: 3000, upc_code: nil, created_at: "2021-03-08 13:19:14", updated_at: "2021-03-08 13:19:14", 
custom_specifications: "{:precision=>\"ABEC-1(P0)\", :noise_level=>\"Z2, Z2, Z3, ...">, 

#<Product id: 2, name: "Anti Anxiety Fidget Spinner", description: "..", selling_price: 2000, upc_code: nil, 
created_at: "2021-03-08 13:32:36", updated_at: "2021-03-08 13:32:36",
custom_specifications: "{:precision=>\"PSI-10\", :noise_level=>\"Z3\", :vibrat...">,

#<Product id: 3, name: "Hover Soccer Ball", description: "...", selling_price: 2500, upc_code: nil, 
created_at: "2021-03-08 13:34:11", updated_at: "2021-03-08 13:34:11",
custom_specifications: "{:precision=>\"PSI-12\", :noise_level=>\"Z3 Z4\", :bat...">]>
Searching for products with noise level Z3 is not very efficient as seen above

2)  The state of the serialized data would change frequently. This would need frequent updates to the serialized column which can be a heavy operation.

irb(main):048:0* specifications = Product.first.custom_specifications
  Product Load (0.8ms)  SELECT  `products`.* FROM `products` ORDER BY `products`.`id` ASC LIMIT 1
=> {:precision=>"ABEC-1(P0)", :noise_level=>"Z2, Z2, Z3, Z4", :vibration_level=>"V1, V2, V3", 
:hardness=>"59-63 HRC"}


irb(main):050:0> specifications[:precision] = 'ABEC-2(P1)'
=> "ABEC-2(P1)"


irb(main):051:0> Product.first.update_attributes(custom_specifications: specifications)
  Product Load (0.8ms)  SELECT  `products`.* FROM `products` ORDER BY `products`.`id` ASC LIMIT 1
 
(0.3ms)  BEGIN 
Product Update (0.3ms) 
	UPDATE 
  		`products` 
	SET 
  		`custom_specifications` = '---\n:precision: ABEC-2(P1)\n:noise_level: Z2, Z2, Z3, Z4\n
:vibration_level: V1, V2, V3\n:hardness: 59-63 HRC\n', `updated_at` = '2021-03-08 13:49:31' 
	WHERE 
 	 	`products`.`id` = 4

   (83.8ms)  COMMIT
=> true
Updating precision field frequently like this is slow and inefficient. Instead having a separate column for precision would be a better option

3)  There is a possibility that the field/property name of an object would change in future. (Note: This is not a pressing issue as we do have ways to fix this as explained in the Custom Serializer section below.)

Once you are convinced that all the above criteria do not apply to your use-case, you may go ahead and implement serialization for storing the data.

How to store serialized data in a column?

Let's say we have a shopping website, and the sellers have the freedom to add custom attributes to each of their products. These attributes being very specific to a single product, it is not very practical to have a column created dynamically for each attribute. Additionally, we can assume that these properties are not going to change a lot for the existing products. Using a document-based search engine like ElasticSearch will leave us less paranoid or worry-free about any frequent SQL query search on this column.

Considering all these factors, we can safely dump the product's custom attributes in a column called custom_specifications in the products table.

  • First, we need to ensure that the data type of the column custom_specifications is of type text or varchar with the max limit specified. This is very very important, you may not want to run into situations where the object size exceeds the column size.
  • After this, we have to tell ActiveRecord to serialize the object while storing it in the DB column.  As specified by this ruby doc, you can store an object to the database and retrieve the data as the same object just by passing the name of the column to the serialize method.
  • Along with the column name, the serialize method accepts another parameter called class_name_or_coder that notifies ActiveRecord  about the format in which data would be persisted in the database.

Let us see various ways to use class_name_or_coder:

1. No class_name_or_coder passed - YAML

The default serialization format used by ActiveRecord is YAML.

class Product < ApplicationRecord
  serialize :custom_specifications
end


Now when we save the record, the object is stored in the Database in the YAML format.

irb(main):007:0* Product.create(name: '3D-Fidget Spinner Toy',
description: '...', price: '3000', custom_specifications: { precision: 
'ABEC-1(P0)', noise_level: 'Z2, Z2, Z3, Z4', vibration_level: 'V1, V2, V3', hardness: '59-63 HRC' })
   
   (0.4ms)  BEGIN
Product Create (0.4ms) 
 INSERT INTO `products` (`name`, `description`, `price`, `created_at`, `updated_at`, `custom_specifications`) 
 VALUES ( '3D-Fidget Spinner Toy', '...', 3000, 
    '2021-01-29 05:11:55', '2021-01-29 05:11:55', 
    '---\n:precision: ABEC-1(P0)\n:noise_level: Z2, Z2, Z3,Z4\n:vibration_level: V1, V2, V3\n:hardness: 59-63 HRC\n' )
  
   (95.1ms)  COMMIT
=> <Product id: 2, name: "3D-Fidget Spinner Toy", description: "..",
price: 3000, upc_code: nil, created_at: "2021-01-29 05:11:55", updated_at: "2021-01-29 05:11:55", 
custom_specifications: {:precision=>"ABEC-1(P0)", :noise_level=>"Z2, Z2, Z3, Z4", :vibration_level=>"V1, V2, V3", 
:hardness=>"59-63 HRC"}>

If you see the MySQL insert statement above, you will notice the YAML format being used for storage i.e.

---\n:precision: ABEC-1(P0)\n:noise_level: Z2, Z2, Z3, Z4\n:vibration_level: V1, V2, V3\n:hardness: 59-63 HRC\n

But if you observe the product object returned, you will notice that the custom_specifications is an object of class Hash and not as YAML. We received the Hash object that we had passed to the create method.

If we pass an object of some other class or an Array object, we would get back the object of the same class as given below:

irb(main):012:0> Product.create(name: "3-D fidget spinner",
custom_specifications: ['low noise', 'good precision'])

   (0.2ms)  BEGIN
Product Create (0.2ms)  
  INSERT INTO `products` (`name`, `created_at`, `updated_at`, 
  		`custom_specifications` ) 
  VALUES ('3-D fidget spinner', '2021-02-01 04:49:22', 
    	'2021-02-01 04:49:22', '---\n- low noise\n- good precision\n' )

   (44.6ms)  COMMIT
=> <Product id: 6, name: "3-D fidget spinner", description: nil,
 price: nil, upc_code: nil, created_at: "2021-02-01 04:49:22", updated_at: "2021-02-01 04:49:22", 
custom_specifications: ["low noise", "good precision"]>
We get the original array back on retrieval of the Product record
irb(main):013:0> Product.create(name: "3-D fidget spinner", 
custom_specifications: ProductCustomSpecification.new('fidget spinner', 
precision: 'ABEC-1(P0)', noise_level: 'Z2, Z2, Z3, Z4'))
   
	(0.2ms)  BEGIN
Product Create (0.3ms)
  INSERT INTO `products` (`name`, `created_at`, `updated_at`, `custom_specifications`)
  VALUES ('3-D fidget spinner', '2021-02-01 04:50:57', '2021-02-01 04:50:57', 
'--- !ruby/object:ProductCustomSpecification\nproduct_name: fidget spinner\nattributes:\n  :precision: ABEC-1(P0)\n  :noise_level: Z2, Z2, Z3, Z4\n')

   (115.7ms)  COMMIT
=> <Product id: 7, name: "3-D fidget spinner", description: nil, price: nil, 
upc_code: nil, created_at: "2021-02-01 04:50:57", updated_at: "2021-02-01 04:50:57", 
custom_specifications: #<ProductCustomSpecification:0x0000564669e59c68 @product_name="fidget spinner", @attributes={:precision=>"ABEC-1(P0)", :noise_level=>"Z2, Z2, Z3, Z4"}>>


irb(main):014:0> Product.find(7).custom_specifications.class
  Product Load (0.3ms)  SELECT  `products`.* FROM `products` WHERE `products`.`id` = 7 LIMIT 1
=> ProductCustomSpecification
The retrieved custom_specifications is of class ProductCustomSpecification

This is how the serialize method works.

2. class_name_or_coder as JSON

If we want to store the data in JSON format, we need to specify the same as follows.

class Product < ApplicationRecord
  serialize :custom_specifications, JSON
end


If we run the same statements, it will store the data in the database as a JSON string - just observe the insert statement.

irb(main):009:0> Product.create(name: '3D-Fidget Spinner Toy', description: '..', price: '3000', 
custom_specifications: { precision: 'ABEC-1(P0)', noise_level: 'Z2, Z2, Z3, Z4', 
vibration_level: 'V1, V2, V3', hardness: '59-63 HRC' })
   
	(0.2ms)  BEGIN
Product Create (0.3ms)  
  INSERT INTO `products` (`name`, `description`, `price`, `created_at`, `updated_at`, `custom_specifications`)
  VALUES ('3D-Fidget Spinner Toy', '..', 3000, '2021-01-29 05:52:54', '2021-01-29 05:52:54',
'{\"precision\":\"ABEC-1(P0)\",\"noise_level\":\"Z2, Z2, Z3, Z4\",\"vibration_level\":\"V1, V2, V3\",\"hardness\":\"59-63 HRC\"}')

   (79.0ms)  COMMIT
=> <Product id: 3, name: "3D-Fidget Spinner Toy", description: "..", price: 3000, upc_code: nil, 
created_at: "2021-01-29 05:52:54", updated_at: "2021-01-29 05:52:54", custom_specifications: 
{"precision"=>"ABEC-1(P0)", "noise_level"=>"Z2, Z2, Z3, Z4", "vibration_level"=>"V1, V2, V3", "hardness"=>"59-63 HRC"}>
The hash object is returned as it is, because a ruby hash is synonymous to a JSON object, and therefore is compatible for serialization.
irb(main):001:0> Product.create(name: "3-D fidget spinner", 
custom_specifications: ['low noise', 'good precision'])
   
   (0.2ms)  BEGIN
Product Create (0.7ms)
  INSERT INTO `products` (`name`, `created_at`, `updated_at`, `custom_specifications`)
  VALUES ('3-D fidget spinner', '2021-02-01 04:37:49', '2021-02-01 04:37:49',
'[\"low noise\",\"good precision\"]')
   
   (60.3ms)  COMMIT
=> <Product id: 4, name: "3-D fidget spinner", description: nil, price: nil, upc_code: nil, 
created_at: "2021-02-01 04:37:49", updated_at: "2021-02-01 04:37:49",
custom_specifications: ["low noise", "good precision"]>
The array object is returned as it is because arrays are valid JSON, and hence can be serialized.
irb(main):007:0> Product.create(name: "3-D fidget spinner",
 custom_specifications: ProductCustomSpecification.new('fidget spinner', precision: 'ABEC-1(P0)', noise_level: 'Z2, Z2, Z3, Z4'))
   
	(0.2ms)  BEGIN
Product Create (0.4ms)  
  INSERT INTO `products` (`name`, `created_at`, `updated_at`, `custom_specifications`)
  VALUES ('3-D fidget spinner', '2021-02-01 04:45:36', '2021-02-01 04:45:36', 
'{\"product_name\":\"fidget spinner\",\"attributes\":{\"precision\":\"ABEC-1(P0)\",\"noise_level\":\"Z2, Z2, Z3, Z4\"}}')

	(108.6ms)  COMMIT
=> <Product id: 5, name: "3-D fidget spinner", description: nil, price: nil, upc_code: nil, 
created_at: "2021-02-01 04:45:36", updated_at: "2021-02-01 04:45:36", 
custom_specifications: {"product_name"=>"fidget spinner", "attributes"=>{"precision"=>"ABEC-1(P0)", "noise_level"=>"Z2, Z2, Z3, Z4"}}>


irb(main):021:0> Product.find(5).custom_specifications.class
  Product Load (0.3ms)  SELECT  `products`.* FROM `products` WHERE `products`.`id` = 5 LIMIT 1
=> Hash

When we save an object of any class using JSON, the retrieved object is a hash and not the original class unlike the default YAML. The reason for this is that ActiveRecord calls to_h on our custom object before passing it to the serializer. This means when deserializing, ActiveRecord has no information about the type of the original object.

3. class_name_or_coder as Hash

The above two examples showed how to use a coder - JSON/YAML  with serialize. As the name suggests this method also accepts a class name that the object type should equal to.

In the above examples, we could store an array or a hash or an object of any class, without any restriction on the type of data that would be retrieved. So one row would contain YAML serialized hash and another would contain YAML serialized array and it would still work completely fine. We would use this if we are not concerned with the consistency of the datatype of the retrieved data.

If we want to typecast all the rows to contain the same type of data, we have the option to specify the name such as Hash or Array or any other class. If we specify class name as Hash and if we try to store an Array or any other object, we would get a Serialization Type Mismatch error.

class Product < ApplicationRecord
  serialize :custom_specifications, Hash
end

Let us run these statements again.

irb(main):024:0> Product.create(name: '3D-Fidget Spinner Toy', description: '..', price: '3000', 
custom_specifications: { precision: 'ABEC-1(P0)', noise_level: 'Z2, Z2, Z3, Z4'})

	(0.2ms)  BEGIN
Product Create (0.5ms)  
  INSERT INTO `products` (`name`, `description`, `price`, `created_at`, `updated_at`, `custom_specifications`) 
  VALUES ('3D-Fidget Spinner Toy', '..', 3000, '2021-02-01 05:05:51', '2021-02-01 05:05:51', 
'---\n:precision: ABEC-1(P0)\n:noise_level: Z2, Z2, Z3, Z4\n')

	(66.4ms)  COMMIT
=> <Product id: 8, name: "3D-Fidget Spinner Toy", description: "..", price: 3000, upc_code: nil,
created_at: "2021-02-01 05:05:51", updated_at: "2021-02-01 05:05:51", 


irb(main):025:0> Product.create(name: "3-D fidget spinner", custom_specifications: ['low noise', 'good precision'])
ActiveRecord::SerializationTypeMismatch: can't serialize `custom_specifications`: was supposed to be a Hash,
but was a Array. -- ["low noise", "good precision"]


irb(main):026:0> Product.create(name: "3-D fidget spinner", 
custom_specifications: ProductCustomSpecification.new('fidget spinner', precision: 'ABEC-1(P0)', noise_level: 'Z2, Z2, Z3, Z4'))
ActiveRecord::SerializationTypeMismatch: can't serialize `custom_specifications`: was supposed to be a Hash,
but was a ProductCustomSpecification. 

So now all the rows will accept only a Hash. If we try to save any other type of data it would raise an error.

If class_name is specified, the serialized object must be of that class on assignment and retrieval. Otherwise, SerializationTypeMismatch will be raised.

4. Custom Serializer

We can also create our own serializer by defining a class that responds to a load and dump method. Then we can pass the new serializer class as the coder to the serialize method.

Let us take the following hypothetical scenario to understand how we can use custom serializer to manipulate data.

Let us assume that we are storing Product objects in a column called product_snapshot in the product_orders table.

class ProductOrder < ApplicationRecord
  serialize :product_snapshot, Product
end

For some business purposes, suppose we had to change the column name of products from selling_price to price. But the existing snapshots in product_orders would retain the old selling_price attribute in the retrieved Product record (This is a major drawback of storing serialized data).

irb(main):035:0> ProductOrder.last.product_snapshot
 ProductOrder Load (0.7ms)  SELECT  `product_orders`.* FROM `product_orders` ORDER BY `product_orders`.`id` 
DESC LIMIT 1

=> <Product id: 9, name: "3D-Fidget Spinner Toy", description: "..", selling_price: 3000, upc_code: nil,
 created_at: "2021-02-01 06:01:49", updated_at: "2021-02-01 06:01:49", 
custom_specifications: {:accuracy=>"ABEC-1(P0)", :noise_level=>"Z2, Z2, Z3, Z4"}>
Existing record contains selling_price saved in the snapshot.

Instead of updating the snapshot data for existing records, we can use custom serializers as a workaround (though dirty) and override the load method to replace the old selling_price column with price.

class CustomProductSerializer < Product
  class << self
    def load(value)
      return {} if value.blank?

      product_snapshot = YAML.load(value)
      rebuild(product_snapshot)
    end

    def dump(value)
      YAML.dump(value)
    end

    private

    def rebuild(product_snapshot)
      new_attributes = product_snapshot.attributes
      new_attributes['price'] = new_attributes.delete('selling_price') unless new_attributes['price'].present?

      Product.new(new_attributes)
    end
  end
end

We will use the above serializer class as the coder and then try retrieving ProductOrder records that contain an old snapshot of the  Product.

class ProductOrder < ApplicationRecord
  serialize :product_snapshot, CustomProductSerializer
end
irb(main):002:0> ProductOrder.last.product_snapshot
  ProductOrder Load (1.0ms)  SELECT  `product_orders`.* FROM `product_orders` ORDER BY `product_orders`.`id`
 DESC LIMIT 1

=> <Product id: 9, name: "3D-Fidget Spinner Toy", description: "..", price: 3000, upc_code: nil, 
created_at: "2021-02-01 06:01:49", updated_at: "2021-02-01 06:01:49", 
custom_specifications: "---\n:accuracy: ABEC-1(P0)\n:noise_level: Z2, Z2, Z3...">
The existing product_order returned the Product snapshot with the new price attribute

Generally, it is very rare that we would need a custom serializer, but it's good to know that this flexibility is available when needed.

Summary

There are few pros and many cons of storing serialized data in relational DB as we have seen so far. It is important to be very cautious while choosing this approach for storing data.  We can consider this as an effective option when the data would majorly be used for reading and performance is not a major concern - like reporting or analysis.

I hope this article has been able to explain how to store serialized data in a column in Rails and also the benefits and pitfalls of doing so. We will explore how to serialize data while generating API responses in the next article.

I am sure you would have had bitter and/or sweet experiences 😄 with handling serialized data in the database. Thanks for reading.

References

  1. When should you store serialized objects in the database?

  2. Storing serialized objects in a database good or bad

  3. Rails Activerecord serialize

  4. How I used Activerecord serialize with a custom datatype?

  5. How to write custom serializer for activerecord number serialize/)