Everything You Need to know about Serialization in Rails: Part I

It was the day we were moving. I was observing how the "Packers and Movers" professionals packed our furniture. For example, the King size bed shown below had to be accommodated within a space of about 6-7 inches inside a van. While I kept wondering how they'd manage this, they dismantled the bed. And in went the camel through the needle's eye very neatly.

That's when I realized the computing world is not very different from the real world. They dismantled the bed for transportation and then reassembled at the destination. Similarly, in the computing world, we deconstruct objects or data structures in a format that enables easy storage/transfer and reconstruct them whenever required. This is nothing but serialization.

In short, serialization is turning a complex "3-D" object into a single long "2-D" string. This can then be stored anywhere or made to travel across the web easily.

We will be delving into the following three topics in this series.

In this article, we will learn about how serialization works in Ruby.

Serialization in Ruby

You may come across instances where you would need to save Ruby objects in a file or send data to another program across the web.
To pull this off, Ruby provides two different mechanisms for serializing objects. These are based on the format/rules used for dismantling and assembling -

Binary
Human-Readable
a. YAML
b. JSON

1. Binary Format

Ruby supports binary serialization through the Marshal module available in its standard library.
The marshalling library transforms the collection of Ruby objects into a stream of bytes that we humans can't decipher but Ruby can. The Marshal.dump method is used to convert the object to a byte stream and the Marshal.load or Marshal.restore method reconstructs the object.

Below is a class representation of the above real-life example in Ruby.

class Furniture
  def initialize(category, primary_material)
    @category = category
    @primary_material = primary_material
  end

  def to_s
    "Category: #{@category} \nMaterial: #{@primary_material}\n"
  end
end

class WoodenBed < Furniture
  def initialize(size, color)
    super('Bed', 'Wooden')
    @size = size
  end

  def to_s
    "#{super} Size: #{@size} \n"
  end
end

class KingSizeWoodenBed < WoodenBed
  def initialize(height, width)
    super('King Size', 'Black')
    @height = height
    @width = width
  end

  def to_s
    "#{super}  Height: #{@height} \n  Width: #{@width} \n"
  end
end

Class Representation of Furniture

We'll create an object of the above class and serialize it using Marshal.

time = Benchmark.measure do
  selected_bed = KingSizeWoodenBed.new(76, 80)

  print "Original object \n\n"
  puts selected_bed

  serialized_object = Marshal::dump(selected_bed)

  print "\nSerialized object\n\n"
  puts serialized_object

  selected_bed = Marshal::load(serialized_object)

  print "\n\nOriginal object back\n\n"
  puts selected_bed
end

puts "\n\n Time taken: #{time}"

Serialize and Deserialize using Marshal

Original object 

Category: Bed 
Material: Wooden
 Size: King Size 
  Height: 76 
  Width: 80 

Serialized object

o:KingSizeWoodenBed
:@categoryIBed:ET:@primary_materialI"
                                        Wooden;T:
@sizeI"King Size;T:
                    @heightiQ:
                              @widthiU


Original object back

Category: Bed 
Material: Wooden
 Size: King Size 
  Height: 76 
  Width: 80 


 Time taken:   0.000128   0.000066   0.000194 (  0.000189)

Output of the above code.

As you can see, even though the encoded string looks like gibberish, the reconstructed string is the same as the original. This type of serialization can be used when we are not concerned with being able to read the encoded data.
Note that it took overall 0.19 ms.

2. Human-Readable Format

a. YAML (YAML Ain't Markup Language)

YAML is a human-readable serialization standard that uses spaces and dashes for representing object data.
YAML supports serialization of objects of any class in Ruby. The YAML module in Ruby is an alias of the Psych module, which is the default YAML parser since Ruby 1.9.3. The YAML.dump and YAML.load methods are used for encoding and decoding the objects.

Let's serialize the same object using YAML and benchmark it.

require "yaml"

time = Benchmark.measure do
  selected_bed = KingSizeWoodenBed.new(76, 80)

  print "Original object \n\n"
  puts selected_bed

  serialized_object = YAML::dump(selected_bed)

  print "\nSerialized object\n\n"
  puts serialized_object

  selected_bed = YAML::load(serialized_object)

  print "\n\nOriginal object back\n\n"
  puts selected_bed
end

puts "\n\n Time taken: #{time}"

Serialize and Deserialize using YAML

Original object 

Category: Bed 
Material: Wooden
 Size: King Size 
  Height: 76 
  Width: 80 

Serialized object

--- !ruby/object:KingSizeWoodenBed
category: Bed
primary_material: Wooden
size: King Size
height: 76
width: 80


Original object back

Category: Bed 
Material: Wooden
 Size: King Size 
  Height: 76 
  Width: 80 


 Time taken:   0.000939   0.001877   0.002816 (  0.056699)

Output for the above code.

Notice that the serialized object is so easy to read. But it took almost 56 ms.

b. JSON (JavaScript Object Notation)

JSON is also a human-readable data interchange format that needs no introduction. We are familiar with the JSON format for serialization as it has become a popular choice for data exchange on the web.
Ruby has the JSON library which provides similar methods like load and dump along with to_json and parse methods to parse data to/from JSON.

Let's serialize the same object again using JSON and benchmark it.

require "json"

time = Benchmark.measure do
  selected_bed = KingSizeWoodenBed.new(76, 80)

  print "Original object \n\n"
  puts selected_bed

  serialized_object = JSON::dump(selected_bed)

  print "\nSerialized object\n\n"
  puts serialized_object

  selected_bed = JSON::load(serialized_object)

  print "\n\nOriginal object back\n\n"
  puts selected_bed
end

puts "\n\n Time taken: #{time}"

Serialize and Deserialize using JSON

Original object 

Category: Bed 
Material: Wooden
 Size: King Size 
  Height: 76 
  Width: 80 

Serialized object

"Category: Bed \nMaterial: Wooden\n Size: King Size \n  Height: 76 \n  Width: 80 \n"


Original object back

Category: Bed 
Material: Wooden
 Size: King Size 
  Height: 76 
  Width: 80 


 Time taken:   0.000269   0.000105   0.000374 (  0.000370)

Output of the above code

If you notice, this is faster than YAML (Only took 0.37 ms)

Below is a comparison of the three formats.

Each serialization format has its own perks and uses. The choice of a format would mainly depend on what works best for your project case. As this blog summarizes, choose Marshal for speed, choose JSON for speed plus human-readability and YAML for human-readability and small data-sets.

Hope this article was able to throw some light on the Ruby serialization formats. We will see how these formats are leveraged by Rails for storage and data transfer in the coming parts.

Thank you for reading.

Everything You Need to know about Serialization in Ruby on Rails - Part I

Serialization in Ruby

1. Binary Format

2. Human-Readable Format

a. YAML (YAML Ain't Markup Language)

b. JSON (JavaScript Object Notation)

References

The Many Faces of Ruby's Top-level

Orchestration using Durable Azure Function