Speed

  • Method Missing

Method Missing is one of the concepts of metaprogramming ruby. Although metaprogramming is very powerful it too has some shortcomings especially speed. A normal method is comparatively 1.5x times faster than a missing method. One of our users complained about our calendar page being very slow - on inspecting we found request was taking more than 30 seconds for loading the month view of the calendar and eventually request timed out. The user had lots of activities on that calendar - 2000+. On further inspection, we found our DB query was quite fast but each activity was calling several missing methods via one of its association.

class Task
  def the_method
  end
end

class ActivityState
  def method_missing(method, *args, &block)
    if task
      if task.respond_to?(method)
        task.send(method, *args, &block)
      else
        super
      end
    end
   end
end

We replaced all those missing methods using activity.task.the_method instead of activity.the_method - now the time taken to load the month view is less than 3 seconds.

  • Caching

We cached values for resource intensive stats, counts and we update the cache as the data is modified in the system via background jobs. This strategy allowed us to save network bandwidth and a lower request time. Sometimes we all have a tendency to neglect the query efficiency given we are running it in background job but that too has a side effect on the overall DB performance and thereby making any server request involving DB queries slow. So it's essential to make sure DB query as efficient as possible then use it in a background job. Cache can attract lots of bugs if not handled properly especially multiple data entry or update points and not all routed via callbacks.

  • Order Clauses and Use Limit, do not load all.

Although database order clauses are generally faster - you should avoid if not really needed. Example - If you are querying a table for calculating a trend or generating some stats - order of the data does not matter.

  • Database Indexes

Database indexes allow us to fetch data quickly. Without the index, the database will scan the entire table and then sort the rows to process the query which will be inefficient for a larger table. We should only add database indexes wherever needed as it increases the time take to insert or update a record and also indexes take up lots of disk space.

  • Select necessary columns and Avoid *

With select * you are bringing all the columns which you may need especially your table has lots of columns for smaller tables difference is unnoticeable. Pluck is a similar technique in rails but it returns you an array whereas return object from select is still an ActiveRecord relation which you can chain further as per your needs. By specifying what you need exactly you will be able to save some network bandwidth in addition to some query time. With rails, we do not care much about select and are okay to use select * but as your app grows and evolves, manually specifying columns for select will really benefit in terms of optimizations.

  • Avoid Count Queries and Use exists?

      irb(main):025:0> Contact.visible_by_user(user).any?
         (36.5ms)  SELECT DISTINCT COUNT(DISTINCT "contacts"."id") FROM "contacts" .....
    
      irb(main):026:0> Contact.visible_by_user(user).exists?
        Contact Exists (1.3ms)  SELECT  DISTINCT 1 AS one FROM "contacts" .....
    

Count queries are really slow - wherever possible we should avoid it. Especially for checks as shown in the example above - we need not count the records until we need to compare something.

We can cache the count when a record is added or removed in the system yet again some extra work but will help us in the longer run. This process will make your creates and destroy slower but if the approach is well thought of, extra time is negligible. We generally try to avoid the counter approach where we increment or decrement when we add or remove something respectively - given the complexity of our app, permission systems, data coming from multiple sources and multiple users - we use optimized query to get the counts using background jobs and cache it in Redis configs.

  • Also, few more things which helped us to speed up our app where Bulk SQL Inserts to create bulk data at once, removal of n + 1 queries and most important we used large data dumps to check our optimizations - that helped us gauge the amount of improvement we were getting for each change we made. All the above tips although may sound basic and known we somewhere neglect or miss but are very significant for an app which has lots of data and a very high traffic.

Our approach is to daily monitor these bottlenecks using tools like New Relic, Scout and Librato and make optimizations and repeat the procedure until we get close to a timing which is usable for a user. We are yet to achieve a seamless and consistent speed across all our requests but we are heading in a right direction. We also made many more large scale changes like moving some of our UI to React, changing our Solr strategies, using Elixir as a microservice for some of the resource intensive tasks but that's for an another post.

Hopefully, this post helps also, we would also love to hear from you what were your learnings while scaling or optimizing your application.