Spoiler Alert: This article is not about Microservices architecture or how microservices works. In this article, I will share what we have learned in scaling one of the large SaaS platform, our ongoing improvements and how Microservices architecture is playing a key role in scaling this SaaS platform.
This SaaS Platform has experienced tremendous growth in the past 18 months. It has now become one of the leading CRMs in North America.
Everyone’s definition of “scale” and “growth” can differ based on specific numbers or metric a company is targeting i.e. revenue, user base, or others. We are focussed in the speed to “customer success and satisfaction” growth. If customers are happy and able to grow their business using this SaaS platform then that kind of growth makes us happy. Thus, our definition of scale & growth has the following ingredients:
- Performance i.e. Platform Speed (sub second response time)
- Stability i.e. Uptime (99.999% availability)
- New features that helps our Customers grow
- Constant Improvements i.e. enhancements and bug fixes
- User Support with quick turnaround time
- Constant upgrades in technology stack & infrastructure
- Shared learning of platform through Facebook Community
Just to give transparency, this SaaS platform has witnessed 97% growth in subscriptions within the past few months.
Technology plays a very important role in this kind of scale and growth and thus we need to always stay ahead (and often catch up enhancing legacy code) for this kind of demand and growth.
Our SaaS web application involves more than a dozen integrations with external services and has an inbuilt robust Workflow platform along with 45+ custom features.
Here is our Tech Stack
- Ruby on Rails (Server-side web framework) — Rails 4.2 and Ruby 2.3.1
- Elixir and Phoenix (MicroService for 4 inter-related projects)
- PostgreSQL (Relational database) and Follower database setup
- React.js and Redux (Javascript library for user interfaces)
- SOLR (Search engine)
- Redis (Caching)
- Sidekiq (Background jobs processing)
- Amazon Web Services & Heroku (Servers)
- Semaphore CI (Continuous Integration)
- NewRelic, Scout, Librato, Pingdom (Performance and Platform monitoring)
Performance i.e. platform speed of the SaaS application is paramount and we noticed the performance of our web application was constantly degrading as our user base grew. We were facing random crashes i.e. downtime of a couple of minutes every few months. Thus, the Performance and Uptime metric plays a very significant role in scaling SaaS web applications.
We addressed many of our scaling challenges through constant improvements in our code refactoring, database optimisations, technologies, and infrastructure upgrades along with separating a few of our core resource intensive processes into Microservices.
Microservices are an architecture pattern that helps break down large complex systems into multiple smaller, more manageable systems. These services are built around business capabilities and independently deployable by automated deployments. Microservices can be written in different programming languages and can use different data storage technologies based on business needs.
Here are the names of a few companies successfully growing and scaling using Microservice architectural pattern.
- Netflix
- Amazon
- Uber
- Ebay
- SoundCloud
- Yelp
- Disney
- The Guardian
- Travis CI
We had a large monolith web application earlier i.e. all components and projects were running under one large server in one huge code base. We then adopted Microservices architectural pattern because it was a constant challenge managing one large monolith application. I won’t debate here whether Microservices are best or better than the Monolith application as both have their own pros
and cons
based on the nature of business requirements and problems/challenges we are solving.
Microservices has its own challenges for managing various moving parts of a large enterprise app but in our case it’s a “good” or “nice to have” challenge. Microservice helped us scale for our growth and to focus independently on core business needs.
One reasonable argument would be that we should not start with a microservices architecture right from the beginning of an application. Instead, it can begin with a monolith and later split it into microservices once the monolith becomes a problem.
Here is a picture depicting difference between monolith and microservices
(Credits to Martin Fowler and James Lewis for this picture)
Currently, in our large enterprise app, we have separated four medium to large functionalities and components into microservice to run as a separate process that coordinates asynchronously with the central master application and database.
We still have a lot of scope to separate a few more of our tightly coupled components into separate Microservices. We still need to improve one of our Microservice as it is separated but still have areas that are coupled with master application.
Here are our key takeaways while developing Microservices and improving overall system:
1. Define metrics to monitor. Following are our key metrics to monitor entire platform
- App Server response time
- End user response on browser
- Apdex Score
- Uptime
- Request Latency
- Request Time (95th and 99th Percentile)
- Postgres Load Average
- Postgres Memory Load
- Top 20 SQL queries taking over 1 second and number of invocations
- Dyno Load Average and Dyno memory
- Number of jobs being processed in Sidekiq daily
- Sidekiq — Time taken to execute important jobs in seconds and minutes
- Sidekiq — queue growth (number of jobs getting queued for longer)
- Average Solr response times
- Redis cache size growth
- CodeClimate Issues and GPA Score
- Code Coverage (Unit Tests)
- Sidekiq failures
- Vulnerable Packages
- Open Sentry Errors and Number of Crashes
2. Setup alarms and notifications for all the key metrics mentioned in point-1 above.
These alarms and notifications will notify (by email, sms and chat integrations) everybody in engineering team including key stakeholders if anything crossed the threshold limit we set or anything dropped below our configured benchmarks that needs urgent attention. We have team members in multiple countries (in various time zones) to respond to urgent issues that arise.
3. Extensively monitor platform metrics mentioned in point-1 above to know the growth in terms of usage, performance, consumption, resources etc.
This will help to precisely understand bottlenecks and challenges in the system. Then respond fast by devising a plan to refactor, improve, scale, upgrade or divide the components into a microservice that needs to run as a separate component in its own workspace.
4. Do not get complacent or satisfied easily with improvements
until these improvements are monitored for some extensive period of time (at-least few weeks to a month), do not get complacent or satisfied. Monitor trends and celebrate when the trend meets expected satisfaction level. Example in our case: 99.999% uptime, less than 230ms app server response time, zero crashes in a period of 60 days etc.
5. Explore, learn, and implement latest technologies
Explore, learn, and implement latest technologies that gives edge for productivity, reliability, stability, and performance. Example in our case: we developed a couple of our microservices using Elixir
and Phoenix
that gives us extremely solid stability, reliability, performance, and scalability.
6. Listen to customers actively and appreciate their feedback
Acknowledge customer feedback. Sometimes you won’t like the feedback but it’s the most important metric available easily to work upon i.e. the metric of “Whether your Customer is happy and able to execute work seamlessly on daily basis 24x7”. No matter how many metrics, alarms, and monitoring we have in place, it’s our Customers who give most valuable and accurate insights with our CRM platform which can help to find root cause of various challenges and bottlenecks in the system. This customer centric aspect has played the most important role in our technology improvements and initiatives. We should have our ears grounded all the time and listen to customers for bottlenecks and challenges.
7. Attitude of Gratitude
Be thankful for all the challenges in growth and scale. This is the biggest source of learning and growth as a company and as an individual.
We would love to hear your experience with Microservices and challenges you faced scaling web applications.