Building a notification system in Ruby on Rails

A story about writing a notification engine responsible for notifying folks when their websites blow up.

Jamhur Mustafayev - • Updated

Backstory

When I shipped the initial version of Hexadecimal, the only way to get notified when something went wrong with customer’s websites, was via email. Naturally, all of the text pertaining to notifications was contained inside mailer views.

I wanted to add more channels (e.g., SMS, Slack), but the challenge was sharing notification text (that was previously living inside mailers views) between all channels.

When I say notification text, I mean the contents of the message my customers receive whenever something worthy of their attention happens. For example, when a website goes down, instead of a generic “your website is down, go figure” message, they receive a reason why the website is down, what might have caused it, and how to go about troubleshooting it. I want them to spend as little time as possible on identifying the root cause of the downtime.

Excerpt from the notification one receives when a name resolution for their domain fails (click to toggle)
Name resolution (DNS lookup) for tryhexadecimal.com has failed

It usually happens because a hostname
doesn't have any DNS records pointing to it.

To look up the DNS records for tryhexadecimal.com,
run the following command from your terminal:

    nslookup tryhexadecimal.com

If the resulting message is
"server can't find tryhexadecimal.com: NXDOMAIN",
it means your host doesn't have
any DNS records associated with it.
Note that it takes some time for DNS records to propagate.

If you think that you have mistyped the URL,
the best course of action would be to delete this check
and create a new one
(you can't modify the URL after creation, by design).

Architecture

Each notification is a single, standalone Active Record model. These models share some similarities, but also differ in some ways. I looked into polymorphism and Single Table Inheritance (STI), but after a bit of consideration, tradeoffs seemed more costly than alleged benefits. Having a separate model for each notification channel conformed to the Do The Simplest Thing That Could Possibly Work, which is my preferred way of doing things, at least during the exploratory phase.

Each notification is associated with a single Event object. Notifications can be either global (i.e., they are automatically added to all websites) or local (i.e., you add them manually to each website).

Each notification has a dispatch method that receives data, processes it, and passes on the request to the specific background job that sends the notification. DispatchNotificationsJob background job is responsible for orchestrating the entire show: from gathering all notification channels (e.g., email, SMS, Slack) for a given website, to dispatch-ing notifications to them.

class DispatchNotificationsJob < ApplicationJob
  queue_as :notifications

  def perform(website, event, **kwargs)
    notification_channels = []

    notification_channels += website.email_alerts.to_a
    notification_channels += website.slack_alerts.to_a
    notification_channels += website.sms_alerts.to_a

    notification_channels.each do |channel|
      channel.dispatch(website, event, kwargs)
    end
  end
end

As I mentioned earlier, the roadblock for me was sharing notification text between all those channels without losing my sanity.

First attempt - i18n

My first instinct was to use Rails’ i18n machinery. Briefly, you store your strings in a locale-specific YAML file (for example, en.yml contains strings in English), and retrieve them when necessary.

This approach quickly fell flat on its face because some mailer views contained logic that I didn’t want to remove (for example, iterating over missing keywords and listing them in an unordered list). After all, that would have resulted in more generic messages, which in turn would have necessitated folks to log in to their dashboards every time a Bad Thing happened. Doubleplusungood.

Second attempt - a service object

Create an intermediary service object that will accept the data and return the appropriate message.

The idea was that whenever I need to obtain the message, I would create the object, pass all the data to it, and in return, would receive appropriate strings via public methods such as notification_title and notification_body (think: junk drawer).

Surprisingly, it worked. I wish it didn’t. It was the epitome of the duct tape and bubble gum. An unashamedly long case statement with squiggly heredocs all over the place. It was one of those Don’t go there places of the codebase where monsters roamed.

As much as I abhorred that piece of code, it was working.

I decided to leave it alone for the time being, and come back later when I have more time on my hands.

Third attempt - placeholders

I decided to tackle the technical debt once for all.

“What if I could shove those strings in a database”, I thought. To make it happen, I will replace dynamic parts with placeholders and insert necessary data whenever I need to. Those strings should be a part of the event object that this notification is associated with. Whenever I would have to dispatch a message, I would pull that string from the corresponding event object, and pass in the necessary data.

# object access is replaced by a variable
# website.url becomes website_url
website_url = website.url
event.body % { website_url }

Even though this proof of concept sort of worked, it never made it to production because of how messy the underlying code became. Since I can’t access object methods within those static strings, I need to pass each of those variables separately. Simple object access turned into multiple variable declarations. Yuck.

As much as I wanted to get rid of that despicable service object, I certainly didn’t want to trade one mess with another.

Final attempt - going back home

For some unbeknownst reasons, it didn’t occur to me that I can keep notification text inside mailer views, and access them from there before I stumbled upon these two StackOverflow questions.

So I bit the bullet and started making changes. I initialized the mailer class, passed in the necessary params, and dispatched the request to the appropriate mailer (notice public_send method: each event is tied to a single mailer). Then I would simply query the mailer object and get the necessary strings, such as the title and the body of the notification.

class SlackAlert > ApplicationRecord
  def dispatch(website, event, **kwargs)
    params = { website: website, event: event }.merge(kwargs)
    mailer = NotificationMailer.with(params).public_send(event.name)

    subject = mailer.subject
    body = mailer.body.encoded

    # Prepare the payload and schedule a background job
  end
end

Guess what? It worked like a charm. This approach might not conform to the “Single Responsibility Principle”, but in my book, practicality beats purity.

Delaying notifications

Not all notifications are created equal.

Some events, like downtimes, require immediate attention. Others, like certificate expiration notices, can be delayed for a few hours.

Considering that some of my customers are on-call, I wouldn’t want to wake them up in the middle of the night over less important notifications. There’s no harm in delaying these kinds of notifications until the morning.

To implement this functionality, I calculate the current hour in the customer’s time zone, and if it is within the waking hours (between 08:00 and 20:00), send the notification. Otherwise, I delay notification up to 12 hours (until 8 in the morning).

class ApplicationJob < ActiveJob::Base
  # Automatically retry jobs that encountered a deadlock
  retry_on ActiveRecord::Deadlocked

  # Most jobs are safe to ignore if the underlying records are no longer available
  discard_on ActiveJob::DeserializationError

  def self.delay_hours(time_zone)
    current_hour = Time.zone.now.in_time_zone(time_zone).hour

    if current_hour < 8
      8 - current_hour
    elsif current_hour >= 20
      24 - current_hour + 8
    else
      0
    end
  end
end

To determine if the notification should be delayed, I pass the time zone to the delay_hours function. If the event happened during the waking hours, delay.hours is 0, and the notification goes through right away. Otherwise, it is delivered in the morning.

delay = ApplicationJob.delay_hours(website.account.time_zone)
DispatchNotificationsJob.set(wait: delay.hours).perform_later(website, event, certificate: certificate)

Now all customers get a good night’s sleep :)

Interested in behind-the-scenes of a one-person software company?

No link tracking, no hidden pixels, no promotional emails, or other nonsense. I will only send you one email when a new article is out. Unsubscribe anytime.

You can also subscribe to the Atom feed (it's like RSS, but better).