All Ruby on Rails Node JS Android iOS React Native Frontend

Better Way to Log Your Background Jobs

Let’s face it: building any non-trivial application in Rails is impossible without background jobs. There are several use-cases for those, some of which might be: periodic tasks, tasks too heavy to return the result in the synchronous way or the tasks that can be outside the flow and their failure should not make any difference to the currently processed flow.

Fortunately for us, the developers, because of the background processing being so widely used in Rails we have some awesome tools and gems to help us out! The most popular one being Sidekiq and that’s the one I'll use in the code gists.

Great, so pretty much everyone is now using background processing, but what’s with this logging thing? Isn't Sidekiq process generating logs and saving them to the text file by default? Yes, but these logs are ugly, hard to read, hard to aggregate and almost impossible to make automatic decisions upon (unless you are using something like ELK stack). This blog post will give you an example of how you can log your async jobs with just a bit of work and how you can use stashed logs for various purposes.

Step 1 - Create a table for logs

Yes, we're going to save logs into the database. The structure might depend on your needs. What I wanted to have for sure were: job type, job's current state, jid (Job ID), associated user_id (optional) and created_at / updated_at timestamps. Let's also add an index on the jid column.

Step 2 - Create model

Now, let's quickly add AsyncJobLog model with enums. You might want to add more states like failed or dead. For the sake of simplicity, I will just use started and finished. The types of jobs again depend on what background jobs are responsible for.

Step 3 - Add logging to your workers

Let's start with the module that will be responsible for most of the dirty work - meaning creating and updating logs:

Now let's use this module in the actual worker:

The flow is pretty straightforward - before the worker does its job, it first creates a record in the table providing some information about who he is. At this point we have a record saying the worker responsible for financial data import (job_type) and associated with the certain user (user_id) has started processing at the given time (created_at) - that's already a lot of nicely aggregated information. After finishing processing whatever the worker is responsible for, it simply finds his own log (by JID) and marks it as finished. But what if my worker fails, multiple times up to the point when it reaches the maximum amount of retries? Well, we can handle that in many ways: it all depends on what you need and what the convention is. Let's go with the simplest example and update our worker so it marks the log as finished when it retried for the last time:

If your application can make a use of the information about the job being unsuccessful, you might want to update job log with the new state, such as dead (don't forget to add it to your enum hash). While we're at it - what about the jobs that are not allowed to retry at all? Here is another approach to the problem. Let's add new method to our module for the sake of simplicity, but in the real world you might want to create a separate module with descriptive name:

Now we can use it in the non-retryable worker just as the other method:

We simply make sure that whatever happens in the main block of the worker, we’ll end up with log marked as finished. It's crucial to have the right state of the job reflected in the log if you want to make decisions based on gathered information.

Step 4 - Use gathered information

Okay, we got logs nicely structured and saved into the database. What now? There are tons of things you can use logs for. Now you know what worker performed at what time and for which user. You can add all kinds of fields to those logs, the options are limitless (well, actually limited by the max number of columns) such as calculated processing time, amount of retries, references to other associated models and so on.

You can probably imagine working with those logs manually, so using them for debugging or showing/filtering them in the Active Admin. Let's just go through one of the possible scenarios where the code uses those as a part of logic.

Critical section

We previously defined a couple of job types. Let's assume that it shouldn't be possible for more than one financial-related worker to run simultaneously for the same user. That would mean the reconciliation worker and financial data import workers cannot run at the same time. First, let's add a scope to the AsyncJobLog model that's going to be handy:

Next, in our FinancialDataImportWorker, let's check if we can proceed before even creating a job log:

The worker will simply quit without starting if there is one of the two financial workers already in progress for this user. Of course, quitting is the easiest option. You can raise Rollbar error or do anything else that makes sense to your application. To make it the real critical section you can make worker retry in certain intervals (don't forget to set some kind of TTL!). You might be wondering where where.not(jid: jid) came from. If you have retryable worker that doesn’t mark log as finished when it fails you basically leave the critical section closed to any financial related job. That's good, but you need some kind of "backdoor" for the failed worker to retry - and this is your backdoor. The Sidekiq job, even when retrying, is still using the same unique JID. That means we can use this JID to authorize retrying worker to get into the critical section and try to finish the job.

PS. If the critical section is VERY critical it might be worth creating unique key in the database to make sure you won't get any dirty read/write and even in the very unfortunate event - won't let two financial workers spawn at the same time.

Conclusion

I hope that gave you an insight into how you can easily setup background job logging and what are the advantages of using even simple logging mechanism. Asynchronous processing definitely isn't easy, nor is debugging problems in this area. Logging workers gives you more information about what is going on behind the scenes and also can be used within the code to adjust the flow - both in synchronous and asynchronous situations. Once you set it up you will probably come up with other ideas how to use logging - all in all each application might need very specific information about background jobs. Also be aware that this isn't one-size-fits-all solution. You might need to tweak it if you need better performance, such as using UUID created out of JID or using Sidekiq locking mechanism supported within sidekiq-lock or sidekiq-unique-jobs. Be cautious when choosing solutions for high load systems and make sure they are well tested! Good luck and have fun!


Huge thanks to Maciej Mensfeld for help in making the most of this post!

Photo by Maciej Rusek on Unsplash

We're building our future. Let's do this right - join us
New Call-to-action
READ ALSO FROM Ruby/Ruby on Rails
Read also
Need a successful project?
Estimate project or contact us