Monitoring your django site: how to and first steps

  1. How to architect a django website for the real world?
  2. Monitoring your django site: how to and first steps
  3. How to add privacy friendly analytics to your Django website

This is the second post in my Promozilla series. Today I will discuss how I added monitoring to Promozilla, a Nintendo Switch promotion tracking website built with django. In the first post, I described the general architecture.

Why?

Monitoring our django site solves a very simple question. If we want our django website to be used by the world, it is nice for it to be running in the first place.

Monitoring enables us to do that, and preferably in a proactive way. Of course we can ssh in every day and read the logs to check for any issues, but that’s far from perfect.

First, we are not alerted if a problem arises, which means that the site can be down for hours or days before we realise it. Lastly, that’s not really clarifying since the logs may contain too much noise or missing some information.

So there has to be a better way! And there is!

It is fundamental to know the status of our applications. Today I’ll discuss about the purple section of the diagram: monitoring.

How?

Django’s ecosystem makes it very easy to monitor our website. With some dependencies installed and a little bit of configuration, we can be monitoring our django website in no time.

I designed promozilla monitoring to solve both of those issues. Bugsnag alerts me via email if any issue arises and grafana/prometheus allows me to have a global picture of the different components and how they are evolving over time (any degradation while I was away?).

Lastly, take into consideration that if you host your monitoring stack in the same place you host the rest of the infrastructure, if both go down you are out of luck.

The components

Error monitoring: Bugsnag for django

Bugsnag in its simplest core is a dashboard for exceptions. One problem in my previous projects was that the only way for me to know if the sites were up was to either a) visit them, b) have someone complain, c) read the logs. But this is no way to sleep nicely at night. Fortunately, bugsnag is good peace of mind creator.

It provides a very simple django middleware that can be integrated with your django project. Every time an unhandled exception occurs, you receive an email with its stack trace and some very nice details. This can be tuned, of course, but the value it brings out of the box with its free plan is tremendous. One strong point, since it is hosted outside your server: if it is is unreachable you’ll still be able to somehow see how it went down.

It has some pretty good documentation on how to start using bugsnag with django. I seriously recommend bugsnag.

Dashboards: Grafana

Grafana is a powerful monitoring tool. It contains tracing, logging and dashboarding functionalities. To keep things simple, I decided to just use grafana’s dashboard with django.

The dashboards grafana provides are ideal to understand not only business metrics (most popular pages on the site/ who is referring us), but also application metrics (number of errors, database connections, average response time, etc).

Below I’m showing the dashboard I built for promozilla. The first screenshot contains business metrics: visitors over time, number of new accounts and referrers. The second screenshot has service-quality metrics: response time, error rates and requests served.

Grafana dashboard with django site popularity metrics: page views, referrers and new accounts
The dashboard can contain details about our django website popularity: popular pages, page views and referrers
Grafana dashboard with django site popularity metrics:  latenct, requests and errors
Grafana dashboards are also very useful for stability and service monitoring of our django site: latency, requests and errors.

Here I am also showing metrics retrieved from traefik, my reverse proxy, which has Prometheus support out of the box.

If you want to have some inspiration, Grafana has a dashboard and widget showcase page and of course, documentation.

The important thing is: Grafana is not a data storage solution. We need to have a service responsible for just querying and storing our system metrics. For that, I added Prometheus into the mix.

Prometheus with django out of the box

Prometheus is the perfect storage solution to integrate with grafana. At it’s core it is a time-series database designed for metrics storing and querying.

The way it works is tremendously simple: services like django expose an endpoint that Prometheus periodically visits to collect the data. This is called a pull model, where it visits the application to collect the metrics, as opposed to a push model, where the applications send the metrics to the database. This makes everything simpler (e.g. prometheus can be offline and the applications are not affected).

Fortunately for us, django has an extension that exposes a boatload of metrics – but if you are curious, I created a tutorial on how to Create a custom django prometheus metric.

Also, the reverse proxy that I used, traefik, has Prometheus support out of the box, which enables some of the plots you see in the previous screenshots. This is the great thing about prometheus. Because it is so ubiquitous, you do not need to reinvent the wheel to add monitoring to your applications.

Lastly, Prometheus has a particular data model and query syntax when compared to more traditional query languages like SQL. It has some particular concepts like metrics (counter, gauge, histogram and summary) and dimensions that are worth getting familiar with before delving right into it.

Caution: in my experience, Prometheus is quite heavy on the RAM side – be careful if you are running your website in the same machine.

  1. How to architect a django website for the real world?
  2. Monitoring your django site: how to and first steps
  3. How to add privacy friendly analytics to your Django website

Django monitoring: conclusion

I think my django monitoring approach is very solid because it has alerting for when something goes wrong, with bugsnag, but also provides with a birds eye view of recent events and application changes with dashboards provided by grafana and prometheus. I hope this was useful and I’m welcome to any feedback!

You might like

Leave a Reply