How to architect a django website for the real world?

I found that most resources online are focused on building django websites locally, or for simple use cases that are not tailored to the open world. So, I decided to share the architecture of a recent project of mine, promozilla. I believe it is a good example of flexibility in systems interactions without compromisity ease of use.

What is promozilla.xyz?

Promozilla is a promotion tracker build using a framework I love, Django. It tracks Nintendo Switch games, consoles and accessories promotions on the Portuguese market. In a simple way, first, it scrapes the Portuguese stores every night, then stores the game prices and lastly sends an email to the Switch owners that have the game in their wish list when it is in a promotion.

I believe that this personal project ended with an interesting architecture, suitable for a production-ready django system and I would love to share it with you.

  1. How to architect a django website for the real world?
  2. Monitoring your django site: how to and first steps
  3. How to add privacy friendly analytics to your Django website

Django architecture? What?

First of all, what is a system architecture? For me, the architecture of a solution (in this case a promotion tracking website build using Django) consists of a description of its components, how they interact, and most importantly, why they were chosen.

Why this architecture for a django site?

First of all, this is a side-project, so I must find the technology interesting and/or a good learning opportunity (that’s the reason I chose grafana and prometheus, for example).

Then, its components should be like legos, meaning:

  • Adding or removing components does not break unrelated stuff
  • It is easy to deploy (they play well with docker containers, for example)
  • They play nicely together (django and postgres rather than django and mongodb, for example). This means that I can spend my time adding new features and not configuring stuff
  • They are easy to test and run locally
  • They are fun to work with. This is a side project after all!

How to implement this in practice? Django loves docker!

How did I do it? Easy! Docker and docker-compose! Docker provides the container technology and docker-compose the orchestration. For those that do not know what that means I recommend watching this video from Fireship.

I ended up with two docker-compose repositories, so two versions of this architecture. Oone for local development and another for the production environment. The local development repo builds the local images from the development code, but the production environment one pulls the images from the gitlab image registry during the CICD cycle.

I really like docker-compose for several reasons. First, it is very easy to add or remove services, second, the service configurations are kept separate from their secrets and it is nice to track everything wih git. Lastly, it is super simple to backup production data, since I just need to copy the service’s volumes and save them somewhere safe.

The architecture

This architecture has several components, from the ones the visitors interacts with directly, to the system monitoring, web analytics, the hosting and DNS, code repository, or CICD flows. Don’t worry, they will all be talked about.

In this post, I will only describe the components in the red rectangle since those are the ones the visitors interacts with directly. I will describe the other components in separate posts because this one was getting too large. Let’s get started!

The components

The star component: the django website

This is the breadwinner of the system and the reason why you clicked this article. This is the main piece of promozilla: it renders the pages the visitor clicks and handles persisting and retrieving user information (like registration, login, their product wishlist, etc), and of course, game prices. It runs with gunicorn to handle more than one visitor using the site at the same time.

You can of course visit it here: https://promozilla.xyz/

It is a fairly simple application technically, and it uses Materialize as the responsive frontend css framework (to run away from bootstrap). To make the development job easier it has some extensions installed like django-allauth, django-filter or djago-materializecss-form. Lastly is uses some other extensions to connect with the rest of the stack, like django-prometheus and celery.

It uses sendgrid to send the registration emails (more on that below), PostgreSQL as the database, and shares disk with nginx for it to serve the images and css.

Django celery worker

This application has some long-running tasks that do not make sense to be running in the main application. Why? They will take resources from the main responsibility of the system: displaying promotions. They are harder to scale separately and there are even technical limitations to that (in the case of celery-beat, for example). Some example of those long-running tasks are:

  • Notifying the users if a game on their wish list has a new promotion
  • Storing the scrapped prices in the database
  • Triggering the nightly runs for scrapping and email notification
  • Retrieving the product thumbnails from the store page.

So those tasks as perfect to a system built for long-running asynchronous tasks that support concurrency and distribution, or in another word, celery!

This worker is built using the celery integration with django. I chose it because these tasks share many of the components of the main app. This makes it much easier to share the data models, the application shared settings (like email and database connections, for example) and in general leads to less code duplication, which is always a nice plus.

To send and receive work, it communicates with the rest of the workers using RabbitMQ (more on that below). Lastly, some tasks must run every day (like the scrapping and notifications), so it uses celery-beat, celery’s message scheduler, as a cron job to trigger the messages.

It uses sendgrid to send the registration emails (again, more on that below), PostgreSQL as the database, and shares storage with the django application so that it can store the product thumbnails.

Store scraper celery worker

This is the heart of the operation: it contains the scraping logic to get listings of Nintendo Switch products in online Portuguese stores. This code was initially in the django worker app, but I decided to split it because it started having its own needs, for example, scrapping libraries and proxy logic, that could have side effects on the main app (e.g. dependencies versions, bloating, etc)

The flow is very simple: the scraper receives a rabbitMQ message through celery with a Store and Product (e.g. Fnac/Games) to scrape. It visits the corresponding site, does its magic and finishes by sending a message again with a list of the products, their prices and promotions status. This message is consumed by the django celery worker that stores its content on the database.

Because of its stealth technology, the scraper is also responsible for downloading the product thumbnails and storing them so Nginx can later serve them.

An email provider: Sendgrid

Like most websites, email is the main method promozilla uses to communicate with its users. Email is used in the registration and profile management flows (e.g. password resets, etc), for example. One could naively use their personal Gmail or host its own email server, but that comes with management headaches and a high risk of the emails being classified as spam. So I decided to use SendGrid for django here, connected with SMTP. SendGrid has a free plan good for a side project of this size.

A scraping proxy: OpenVPN proxy

Since we are web-scraping, it is certain our requests might be blocked. To circumvent this I routed the requests through a proxy. I used a Privoxy-OpenVPN docker image where the scraper worker could route its requests through. This is fairly straightforward to setup, one just needs to have a proxy it can use (there are quite a lot online). Typically the free proxies are blocked very quickly, so I recommend going the extra mile and paying for a good service.

Infrastructure

The reverse proxy: Traefik proxy

Traefik is a reverse proxy built for micro-services in mind, with lots of features designed for a container environment. I think it is the best reverse proxy for django for three reasons. First, it is very easy to configure: it is a single line of code to add and block services from being public or to force incoming connections to use HTTPS. Also, it is configurable from the project’s docker-compose file, meaning git tracking and one less file. But the biggest reason I chose it is because it has automatic SSL certificate generation capabilities.

However, there are other alternatives for reverse proxies that I would like to present:

  • nginx a reverse proxy that is also capable of serving static files, but lacks automatic SSL certificate generation out of the box
  • caddy, a reverse proxy that can serve static files AND automatically generate static files, but is not configurable from docker

The biggest drawback of using traefik with django is that you need to rely on another static file provider, but I will get on that later.

The database: PostgreSQL

The relational database is the unknown hero of any architecture because it stores its most precious resource: information. Django’s ORM was built with PostresSQL in mind, so it was a clear choice for the database. There are other choices, like Sqlite or MySQL, but I do not think they are worth the hassle. However, the deciding factor for me is that Django’s text search takes advantage of Postgre’s features, so I do not need to reinvent the wheel for something boring as text search.

Piece of advice however: some features are not enabled in postgres by default (like the TrigramExtension).

The static file server: Nginx

Django or Traefik are not suitable to serve static files, so I needed a solution. What are static files in the first place? Static files are files that are not server-generated, like images or css. Because of their nature, we can optimize their delivery with caching or serving from a server close to the website visitor. For this, I chose nginx, since I had used it previously and it is quite easy to set up. However, I would like to present three alternatives to serve static files in another post I wrote.

The message broker: RabbitMQ

The workers need to exchange work, so a broker is needed. Rabbitmq provides a common intermediary for them to communicate. It is perfect for this job because it is built with resiliency and fault tolerance in mind. For example, if one of workers goes down, the job does not disappear and is sent to another. Or it is very easy to scale the workers because they can start consuming the workload right away. Lastly, it is very easy to set up with docker and celery supports it out of the box, meaning I do need to reinvent the wheel..

Gitlab repos

Right now I’m still cleaning up the code, but I’ll share the django, scraper and docker-compose repositories. Stay tuned!

  1. How to architect a django website for the real world?
  2. Monitoring your django site: how to and first steps
  3. How to add privacy friendly analytics to your Django website

Conclusion

That was it, thank you for your attention! I spent some time discussing the “code-related” components of a promotion tracking website and I hope this was useful.

In the next posts I will present some other very important pieces of my website. While the visitor does not interact with them, they make my life easier and the project more interesting.

Django is a very flexible framework that presents use with many options, so I showed you my way of doing things. Feel free to share your way!

You might like

Leave a Reply