Stress-testing Django, Django REST framework, FastAPI, Express.js, Go-chi and Axum
Intro
While working with different stack combinations, I sometimes wonder about the true throughput/performance of each solution. It is not straightforward to compare them because every product is unique and possesses different features and design choices. However, some things can still be compared, even though not everything will be a fair comparison. This will be comparing apples with pears, plums and other fruits to get an idea of the capacity.
Updates
2024-03-21 - Based on this comment, I changed the FastAPI code to async approach and re-run tests
Constraints
I decided to perform two parts of this test. The first one will consist of a JSON response with pagination, foreign keys, and actual database content. The second part will be a more brute-force-like test, without a database but still with significant data to be serialized, and under higher concurrency. For the sake of testing, I disabled any kind of request logging.
In some places, I control the inputs (e.g., page) more strictly, while in others, I ignore it. It was not worth switching context for every one just to gain no benefit. There may be minor response size changes, meaningly in go-chi solution, because it interprets the timestamp a bit differently. It is however so small diff, it was not worth sanitizing. In Django, however, white spaces in JSONs were removed (e.g.: instead of "key": "val"
having "key":"val"
) to make the JSON in line with other ones. Because this was quite a different response size otherwise.
The final constraint is the test system: Ubuntu 22.04 running on an AMD Ryzen 5 3600 with 32GB of memory. Everything is running locally, no containers, nothing.
Real database, low concurrency constraints
The first part will be performed under low concurrency because high load could negatively affect the database’s performance, skewing the results. This will simulate a real-world scenario where data is typically used dynamically, and even the choice of serializer or database connection library can make a difference.
All apps will use the same database, but some will utilize a mature, beefy ORM, while others will not. This mirrors real-life scenarios where ORMs are not used universally, and researching every possible option solely for this test is impractical. Some of my choices are based on popularity, as many people may use them in a similar manner.
The database engine will be PostgreSQL, and each app will use a connection pool.
Another constraint is running each app in a single fork and a single thread (apps can use more threads internally, like Rust for spawn_blocking()). First reason is to avoid possible concurrency issues. Second is managing the connection pool differently to not exhaust all the available server connections in case of running multiple forks. Finding the good setup is out of scope.
Static data, high concurrency
Second part will use static data serialized in every request. This setup will use multiple forks or threads, utilizing as many processess/threads as available CPUs (physical and logical).
Part 1: Dynamic content
Generic design choices
The database was migrated/generated from Django because it was the most convenient option. It was afterwards imported into SQLAlchemy and even into Prisma. The response object is the same for every app and follows standard Django REST framework (DRF) response containing the following structure:
{
"count": 5,
"next": null "or url string",
"previous": null "or url string",
"results": [{}]
}
For every API call, two database roundtrips will be performed, each consisting of distinct queries. The first query will execute a full COUNT() on the table, while the second query will be a SELECT with an INNER JOIN, LIMIT, and OFFSET clauses. While experimenting with solutions like COUNT() OVER and CTEs could provide valuable insights into how network latencies might affect the performance of two quick queries versus a single slow query, this exercise is considered out of scope for this test.
The app will have a parent and child objects. With a foreign key relation.
App options
I chose Django + its internal JsonResponse because of my long history with it, and in many cases, it may still be a better choice than DRF or other solutions.
I chose Django + DRF because it’s a no-brainer for REST APIs and a very popular combination. Since I already chose Django, DRF is a natural option.
I chose FastAPI + SQLAlchemy because it’s popular, designed to be async, which brings a fresh difference compared to Django choices, and is favored by many. I have also used it in one product and was happy with it.
I chose go-chi + GORM because I have used it for every Go app I’ve developed so far.
I chose Express.js + Prisma because it is a popular combination in multiple applications, even though I haven’t worked with this setup before.
I chose Axum + SQLx because Axum looked lightweight enough and I’ve heard good things about SQLx. I haven’t worked with this setup before.
Particular configurations
Django
By default, Django utilizes many different applications and middlewares. I decided to trim it down to django.contrib.auth and django.contrib.contenttypes, commenting out anything else and setting the MIDDLEWARE setting to empty. Additionally, I disabled debugging with DEBUG = False and the admin interface with ADMIN_ENABLED = False. The only reason I kept auth and contenttypes enabled is because of DRF dependency, making these two setups more comparable. I used low-level view functions instead of viewsets or similar constructs, as it didn’t make much sense to use them in this context.
In addition to the defaults, I used dj_db_conn_pool.backends.postgresql instead of the standard Django database backends.
The test was performed using gunicorn. While I typically achieve better and more stable performance using uWSGI, gunicorn was chosen as it appears to be a more popular choice. The command used was poetry run gunicorn djangoapp.wsgi --workers 1 --threads 1
.
Django + DRF
The configuration is more or less the same as previously, with the addition of the rest_framework app. I modified the REST_FRAMEWORK configuration setting to handle 30 results per page and removed the BrowsableAPIRenderer, which is the default setting. I defined serializers for both models and a ModelViewSet, although it does not make sense for a single list view.
I used the same gunicorn configuration as before: poetry run gunicorn djangoapp.wsgi --workers 1 --threads 1 --bind 127.0.0.1:8001
.
FastAPI + SQLAlchemy
I utilized FastAPI + uvloop, along with SQLAlchemy to dynamically map the database to ORM classes and utilize the provided connection pool.
Run as poetry run uvicorn main:app --host 127.0.0.1 --port 8002 --workers 1 --loop uvloop --no-access-log
go-chi + GORM
Not much. Prepared modules, mainly go-chi, GORM + dependencies. Built with go build -ldflags="-s -w" main.go
and run with GOMAXPROCS=1 ./main
.
express.js + prisma
I generated the schema.prisma from the database and only added @@map("table")
and renamed the models to match the intended design. Additionally, in the schema, I enabled generator client with previewFeatures = ["relationJoins"]
as it was a test feature. The joining support in Prisma is poor, which results in poor query performance as well. However, I decided to follow their design as I couldn’t find a better published solution. And doing it low-level SQL would render Prisma pointless.
Other than that, I haven’t done much more than the standard example app.
Executed with node app.js
.
Axum + SQLx
I can’t provide much insight here because I’m not experienced with this language, and the setup was the first time I set it up. Since I wanted to run it in a single thread, I used #[tokio::main(flavor = "current_thread")]
. I built it with cargo build –release and executed the output binary. That’s all.
Collecting data
For each app, I run a single request to “warm-up” the app. Since I have only a single fork/thread, no need to do more. Then I run the following siege:
siege -c 1 -r 1000 --log=./siegelog/800X.csv ...
And then just combine the CSV files.
Results
The following are the result as gathered on my testing machine.
App | Elap Time (s) | Resp Time (s) | Transaction Rate (transactions/s) | Throughput (MB/s) |
---|---|---|---|---|
Django | 3.13 | 0 | 319.49 | 30.35 |
Django + DRF | 4.6 | 0 | 217.39 | 20.65 |
FastAPI + SQLAlchemy | 5.81 | 0.01 | 172.12 | 15.83 |
FastAPI + SQLAlchemy (async) | 3.59 | 0 | 278.55 | 25.63 |
Go-chi + GORM | 1.22 | 0 | 819.67 | 78.69 |
express.js + prisma | 3.77 | 0 | 265.25 | 25.2 |
Axum + SQLx | 1.17 | 0 | 854.7 | 81.2 |
There is no conclusion in this part. Only true surprise is low performance of FastAPI + SQLAlchemy setup.
Part 2: Static content
Generic desing choices + app options
For the second part of the test I used more or less the same setup, where in case of Python apps, number of forks equal to number of system threads will be used, for Go app the GOMAXPROCS
won’t be set to make it run in default mode, for the Axum app, multi thread flavor will be used and for the Express.js app, additional configuration supporting forks will be used (app_cluster.js vs app.js).
One additional change. The Django + DRF won’t be executed here. Because it doesn’t really make sense to return static data from DRF.
Collecting data
Testing the multiprocessing setup will be performed with 1000 requests and with the following concurrencies:
- 1
- 50
- 100
- 150
- 200
- 250
Each time, app will be warmed up with some requests batch.
Limitations of the experiment
It is not absolutely possible to compare these, because even such a fundamental thing as multi-process vs multi-threading behave differently. And the go-chi and Axum apps are completely self-served. For gunicorn, no respawn threshold will be set. This gives these setups a bit advantage over the real-world scenarios, because respawning is a costy operation and under high pressure, it really makes a difference throughput wise and response time wise. We can afford this simplification here, but would most likely end up with issues doing this in production.
Under such a heavy concurrency it is also possible, that the client itself can start affecting the performance, but we will consider this not relevant.
Results
Data transferred
This is basically constant and just to summarize it, it looked like this:
Concurrency | Data transferred (MB) |
---|---|
1 | 95.4 |
50 | 4770.04 (go: 4769.75) |
100 | 9540.08 (go: 9539.51) |
150 | 14310.12 (go: 14309.26) |
200 | 19080.16 (go: 19079.02) |
250 | 23850.2 (go: 23848.77) |
Elapsed Time
Concurrency | 1 | 50 | 100 | 150 | 200 | 250 |
---|---|---|---|---|---|---|
Django | 0.62 | 5.55 | 11.13 | 16.58 | 22.39 | 33.65 |
FastAPI | 1.61 | 11.71 | 24.16 | 36.74 | 49.7 | 62.19 |
FastAPI (async) | 1.52 | 11.54 | 23.2 | 35.11 | 47.32 | 59.68 |
Go-chi | 0.52 | 3.88 | 7.94 | 12.02 | 15.96 | 20.37 |
express.js | 1.31 | 7.48 | 14.15 | 21.22 | 28.11 | 34.95 |
Axum | 0.48 | 3.28 | 7.32 | 10.89 | 14.53 | 18.04 |
Response Time
Concurrency | 1 | 50 | 100 | 150 | 200 | 250 |
---|---|---|---|---|---|---|
Django | 0 | 0.01 | 0.01 | 0.02 | 0.02 | 0.03 |
FastAPI | 0 | 0.01 | 0.02 | 0.04 | 0.05 | 0.06 |
FastAPI (async) | 0 | 0.01 | 0.02 | 0.04 | 0.05 | 0.06 |
Go-chi | 0 | 0 | 0.01 | 0.01 | 0.02 | 0.02 |
express.js | 0 | 0.01 | 0.01 | 0.02 | 0.03 | 0.03 |
Axum | 0 | 0 | 0.01 | 0.01 | 0.01 | 0.02 |
Transaction rate/s
Concurrency | 1 | 50 | 100 | 150 | 200 | 250 |
---|---|---|---|---|---|---|
Django | 1612.9 | 9009.01 | 8984.73 | 9047.04 | 8932.56 | 7429.42 |
FastAPI | 621.12 | 4269.85 | 4139.07 | 4082.74 | 4024.14 | 4019.94 |
FastAPI (async) | 657.89 | 4332.76 | 4310.34 | 4272.29 | 4226.54 | 4189.01 |
Go-chi | 1923.08 | 12886.6 | 12594.46 | 12479.2 | 12531.33 | 12272.95 |
express.js | 763.36 | 6684.49 | 7067.14 | 7068.8 | 7114.91 | 7153.08 |
Axum | 2083.33 | 15243.9 | 13661.2 | 13774.1 | 13764.62 | 13858.09 |
Throughput MB/s
Concurrency | 1 | 50 | 100 | 150 | 200 | 250 |
---|---|---|---|---|---|---|
Django | 153.87 | 859.47 | 857.15 | 863.1 | 852.17 | 708.77 |
FastAPI | 59.26 | 407.35 | 394.87 | 389.5 | 383.91 | 383.51 |
FastAPI (async) | 62.76 | 413.35 | 411.21 | 407.58 | 403.22 | 399.63 |
Go-chi | 183.45 | 1229.32 | 1201.45 | 1190.45 | 1195.43 | 1170.78 |
express.js | 72.83 | 637.71 | 674.21 | 674.37 | 678.77 | 682.41 |
Axum | 198.75 | 1454.28 | 1303.29 | 1314.06 | 1313.16 | 1322.07 |
Longest transaction
Concurrency | 1 | 50 | 100 | 150 | 200 | 250 |
---|---|---|---|---|---|---|
Django | 0.01 | 0.05 | 0.05 | 0.06 | 0.06 | 0.12 |
FastAPI | 0.01 | 0.05 | 0.08 | 0.1 | 0.12 | 0.13 |
FastAPI (async) | 0.01 | 0.06 | 0.07 | 0.1 | 0.11 | 0.12 |
Go-chi | 0.01 | 0.04 | 0.11 | 0.21 | 0.29 | 0.41 |
express.js | 0.01 | 0.04 | 0.05 | 0.08 | 0.06 | 0.09 |
Axum | 0.01 | 0.02 | 0.05 | 0.04 | 0.08 | 0.14 |
Conclusion
First, let’s not forget this comparison is not completely fair and does not demonstrate real world scenarios.
The performance of particular apps is not surprising, but as a long-time Django user who transitioned into DRF at some point, I forgot how much of a performance hit DRF incurs. However, that itself is not a reason to switch back to Django. And we have to bear in mind that the Django setup did not use any middleware. In most production environments, the results would look different.
Another significant factor here is the complete disregard for respawning processes after N requests. I expected that if I performed an experiment with a respawn threshold of 1000 in Gunicorn, the longest transaction time would be much higher. However, nothing changed for this metric. The overall throughput and transaction rate dropped significantly, preserving the longest transaction time the same. But this is out of the scope of this article and hence, I’m not publishing the differences in this article.
FastAPI is surprisingly slow in this test. Since I haven’t used it for 2 years, there will be some level of uncertainty remaining; a possibility that I’ve done something wrong.
Express.js was another surprise because, based on many ‘Hello World’ brute force experiments, I expected more. It gets in par with my observations from my professional life.
The true surprise, however, lies in the longest transaction response times from Go-chi and Axum. My assumption is that this is by design - the apps are multithreaded and manage the threads. Whereas, in other cases, I was just smashing the app and could roll this until it went out of memory. To explain this, I would need a different testing approach and collect more data - e.g., all response times to see how (un)stable the apps really are.
This simple exercise brought many more questions (How would uWSGI perform over Gunicorn? How would FastAPI behave if asyncio was used instead of uvloop? How long until the app would eat all available memory without respawn, etc.). I will pick some for future articles.
Source code used for testing is published here