The question is published on by Tutorial Guruji team.
I have a synchronous, single process web app with a CPU bound of 2,200 QPS. When I make it multi-processed, the QPS only shows limited increases:
- single process: 2200 QPS.
top
shows CPU at 100% - 2 workers: 2230 QPS.
top
shows each process at 60% - 4 workers: 2280 QPS.
top
shows each process at 30%
I don’t understand these numbers. Why don’t two processes multiply the QPS by 2? Four processes by 4?
Implementation details
- the web app is just replying to a GET request. I used
Falcon
, it’s a copy paste of this example from the documentation - the server is running behind
gunicorn
. Here are the command lines for:- single-process:
gunicorn things:app
- two workers:
gunicorn things:app -w 2
- four workers:
gunicorn things:app -w 4
- single-process:
- I used
locust
for performing the load tests and measuring the resulting QPS. Configuration: one master and five slaves. Waiting time between each GET request: 0 ms. - everything is running on the
localhost
. The test machine has 16 cores, so that each process of the web app and each process of the load test is using its own core.
Some more results
As dano suggested, I inserted a time.sleep
in the GET handler. Here are the resulting QPS:
Answer
It seems like you’re just at the limits of how quickly locust
can send queries to your server (or perhaps of gunicorn’s ability to distribute the requests to workers). With no network latency, and a server doing almost no work (and thus having a very fast response time), it’s possible you can respond almost as quickly as locust can send queries.
The gunicorn docs strongly recommend running it behind NGINX. Maybe give that a try and see how that affects performance.