Home  /  Documentation  /  Performance

Performance

Comparative research of web applications frameworks' performance

Masters Dissertation at Poznan University of Technology.

Jurek Muszyński, October 2022

This article is a short, redacted version of the paper.

(Jump to Results Summary)


Goal

Web applications' frameworks differ in performance. There are some rankings available but they usually use very simple model, without user sessions.

This research tries to simulate the typical use case of a web application with user sessions. The number of connections and sessions in the test should be sufficient to expose potential inefficiencies in handling both lists by the framework or its server engine.

Tested frameworks include the three popular ones: Spring Boot, Node.js, Flask and Node++.


Background

Operating System

On the operating system level the basic role is played by the TCP protocol. The connection is established in a three-step way:

TCP

Source: [7]

The listening server uses one of the TCP ports. Incoming connection starts with a SYN packet sent by a client (1). If the server is free, it sends a SYN-ACK response, creates a new socket and puts it in the SYN-RCVD queue. After receiving an ACK packet from the client (2) connection is moved from SYN-RCVD queue to accept-ready. On the application level it's detected via listen() system call. accept() is used to actually accept the connection (3).

The waiting connection's queue size is set with somaxconn system variable. If the queue is full then incoming SYN packets are rejected.

The second parameter influencing potential queue saturation is tcp_keepinit, which sets the waiting time for ACK packet. On the one hand, it can't be shorter than a full round-trip, on the other, too high increases SYN-Flood vulnerability.

Application Layer

From the performance's perspective these are the top three factors: connections handling, sessions handling and application code execution.

Connections handling

The most popular models include:

  1. Single-threaded server, each client starts new process.
  2. Multi-threaded server, each client starts new thread.
  3. Multi-threaded server with worker threads pool.
  4. Asynchronous single-threaded server.
  5. Asynchronous multi-threaded server.

Asynchronous models require some kind of active connections' polling. There are three system calls for this: select(), poll() and epoll(). Each of them requires slightly different approach and they differ in potential and real performance (depending on server implementation). epoll() is not available on some systems.

Sessions handling

In single-instance applications sessions handling is basically about sorting and searching. There are three main models: unsorted collection (sequentuential search), sorted array (binary search) and hash table. Optimal model may depend on the expected number of sessions.

Application code execution

Three possibilities exist here and they significantly differ in terms of performance:

  1. Application code is part of the server code.
  2. Application code is run as a separate process (i.e. cgi).
  3. Application code is run in a different worker thread or process (RPC).

Web Application Model

The application model used in the test consists of two HTML pages.

The first one (main) contains several KiB of text, ~100 KiB picture and a paragraph showing current session ID. There is also a ~200 KiB CSS file referenced in the header:

Web Application Model

The second page is only an upload confirmation saying Data accepted.


Applications

All frameworks have been used with their default settings, except of Flask, where Waitress has been used instead of default development server.

The snippets below show the main request handling part for each framework:

Spring Boot

@GetMapping("/") public String index(HttpSession session) { String response1 = "<!DOCTYPE html>" + "<html>" + "<head>" + "<meta charset=\"UTF-8\">" + "<link rel=\"stylesheet\" type=\"text/css\" href=\"/bootstrap.css\">" + "<title>Generic Web Application</title>" + "</head>" + "<body>" + "<h1>Generic Web Application</h1>" + "<img src=\"/servers.jpg\" width=\"700\">" + "<h2>Lorem Ipsum</h2>" + lorem_ipsum + lorem_ipsum + lorem_ipsum + lorem_ipsum + lorem_ipsum; String response2 = "<p>" + session.getId() + "</p></body></html>"; return response1 + response2; } @PostMapping("/upload") public String upload(HttpSession session) { return "Data accepted"; }

Node.js

app.get("/", (req, res) => { res.setHeader("Content-Type", "text/html"); res.write("<!DOCTYPE html>" + "<html>" + "<head>" + "<meta charset=\"UTF-8\">" + "<link rel=\"stylesheet\" type=\"text/css\" href=\"/bootstrap.css\">" + "<title>Generic Web Application</title>" + "</head>" + "<body>" + "<h1>Generic Web Application</h1>" + "<img src=\"/servers.jpg\" width=\"700\">" + "<h2>Lorem Ipsum</h2>" + lorem_ipsum + lorem_ipsum + lorem_ipsum + lorem_ipsum + lorem_ipsum); res.write("<p>" + req.session.id + "</p>"); res.end("</body></html>"); }); app.post("/upload", (req, res) => { res.end("Data accepted"); });

Flask

@app.route("/") def main(): session["uid"] = create_identifier() response = ("<!DOCTYPE html>" "<html>" "<head>" "<meta charset=\"UTF-8\">" "<link rel=\"stylesheet\" type=\"text/css\" href=\"/bootstrap.css\">" "<title>Generic Web Application</title>" "</head>" "<body>" "<h1>Generic Web Application</h1>" "<img src=\"/servers.jpg\" width=\"700\">" "<h2>Lorem Ipsum</h2>" ) response += lorem_ipsum + lorem_ipsum + lorem_ipsum + lorem_ipsum + lorem_ipsum response += "<p>" + str(session["uid"]) + "</p>" response += "</body></html>" return response @app.route("/upload", methods=["POST"]) def upload(): return "Data accepted"

Instead of default built-in HTTP 1.0 server (Werkzeug), production grade Waitress has been used.

Node++

if ( REQ("") ) /* all HTTP methods */ { OUT("<!DOCTYPE html>"); OUT("<html>"); OUT("<head>"); OUT("<meta charset=\"UTF-8\">"); OUT("<link rel=\"stylesheet\" type=\"text/css\" href=\"/bootstrap.css\">"); OUT("<title>Generic Web Application</title>"); OUT("</head>"); OUT("<body>"); OUT("<h1>Generic Web Application</h1>"); OUT("<img src=\"/servers.jpg\" width=\"700\">"); OUT("<h2>Lorem Ipsum</h2>"); OUT_SNIPPET("lorem_ipsum.html"); OUT("<p>%s</p>", SESSION.sessid); OUT("</body></html>"); RES_DONT_CACHE; } else if ( REQ("upload") && REQ_POST ) /* POST only */ { OUT("Data accepted"); } else { RES_STATUS(404); }

Client

The client program has been written specifically for this research to allow maximum flexibility.

It's a command line tool sending batches of series of HTTP requests. Its priorities were to ensure maximum performance and to emulate browser traffic as close as possible.

Options:

perf -[bcfgruv] url -b = number of batches (default=1) -c = connection mode: 0=always disconnect, 1=keep (default), 2=randomly disconnect randomly means disconnecting after 40 - 167 requests -f = fixed resources' list (bootstrap.css + servers.jpg) -g = get images / scripts / CSS -r = number of requests (default=1) -u = upload test -v = verbose mode

Each batch starts a separate process. Each process sends r requests.

Important features of this program include:

Randomly disconnecting allows to test server's ability to handle sessions' list search in a typical situation, that is – when newly connected client produces a valid session cookie.

Static resources are requested only once per session.

-f option allows to skip uncompressing and parsing of the response and instead request fixed list of static resources.

-u option adds an extra POST request with 15 KiB payload to all of the above.

The result is presented as a number of requests successfully executed (status 200 was received) during the time passed between the first and the last request:

100,000 request(s) ------------------------------------------------------------ elapsed: 3,725.19 ms => 26,844.24 request(s) per second ------------------------------------------------------------

Test Environment

Local Area Network has been used with 11 computers connected through 1 Gbps Ethernet.

Hardware and system: Intel i7-8700 @ 3.2 GHz / 16 GiB RAM / Ubuntu 18.04.


Test Procedure

One of the computers hosted the application and the remaining 10 were used to run the clients.

Each application has been tested after fresh restart. Before running test there was a visual verification in the browser that application works as expected.

On each of the client machines there was a script waiting for a flag copied from the master so the test would start simultaneously on all of them.

Read

The following options have been used on the client computers:

perf -f -g -c 2 -b 10 -r 10000 10.10.94.2:8080

10 batches on 10 computers resulted in maximum 100 connections and 100 sessions at the same time. Each batch sent 10,000 requests, so the total number of requests per test run was 1 million. Connections have been randomly disconnected to simulate real user traffic, as described in Client section.

Write

Upload test client options:

perf -g -c 2 -b 10 -r 10000 -u 10.10.94.2:8080

Average

For each test run both extremes have been removed from the results and the average was calculated using the remaining 8.


Results

(Jump to Results Summary)

Spring Boot

Read
elapsed: 46,813.18 ms => 2,136.15 request(s) per second elapsed: 47,926.97 ms => 2,086.51 request(s) per second elapsed: 47,631.23 ms => 2,099.46 request(s) per second elapsed: 47,399.38 ms => 2,109.73 request(s) per second elapsed: 47,713.89 ms => 2,095.83 request(s) per second elapsed: 47,379.95 ms => 2,110.60 request(s) per second elapsed: 46,935.67 ms => 2,130.58 request(s) per second elapsed: 46,640.25 ms => 2,144.07 request(s) per second

Total average = 21,139 rps

Write
elapsed: 72,522.22 ms => 1,378.89 request(s) per second elapsed: 71,584.11 ms => 1,396.96 request(s) per second elapsed: 69,536.82 ms => 1,438.09 request(s) per second elapsed: 70,441.83 ms => 1,419.61 request(s) per second elapsed: 70,823.79 ms => 1,411.95 request(s) per second elapsed: 70,598.05 ms => 1,416.47 request(s) per second elapsed: 69,989.89 ms => 1,428.78 request(s) per second elapsed: 70,103.33 ms; did not complete successfully WARNING: finish_client_io timeouted (was waiting for 6005.96 ms) ERROR: Could not connect ERROR: Remote call failed batch 0, status 256

Total average = 14,129 rps

As we can see, one of the client processes shows a connection error. Repeating did not help. The test however was accepted, as this kind of error is relatively easy to handle.

Node.js

Read
elapsed: 92,287.14 ms => 1,083.57 request(s) per second elapsed: 92,779.14 ms => 1,077.83 request(s) per second elapsed: 92,977.05 ms => 1,075.53 request(s) per second elapsed: 93,142.58 ms => 1,073.62 request(s) per second elapsed: 93,075.82 ms => 1,074.39 request(s) per second elapsed: 92,866.80 ms => 1,076.81 request(s) per second elapsed: 92,677.06 ms => 1,079.02 request(s) per second elapsed: 92,389.05 ms => 1,082.38 request(s) per second

Total average = 10,776 rps

Write
elapsed: 106,601.79 ms => 938.07 request(s) per second elapsed: 106,265.94 ms => 941.04 request(s) per second elapsed: 106,545.51 ms => 938.57 request(s) per second elapsed: 106,352.96 ms => 940.27 request(s) per second elapsed: 106,665.91 ms => 937.51 request(s) per second elapsed: 106,690.63 ms => 937.29 request(s) per second elapsed: 107,269.39 ms => 932.23 request(s) per second elapsed: 106,280.18 ms => 940.91 request(s) per second

Total average = 9,380 rps

Flask

Read
elapsed: 851,655.15 ms => 117.42 request(s) per second elapsed: 853,273.23 ms => 117.20 request(s) per second elapsed: 853,255.02 ms => 117.20 request(s) per second elapsed: 853,004.98 ms => 117.23 request(s) per second elapsed: 852,403.81 ms => 117.32 request(s) per second elapsed: 852,203.22 ms => 117.34 request(s) per second elapsed: 852,030.18 ms => 117.37 request(s) per second elapsed: 851,589.37 ms => 117.43 request(s) per second

Total average = 1,170 rps

Write
elapsed: 617,015.17 ms => 162.07 request(s) per second elapsed: 616,947.21 ms => 162.09 request(s) per second elapsed: 615,603.21 ms => 162.44 request(s) per second elapsed: 616,350.51 ms => 162.25 request(s) per second elapsed: 615,620.10 ms => 162.44 request(s) per second elapsed: 615,584.22 ms => 162.45 request(s) per second elapsed: 615,158.37 ms => 162.56 request(s) per second elapsed: 614,654.35 ms => 162.69 request(s) per second

Total average = 1,620 rps

Node++

Read
elapsed: 19,304.83 ms => 5,180.05 request(s) per second elapsed: 19,798.81 ms => 5,050.81 request(s) per second elapsed: 20,247.73 ms => 4,938.83 request(s) per second elapsed: 20,328.37 ms => 4,919.23 request(s) per second elapsed: 20,356.40 ms => 4,912.46 request(s) per second elapsed: 20,259.07 ms => 4,936.06 request(s) per second elapsed: 19,919.73 ms => 5,020.15 request(s) per second elapsed: 19,747.00 ms => 5,064.06 request(s) per second

Total average = 50,026 rps

Write
elapsed: 68,365.14 ms => 1,462.73 request(s) per second elapsed: 68,589.90 ms => 1,457.94 request(s) per second elapsed: 68,555.25 ms => 1,458.68 request(s) per second elapsed: 68,447.32 ms => 1,460.98 request(s) per second elapsed: 67,351.95 ms => 1,484.74 request(s) per second elapsed: 67,459.53 ms => 1,482.37 request(s) per second elapsed: 67,028.53 ms => 1,491.90 request(s) per second elapsed: 66,952.87 ms => 1,493.59 request(s) per second

Total average = 14,740 rps

Results Summary

Thousands of requests per second:

TestSpring BootNode.jsFlaskNode++
Read21.110.81.250.0
Write14.19.41.614.7

Summary

As expected, compiled technologies turned out to be better performing than interpreted ones. It's particularly visible in the case of Flask, where response rendering time in the read test surpassed upload time in the write test, despite more data being transmitted in the latter case.


Bibliography

  1. Max Roser, Hannah Ritchie, Esteban Ortiz-Ospina, Internet, Our World in Data, 2015 (https://ourworldindata.org/internet, access 2022-06-25)
  2. Shailesh Kumar Shivakumar, Modern Web Performance Optimization, Apress, 2020
  3. Raj Jain, The Art of Computer Systems Performance Analysis, Wiley & Sons, 1991
  4. Brendan Gregg, Systems Performance: Enterprise and the Cloud (Addison-Wesley Professional Computing Series), Pearson, 2021
  5. TechEmpower, Web Framework Benchmarks (https://www.techempower.com/benchmarks, access 2022-06-25)
  6. Dan Kegel, The C10K problem (http://www.kegel.com/c10k.html, access 2022-06-25)
  7. Gaurav Banga, Peter Druschel, Measuring the Capacity of a Web Server, Department of Computer Science, Rice University, 1997

Is something wrong here? Please, let us know! Envelope