Node++ is an asynchronous HTTP(S) engine and framework for low-latency C/C++ web applications

Node++ has been designed to:

Handle as much connections and sessions as possible, before considering any kind of scaling.
Keep the request's latency as low as possible.

The assumption is that you run your application on the cheapest (or even free) AWS or Azure single CPU Linux instance and forget about scaling until you are really big – like tens of thousands of sessions at the same time.

The sections below describe some of the design choices made particularly with performance in mind.

Engine model

Node++ has a single-threaded asynchronous, non-blocking engine.

Application code is called directly by the engine thread (npp_app_main()).

Sockets' polling

There are three sockets' polling options:

select
poll
epoll

All of them are supported in Node++. The default for Windows is select, for other systems it's poll. In theory epoll has the potential to outpace poll, however my tests had revealed that actually poll was performing slightly better. I can't see the design in which adding to monitored set can avoid pool's lookup. If you want to use epoll in your application anyway, add to npp_app.h:

#define NPP_FD_MON_LINUX_EPOLL

Note that epoll is not available on Windows.

Connections and sessions

Connection and session pools are simple static arrays (G_connections and G_sessions).

The set of counters reduces number of lookups when accepting new connection (look for M_first_free_ci / M_highest_used_ci and M_first_free_si / M_highest_used_si in lib/npp_eng_app.c).

Connection and session arrays cross-reference each other to avoid lookups.

Sessions array is always sorted by sessid to allow binary search when newly connected request produces a session cookie.

Static resources

All static resources smaller or equal to resCacheTreshold are read into memory at startup. They are stored in the plain form, as well as compressed.

If static resource is put into resmin directory, it is first minified, then – like all other static resources – compressed and stored in memory.

Request parsing

Request header is parsed in one pass, with the note that it starts with strstr() looking for the header's end. It might use some improvement, however it doesn't seem to be a bottleneck for now.

Sending response

Response is sent in one go. Because with rendered content the header is only ready after content has been generated, there is a reserved space in the adjacent memory just before the content. Once the header's size is known, it can be copied to that reserved space + offset and the whole space is then written to the connected socket.

This operation takes place at the end of gen_response_header():

G_connections[ci].out_start = G_connections[ci].out_data + (NPP_OUT_HEADER_BUFSIZE - G_connections[ci].out_hlen);
memcpy(G_connections[ci].out_start, out_header, G_connections[ci].out_hlen);

This solution affected performance significantly.

Multi-process mode (ASYNC)

The multi-process mode allows local vertical scaling. Requests or calls are passed from the main process (npp_app) to service processes (npp_svc) via CALL_ASYNC() macro. It uses POSIX queues as the fastest messaging facility available on every Linux.

Note that POSIX queues are not available on Windows.

Is something wrong here? Please, let us know!