Node++ has been designed to:
- Handle as much connections and sessions as possible, before considering any kind of scaling.
- Keep the request's latency as low as possible.
The assumption is that you run your application on the cheapest (or even free) AWS or Azure single CPU Linux instance and forget about scaling until you are really big – like tens of thousands of sessions at the same time.
The sections below describe some of the design choices made particularly with performance in mind.
Node++ has a single-threaded asynchronous, non-blocking engine.
Application code is called directly by the engine thread (
There are three sockets' polling options:
All of them are supported in Node++. The default for Windows is select, for other systems it's poll. In theory epoll has the potential to outpace poll, however my tests had revealed that actually poll was performing slightly better. I can't see the design in which adding to monitored set can avoid pool's lookup. If you want to use epoll in your application anyway, add to
Note that epoll is not available on Windows.
Connections and sessions
Connection and session pools are simple static arrays (
The set of counters reduces number of lookups when accepting new connection (look for
Connection and session arrays cross-reference each other to avoid lookups.
Sessions array is always sorted by sessid to allow binary search when newly connected request produces a session cookie.
All static resources smaller or equal to resCacheTreshold are read into memory at startup. They are stored in the plain form, as well as compressed.
If static resource is put into
resmin directory, it is first minified, then – like all other static resources – compressed and stored in memory.
Request header is parsed in one pass, with the note that it starts with
strstr() looking for the header's end. It might use some improvement, however it doesn't seem to be a bottleneck for now.
Response is sent in one go. Because with rendered content the header is only ready after content has been generated, there is a reserved space in the adjacent memory just before the content. Once the header's size is known, it can be copied to that reserved space + offset and the whole space is then written to the connected socket.
This operation takes place at the end of
This solution affected performance significantly.
Multi-process mode (ASYNC)
The multi-process mode allows local vertical scaling. Requests or calls are passed from the main process (
npp_app) to service processes (
CALL_ASYNC() macro. It uses POSIX queues as the fastest messaging facility available on every Linux.
Note that POSIX queues are not available on Windows.
Is something wrong here? Please, let us know!