The basic premise of Node.js is that I/O is expensive and that, since I/O is expensive, we can’t block waiting for it to complete.
Many traditional Web Servers typically adopt a one thread per request approach, and any I/O (database, web service, file system call…) during the request blocks that thread of execution. This is inefficient in many ways because when the thread is blocked waiting for I/O to complete, it can’t respond to other requests.
Basically one OS Thread / TCP Connection is suboptimal.
The solution is not a mystery: non-blocking I/O. An asynchronous request is issued and nothing more is done with it until that request is completed leaving the main thread of execution free to handle other requests.
Cooking Analogy
If you are a reasonably experienced cook, you’d go about making Pasta Bolognese something like this:
- start boiling water
- start frying ground beef
- put pasta in water
- finish meat sauce
- serve once pasta and meat sauce are both done
but if you’re a rookie, something like this is most likely where you’d end up:
- start boiling water
- wait for water to boil
- put pasta in water
- wait for pasta to cook
- start frying ground beef
- finish meat sauce
- serve once meat sauce is done
It’s clear that you’re not using your culinary skills to your full potential with the rookie approach.
I know the analogy is limping somewhat but even so, I think it illustrates the basic problem with the “let’s wait for I/O to complete before we do anything else” approach.
The fact that the one OS Thread / TCP Connection is suboptimal has been known for quite a while. Apache and Tomcat both belong to this category.
Solutions to the problem are actually well known and include
- multiplex I/O into each thread
- use event notification mechanisms like epoll, kqueue, event ports
- adopt non-blocking I/O
- don’t share memory (or at least limit sharing)
- spawn however many threads you want
- start event loop on each thread
They all share the same basic principles
- event loop
- non-blocking I/O
The Goal of Node.js
Make it easy to write high-performance servers.
Enter JavaScript, a language everybody “knows”. A generation of programmers have grown up learning how to program in terms of ‘mouseover’ events. JavaScript is a single threaded, non-blocking interface, it has no preconceived idea about I/O and right now there is an arms race for performance.
What is Node?
Node is a command line tool that uses JavaScript (Google’s V8) for low level work, such as handling sockets, files etc.
- only exposes asynchronous interfaces (non-blocking)
- has only one thread of execution (one call stack)
- has low-level network features
- strong HTTP support
- purely non-blocking means decent concurrency, basically for free
- no mutex locks
- one thing at a time, means no thread safety issues
But hang on, if Node is a single thread? What about scaling across multiple cores? The answer is simple; you start more node instances. Given the event loop and a potential for starvation Node is most likely not the solution for really CPU intensive problems. It is a solution for I/O bound problems.
When would I use Node?
I work mainly with web applications and the “modern Web Application” is a prime candidate for Node.
The Modern Web Application
- has a JavaScript “fat” client
- uses asynchronous client-server communication
- uses JSON as the communication format
- uses WebSockets or long polling where appropriate
At all of these things, Node excels.
Finally, the idea of using the same language on both the client and the server is obviously extremely attractive and the possibilities are endless.