Skip to main content

Node.js: Thread pool

Let us begin by defining a thread. Wikipedia states that -

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. The implementation of threads and processes differs between operating systems, but in most cases a thread is a component of a process.

 

In Node.js, there are two types of threads: one Event Loop, also known as the main thread, and a pool of k Workers in a Worker Pool, also known as a thread pool. The libuv library maintains a pool of threads that Node.js uses in the background to perform long-running operations without blocking its main thread. To handle "expensive" tasks, Node.js employs the Worker Pool. This includes I/O for which an operating system does not provide a non-blocking version, as well as CPU-intensive tasks in particular. So, in essence, the threads are executed by the processor.


Let's say a system or machine is quad-core, which means it has four cores. As a result, the maximum number of threads it can run concurrently is four. Please keep in mind that we're talking about physical cores here, not logical cores. Hyperthreading is a feature of most chips (Intel) that allows multiple requests to be executed concurrently on the same core. These are commonly referred to as logical cores. As previously stated, when we run Node.js, libuv manages the threads. The UV_THREADPOOL_SIZE (default - 4) variable in the env can be used to control the number of threads used by libuv. So, with UV_THREADPOOL_SIZE = 4, libuv can run four tasks concurrently. However, in this scenario, there is one more component to consider: schedulers. The schedulers determine which threads will use the resources and executes them. (This decision is based on a variety of factors, including priority).


To further investigate the concept, let us run a simple code that performs a computational task. Take a look at the following code -



I'm using apache benchmark (AB) to evaluate performance. My current system has 12 logical cores in total. We will send a total of 1000 requests in batches of 100 concurrent requests. To accomplish this, use the following command -


 Apache benchmark results (4 threads) 



To compare performance, let's change the UV_THREADPOOL_SIZE value to 1, 2, 6, and 50.


Apache benchmark results (1 thread) 

 

Apache benchmark results (2 threads) 



Apache benchmark results (6 threads) 



Does this imply that if we continue to increase the number of threads, performance will improve? Let's check for 50 threads.

Apache benchmark results (50 threads) 



Wasn't this unexpected? Instead of improving, performance has deteriorated. Let us summarise the results in a table for easier comprehension -



So, in an ideal situation, my system can run up to 12 threads at the same time, and the two major components that manage the threads in this process are -

  • Libuv
  • Scheduler
Assume we've configured UV_THREADPOOL_SIZE to be 4 threads and there are 5 concurrent tasks. As a result, libuv will manage the threads as -


Task 1 => Thread 1
Task 2 => Thread 2
Task 3 => Thread 3  
Task 4 => Thread 4


As a result, Task 5 should be deferred until one of the other tasks is completed. Furthermore, the OS scheduler will determine which task should be executed next. So, when we have a large number of threads to execute, the Scheduler performs a process known as "Context Switching". Context Switching suspends threads and swaps them with other threads. This may appear to be efficient, but it causes a problem: when the threads are switched, the processor remains idle for a short period of time (in nanoseconds). This may not seem like much, but executing multiple tasks has an impact on performance, so the configuration with 50 threads was less efficient than the configuration with 6 threads.

To improve performance, set the UV_THREADPOOL_SIZE to the number of logical cores your machine has, which in my case is 12. As previously stated, increasing the size beyond the number of logical cores supported by your hardware is pointless and may result in poorer performance. When it comes to deployment, the last thing you want to do is manually set the UV_THREADPOOL_SIZE because your application may run in multiple environments with varying machine specifications. As a result, we require a method to dynamically set the thread pool size when the application is launched in the appropriate environment.

Setting the size of the thread pool is simple, but it must be done with caution. To accomplish this, add the following code to the top of your Node.js application's root JavaScript file - 




The OS module is built into Node.js. It has a function called cpus() that returns the total number of logical cores on your machine. What's more, if your CPU cores don't support hyperthreading, this function will simply return the number of physical CPU cores, which is ideal.

That's all I have got for today. I'll be adding more information about it in the future.


Comments

Popular posts from this blog

Node.js: Event Loop

  Coming from the world of PHP, Node.js was a frustrating experience, especially when it came to event loops, callbacks, promises, and so on. It took me a few hours to understand the fundamentals of the Node.js event loop. So, for everyone else out there (who doesn't understand), I'll try to explain it as simply as I can. As the official docs say What is the Event Loop? The event loop is what allows Node.js to perform non-blocking I/O operations — despite the fact that JavaScript is single-threaded — by offloading operations to the system kernel whenever possible. Since most modern kernels are multi-threaded, they can handle multiple operations executing in the background. When one of these operations completes, the kernel tells Node.js so that the appropriate callback may be added to the poll queue to eventually be executed. When Node.js starts, it initializes a single thread known as the event loop. Node.js employs the event loop to handle asynchronous operations within appli...

Understanding SAML Metadata

Single sign-on  ( SSO )  is an authentication mechanism that allows a user to log in to numerous linked but separate software systems using a single set of credentials. We have two basic entities when it comes to Single Sign-On (SSO) - Identity Provider (IdP) - This entity is in charge of verifying the user's identity and communicating user information with the Service Provider (SP). In a nutshell, the identity provider delivers identification data.   Service Provider (SP) - This entity is responsible for providing services to the user. From the IdP, it obtains the user's identity. Consider the following scenario to better understand SSO: You've lately started working at XYZ, a new company. You've been given a work email address as well as access to a dashboard. After logging in, you'll see icons for all of the company's external services, including Salesforce, Jira, and others. When you click on the Salesforce icon, a background procedure occurs, and before y...