How NGINX Handles Millions of Requests (And Apache Can’t)
Honestly? I never thought I’d become the guy who gets excited talking about web servers. But here we are. Back when I was just starting out in cloud infrastructure, I kept running into the same problem over and over. A client’s site would go down during traffic spikes.
Another would have these weird performance issues no one could explain. And everyone kept asking me the same thing: “Why does NGINX seem to handle so much more traffic than Apache?” I had no clue. I just knew it worked better. But that wasn’t good enough for me. I needed to actually understand why.
So I spent way too much time digging into how NGINX actually works. Read a bunch of stuff. Broke some things in staging. Fixed them. Asked a lot of questions to people smarter than me. And now? Now I actually get it. And it’s kind of mind-blowing how simple the idea is, even though the execution is really clever.
Why Apache Made Me Want to Pull My Hair Out
Let me take you back to around 2018. I was managing this server running Apache. Nothing fancy. Just a standard setup. One day traffic picks up a bit. Not even that much, really. But suddenly the server starts freaking out. Response times go through the roof. Everything’s slow. The whole thing almost crashes.
I’m looking at the logs trying to figure out what’s happening. And that’s when I learned about how Apache actually handles connections. Here’s the deal with Apache: every single time someone connects to your server, Apache creates a new thread for them. One connection = one thread. Sounds okay when you say it out loud, right? Not really.
Picture it like a restaurant with a fixed number of kitchen staff. Let’s say you have 16 cooks. That’s your CPU cores. But then you get 5,000 customers showing up. Apache’s like, “Cool, I’ll just make a station for each customer.” Now you’ve got 5,000 stations but only 16 cooks.
So those cooks are constantly running between stations, switching gears, forgetting what they were doing, starting over. It’s a mess. Nothing gets done fast. And your computer’s working overtime just switching between all these threads. Your memory is getting hammered. The whole thing just craps out.
The frustrating part? This was just how things were. You either lived with it or threw more money at hardware. Neither option is great when you’re running a business.
NGINX Showed Me There’s A Better Way
Then I heard about NGINX. People kept saying it could handle way more load with way less resources. I was skeptical because honestly, that sounded impossible. I decided to just try it. Swapped it in on a test server. Did some load testing. Mind. Blown.
Same hardware. Same traffic. But this time it was handling everything smoothly. I ran it up to numbers that would have destroyed Apache. NGINX just… took it. Like no big deal. But I didn’t understand why. And I hate not understanding things. So I kept digging.
The Simple Idea That Changes Everything
Here’s the core idea behind NGINX, and it’s actually pretty straightforward:
Instead of making a new thread for every connection, NGINX just runs a few worker processes. Usually one for each CPU core you have. And each one of those workers? It can handle thousands of connections at the same time.
That’s it. That’s the whole thing. On my 16-core server, I’d have 16 NGINX workers instead of 5,000 Apache threads. Each worker is doing its job without all the switching and chaos. But the real question is: how does one worker handle thousands of connections without getting overwhelmed? That’s where it gets interesting.
Meet The Master And The Workers
NGINX has this hierarchy thing going on. At the top is the master process. This thing doesn’t do any of the actual work. It’s more like the manager. It reads your config file. It starts up the worker processes. If something goes wrong, it might restart a worker.
If you want to reload your config without restarting everything, the master handles that too. It’s pretty hands-off, which is exactly what you want. Then you’ve got the workers. These are the ones actually handling traffic. If you have an 8-core server, you’ll probably have 8 workers. Each one is completely independent.
They don’t share much with each other. They just do their thing. There’s also some other stuff like cache managers running around, but that’s more advanced. The main point is: master manages, workers do the work.
What’s Actually Happening With Your Connections
Okay so when a person visits your website, here’s what actually goes down:
Their browser makes a connection to your server. That connection gets added to what’s called a “listen socket.” Think of this like a queue outside a club. You’ve got the rope, and people are waiting to get in.
One of the worker processes checks this queue. Sees someone waiting. Lets them in. Now they have what’s called a “connection socket.” This is the actual connection between your browser and the server. This is where the conversation happens.
But here’s the thing: the worker doesn’t just sit with this person and wait around. If that person needs something that takes time – like reading a file from disk or talking to a database – the worker makes a note of it and moves on to the next person.
If the first person just needed something quick – like reading an HTML file that’s already in cache boom, done. Response sent. Connection closed.
But if they need something slow, the worker comes back to them later when that thing is ready. In the meantime, the worker’s helping other people. One worker is juggling hundreds or thousands of these people at the same time. Helping whoever’s ready to be helped. Parking whoever needs to wait. This is the key to the whole thing.
Enter The Event Loop And System Calls
So how does a worker actually know who’s ready and who’s not? How does it track thousands of connections? This is where it gets technical, but stick with me. The worker has something called an event loop.
It’s basically running super fast, constantly asking: “Is anyone ready? Does anyone need me right now?” But it doesn’t just check every single connection manually. That would be slow and dumb.
Instead, it uses something called epoll on Linux, or kqueue on Mac and BSD. These are basically special power tools the operating system gives you. You’re like, “Hey OS, tell me which of these thousands of sockets have something for me to do.” And the OS tells you, “Socket 451 has data. Socket 823 is ready. Socket 1200 is done with I/O.”
Then the worker processes exactly those ones. It doesn’t waste time checking sockets that have nothing for it. This is fast. Like, really fast. The event loop can cycle through thousands of sockets per second and handle them.
Real Numbers From Actual Work
Let me give you an actual example from something I dealt with recently at HostGet. Client comes to us with a SaaS platform. They’ve got around 2 million users. During peak hours, they’re seeing about 50,000 people connected at the same time.
We put NGINX on a 32-core server. So 32 workers. Each worker is handling around 1,500 concurrent connections. Total memory for the whole NGINX process? Under 2GB. CPU usage during peak? Around 40-50%.
We decided to stress test it. Pushed it to 100,000 concurrent connections on the same hardware. It handled it. Average response time was still under 100 milliseconds. For basic requests, it was actually faster.
I ran the same load test on an Apache setup with similar hardware and configuration a while back. Apache started having serious issues around 15,000 to 20,000 concurrent connections. Response times got terrible. The whole thing was struggling. That’s the difference. That’s not marketing talk. That’s what actually happens.
The Difference Between NGINX And Node.js
People sometimes ask me if NGINX and Node.js are basically the same thing because they both use events and non-blocking I/O. They’re not the same thing at all.
Node.js runs your JavaScript code. It’s got one main thread by default. It uses something called libuv to handle async operations. There’s a thread pool somewhere in the background doing I/O operations. If you want to do heavy computation work, you’ve gotta create worker threads yourself. It’s powerful for building applications, but it’s a different tool.
NGINX is just handling requests. It’s not running your code. It’s built specifically to be a web server and reverse proxy. Multiple separate worker processes, each with their own event loop. No JavaScript overhead. No garbage collection pauses. Each worker is isolated.
From a stability standpoint, this isolation is huge. If something breaks in one NGINX worker, it doesn’t necessarily take down the others. With Node.js, you’ve got shared state you need to worry about. You need to manage memory carefully. You need to think about graceful shutdowns.
With NGINX, it’s just rock solid. Reload your config, zero downtime. A worker crashes, master starts a new one. It just keeps running.
Things I’ve Learned That Actually Matter
After spending way too much time with NGINX in production, here are the things that actually make a difference:
The default settings are not good enough. Out of the box, NGINX will work. But it’s not optimized. You need to mess with OS settings. File descriptor limits. Socket backlog sizes. TCP settings. We always increase these because NGINX can handle way more than what the operating system allows by default.
You need to watch it. Don’t just set it and forget it. You should monitor how many connections each worker has. Look at response times. See how it performs under load. Most people don’t do this and then they’re surprised when things break.
One worker per core usually makes sense. If you’re just doing reverse proxy work or basic web serving, one worker per CPU core is the sweet spot. If you’re doing Lua scripting or heavy computation at the edge, maybe fewer workers. You want your cache to stay hot.
Graceful reloads are actually amazing. This is something that still gets me excited. You can update your NGINX config and reload it without dropping a single connection. Zero downtime. Try that with Apache. The architecture actually makes this possible and easy.
Why This Actually Matters
Whether you’re running a little side project or managing infrastructure for millions of people, understanding how NGINX works changes how you think about scaling. You stop thinking, “We need to buy more servers.” You start thinking, “Are we actually using what we have efficiently?”
Most people aren’t. NGINX lets you handle more traffic with less hardware. That means lower cloud bills. Faster response times. Better ability to handle traffic spikes without everything melting.
I’ve seen clients migrate from Apache or other setups to NGINX and their infrastructure costs dropped by 30-40%. While their performance actually got better. That’s not small. That’s real money and real improvement.
My Bottom Line
When I started out, I didn’t get how NGINX could handle so much more traffic than Apache. It seemed like magic or something. Turns out it’s just smart architecture. Use multiple worker processes instead of threads per connection.
Use efficient system calls to monitor thousands of connections. Use an event loop to switch between whoever’s ready. Don’t block on I/O. Keep moving. Simple ideas, really well executed. And now that I get it, I see this pattern everywhere. Other fast systems are built the same way. Once you understand how this works, you can’t unsee it.
If you’re still using Apache or something else that’s not optimized, seriously just try NGINX. Spin up a test server. Run some load tests against it. See what happens. You’ll probably be surprised. That’s what I’ve learned from years of dealing with this stuff in the real world.
Building systems that actually need to handle massive traffic. Making sure they don’t fall over. NGINX is just really good at that one thing. And sometimes, being really good at one thing is exactly what you need.
