Ibraheem Ahmed

Async Rust is taking over, quickly becoming the default for libraries that do any sort of I/O. While async is incredibly useful for a large number of applications and enables patterns that simply cannot be expressed in blocking code, it does come with a certain amount of added complexity. Because of this, there is still a large audience for blocking crates. ureq for example, a blocking HTTP client, received over 350k downloads last month. However, the blocking ecosystem for HTTP servers is severely lacking. tiny-http, the main blocking server option, is still relatively popular despite it's age, lack of maintenance, and lack of HTTP/2 support.

A few months ago I released a crate called astra, a blocking HTTP server built on top of hyper:

use astra::{Body, Response, Server};

fn main() {
    Server::bind("localhost:3000")
        .serve(|_req| Response::new(Body::new("Hello World!")))
        .expect("serve failed");
}

Using astra means you get all the features of hyper, Rust's defacto HTTP implementation, without having to deal with async/await. It is similar to the blocking module provided by reqwest, but unlike reqwest, astra does not depend at all on tokio.

How Does It Work?

hyper is fundamentally built on async I/O and requires it to run correctly, but it is generic over the runtime. To avoid depending on a large crate like tokio, astra runs a small evented I/O loop on a background thread and dispatches connections to it's own worker pool. The difference is that instead of tasks yielding to a userspace runtime like tokio, they yield to the operating system. This means that request handlers can use standard I/O primitives without worrying about blocking the runtime:

use astra::{Body, ResponseBuilder, Server};
use std::time::Duration;

fn main() {
    Server::bind("localhost:3000")
        .serve(|_req| {
            // Putting the worker thread to sleep will allow other workers to run.
            std::thread::sleep(Duration::from_secs(1));

            // Regular blocking I/O is fine too!
            let body = std::fs::read_to_string("index.html").unwrap();

            ResponseBuilder::new()
                .header("Content-Type", "text/html")
                .body(Body::new(body))
                .unwrap()
        })
        .expect("serve failed");
}

But Is It Fast?

Many of the references you'll find about thread-per-request performance are very outdated, often referencing bottlenecks from a time where C10k was peak scale. Since then, thread creation has gotten significantly cheaper and context switching overhead has been reduced drastically. Modern OS schedulers are much better than they are given credit for, and it is very feasible to serve upwards of tens of thousands of concurrent connections using blocking I/O.

In naive 'Hello World' style HTTP benchmarks, astra is likely to lag behind tokio. This is partly because astra has to pay the cost of both threading and async I/O to be compatible with hyper. However, as more work is done per request, especially pure blocking I/O, the difference diminishes. As always, you should measure your own use case, but astra's performance may surprise you.

That being said, one of astra's main use cases is running a lightweight server with minimal dependencies, and avoiding the complexity that comes with async, so any potential performance tradeoffs might not be be relevant.

Resource Limits

astra has a configuration option for the maximum number of threads that can be spawned by the worker pool. The pool is dynamic, growing and shrinking with the load on your server. A common misconception is that the number of threads used should be capped close to the number of CPUs to avoid any unnecessary context switching. While this is often true for compute-bound applications, it's not at all true for an HTTP server. Threads are not much different than async tasks in this regard - just as async tasks yield cooperatively at .await points, threads context switch when they have to block for I/O. So in the effort of increasing the concurrency of your server, the level of hardware concurrency is not very relevant.

In fact, I/O heavy applications will generally only ever context switch when blocking for I/O, so the operating system will rarely, if ever, have to resort to preemption. This means that much of the cost often associated with a context switch is avoided.

Another common misconception is that threads allocate their entire stack on creation, making them very expensive. This is simply not true. On Linux for example, the initial amount memory used by a thread is only around 8kb. Combined with the fact that context switching is quite fast on modern systems, this means that the maximum number of threads can be relatively high. astra uses a conservative default of 15 threads per CPU, but the limit can be reasonably set in the thousands, though you may have to increase virtual memory limits through your operating system.

Security

One of the issues with blocking I/O is that it is susceptible to attacks such as Slowloris. Because of this, it is important to run your server behind an async reverse proxy such as Nginx. Most people end up doing this anyways for TLS support, but it is something to keep in mind.