The decision to timeout requests quickly wasn't made to avoid having long-running requests on our router, nor to only have fast apps on our platform, but because standard web servers do not handle these types of requests particularly well.
All webservers will work in a similar way. Any new request will go to a queue, and the server will process them one after the other.
This means if you have 30 requests in your queue, each taking 1 second to be processed, that will take 30 seconds for your server to empty the queue. If one of those requests is a file upload for example and takes 5 minutes to be processed, it means that any other request will be stuck for 5 minutes. That's 5 minutes during which no one else can visit your app.
In standard servers, you would start multiple processes and setup a HAProxy in front of them.
If you're familiar with our stack, this architecture will look quite familiar to you, as it is exactly what our router does.
This means if you need to start new processes to handle more requests, you just have to start more dynos. Of course, dynos can handle more than one process at a time, when using a concurrent web server, such as Unicorn in ruby, node's Cluster, or python's Gunicorn.
All these solutions are small workarounds to face a real problem though. Standard web servers don't scale with long-running requests. Knowing this should be among the top things in your mind when architecting a web application. Because of this, you probably don't want to do long things in your web process, at the risk of having bad performance.
There are two kind of different long-running actions, both with different solutions.
Even with a web server able to handle long-running requests properly, you probably wouldn't want to handle slow actions in your web process. These actions can be processing a video, generating a PDF, or thumbnails for an image and other actions that use a lot of memory or CPU and will take more than a few milliseconds to execute.
You will want to execute those requests asynchronously in a background process. As it's name states, this process will be running in the background, which means it won't accept web requests, and will only handle actions unseen from the client.
This means your web app can upload a file to a third-party service without ever hitting your server, preventing any long-running request and delegating all the slow upload to an app dedicated to that.
Once that upload is done, you would only have to send the uploaded file's name to your app, which will then be able to do whatever it needs with it. Move it to a secure location; generate thumbnails etc.
Building programs is a lot of fun because you alway find edge cases where you will need and want to have a long-running request. What, for example, if you're waiting for a PDF to have finished generating before sending it to the client. You could perform requests every 10 seconds to check the status of your PDF. Or you could keep a long-running connection with your server and be notified as soon as the PDF is generated.
Furthermore, technologies like Node.js and Go are specifically designed for running concurrent connections, which means they are perfectly able to scale even with a lot of long-running requests. The way to go here is by opening a websocket.
On our platform, WebSockets are following the specification as much as they can. Which means you can open a long-running connection, which will stay open until it is closed either by the server or the client, or it has 55 seconds of inactivity. This means we can keep a connection open as long as we want, on the condition that we feed it data regularly.
Avoiding long-running requests will allow you to get better performance on your app and scale it more easily. Unfortunately, there is no generic way for our router to let your application know when we have responded to a request with an H12. This means an app with a very long queue could end up in a state where all requests are H12 because your app doesn't even have the time to get to it in 30 seconds. This can be mitigated quite easily in ruby with rack-timeout, or in node with expressjs' timeout.
These libraries will raise an exception in your app if a request takes longer than a few seconds to execute (we recommend not setting a value higher than 10 seconds). This way, requests stay too long in their queue will be killed much faster, avoiding that state of perpetual H12 errors.