LAMPe2e
Tuesday 8 August 2023
Taming the vfs
Apache listen backlog
Friday 14 October 2022
Gunicorn: a blast from the past
Nowadays I don't usually configure any swap on my servers. When a web application starts using swap it is already failing - its not really providing any contingency / safety net. At least that's what I thought up until today.
I was looking at an ELK cluster built by one of my colleagues. Where saw this:
$ free -m
total used free shared buff/cache available
Mem: 63922 50151 5180 1 8590 12907
Swap: 16383 10933 5450
Hmmm. Lots of swap used, but lots of free memory. And it was staying like this.
Checking with vmstat, although there was a lot of stuff in swap, nothing was moving in and out of swap.
After checking the value for VmSwap in /proc/*/stat, it was clear that the footprint in swap was made up entirely of gunicorn processes. Gunicorn, in case you hadn't heard of it, is a Python application server. The number of instances it runs is fixed and defined when the server is started. I've not seen a server like that in 20 years :).
- On an event based server such as nginx or lighttpd, a new client connection just requires the server process to allocate RAM to handle the request.
- With the pre-fork servers I am familiar with, the server will adjust the number of processes to cope with level of demand within a defined range. Some, like Apache httpd and php-fpm implement hysteresis - they spin up new instances faster than they reap idle ones - to better cope with spikes in demand.
- Thread based servers are (in my experience) a halfway-house between the event based and (variable) pre-fork servers.
While the kernel is doing the job of ensuring that these idle processes are not consuming resources which could be better used elsewhere, it is perhaps a little over-zealous here. It will be more expensive to recover these from swap than it would be to fork an instance. But changing to a variable number of processes is not really an option here. If I start seeing performance isues when this application comes under load I'll need to look at keeping these out of swap - which unfortunately comes at the cost of reducing available memory for the overnight batch processing handled on the cluster.
Thursday 6 August 2020
Web standards never sleep
Saturday 13 June 2020
HTTP2 and 421 errors
- (re)route services within our network
- Upgrade services to HTTP2
- Provision certificates / manage encryption
- Configure browser-side caching
- Analyse traffic
- ...and more
Running a handful of BigIp F5s is unfortunately not an option, so my switchboard is a stack of Ubuntu + nginx.
Up till recently the onboarding exercise had gone really well - but then I started encountering 421 errors. My initial reading kept leading me to old bugs in Chrome and other issues with HTTP2. These are neatly summarized by Kevin as follows:
However, I was able to reproduce the bug in the first request from a browser instance I had just started. Also I was able to see the issue across sites using distinct certificates. I was confused.
This is caused by the following sequence of events:
- The server and client both support and use HTTP/2.
- The client requests a page at
foo.example.com
.- During TLS negotiation, the server presents a certificate which is valid for both
foo.example.com
andbar.example.com
(and the client accepts it). This could be done with a wildcard certificate or a SAN certificate.- The client reuses the connection to make a request for
bar.example.com
.- The server is unable or unwilling to support cross-domain connection reuse (for example because you configured their SSL differently and Apache wants to force a TLS renegotiation), and serves HTTP 421.
- The client does not automatically retry with a new connection (see for example Chrome bug #546991, now fixed). The relevant RfC says that the client MAY retry, not that it SHOULD or MUST. Failing to retry is not particularly user-friendly, but might be desirable for a debugging tool or HTTP library.
Eventually I looked at the logs for the origin server (Apache 2.4.26). I hadn't considered that before as I knew it did not support HTTP2. But low and behold, there in the logs, a 421 error against my request.
[Fri Jun 12 18:44:47.945706 2020] [ssl:error] [pid 21423:tid 140096556701440] AH02032: Hostname foo.example.net provided via SNI and hostname bar.example.com provided via HTTP have no compatible SSL setup
So disabling SSL session re-use on the connection from the nginx proxy to the origin resolved the issue:
proxy_ssl_session_reuse off;
This does mean slightly more overhead between the proxy and the origin, but since both are on the same LAN, its not really noticeable.
Although its something of a gray area, I don't think this is a bug with nginx - I had multiple sites in nginx pointing at the same backend URL. It would be interesting to check if the issue occurs when I have multiple unique DNS names for the origin - one for each nginx front end - if it still occurs there, then that is probably a bug.