This is an interesting project - kudos for executing it. I have to admit that when I was starting out in this field, I too fantasised about, "Would this software be faster, smaller and better in assembly?". Ofcourse, assembly programming made some sense in embedded electronics, which can be very resource constrained and even specialised for one particular application. Thinking from that aspect, perhaps you should consider making this a specialised program that runs on something like a Raspberry Pi - running such a web server directly on it, without an OS (or a very minimal OS), would make for a real cool and interesting project.
I did actually make an attempt at that once for BGGP5 [0]. (That is, making a minimal, horribly insecure 'client' implementing just enough behavior to get a response from a server.) But I got demoralized by how much space the binary blobs for the crypto algorithms took up, in comparison to the actual machine code.
What on earth are you talking about? Assembly makes sense in desktop computing as well. Have you ever, for example, watched a video? What do you think powers the codecs, JSX?
The statistics reported by GitLab for the x264 repo (https://code.videolan.org/videolan/x264) report that the project is 13.5% assembly; common utilities used in the inner loops of the codec have optimized assembly implementations for several CPU architectures.
A lot of the encoding side on ffmpeg now uses hand-coded assembly optimizations to take advantage of avx512 instructions on newer x64 processors for "100x speed increase" since February 2025 in a stable form
[0] https://binary.golf/5/
https://www.techspot.com/news/108715-ffmpeg-gets-100x-faster...
https://news.ycombinator.com/item?id=48080587