HTTP is not simple

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

HTTP/1 may appear simple because of several reasons: it is readable text, the most simple use case is not overly complicated and existing tools like curl and browsers help making HTTP easy to play with.

The HTTP idea and concept can perhaps still be considered simple and even somewhat ingenious, but the actual machinery is not.

[goes on to describe several specific characteristics of HTTP that make it un-simple, under the headings:

  • newlines
  • whitespace
  • end of body
  • parsing numbers
  • folding headers
  • never-implemented
  • so many headers
  • not all methods are alike
  • not all headers are alike
  • spineless browsers
  • size of the specs

]

I discovered this post late, while catching up on posts in the comp.infosystems.gemini newsgroup, but I’m glad I did because it’s excellent. Daniel Stenberg is, of course, the creator of cURL and so probably knows more about the intricacies of HTTP than virtually anybody (including, most-likely, some of the earliest contributors to its standards), and in this post he does a fantastic job of dissecting the oft-made argument that HTTP/1 is a “simple” protocol; based usually upon the argument that “if a human can speak it over telnet/netcat/etc., it’s simple”.

This argument, of course, glosses over the facts that (a) humans are not simple, and the things that we find “easy”… like reading a string of ASCII representations of digits and converting it into a representation of a number… are not necessarily easy for computers, and (b) the ways in which a human might use HTTP 0.9 through 1.1 are rarely representative of the complexities inherent in more-realistic “real world” use.

Obviously Daniel’s written about Gemini, too, and I agree with some of his points there (especially the fact that the specification intermingles the transfer protocol and the recommended markup language; ick!). There’s a reasonable rebuttal here (although it has its faults too, like how it conflates the volume of data involved in the encryption handshake with the processing overhead of repeated handshakes). But now we’re going way down the rabbithole and you didn’t come here to listen to me dissect arguments and counter-arguments about the complexities of Internet specifications that you might never use, right? (Although maybe you should: you could have been reading this blog post via Gemini, for instance…)

But if you’ve ever telnet’ted into a HTTP server and been surprised at how “simple” it was, or just have an interest in the HTTP specifications, Daniel’s post is worth a read.

You MUST listen to RFC 2119

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

With thanks to Ruth for sharing this with me:

RFC 2119 establishes language around requirement levels. Terms like “MUST”, “MUST NOT”, “SHOULD”, and “SHOULD NOT” are helpful when coordinating with engineers. I reference it a lot for work, as I create a lot of accessible component specifications.

Because of this familiarity—and because I’m an ass—I fired back in Discord:

I want to hire a voice actor to read 2119 in the most over the top, passive-aggressive way possible
wait, this is an achievable goal oh no

It turns out you can just pay people to do things.

I found a voice actor and hired them with the task of “Reading this very dry technical document in the most over-the-top sarcastic, passive-aggressive, condescending way possible. Like, if you think it’s too much, take that feeling, ignore it, and crank things up one more notch.”

RFC 2119 is one of few RFCs I can identify by number alone, too. That and RFCs 1945 and 1866, for some reason, and RFC 2822 (and I guess, by proxy, 822) because I’ve had to implement its shitty date format more times than I’d like to count.

But anyway: if you’ve ever wanted to hear a (sarcastic, passive aggressive) dramatic reading of RFC 2119, Eric – and the actor he found – have got you covered!

RFC-20

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

The choice of this encoding has made ASCII-compatible standards the language that computers use to communicate to this day.

Even casual internet users have probably encountered a URL with “%20” in it where there logically ought to be a space character. If we look at this RFC we see this:

   Column/Row  Symbol      Name

   2/0         SP          Space (Normally Non-Printing)

Hey would you look at that! Column 2, row 0 (2,0; 20!) is what stands for “space”. When you see that “%20”, it’s because of this RFC, which exists because of some bureaucratic decisions made in the 1950s and 1960s.

Darius Kazemi is reading a single RFC every day throughout 2019 and writing up his understanding as to the content and importance of each. It’s good reading if you’re “into” RFCs and it’s probably pretty interesting if you’re just a casual Internet historian.