internet

[Bloganuary] Communicate Early, Communicate Often

This post is part of my attempt at Bloganuary 2024. Today’s prompt is:

In what ways do you communicate online?

What a curious question! For me, it’s perhaps best divided into public and private communication, for which I use very different media:

Public

I’ve written before about how this site – my blog – is the centre of my digital “ecosystem”. And while the technical details may have changed since that post was published, the fundamentals have not: everything about my public communication revolves around this, right here.

For example:

When I post a non-trivial toot to my Mastodon account, a copy gets posted as a note here, and vice-versa.
When I undertake a geocaching or geohashing expedition, a log gets posted as a checkin here.
When I vlog, the primary/first version is published here; secondary copies might appear e.g. on my YouTube channel for visibility but the “official” version remains here
I’ve even retroactively imported content back into my blog, e.g. my old del.icio.us bookmarks from circa 2005-2008, many of my old Reddit posts, even old reviews I wrote on Google Maps and Amazon going back as 2003, etc., all pulled-together into a single central space¹
Content gets syndicated elsewhere via a variety of mechanisms, for visibility².

A golden cornfield with setting sun, superimposed with "Reap what you wow. Plant your content into the field of your own website." — This is what I’m talking about.

Private

For private communication online, I perhaps mostly use the following (in approximate order of volume):

Slack: we use Slack at Automattic; we use Slack at Three Rings; we’ve even got a “household” instance running for The Green!³
WhatsApp: the UI‘s annoying (but improving), but its the go-to communications platform of my of my friends and family, so it’s a big part of my online communications strategy.⁴
Email: Good old-fashioned email⁵. I prefer to encrypt, or at least sign, my email: sure, PGP/GPG‘s not perfect⁶, but it’s better than, y’know, not securing your email at all.
Discord: I’m in a couple of Discord servers, but the only one I pay any reasonable amount of attention to is the Geohashing one.
Various videoconferencing tools including Google Meet, Zoom, and Around. Sometimes you’ve just gotta get (slightly more) face-to-face.
Signal: I feel like everybody’s on WhatsApp now, and the Signal app got annoying when it stopped being able to not only send but even receive SMS messages (which aren’t technically Internet messages, usually), but I still send/receive a few Signal messages in a typical month.

That’s a very different set of tech stacks than I use in my “public” communication!

¹ My thinking is, at least in part: I’ve seen platforms come and go, and my blog’s outlived them. I’ve seen platforms change their policies or technology in ways that undermine the content I put on them, but the stuff on my blog remains under my control and I can “fix” it if I wish. Owning your data is awesome, although I perhaps do it to a more-extreme extent than many.

² I’ve used to joke that I syndicate content to e.g. Facebook to support readers who haven’t learned yet to use a feed reader. I used to, and I still do, too.

³ A great thing about having a “personal” Slack installation is that you can hook up your own integrations and bots to e.g. remind you to bring the milk in.

⁴ I’ve been experimenting with Texts to centralise several of my other platforms; I’m not convinced by it yet, but I love the thinking! Long ago, I used to love using Pidgin for simultaneous access to IRC, ICQ, MSN Messenger, Google Talk, Yahoo! Messenger and all that jazz, so I fully approve of the concept.

⁵ Okay, not actually old-fashioned because I’m not suggesting you use UUCP to send mail to protonmail!danq!dan or DECnet to deliver to danq.me::dan or something!

⁶ Most of the metadata including sender, recipient, and in most cases even subject is not encrypted.

Gemini and Spartan without a browser

A particular joy of the Gemini and Spartan protocols – and the Markdown-like syntax of Gemtext – is their simplicity.

Even without a browser, you can usually use everyday command-line tools that you might have installed already to access relatively human-readable content.

Here are a few different command-line options that should show you a copy of this blog post (made available via CapsulePress, of course):

Gemini

Gemini communicates over a TLS-encrypted channel (like HTTPS), so we need a to use a tool that speaks the language. Luckily: unless you’re on Windows you’ve probably got one installed already¹.

Using OpenSSL

This command takes the full gemini:// URL you’re looking for and the domain name it’s at. 1965 refers to the port number on which Gemini typically runs –

printf "gemini://danq.me/posts/gemini-without-a-browser\r\n" | \
  openssl s_client -ign_eof -connect danq.me:1965

Using GnuTLS

GnuTLS closes the connection when STDIN closes, so we use cat to keep it open. Note inclusion of --no-ca-verification to allow self-signed certificates (optionally add --tofu for trust-on-first-use support, per the spec).

{ printf "gemini://danq.me/posts/gemini-without-a-browser\r\n"; cat -; } | \
  gnutls-cli --no-ca-verification danq.me:1965

Using Ncat

Netcat reimplementation Ncat makes Gemini requests easy:

printf "gemini://danq.me/posts/gemini-without-a-browser\r\n" | \
  ncat --ssl danq.me 1965

Spartan

Spartan is a little like “Gemini without TLS“, but it sports an even-more-lightweight request format which makes it especially easy to fudge requests².

Using Telnet

Note the use of cat to keep the connection open long enough to get a response, as we did for Gemini over GnuTLS.

{ printf "danq.me /posts/gemini-without-a-browser 0\r\n"; cat -; } | \
  telnet danq.me 300

Using cURL

cURL supports the telnet protocol too, which means that it can be easily coerced into talking Spartan:

printf "danq.me /posts/gemini-without-a-browser 0\r\n" | \
  curl telnet://danq.me:300

Using Ncat/Netcat

Because TLS support isn’t needed, this also works perfectly well with Netcat – just substitute nc/netcat or whatever your platform calls it in place of ncat:

printf "danq.me /posts/gemini-without-a-browser 0\r\n" | \
  ncat danq.me 300

I hope these examples are useful to somebody debugging their capsule, someday.

¹ You can still install one on Windows, of course, it’s just less-likely that your operating system came with such a command-line tool built-in

² Note that the domain and path are separated in a Spartan request and followed by the size of the request payload body: zero in all of my examples

Yesterday’s Internet Today! (Woo DM 2023)

The week before last I had the opportunity to deliver a “flash talk” of up to 4 minutes duration at a work meetup in Vienna, Austria. I opted to present a summary of what I’ve learned while adding support for Finger and Gopher protocols to the WordPress installation that powers DanQ.me (I also hinted at the fact that I already added Gemini and Spring ’83 support, and I’m looking at other protocols). If you’d like to see how it went, you can watch my flash talk here or on YouTube.

If you love the idea of working from wherever-you-are but ocassionally meeting your colleagues in person for fabulous in-person events with (now optional) flash talks like this, you might like to look at Automattic’s recruitment pages…

The presentation is a shortened, Automattic-centric version of a talk I’ll be delivering tomorrow at Oxford Geek Nights #53; so if you’d like to see it in-person and talk protocols with me over a beer, you should come along! There’ll probably be blog posts to follow with a more-detailed look at the how-and-why of using WordPress as a CMS not only for the Web but for a variety of zany, clever, retro, and retro-inspired protocols down the line, so perhaps consider the video above a “teaser”, I guess?

Breakups as HTTP Response Codes

103: Early Hints ("I'm not sure this can last forever.") — 103: Early Hints (“I’m not sure this can last forever.”)

300: Multiple Choices ("There are so many ways I can do better than you.") — 300: Multiple Choices (“There are so many ways I can do better than you.”)

303: See Other ("You should date other people.") — 303: See Other (“You should date other people.”)

304: Not Modified ("With you, I feel like I'm stagnating.") — 304: Not Modified (“With you, I feel like I’m stagnating.”)

402: Payment Required ("I am a prostitute.") — 402: Payment Required (“I am a prostitute.”)

403: Forbidden ("You don't get this any more.") — 403: Forbidden (“You don’t get this any more.”)

406: Not Acceptable ("I could never introduce you to my parents.") — 406: Not Acceptable (“I could never introduce you to my parents.”)

408: Request Timeout ("You keep saying you'll propose but you never do.") — 408: Request Timeout (“You keep saying you’ll propose but you never do.”)

409: Conflict ("We hate each other.") — 409: Conflict (“We hate each other.”)

411: Length Required ("Your penis is too small.") — 411: Length Required (“Your penis is too small.”)

413: Payload Too Large ("Your penis is too big.") — 413: Payload Too Large (“Your penis is too big.”)

416: Range Not Satisfied ("Our sex life is boring and repretitive.") — 416: Range Not Satisfied (“Our sex life is boring and repretitive.”)

425: Too Early ("Your premature ejaculation is a problem.") — 425: Too Early (“Your premature ejaculation is a problem.”)

428: Precondition Failed ("You're still sleeping with your ex-!?") — 428: Precondition Failed (“You’re still sleeping with your ex-!?”)

429: Too Many Requests ("You're so demanding!") — 429: Too Many Requests (“You’re so demanding!”)

451: Unavailable for Legal Reasons ("I'm married to somebody else.") — 451: Unavailable for Legal Reasons (“I’m married to somebody else.”)

502: Bad Gateway ("Your pussy is awful.") — 502: Bad Gateway (“Your pussy is awful.”)

508: Loop Detected ("We just keep fighting.") — 508: Loop Detected (“We just keep fighting.”)

With thanks to Ruth for the conversation that inspired these pictures, and apologies to the rest of the Internet for creating them.

How WordPress and Tumblr are keeping the internet weird

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

…

[Nilay:] It is fashionable to run around saying the web is dead and that apps shape the world, but in my mind, the web’s pretty healthy for at least two things: news and shopping.

[Matt:] I think that’s your bubble, if I’m totally honest. That’s what’s cool about the web: We can live in a bubble and that can seem like the whole thing. One thing I would explicitly try to do in 2022 is make the web weirder.

…

Nilay Patel (The Verge)

The Verge interviewed Matt Mullenweg, and – as both an Automattician and a fan of the Web as a place for fun and weirdness – I really appreciated the direction the interview went in. I maintain that open web standards and platforms (as opposed to closed social media silos) are inspirational and innovative.

Emilie Reed‘s Anything a Maze lives on itch.io, and (outside of selfhosting) that’s clearly the best place for it: you couldn’t tell that story the same way on Medium; even less-so on Facebook or Twitter.

Can I use HTTP Basic Auth in URLs?

Web standards sometimes disappear

Sometimes a web standard disappears quickly at the whim of some company, perhaps to a great deal of complaint (and at least one joke).

But sometimes, they disappear slowly, like this kind of web address:

http://username:password@example.com/somewhere

If you’ve not seen a URL like that before, that’s fine, because the answer to the question “Can I still use HTTP Basic Auth in URLs?” is, I’m afraid: no, you probably can’t.

But by way of a history lesson, let’s go back and look at what these URLs were, why they died out, and how web browsers handle them today. Thanks to Ruth who asked the original question that inspired this post.

Basic authentication

The early Web wasn’t built for authentication. A resource on the Web was theoretically accessible to all of humankind: if you didn’t want it in the public eye, you didn’t put it on the Web! A reliable method wouldn’t become available until the concept of state was provided by Netscape’s invention of HTTP cookies in 1994, and even that wouldn’t see widespread for several years, not least because implementing a CGI (or similar) program to perform authentication was a complex and computationally-expensive option for all but the biggest websites.

1996’s HTTP/1.0 specification tried to simplify things, though, with the introduction of the WWW-Authenticate header. The idea was that when a browser tried to access something that required authentication, the server would send a 401 Unauthorized response along with a WWW-Authenticate header explaining how the browser could authenticate itself. Then, the browser would send a fresh request, this time with an Authorization: header attached providing the required credentials. Initially, only “basic authentication” was available, which basically involved sending a username and password in-the-clear unless SSL (HTTPS) was in use, but later, digest authentication and a host of others would appear.

Webserver software quickly added support for this new feature and as a result web authors who lacked the technical know-how (or permission from the server administrator) to implement more-sophisticated authentication systems could quickly implement HTTP Basic Authentication, often simply by adding a .htaccess file to the relevant directory. .htaccess files would later go on to serve many other purposes, but their original and perhaps best-known purpose – and the one that gives them their name – was access control.

Credentials in the URL

A separate specification, not specific to the Web (but one of Tim Berners-Lee’s most important contributions to it), described the general structure of URLs as follows:

<scheme>://<username>:<password>@<host>:<port>/<url-path>#<fragment>

At the time that specification was written, the Web didn’t have a mechanism for passing usernames and passwords: this general case was intended only to apply to protocols that did have these credentials. An example is given in the specification, and clarified with “An optional user name. Some schemes (e.g., ftp) allow the specification of a user name.”

But once web browsers had WWW-Authenticate, virtually all of them added support for including the username and password in the web address too. This allowed for e.g. hyperlinks with credentials embedded in them, which made for very convenient bookmarks, or partial credentials (e.g. just the username) to be included in a link, with the user being prompted for the password on arrival at the destination. So far, so good.

This is why we can’t have nice things

The technique fell out of favour as soon as it started being used for nefarious purposes. It didn’t take long for scammers to realise that they could create links like this:

https://YourBank.com@HackersSite.com/

Everything we were teaching users about checking for “https://” followed by the domain name of their bank… was undermined by this user interface choice. The poor victim would actually be connecting to e.g. HackersSite.com, but a quick glance at their address bar would leave them convinced that they were talking to YourBank.com!

Theoretically: widespread adoption of EV certificates coupled with sensible user interface choices (that were never made) could have solved this problem, but a far simpler solution was just to not show usernames in the address bar. Web developers were by now far more excited about forms and cookies for authentication anyway, so browsers started curtailing the “credentials in addresses” feature.

(There are other reasons this particular implementation of HTTP Basic Authentication was less-than-ideal, but this reason is the big one that explains why things had to change.)

One by one, browsers made the change. But here’s the interesting bit: the browsers didn’t always make the change in the same way.

How different browsers handle basic authentication in URLs

Let’s examine some popular browsers. To run these tests I threw together a tiny web application that outputs the Authorization: header passed to it, if present, and can optionally send a 401 Unauthorized response along with a WWW-Authenticate: Basic realm="Test Site" header in order to trigger basic authentication. Why both? So that I can test not only how browsers handle URLs containing credentials when an authentication request is received, but how they handle them when one is not. This is relevant because some addresses – often API endpoints – have optional HTTP authentication, and it’s sometimes important for a user agent (albeit typically a library or command-line one) to pass credentials without first being prompted.

In each case, I tried each of the following tests in a fresh browser instance:

Go to http://<username>:<password>@<domain>/optional (authentication is optional).
Go to http://<username>:<password>@<domain>/mandatory (authentication is mandatory).
Experiment 1, then f0llow relative hyperlinks (which should correctly retain the credentials) to /mandatory.
Experiment 2, then follow relative hyperlinks to the /optional.

I’m only testing over the http scheme, because I’ve no reason to believe that any of the browsers under test treat the https scheme differently.

Chromium desktop family

Chrome 93 and Edge 93 both immediately suppressed the username and password from the address bar, along with the “http://” as we’ve come to expect of them. Like the “http://”, though, the plaintext username and password are still there. You can retrieve them by copy-pasting the entire address.

Opera 78 similarly suppressed the username, password, and scheme, but didn’t retain the username and password in a way that could be copy-pasted out.

Authentication was passed only when landing on a “mandatory” page; never when landing on an “optional” page. Refreshing the page or re-entering the address with its credentials did not change this.

Navigating from the “optional” page to the “mandatory” page using only relative links retained the username and password and submitted it to the server when it became mandatory, even Opera which didn’t initially appear to retain the credentials at all.

Navigating from the “mandatory” to the “optional” page using only relative links, or even entering the “optional” page address with credentials after visiting the “mandatory” page, does not result in authentication being passed to the “optional” page. However, it’s interesting to note that once authentication has occurred on a mandatory page, pressing enter at the end of the address bar on the optional page, with credentials in the address bar (whether visible or hidden from the user) does result in the credentials being passed to the optional page! They continue to be passed on each subsequent load of the “optional” page until the browsing session is ended.

Firefox desktop

Firefox 91 does a clever thing very much in-line with its image as a browser that puts decision-making authority into the hands of its user. When going to the “optional” page first it presents a dialog, warning the user that they’re going to a site that does not specifically request a username, but they’re providing one anyway. If the user says that no, navigation ceases (the GET request for the page takes place the same either way; this happens before the dialog appears). Strangely: regardless of whether the user selects yes or no, the credentials are not passed on the “optional” page. The credentials (although not the “http://”) appear in the address bar while the user makes their decision.

Similar to Opera, the credentials do not appear in the address bar thereafter, but they’re clearly still being stored: if the refresh button is pressed the dialog appears again. It does not appear if the user selects the address bar and presses enter.

Similarly, going to the “mandatory” page in Firefox results in an informative dialog warning the user that credentials are being passed. I like this approach: not only does it help protect the user from the use of authentication as a tracking technique (an old technique that I’ve not seen used in well over a decade, mind), it also helps the user be sure that they’re logging in using the account they mean to, when following a link for that purpose. Again, clicking cancel stops navigation, although the initial request (with no credentials) and the 401 response has already occurred.

Visiting any page within the scope of the realm of the authentication after visiting the “mandatory” page results in credentials being sent, whether or not they’re included in the address. This is probably the most-true implementation to the expectations of the standard that I’ve found in a modern graphical browser.

Safari desktop

Safari 14 never displays or uses credentials provided via the web address, whether or not authentication is mandatory. Mandatory authentication is always met by a pop-up dialog, even if credentials were provided in the address bar. Boo!

Once passed, credentials are later provided automatically to other addresses within the same realm (i.e. optional pages).

Older browsers

Let’s try some older browsers.

From version 7 onwards – right up to the final version 11 – Internet Explorer fails to even recognise addresses with authentication credentials in as legitimate web addresses, regardless of whether or not authentication is requested by the server. It’s easy to assume that this is yet another missing feature in the browser we all love to hate, but it’s interesting to note that credentials-in-addresses is permitted for ftp:// URLs…

…and if you go back a little way, Internet Explorer 6 and below supported credentials in the address bar pretty much as you’d expect based on the standard. The error message seen in IE7 and above is a deliberate design decision, albeit a somewhat knee-jerk reaction to the security issues posed by the feature (compare to the more-careful approach of other browsers).

These older versions of IE even (correctly) retain the credentials through relative hyperlinks, allowing them to be passed when they become mandatory. They’re not passed on optional pages unless a mandatory page within the same realm has already been encountered.

Pre-Mozilla Netscape behaved the same way. Truly this was the de facto standard for a long period on the Web, and the varied approaches we see today are the anomaly. That’s a strange observation to make, considering how much the Web of the 1990s was dominated by incompatible implementations of different Web features (I’ve written about the <blink> and <marquee> tags before, which was perhaps the most-visible division between the Microsoft and Netscape camps, but there were many, many more).

Interestingly: by Netscape 7.2 the browser’s behaviour had evolved to be the same as modern Firefox’s, except that it still displayed the credentials in the address bar for all to see.

Now here’s a real gem: pre-Chromium Opera. It would send credentials to “mandatory” pages and remember them for the duration of the browsing session, which is great. But it would also send credentials when passed in a web address to “optional” pages. However, it wouldn’t remember them on optional pages unless they remained in the address bar: this feels to me like an optimum balance of features for power users. Plus, it’s one of very few browsers that permitted you to change credentials mid-session: just by changing them in the address bar! Most other browsers, even to this day, ignore changes to HTTP Authentication credentials, which was sometimes be a source of frustration back in the day.

Finally, classic Opera was the only browser I’ve seen to mask the password in the address bar, turning it into a series of asterisks. This ensures the user knows that a password was used, but does not leak any sensitive information to shoulder-surfers (the length of the “masked” password was always the same length, too, so it didn’t even leak the length of the password). Altogether a spectacular design and a great example of why classic Opera was way ahead of its time.

The Command-Line

Most people using web addresses with credentials embedded within them nowadays are probably working with code, APIs, or the command line, so it’s unsurprising to see that this is where the most “traditional” standards-compliance is found.

I was unsurprised to discover that giving curl a username and password in the URL meant that username and password was sent to the server (using Basic authentication, of course, if no authentication was requested):

$ curl http://alpha:beta@localhost/optional Header: Basic YWxwaGE6YmV0YQ== $ curl http://alpha:beta@localhost/mandatory Header: Basic YWxwaGE6YmV0YQ==

However, wget did catch me out. Hitting the same addresses with wget didn’t result in the credentials being sent except where it was mandatory (i.e. where a HTTP 401 response and a WWW-Authenticate: header was received on the initial attempt). To force wget to send credentials when they haven’t been asked-for requires the use of the --http-user and --http-password switches:

$ wget http://alpha:beta@localhost/optional -qO- Header: $ wget http://alpha:beta@localhost/mandatory -qO- Header: Basic YWxwaGE6YmV0YQ==

lynx does a cute and clever thing. Like most modern browsers, it does not submit credentials unless specifically requested, but if they’re in the address bar when they become mandatory (e.g. because of following relative hyperlinks or hyperlinks containing credentials) it prompts for the username and password, but pre-fills the form with the details from the URL. Nice.

What’s the status of HTTP (Basic) Authentication?

HTTP Basic Authentication and its close cousin Digest Authentication (which overcomes some of the security limitations of running Basic Authentication over an unencrypted connection) is very much alive, but its use in hyperlinks can’t be relied upon: some browsers (e.g. IE, Safari) completely munge such links while others don’t behave as you might expect. Other mechanisms like Bearer see widespread use in APIs, but nowhere else.

The WWW-Authenticate: and Authorization: headers are, in some ways, an example of the best possible way to implement authentication on the Web: as an underlying standard independent of support for forms (and, increasingly, Javascript), cookies, and complex multi-part conversations. It’s easy to imagine an alternative timeline where these standards continued to be collaboratively developed and maintained and their shortfalls – e.g. not being able to easily log out when using most graphical browsers! – were overcome. A timeline in which one might write a login form like this, knowing that your e.g. “authenticate” attributes would instruct the browser to send credentials using an Authorization: header:

<form method="get" action="/" authenticate="Basic"> <label for="username">Username:</label> <input type="text" id="username" authenticate="username"> <label for="password">Password:</label> <input type="text" id="password" authenticate="password"> <input type="submit" value="Log In"> </form>

In such a world, more-complex authentication strategies (e.g. multi-factor authentication) could involve encoding forms as JSON. And single-sign-on systems would simply involve the browser collecting a token from the authentication provider and passing it on to the third-party service, directly through browser headers, with no need for backwards-and-forwards redirects with stacks of information in GET parameters as is the case today. Client-side certificates – long a powerful but neglected authentication mechanism in their own right – could act as first class citizens directly alongside such a system, providing transparent second-factor authentication wherever it was required. You wouldn’t have to accept a tracking cookie from a site in order to log in (or stay logged in), and if your browser-integrated password safe supported it you could log on and off from any site simply by toggling that account’s “switch”, without even visiting the site: all you’d be changing is whether or not your credentials would be sent when the time came.

The Web has long been on a constant push for the next new shiny thing, and that’s sometimes meant that established standards have been neglected prematurely or have failed to evolve for longer than we’d have liked. Consider how long it took us to get the <video> and <audio> elements because the “new shiny” Flash came to dominate, how the Web Payments API is only just beginning to mature despite over 25 years of ecommerce on the Web, or how we still can’t use Link: headers for all the things we can use <link> elements for despite them being semantically-equivalent!

The new model for Web features seems to be that new features first come from a popular JavaScript implementation, and then eventually it evolves into a native browser feature: for example HTML form validations, which for the longest time could only be done client-side using scripting languages. I’d love to see somebody re-think HTTP Authentication in this way, but sadly we’ll never get a 100% solution in JavaScript alone: (distributed SSO is almost certainly off the table, for example, owing to cross-domain limitations).

Or maybe it’s just a problem that’s waiting for somebody cleverer than I to come and solve it. Want to give it a go?

Standing up for developers: youtube-dl is back

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Today we reinstated youtube-dl, a popular project on GitHub, after we received additional information about the project that enabled us to reverse a Digital Millennium Copyright Act (DMCA) takedown.

…

Abby Vollmer (GitHub)

This is a Big Deal. For two reasons:

Firstly, youtube-dl is a spectacularly useful project. I’ve used it for many years to help me archive my own content, to improve my access to content that’s freely available on the platform, and to help centralise (freely available) metadata to keep my subscriptions on video-sharing sites. Others have even more-important uses for the tool. I love youtube-dl, and I’d never considered the possibility that it could be used to circumvent digital restrictions (apparently it’s got some kind of geofence-evading features you can optionally enable, for people who don’t have a multi-endpoint VPN I guess?… I note that it definitely doesn’t break DRM…) until its GitHub repo got taken down the other week.

Which was a bleeding stupid thing to use a DMCA request on, because, y’know: Barbara Streisand Effect. Lampshading that a free, open-source tool could be used for people’s convenience is likely to increase awareness and adoption, not decrease it! Huge thanks to the EFF for stepping up and telling GitHub that they’d got it wrong (this letter is great reading, by the way).

But secondly, GitHub’s response is admirable and – assuming their honour their new stance – effective. They acknowledge their mistake, then go on to set out a new process by which they’ll review takedown requests. That new process includes technical and legal review, erring on the side of the developer rather than the claimant (i.e. “innocent until proven guilty”), multiparty negotiation, and limiting the scope of takedowns by allowing violators to export their non-infringing content after the fact.

I was concerned that the youtube-dl takedown might create a FOSS “chilling effect” on GitHub. It still might: in the light of it, I for one have started backing up my repositories and those of projects I care about to an different Git server! But with this response, I’d still be confident hosting the main copy of an open-source project on GitHub, even if that project was one which was at risk of being mistaken for copyright violation.

Note that the original claim came not from Google/YouTube as you might have expected (if you’ve just tuned in) but from the RIAA, based on the fact that youtube-dl could be used to download copyrighted music videos for enjoyment offline. If you’re reminded of Sony v. Universal City Studios (1984) – the case behind the “Betamax standard” – you’re not alone.

Evolving Computer Words: “Hacker”

This is part of a series of posts on computer terminology whose popular meaning – determined by surveying my friends – has significantly diverged from its original/technical one. Read more evolving words…

Anticipatory note: based on the traffic I already get to my blog and the keywords people search for, I imagine that some people will end up here looking to learn “how to become a hacker”. If that’s your goal, you’re probably already asking the wrong question, but I direct you to Eric S. Raymond’s Guide/FAQ on the subject. Good luck.

Few words have seen such mutation of meaning over their lifetimes as the word “silly”. The earliest references, found in Old English, Proto-Germanic, and Old Norse and presumably having an original root even earlier, meant “happy”. By the end of the 12th century it meant “pious”; by the end of the 13th, “pitiable” or “weak”; only by the late 16th coming to mean “foolish”; its evolution continues in the present day.

Right, stop that! It's too silly. — The Monty Python crew were certainly the experts on the contemporary use of the word.

But there’s little so silly as the media-driven evolution of the word “hacker” into something that’s at least a little offensive those of us who probably would be described as hackers. Let’s take a look.

Hacker

What people think it means

Computer criminal with access to either knowledge or tools which are (or should be) illegal.

What it originally meant

Expert, creative computer programmer; often politically inclined towards information transparency, egalitarianism, anti-authoritarianism, anarchy, and/or decentralisation of power.

The Past

The earliest recorded uses of the word “hack” had a meaning that is unchanged to this day: to chop or cut, as you might describe hacking down an unruly bramble. There are clear links between this and the contemporary definition, “to plod away at a repetitive task”. However, it’s less certain how the word came to be associated with the meaning it would come to take on in the computer labs of 1960s university campuses (the earliest references seem to come from around April 1955).

There, the word hacker came to describe computer experts who were developing a culture of:

sharing computer resources and code (even to the extent, in extreme cases, breaking into systems to establish more equal opportunity of access),
learning everything possible about humankind’s new digital frontiers (hacking to learn, not learning to hack)
judging others only by their contributions and not by their claims or credentials, and
discovering and advancing the limits of computers: it’s been said that the difference between a non-hacker and a hacker is that a non-hacker asks of a new gadget “what does it do?”, while a hacker asks “what can I make it do?”

It is absolutely possible for hacking, then, to involve no lawbreaking whatsoever. Plenty of hacking involves writing (and sharing) code, reverse-engineering technology and systems you own or to which you have legitimate access, and pushing the boundaries of what’s possible in terms of software, art, and human-computer interaction. Even among hackers with a specific interest in computer security, there’s plenty of scope for the legal pursuit of their interests: penetration testing, security research, defensive security, auditing, vulnerability assessment, developer education… (I didn’t say cyberwarfare because 90% of its application is of questionable legality, but it is of course a big growth area.)

So what changed? Hackers got famous, and not for the best reasons. A big tipping point came in the early 1980s when hacking group The 414s broke into a number of high-profile computer systems, mostly by using the default password which had never been changed. The six teenagers responsible were arrested by the FBI but few were charged, and those that were were charged only with minor offences. This was at least in part because there weren’t yet solid laws under which to prosecute them but also because they were cooperative, apologetic, and for the most part hadn’t caused any real harm. Mostly they’d just been curious about what they could get access to, and were interested in exploring the systems to which they’d logged-in, and seeing how long they could remain there undetected. These remain common motivations for many hackers to this day.

News media though – after being excited by “hacker” ideas introduced by WarGames – rightly realised that a hacker with the same elementary resources as these teens but with malicious intent could cause significant real-world damage. Bruce Schneier argued last year that the danger of this may be higher today than ever before. The press ran news stories strongly associating the word “hacker” specifically with the focus on the illegal activities in which some hackers engage. The release of Neuromancer the following year, coupled with an increasing awareness of and organisation by hacker groups and a number of arrests on both sides of the Atlantic only fuelled things further. By the end of the decade it was essentially impossible for a layperson to see the word “hacker” in anything other than a negative light. Counter-arguments like The Conscience of a Hacker (Hacker’s Manifesto) didn’t reach remotely the same audiences: and even if they had, the points they made remain hard to sympathise with for those outside of hacker communities.

A lack of understanding about what hackers did and what motivated them made them seem mysterious and otherworldly. People came to make the same assumptions about hackers that they do about magicians – that their abilities are the result of being privy to tightly-guarded knowledge rather than years of practice – and this elevated them to a mythical level of threat. By the time that Kevin Mitnick was jailed in the mid-1990s, prosecutors were able to successfully persuade a judge that this “most dangerous hacker in the world” must be kept in solitary confinement and with no access to telephones to ensure that he couldn’t, for example, “start a nuclear war by whistling into a pay phone”. Yes, really.

Four hands on one keyboard, from CSI: Cyber — Whistling into a phone to start a nuclear war? That makes *CSI: Cyber* seem realistic [watch].

The Future

Every decade’s hackers have debated whether or not the next decade’s have correctly interpreted their idea of “hacker ethics”. For me, Steven Levy’s tenets encompass them best:

Access to computers – and anything which might teach you something about the way the world works – should be unlimited and total.

All information should be free.

Mistrust authority – promote decentralization.

Hackers should be judged by their hacking, not bogus criteria such as degrees, age, race, or position.

You can create art and beauty on a computer.

Computers can change your life for the better.

Hackers: Heroes of the Computer Revolution, Steven Levy

Given these concepts as representative of hacker ethics, I’m convinced that hacking remains alive and well today. Hackers continue to be responsible for many of the coolest and most-important innovations in computing, and are likely to continue to do so. Unlike many other sciences, where progress over the ages has gradually pushed innovators away from backrooms and garages and into labs to take advantage of increasingly-precise generations of equipment, the tools of computer science are increasingly available to individuals. More than ever before, bedroom-based hackers are able to get started on their journey with nothing more than a basic laptop or desktop computer and a stack of freely-available open-source software and documentation. That progress may be threatened by the growth in popularity of easy-to-use (but highly locked-down) tablets and smartphones, but the barrier to entry is still low enough that most people can pass it, and the new generation of ultra-lightweight computers like the Raspberry Pi are doing their part to inspire the next generation of hackers, too.

That said, and as much as I personally love and identify with the term “hacker”, the hacker community has never been less in-need of this overarching label. The diverse variety of types of technologist nowadays coupled with the infiltration of pop culture by geek culture has inevitably diluted only to be replaced with a multitude of others each describing a narrow but understandable part of the hacker mindset. You can describe yourself today as a coder, gamer, maker, biohacker, upcycler, cracker, blogger, reverse-engineer, social engineer, unconferencer, or one of dozens of other terms that more-specifically ties you to your community. You’ll be understood and you’ll be elegantly sidestepping the implications of criminality associated with the word “hacker”.

The original meaning of “hacker” has also been soiled from within its community: its biggest and perhaps most-famous advocate‘s insistence upon linguistic prescriptivism came under fire just this year after he pushed for a dogmatic interpretation of the term “sexual assault” in spite of a victim’s experience. This seems to be absolutely representative of his general attitudes towards sex, consent, women, and appropriate professional relationships. Perhaps distancing ourselves from the old definition of the word “hacker” can go hand-in-hand with distancing ourselves from some of the toxicity in the field of computer science?

(I’m aware that I linked at the top of this blog post to the venerable but also-problematic Eric S. Raymond; if anybody can suggest an equivalent resource by another author I’d love to swap out the link.)

Verdict: The word “hacker” has become so broad in scope that we’ll never be able to rein it back in. It’s tainted by its associations with both criminality, on one side, and unpleasant individuals on the other, and it’s time to accept that the popular contemporary meaning has won. Let’s find new words to define ourselves, instead.

Evolving Computer Words: “Broadband”

This is part of a series of posts on computer terminology whose popular meaning – determined by surveying my friends – has significantly diverged from its original/technical one. Read more evolving words…

Until the 17th century, to “fathom” something was to embrace it. Nowadays, it’s more likely to refer to your understanding of something in depth. The migration came via the similarly-named imperial unit of measurement, which was originally defined as the span of a man’s outstretched arms, so you can understand how we got from one to the other. But you know what I can’t fathom? Broadband.

Broadband Internet access has become almost ubiquitous over the last decade and a half, but ask people to define “broadband” and they have a very specific idea about what it means. It’s not the technical definition, and this re-invention of the word can cause problems.

Broadband

What people think it means

High-speed, always-on Internet access.

What it originally meant

Communications channel capable of multiple different traffic types simultaneously.

The Past

Throughout the 19th century, optical (semaphore) telegraph networks gave way to the new-fangled electrical telegraph, which not only worked regardless of the weather but resulted in significantly faster transmission. “Faster” here means two distinct things: latency – how long it takes a message to reach its destination, and bandwidth – how much information can be transmitted at once. If you’re having difficulty understanding the difference, consider this: a man on a horse might be faster than a telegraph if the size of the message is big enough because a backpack full of scrolls has greater bandwidth than a Morse code pedal, but the latency of an electrical wire beats land transport every time. Or as Andrew S. Tanenbaum famously put it: Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

Telegraph companies were keen to be able to increase their bandwidth – that is, to get more messages on the wire – and this was achieved by multiplexing. The simplest approach, time-division multiplexing, involves messages (or parts of messages) “taking turns”, and doesn’t actually increase bandwidth at all: although it does improve the perception of speed by giving recipients the start of their messages early on. A variety of other multiplexing techniques were (and continue to be) explored, but the one that’s most-interesting to us right now was called acoustic telegraphy: today, we’d call it frequency-division multiplexing.

What if, asked folks-you’ll-have-heard-of like Thomas Edison and Alexander Graham Bell, we were to send telegraph messages down the line at different frequencies. Some beeps and bips would be high tones, and some would be low tones, and a machine at the receiving end could separate them out again (so long as you chose your frequencies carefully, to avoid harmonic distortion). As might be clear from the names I dropped earlier, this approach – sending sound down a telegraph wire – ultimately led to the invention of the telephone. Hurrah, I’m sure they all immediately called one another to say, our efforts to create a higher-bandwidth medium for telegrams has accidentally resulted in a lower-bandwidth (but more-convenient!) way for people to communicate. Job’s a good ‘un.

Electro-acoustic telegraph "tuning fork". — If this part of Edison’s 1878 patent looks like a tuning fork, that’s not a coincidence. These early multiplexers made distinct humming sounds as they operated, owing to the movement of the synchronised forks within.

Most electronic communications systems that have ever existed have been narrowband: they’ve been capable of only a single kind of transmission at a time. Even if you’re multiplexing a dozen different frequencies to carry a dozen different telegraph messages at once, you’re still only transmitting telegraph messages. For the most part, that’s fine: we’re pretty clever and we can find workarounds when we need them. For example, when we started wanting to be able to send data to one another (because computers are cool now) over telephone wires (which are conveniently everywhere), we did so by teaching our computers to make sounds and understand one another’s sounds. If you’re old enough to have heard a fax machine call a landline or, better yet used a dial-up modem, you know what I’m talking about.

As the Internet became more and more critical to business and home life, and the limitations (of bandwidth and convenience) of dial-up access became increasingly questionable, a better solution was needed. Bringing broadband to Internet access was necessary, but the technologies involved weren’t revolutionary: they were just the result of the application of a little imagination.

Dawson can't use the Internet because someone's on the phone. — I’ve felt your pain, Dawson. I’ve felt your pain.

We’d seen this kind of imagination before. Consider teletext, for example (for those of you too young to remember teletext, it was a standard for browsing pages of text and simple graphics using an 70s-90s analogue television), which is – strictly speaking – a broadband technology. Teletext works by embedding pages of digital data, encoded in an analogue stream, in the otherwise-“wasted” space in-between frames of broadcast video. When you told your television to show you a particular page, either by entering its three-digit number or by following one of four colour-coded hyperlinks, your television would wait until the page you were looking for came around again in the broadcast stream, decode it, and show it to you.

Teletext was, fundamentally, broadband. In addition to carrying television pictures and audio, the same radio wave was being used to transmit text: not pictures of text, but encoded characters. Analogue subtitles (which used basically the same technology): also broadband. Broadband doesn’t have to mean “Internet access”, and indeed for much of its history, it hasn’t.

Here in the UK, ISDN (from 1988!) and later ADSL would be the first widespread technologies to provide broadband data connections over the copper wires simultaneously used to carry telephone calls. ADSL does this in basically the same way as Edison and Bell’s acoustic telegraphy: a portion of the available frequencies (usually the first 4MHz) is reserved for telephone calls, followed by a no-mans-land band, followed by two frequency bands of different sizes (hence the asymmetry: the A in ADSL) for up- and downstream data. This, at last, allowed true “broadband Internet”.

But was it fast? Well, relative to dial-up, certainly… but the essential nature of broadband technologies is that they share the bandwidth with other services. A connection that doesn’t have to share will always have more bandwidth, all other things being equal! Leased lines, despite technically being a narrowband technology, necessarily outperform broadband connections having the same total bandwidth because they don’t have to share it with other services. And don’t forget that not all speed is created equal: satellite Internet access is a narrowband technology with excellent bandwidth… but sometimes-problematic latency issues!

ADSL microfilter — Did you have one of these tucked behind your naughties router? This box filtered out the data from the telephone frequencies, helping to ensure that you can neither hear the pops and clicks of your ADSL connection nor interfere with it by shouting.

Equating the word “broadband” with speed is based on a consumer-centric misunderstanding about what broadband is, because it’s necessarily true that if your home “broadband” weren’t configured to be able to support old-fashioned telephone calls, it’d be (a) (slightly) faster, and (b) not-broadband.

The Future

But does the word that people use to refer to their high-speed Internet connection matter. More than you’d think: various countries around the world have begun to make legal definitions of the word “broadband” based not on the technical meaning but on the populist one, and it’s becoming a source of friction. In the USA, the FCC variously defines broadband as having a minimum download speed of 10Mbps or 25Mbps, among other characteristics (they seem to use the former when protecting consumer rights and the latter when reporting on penetration, and you can read into that what you will). In the UK, Ofcom‘s regulations differentiate between “decent” (yes, that’s really the word they use) and “superfast” broadband at 10Mbps and 24Mbps download speeds, respectively, while the Scottish and Welsh governments as well as the EU say it must be 30Mbps to be “superfast broadband”.

I’m all in favour of regulation that protects consumers and makes it easier for them to compare products. It’s a little messy that definitions vary so widely on what different speeds mean, but that’s not the biggest problem. I don’t even mind that these agencies have all given themselves very little breathing room for the future: where do you go after “superfast”? Ultrafast (actually, that’s exactly where we go)? Megafast? Ludicrous speed?

What I mind is the redefining of a useful term to differentiate whether a connection is shared with other services or not to be tied to a completely independent characteristic of that connection. It’d have been simple for the FCC, for example, to have defined e.g. “full-speed broadband” as providing a particular bandwidth.

Verdict: It’s not a big deal; I should just chill out. I’m probably going to have to throw in the towel anyway on this one and join the masses in calling all high-speed Internet connections “broadband” and not using that word for all slower and non-Internet connections, regardless of how they’re set up.

Consume less, create more

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

…

Perhaps three people will read this essay, including my parents. Despite that, I feel an immense sense of accomplishment. I’ve been sitting on buses for years, but I have more to show for my last month of bus rides than the rest of that time combined.

Smartphones, I’ve decided, are not evil. This entire essay was composed on an iPhone. What’s evil is passive consumption, in all its forms.

…

TJCX

This amazing essay really hammers home a major part of why I blog at all. Creating things on the Web is good. Creating things at all is good.

A side-effect of social media culture (repost, reshare, subscribe, like) is that it’s found perhaps the minimum-effort activity that humans can do that still fulfils our need to feel like we’ve participated in our society. With one tap we can pass on a meme or a funny photo or an outrageous news story. Or we can give a virtual thumbs-up or a heart on a friend’s holiday snaps, representing the entirety of our social interactions with them. We’re encouraged to create the smallest, lightest content possible: forty words into a Tweet, a picture on Instagram that we took seconds ago and might never look at again, on Facebook… whatever Facebook’s for these days. The “new ‘netiquette” is complicated.

I, for one, think it’d be a better world if it saw a greater diversity of online content. Instead of many millions of followers of each of a million content creators, wouldn’t it be nice to see mere thousands of each of billions? I don’t propose to erode the fame of those who’ve achieved Internet celebrity; but I’d love to migrate towards a culture in which we can all better support one another’s drive to create original content online. And do so ourselves.

The best time to write on your blog is… well, let’s be honest, it was a decade ago. But the second best time is right now. Or if you’d rather draw, or sing, or dance, or make puzzles or games or films… do that. The barrier to being a content creator has never been lower: publishing is basically free and virtually any digital medium is accessible from even the simplest of devices. Go make something, and share it with the world.

(with thanks to Jeremy for the reshare)

A poem about Silicon Valley, assembled from Quora questions about Silicon Valley

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Why do so many startups fail?
Why are all the hosts on CouchSurfing male?
Are we going to be tweeting for the rest of our lives?
Why do Silicon Valley billionaires choose average-looking wives?

What makes a startup ecosystem thrive?
What do people plan to do once they’re over 35?
Is an income of $160K enough to survive?
What kind of car does Mark Zuckerberg drive?

…

Jason O. Gilbert (Splinter News)

Post-it Note Affirmations and the Amazon Dash

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

The amazon dash is a pinnacle of modern web design. It’s one of the most intrusive, complex, and resource-dependent devices we’ve introduced into our homes, yet it appears as a simple oval with a single button for a single use. The use is absurdly narrow: the button will have a picture of Tide detergent, and when you press the button, Tide detergent is sent to your door.

Zach Mandeville

Barely a week goes by between the times that I discover some horrifically over-engineered “solution” on the Internet. Amazon’s Dash buttons are terrible: disposable (plastic) single-purpose computers that could so easily have been made into something “more” – more-versatile, more-open, more-configurable, more-flexible. Indeed: people have been doing exactly that kind of thing! But the vanilla Dash button remains little more than selling you convenience (and not much convenience, if we’re honest) in exchange for more and more of your feeling of digital freedom. Yet another example of what replaced the Web we lost…

By hiding the technical processes, and simplifying the onboarding and engagement of their services, Amazon can continually reinforce your depression for a profit— and you can get name-brand laundry detergent faster.

Zach Mandeville

Also, can I just take a moment to point out how awesome Zach’s website is. Not only is it the perfect example of how fun and weird the Internet can be and having a mixture of fascinating and curious content, it’s also available via dat:// for those of you who’ve got some love for the datbaseiverse.

Finding Lena Forsen, the Patron Saint of JPEGs

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Every morning, Lena Forsen wakes up beneath a brass-trimmed wooden mantel clock dedicated to “The First Lady of the Internet.”

It was presented to her more than two decades ago by the Society for Imaging Science and Technology, in recognition of the pivotal—and altogether unexpected—role she played in shaping the digital world as we know it.

Among some computer engineers, Lena is a mythic figure, a mononym on par with Woz or Zuck. Whether or not you know her face, you’ve used the technology it helped create; practically every photo you’ve ever taken, every website you’ve ever visited, every meme you’ve ever shared owes some small debt to Lena. Yet today, as a 67-year-old retiree living in her native Sweden, she remains a little mystified by her own fame. “I’m just surprised that it never ends,” she told me recently.

…

Anna Huix (Wired)

While I’m not sure that it’s fair to say that Lena “remained a mystery” until now – the article itself identifies several events she’s attended in her capacity of “first lady of the Internet” – but this is still a great article about a picture that you might have seen but never understood the significance of nor the person in front of the lens. Oh, and it’s pronounced “lee-na”; did you know?

RFC-20

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

…

The choice of this encoding has made ASCII-compatible standards the language that computers use to communicate to this day.

Even casual internet users have probably encountered a URL with “%20” in it where there logically ought to be a space character. If we look at this RFC we see this:
   Column/Row  Symbol      Name

   2/0         SP          Space (Normally Non-Printing)
Hey would you look at that! Column 2, row 0 (2,0; 20!) is what stands for “space”. When you see that “%20”, it’s because of this RFC, which exists because of some bureaucratic decisions made in the 1950s and 1960s.

…

Darius Kazemi (365 RFCs)

Darius Kazemi is reading a single RFC every day throughout 2019 and writing up his understanding as to the content and importance of each. It’s good reading if you’re “into” RFCs and it’s probably pretty interesting if you’re just a casual Internet historian.

Prime and Punishment

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

by Josh Dzieza

For sellers, Amazon is a quasi-state. They rely on its infrastructure — its warehouses, shipping network, financial systems, and portal to millions of customers — and pay taxes in the form of fees. They also live in terror of its rules, which often change and are harshly enforced.

…

…the only way back from suspension is to “confess and repent,” she says, even if you don’t think you’ve done anything wrong. “Amazon doesn’t like to see finger-pointing.”

Josh Dzieza (The Verge)

Suppose you have a competitor on Amazon Marketplace. Based on this article, the following strategies are pretty much fair game and are likely to result in immediate suspension of your competitor’s account:

Posting fake reviews favouring your competitor’s products, then reporting your competitor for manipulating reviews.
Making a copyright claim against your competitor’s username, even though you’d never used it before.
Buying your competitor’s product, setting fire to it, photographing it, and claiming that it did that by itself and is thus unsafe for sale.

Amazon don’t like controversy, so they always side against the seller. A great illustration as to why it’s dangerous when we let companies (like Amazon) have the power of judiciaries without the responsibilities of democracies.

Public

Private

Footnotes

Gemini

Using OpenSSL

Using GnuTLS

Using Ncat

Spartan

Using Telnet

Using cURL

Using Ncat/Netcat

Footnotes

Web standards sometimes disappear

Basic authentication

Credentials in the URL

This is why we can’t have nice things

How different browsers handle basic authentication in URLs

Chromium desktop family

Firefox desktop

Safari desktop

Older browsers

The Command-Line

What’s the status of HTTP (Basic) Authentication?

Hacker

What people think it means

What it originally meant

The Past

The Future

Broadband

What people think it means

What it originally meant

The Past

The Future