Our scanning system wasn’t intended to support this style of notation. Why, then, were we being bombarded with so many ASCII tab ChatGPT screenshots? I was mystified for weeks —
until I messed around with ChatGPT myself and got this:
Turns out ChatGPT is telling people to go to Soundslice, create an account and import ASCII tab in order to hear the audio playback. So that explains it!
…
With ChatGPT’s inclination to lie about the features of a piece of technology, it was
only a matter of time before a frustrated developer actually added a feature that ChatGPT had imagined, just to stop users from becoming dissatisfied when they tried to
use nonexistent tools that ChatGPT told them existed.
And this might be it! This could be the very first time that somebody’s added functionality based on an LLM telling people the feature existed already.
The tl;dr is: the court ruled that (a) piracy for the purpose of training an LLM is still piracy, so there’ll be a separate case about the fact that Anthropic did not pay for copies of
all the books their model ingested, but (b) training a model on books and then selling access to that model, which can then produce output based on what it has “learned” from those
books, is considered transformative work and therefore fair use.
Compelling arguments have been made both ways on this topic already, e.g.:
Some folks are very keen to point out that it’s totally permitted for humans to read, and even memorise, entire volumes, and then use what they’ve learned when they
produce new work. They argue that what an LLM “does” is not materially different from an impossibly well-read human.
By way of counterpoint, it’s been observed that such a human would still be personally liable if the “inspired” output they subsequently created was derivative
to the point of violating copyright, but we don’t yet have a strong legal model for assessing AI output in the same way. (BBC News article about Disney & Universal vs. Midjourney is going to be very interesting!)
Furthermore, it might be impossible to conclusively determine that the way GenAI works is fundamentally comparable to human thought. And that’s the thing that got
me thinking about this particular thought experiment.
A moment of philosophy
Here’s a thought experiment:
Support I trained an LLM on all of the books of just one author (plus enough additional language that it was able to meaningfully communicate). Let’s take Stephen King’s 65 novels and
200+ short stories, for example. We’ll sell access to the API we produce.
I suppose it’s possible that Stephen King was already replaced long ago with an AI that was instructed to churn out horror stories about folks in isolated Midwestern locales being
harassed by a pervasive background evil?
The output of this system would be heavily-biased by the limited input it’s been given: anybody familiar with King’s work would quickly spot that the AI’s mannerisms echoed his writing
style. Appropriately prompted – or just by chance – such a system would likely produce whole chapters of output that would certainly be considered to be a substantial infringement of
the original work, right?
If I make KingLLM, I’m going to get sued, rightly enough.
But if we accept that (and assume that the U.S. District Court for the Northern District of California would agree)… then this ruling on Anthropic would carry a curious implication.
That if enough content is ingested, the operation of the LLM in itself is no longer copyright infringement.
Which raises the question: where is the line? What size of corpus must a system be trained upon before its processing must necessarily be considered transformative
of its inputs?
Clearly, trying to answer that question leads to a variant of the sorites paradox. Nobody can ever say that, for example, an input of twenty million words
is enough to make a model transformative but just one fewer and it must be considered to be perpetually ripping off what little knowledge it has!
But as more of these copyright holder vs. AI company cases come to fruition, it’ll be interesting to see where courts fall. What is fair use and what is infringing?
And wherever the answers land, I’m sure there’ll be folks like me coming up with thought experiments that sit uncomfortably in the grey areas that remain.
It’s so emblematic of the moment we’re in, the Who Cares Era, where completely disposable things are shoddily produced for people to mostly ignore.
…
In the Who Cares Era, the most radical thing you can do is care.
In a moment where machines churn out mediocrity, make something yourself. Make it imperfect. Make it rough. Just make it.
At a time where the government’s uncaring boot is pressing down on all of our necks, the best way to fight back is to care. Care loudly. Tell others. Get going.
…
Smart words, well-written by Dan Sinker.
I like the fact that he correctly identifies that the “Who Cares Era” – illustrated by the bulk creation of low-effort, low-quality media, for a disheartened audience that no longer has
a reason to give a damn – isn’t about AI.
I mean… AI’s certainly not helping! AI slop dominates social media (especially in right-wing
spaces, for retrospectively-obvious reasons) and bleeds out into the mainstream. LLM-generated content, lacking even the slightest human input, is becoming painfully ubiquitous.
It’s pretty sad out there.
So while the “Who Cares Era” might be exemplified by the proliferation of AI slop… it’s much bigger than that. It’s a sociological change, tied perhaps to a growing dissatisfaction with
our governments and the increasing feeling of powerlessness to change the unjust social systems we’re locked into?
I don’t know how to fix it. I don’t even know if it’s fixable. But I agree with Dan’s argument that a great starting point is to care.
And I, for one, am going to continue to create things I care about, giving them the time and attention they deserve. And maybe if enough of us can do that, just that, then
maybe that’ll make the difference.
Sure, it’s gaudy, but it’s got a few things going for it, too.
Let’s put aside for the moment that you can already send my website back into “90s mode” and dive into this take on how I could
present myself in a particularly old-school way. There’s a few things I particularly love:
It’s actually quite lightweight: ignore all the animated GIFs (which are small anyway) and you’ll see that, compared to my current homepage, there are very few
images. I’ve been thinking about going in a direction of less images on the homepage anyway, so it’s interesting to see how it comes together in this unusual context.
The page sections are solidly distinct: they’re a mishmash of different widths, some of which exhibit a horrendous lack of responsivity, but it’s pretty clear where
the “recent articles” ends and the “other recent stuff” begins.
The post kinds are very visible: putting the “kind” of a post in its own column makes it really clear whether you’re looking at an article, note, checkin, etc., much
more-so than my current blocks do.
Maybe there’s something we can learn from old-style web design? No, I’m serious. Stop laughing.
90s web design was very-much characterised by:
performance – nobody’s going to wait for your digital photos to download on narrowband connections, so you hide them behind descriptive links or tiny thumbnails, and
pushing the boundaries – the pre-CSS era of the Web had limited tools, but creators worked hard to experiment with the creativity that was possible within those
limits.
Those actually… aren’t bad values to have today. Sure, we’ve probably learned that animated backgrounds, tables for layout, and mystery meat navigation were horrible for
usability and accessibility, but that doesn’t mean that there isn’t still innovation to be done. What comes next for the usable Web, I wonder?
As soon as you run a second or third website through the tool, its mechanisms for action become somewhat clear and sites start to look “samey”, which is the opposite of what
made 90s Geocities great.
The only thing I can fault it on is that it assumes that I’d favour Netscape Navigator: in fact, I was a die-hard Opera-head for most of the
nineties and much of the early naughties, finally switching my daily driver to Firefox in 2005.
I certainly used plenty of Netscape and IE at various points, though, but I wasn’t a fan of the divisions resulting from the browser wars. Back in the day, I always backed
the ideals of the “Viewable With Any Browser” movement.
You’ve probably come across GeoGuessr already: it’s an online game where you (and friends, if you’ve got them) get dropped into Google Street
View and have two minutes to try to work out where in the world you are and drop a pin on it.
Can you tell where we are, yet?
A great strategy is to “walk around” a little, looking for landmarks, phone numbers, advertisements, linguistic clues, cultural indicators, and so on, narrowing down the region of the
world you think you’re looking at before committing to a country or even a city. You’re eventually scored by how close you are to the actual location.
Cheating at GeoGuessr
I decided to see if ChatGPT can do better than me. Using only the free tier of both GeoGuessr and ChatGPT1, I pasted
screenshots of what I was seeing right into ChatGPT:
ChatGPT confidently assessed the geographic clues, translated some text that it found, and eventually made a guess down to a particular street in St Petersburg.
That’s pretty spooky, right?
The response came back plenty fast enough for me to copy-and-paste the suggested address into Google Maps, get the approximate location, and then drop a pin in the right place in
GeoGuessr. It’s probably one of my most-accurate guesses ever.
This isn’t a one-off fluke. I tried again, this time using only a single photo, rather than one pointing in each direction on the street:
Again, the text recognition and translation capabilities of the AI were highly useful, but it was helped by architectural and cultural clues too.
This time, it wasn’t quite right: the actual location of the photo was Chittagong, not Dhaka, about 200km away.
But that’s still reasonably spectacular from only a single vantage from a single point.
Don’t think I’d have done better, though.
Obviously my approach here was crude, but it’d be relatively easy to, for example, implement a browser wrapper that cheated on-your-behalf: while playing GeoGuessr, you’d just click a
“send to AI” button whenever you fancied and it’d start working in the background, improving precision with each subsequent screenshot (so you could still “walk around” and pass extra
details to it).
And similarly, a little prompt manipulation could have the output include estimated GPS coordinates, allowing the code to place the pin for you while outputting its reasoning
on the side2.
The short of what I’m saying is: if you’re playing GeoGuessr against somebody you don’t trust not to cheat… they might well be cheating. And thanks to the explanatory output, they can
even justify their explanations for the places they dropped pins.
What about other photos?
The other thought that comes to my mind is: could this be used to infer approximate geolocations from photos whose EXIF metadata has been stripped. Because if so, that’s the
kind of thing that’s certain to get exploited by all kind of people who want to know where you are/were at a particular time. For example… law enforcement, stalkers, and
other unsavoury types.
Explanation: The photograph shows a large ship docked at a port with a building that displays the text “WELCOME TO PORT OF SPAIN”. This text is in English,
indicating an English-speaking country. The mention of “Port of Spain” directly specifies the port’s location in the capital city of Trinidad and Tobago. The design and style of
the port facilities, as well as the presence of a ship named “Logos Hope”, which is known to visit many ports globally, further suggests a Caribbean port, aligning with Port of
Spain’s known characteristics.
Location: The Rollright Stones, England, United Kingdom
Explanation: The photograph depicts an area with several ancient standing stones arranged in a circle in a rural setting, characteristic of a stone circle.
These are typically found in certain parts of the UK. The terrain is typical of the British countryside, with a grassy field and surrounding trees. The specific arrangement
and size of the stones, along with the style of fencing and walking paths, are reminiscent of the Rollright Stones, one of Britain’s historic sites known for a circle of
Neolithic or Bronze Age standing stones located in Oxfordshire, England.
Coordinates: 51.9743, -1.5731
✅ Again, spot on. If you’d never been to the Rollright Stones, you probably wouldn’t get that one, because while it’s a distinctive-looking stone circle… it’s not that
distinctive-looking, right? Let’s be honest: it’s some rocks in a field.
Explanation: The photograph prominently features the Galata Tower in the background, a well-known landmark in Istanbul, Turkey. Moreover, the presence of a ferry
or similar boat and the architectural style of the surrounding buildings are indicative of Istanbul’s famous skyline. The signage in Turkish, as suggested by the word partially
visible on the left, also supports this location as Turkey is a predominantly Turkish-speaking country. These elements combined make it highly likely that the photograph was taken
in Istanbul.
Coordinates: 41.0082, 28.9784
✅ The coordinates aren’t exactly right – it’s placed us in the city rather than on a boat, which we clearly are – but it’s not done badly. Spotting the Galata Tower3 was a nice move.
Location: Cardiff Bay, Cardiff, Wales, United Kingdom
Explanation: The photograph features a marina with modern residential buildings, which is characteristic of the Cardiff Bay area in Cardiff, Wales. The
architecture, clear blue skies, and marina setting are reminiscent of Cardiff’s waterfront development. Additionally, the presence of English signage on the boat (“FINBAR”)
supports a location in an English-speaking country. Cardiff Bay is known for its redeveloped waterfront area with modern apartments and leisure facilities surrounding the marina.
The style and design of the building are consistent with developments in Cardiff’s revitalized docklands.
Coordinates: 51.4626, -3.1678
❌ Oops! Nope. I can see the thinking of what it’s claiming, there, but this was actually the Ipswich marina. I went for a walk to take a break from the code I’d been writing
and took this somewhere in the vicinity of the blue plaque for Edward Ardizzone that I’d just spotted (I was
recording a video for my kids, who’ve enjoyed several of his Tim… books).
So I don’t think this is necessarily a game-changer for Internet creeps yet. So long as you’re careful not to post photos in which you’re in front of any national monuments and
strip your EXIF metadata as normal, you’re probably not going to give away where you are quite yet.
Footnotes
1 And in a single-player game only: I didn’t actually want to cheat anybody out
of a legitimate victory!
2 I’m not going to implement GeoCheatr, as I’d probably name it. Unless somebody
feels like paying me to do so: I’m open for freelance work right now, so if you want to try to guarantee the win at the GeoGuessr World Championships (which will involve the much-riskier act of cheating in
person, so you’ll want a secret UI – I’m thinking a keyboard shortcut to send data to the AI, and an in-ear headphone so it can “talk” back to you?), look me up? (I’m mostly
kidding, of course: just because something’s technically-possible doesn’t mean it’s something I want to do, even for your money!)
4 3Camp is Three Rings‘ annual volunteer
get-together, hackathon, and meetup. People come together for an intensive week of making-things-better for charities the world over.
Ok, I’m NOT an immediate fan of “vibe coding” and overusing LLMs in programming. I have a healthy amount of skepticism
about the use of these tools, mostly related to the maintainability of the code, security, privacy, and a dozen other more factors.
But some arguments I’ve seen from developers about not using the tools because it means they “will lose their coding skills” its just bonkers. Especially in a professional context.
Imagine you go to a carpenter, and they say “this will take 2x the time because I don’t use power tools, they make me feel like I’m losing my competence in manual skills”. It’s your
job to deliver software using the most efficient and accurate methods possible.
Sure, it is essential that you keep your skills sharp, but being purposfully less effective in your job to keep them sharp is a red flag. And in an industry made of abstractions to
increase productivity (we’re no longer coding in Assembly last time I checked), this makes even less sense.
/rant
I’m in two minds on this (as I’ve hinted before). The carpenter analogy doesn’t really hold, because the underlying skill of carpentry
is agnostic to whether or not you use power tools: it’s about understanding the material properties of woods, the shapes of joins, the ways structures are strong and where they are
weak, the mathematics and geometry that make design possible… none of which are taken over by power tools.
25+ years ago I wrote most of my Perl/PHP code without an Internet connection. When you wanted to deploy you’d “dial up”, FTP some files around, then check it had worked. In that
environment, I memorised a lot more. Take PHP’s date formatting strings, for example: I used to have them down by heart! And even when I didn’t, I knew approximately the right spot to
flip the right book open to that I’d be able to look it up quickly.
“Always-on” broadband Internet gradually stole that skill from me. It’s so easy for me to just go to the right page on php.net and have the answer I need right in front of me! Nowadays, I depend
on that Internet connection (I don’t even have the book any more!).
A power tool targets a carpenter’s production speed,
not their knowledge-recovery speed.
Will I experience the same thing from my LLM usage, someday?
It was a bit… gallows humour… for a friend to share this website with me, but it’s pretty funny.
And also: a robot that “schedules a chat” to eject you from your job and then “delivers the news with the emotional depth of a toaster” might still have been preferable to an
after-hours email to my personal address to let me know that I’d just had my last day! Maybe I’m old-fashioned, but there’s some news that
email isn’t the medium for, right?
There is a lot of smoke in the work-productivity AI space. I believe there is (probably) fire there somewhere. But I haven’t been able to find it.
…
I find AI assistants useful, just less so than other folks online. I’m glad to have them as an option but am still on the lookout for a reason to pay $20/month for a premium plan.
If that all resonants and you have some suggestions, please reach out. I can be convinced!
…
I’m in a similar position to Sean. I enjoy Github Copilot, but not enough that I would pay for it out of my own pocket (like him, I get it for free, in my case because I’m associated
with a few eligible open source projects). I’ve been experimenting with Cursor and getting occasionally good results, but again: I wouldn’t have paid for it myself (but my employer is
willing to do so, even just for me to “see if it’s right for me”, which is nice).
I think this is all part of what I was complaining about yesterday, and what Sean describes as “a lot of smoke”. There’s so much hype around AI
technologies that it takes real effort to see through it all to the actual use-cases that exist in there, somewhere. And that’s the effort required before you
even begin to grapple with questions of cost, energy usage, copyright ethics and more. It’s a really complicated space!
There are things it does well-enough, and much faster than a human, that it’s certainly not useless:
indeed, I’ve used it for a variety of things from the practical to the silly to the sneaky, and many more activities besides 1.
I routinely let an LLM suggest autocompletion, and I’ve experimented with having it “code for me” (with the caveat that I’m going to end up re-reading it all anyway!).
But I’m still not sure whether that, on the balance of things, GenAI represents a net benefit. Time will tell, I suppose.
And like Paul, I’m sick of “the pervasive, all encompassing nature of it”. I never needed AI integration in NOTEPAD.EXE before, and I still don’t need it now! Not
everything needs to be about AI, just because it’s the latest hip thing. Remember when everybody was talking about how everything belonged on the blockchain (it doesn’t): same
energy. Except LLMs are more-accessible to more-people, thanks to things like ChatGPT, so the signal-to-noise ratio in the hype machine is much, much worse. Nowadays, you actually have
to put significant effort in if you want to find the genuinely useful things that AI does, amongst all of the marketing crap that surrounds it.
Footnotes
1 You’ll note that I specifically don’t make use of it for writing any content
for this blog: the hallucinations and factual errors you see here are genuine organic human mistakes!
Eleven years ago, comedy sketch The Expert had software engineers (and other misunderstood specialists) laughing to
tears at the relatability of Anderson’s (Orion Lee) situation: asked to do the literally-impossible by people who don’t understand
why their requests can’t be fulfilled.
Decades ago, a client wanted their Web application to automatically print to the user’s printer, without prompting. I explained that it was impossible because “if a website could print
to your printer without at least asking you first, everybody would be printing ads as you browsed the web”. The client’s response: “I don’t need you to let everybody
print. Just my users.”1
So yeah, I was among those that sympathised with Anderson.
In the sketch, the client requires him to “draw seven red lines, all of them strictly perpendicular; some with green ink and some with transparent”. He (reasonably) states that this is
impossible2.
Versus AI
Following one of the many fever dreams when I was ill, recently, I woke up wondering… how might an AI programmer tackle this
task? I had an inkling of the answer, so I had to try it:
Aside from specifying that I want to use JavaScript and a <canvas> element3, the
question is the same as in the sketch.
When I asked gpt-4o to assist me, it initially completely ignored the perpendicularity requirement.
Drawing all of the lines strictly parallel to one another was… well, the exact opposite of what was asked for, although it was at least possible.
Let’s see if it can do better, with a bit of a nudge:
This is basically how I’d anticipated the AI would respond: eager to please, willing to help, and with an eager willingness that completely ignored the infeasibility of the task.
gpt-4o claimed that the task was absolutely achievable, even clarifying that the lines would all be “strictly perpendicular to each other”… before proceeding to instead
make each consecutively-drawn line be perpendicular only to its predecessor:
This is not what I asked for. But more importantly, it’s not what I wanted. (But it is pretty much what I expected.)
You might argue that this test is unfair, and it is. But there’s a point that I’ll get to.
But first, let me show you how a different model responded. I tried the same question with the newly-released Claude 3.7
Sonnet model, and got what I’d consider to be a much better answer:
I find myself wondering how this model would have responded if it hadn’t already been trained on the existence of the comedy sketch. The answer that (a) it’s impossible but
(b) here’s a fun bit of code that attempts to solve it anyway is pretty-much perfect, but would it have come up with it on a truly novel (but impossible) puzzle?
In my mind: an ideal answer acknowledges the impossibility of the question, or at least addresses the supposed-impossibility of it. Claude 3.7 Sonnet did well here,
although I can’t confirm whether it did so because it had been trained on data that recognised the existence of “The Expert” or not (it’s clearly aware of the sketch, given its
answer).
Suppose I didn’t know that it was impossible to make seven lines perpendicular to one another in anything less than seven-dimensional space. If that were the case, it’d
be tempting to accept an AI-provided answer as correct, and ship it. And while that example is trivial (and at least a little bit silly), it’s the kind of thing that, I have no doubt,
actually happens in other areas.
Chatbots eagerness to provide a helpful answer, even if no answer is possible, is a huge liability. The other week, I experimentally asked Claude 3.5 for assistance with a
PHPUnit mocking challenge and it provided a whole series of answers… that were completely invalid! It later turned out that what I was trying to achieve was
impossible5.
Given that its answers clearly didn’t-work there was no risk I’d have shipped it anyway, but I’m certain that there exist developers who’ve asked a chatbot for help in a domain they
didn’t understood and accepted its answer while still not understanding it, which feels to me like a quick route to introducing into your code a bug that happy-path testing
won’t reveal. (Y’know, something like a security vulnerability, or an accessibility failure, or whatever.)
Code assisting AI remains really interesting and occasionally useful… but it’s also a real minefield and I see a lot of naiveté about its limitations.
Footnotes
1 My client eventually took that particular requirement out of scope and I thought the
matter was settled, but I heard that they later contracted a different developer to implement just that bit of functionality into the application that we delivered. I never
checked, but I think that what they delivered exploited ActiveX/Java applet vulnerabilities to achieve the goal.
2 Nerds gotta nerd, and so there’s been endless debate on the Internet about whether the
task is truly impossible. For example, when I first saw the video I was struck by the observation that perpendicularity within a set of lines is limited linearly by the
number of dimensions you’re working in, so it’s absolutely possible to model seven lines all perpendicular to one another… if you’re working in seven dimensions. But let’s put that
all aside for a moment and assume the task is truly impossible within some framework unspecified-but-implied within the universe of the sketch, ‘k?
3 Two-dimensionality feels like a fair assumed constraint, given that in the sketch
Anderson tries to demonstrate the challenges of the task by using a flip-chart.
4 I also don’t use AI to produce anything creative that I then pass off as my own,
because, y’know, many of these models don’t seem to respect copyright. You won’t find any AI-written content on my blog, for example, except specifically to demonstrate AI’s
capabilities (or lack thereof) when discussing AI, and this is always be clearly labelled. But that’s another question.
5 In fact, I was going about the problem itself in entirely the wrong way: some minor
refactoring later and I had some solid unit tests that fit the bill, and I didn’t need to do the impossible. But the AI never clocked that, and I suspect it never would have.
5. If you use AI, you are the one who is accountable for whatever you produce with it. You have to be certain that whatever you produced was correct. You cannot ask the system
itself to do this. You must either already be expert at the task you are doing so you can recognise good output yourself, or you must check through other, different means the
validity of any output.
…
9. Generative AI produces above average human output, but typically not top human output. If you overuse generative AI you may produce more mediocre output than you are capable of.
…
I was also tempted to include in 9 as a middle sentence “Note that if you are in an elite context, like attending a university, above average for humanity widely could be below
average for your context.”
Point 5 is a reminder that, as I’ve long said, you can’t trust an AI to do anything that you can’t do for yourself. I
sometimes use a GenAI-based programming assistant, and I can tell you this – it’s really good for:
Fancy autocomplete: I start typing a function name, it guesses which variables I’m going to be passing into the function or that I’m going to want to loop through the
output or that I’m going to want to return-early f the result it false. And it’s usually right. This is smart, and it saves me keypresses and reduces the embarrassment of mis-spelling
a variable name1.
Quick reference guide: There was a time when I had all of my PHP DateTimeInterface::format character codes memorised. Now I’d have to look them up. Or I can write a comment (which I should anyway, for the next human) that says something like //
@returns String a date in the form: Mon 7th January 2023 and when I get to my date(...) statement the AI will already have worked out that the format is 'D
jS F Y' for me. I’ll recognise a valid format when I see it, and I’ll be testing it anyway.
Boilerplate: Sometimes I have to work in languages that are… unnecessarily verbose. Rather than writing a stack of setters and getters, or laying out a repetitive
tree of HTML elements, or writing a series of data manipulations that are all subtly-different from one another in ways that are obvious once they’ve been explained to you… I can just
outsource that and then check it2.
Common refactoring practices: “Rewrite this Javascript function so it doesn’t use jQuery any more” is a great example of the kind of request you can throw at an LLM.
It’s already ingested, I guess, everything it could find on StackOverflow and Reddit and wherever else people go to bemoan being stuck with jQuery in their legacy codebase. It’s not
perfect – just like when it’s boilerplating – and will make stupid mistakes3
but when you’re talking about a big function it can provide a great starting point so long as you keep the original code alongside, too, to ensure it’s not removing any
functionality!
Other things… not so much. The other day I experimentally tried to have a GenAI help me to boilerplate some unit tests and it really failed at it. It determined pretty quickly,
as I had, that to test a particular piece of functionality need to mock a function provided by a standard library, but despite nearly a dozen attempts to do so, with copious prompting
assistance, it couldn’t come up with a working solution.
Overall, as a result of that experiment, I was less-effective as a developer while working on that unit test than I would have been had I not tried to get AI assistance: once I
dived deep into the documentation (and eventually the source code) of the underlying library I was able to come up with a mocking solution that worked, and I can see why the AI failed:
it’s quite-possibly never come across anything quite like this particular problem in its training set.
Solving it required a level of creativity and a depth of research that it was simply incapable of, and I’d clearly made a mistake in trying to outsource the problem to it. I was able to
work around it because I can solve that problem.
But I know people who’ve used GenAI to program things that they wouldn’t be able to do for themselves, and that scares me. If you don’t understand the code your tool has
written, how can you know that it does what you intended? Most developers have a blind spot for testing and will happy-path test their code without noticing if they’ve
introduced, say, a security vulnerability owing to their handling of unescaped input or similar… and that’s a problem that gets much, much worse when a “developer” doesn’t even
look at the code they deploy.
Security, accessibility, maintainability and performance – among others, I’ve no doubt – are all hard problems that are not made easier when you use an AI to write code that
you don’t understand.
Footnotes
1 I’ve 100% had an occasion when I’ve called something $theUserID in one
place and then $theUserId in another and not noticed the case difference until I’m debugging and swearing at the computer
2 I’ve described the experience of using an LLM in this way as being a little like having
a very-knowledgeable but very-inexperienced junior developer sat next to me to whom I can pass off the boring tasks, so long as I make sure to check their work because they’re so
eager-to-please that they’ll choose to assume they know more than they do if they think it’ll briefly impress you.
3 e.g. switching a selector from $(...) to
document.querySelector but then failing to switch the trailing .addClass(...) to .classList.add(...)– you know: like an underexperienced but
eager-to-please dev!
I don’t believe AI will replace software developers, but it will exponentially boost their productivity. The more I talk to developers, the more I hear the same thing—they’re now
accomplishing in half the time what used to take them days.
But there’s a risk… Less experienced developers often take shortcuts, relying on AI to fix bugs, write code, and even test it—without fully understanding what’s happening under the
hood. And the less you understand your code, the harder it becomes to debug, operate, and maintain in the long run.
So while AI is a game-changer for developers, junior engineers must ensure they actually develop the foundational skills—otherwise, they’ll struggle when AI can’t do all the heavy
lifting.
Eduardo picks up on something I’ve been concerned about too: that the productivity boost afforded to junior developers by AI does not provide them with the necessary experience to be
able to continue to advance their skills. GenAI for developers can be a dead end, from a personal development perspective.
That’s a phenomenon not unique to AI, mind. The drive to have more developers be more productive on day one has for many years lead to an increase in developers who are hyper-focused on
a very specific, narrow technology to the exclusion even of the fundamentals that underpin them.
When somebody learns how to be a “React developer” without understanding enough about HTTP to explain which bits of data exist on the server-side and which are delivered to the client,
for example, they’re at risk of introducing security problems. We see this kind of thing a lot!
There’s absolutely nothing wrong with not-knowing-everything, of course (in fact, knowing where the gaps around the edges of your knowledge are and being willing to work to fill them
in, over time, is admirable, and everybody should be doing it!). But until they learn, a developer that lacks a comprehension of the fundamentals on which they depend needs to
be supported by a team that “fill the gaps” in their knowledge.
AI muddies the water because it appears to fulfil the role of that supportive team. But in reality it’s just regurgitating code synthesised from the fragments it’s read in the
past without critically thinking about it. That’s fine if it’s suggesting code that the developer understands, because it’s like… “fancy autocomplete”, which you can
accept or reject based on their understanding of the domain. I use AI in exactly this way many times a week. But when people try to use AI to fill the “gaps” at the edge of their
knowledge, they neither learn from it nor do they write good code.
I’ve long argued that as an industry, we lack a pedagogical base: we don’t know how to teach people to do what we do (this is evidenced by the relatively
high drop-out rate on computer science course, the popular opinion that one requires a particular way of thinking to be a programmer, and the fact that sometimes people who fail to
learn programming through paradigm are suddenly able to do so when presented with a different one). I suspect that AI will make this problem worse, not better.
I’ve a notion that during 2025 I might put some effort into tidying up the tagging taxonomy on my blog. There’s a few tags that are duplicates (e.g.
ai and artificial intelligence) or that exhibit significant overlap (e.g. dog and dogs), or that were clearly created when I
speculated I’d write more on the topic than I eventually did (e.g. homa night, escalators1,
or nintendo) or that are just confusing and weird (e.g. not that bacon sandwich picture).
One part of such an effort might be to go back and retroactively add tags where they ought to be. For about the first decade of my blog, i.e. prior to around 2008, I rarely used tags to
categorise posts. And as more tags have been added it’s apparent that many old posts even after that point might be lacking tags that perhaps they ought to have2.
I remain sceptical about many uses of (what we’re today calling) “AI”, but one thing at
which LLMs seem to do moderately well is summarisation3. And isn’t tagging and categorisation only a stone’s throw away from
summarisation? So maybe, I figured, AI could help me to tidy up my tagging. Here’s what I was thinking:
Tell an LLM what tags I use, along with an explanation of some of the quirkier ones.
Train the LLM with examples of recent posts and lists of the tags that were (correctly, one assumes) applied.
Give it the content of blog posts and ask what tags should be applied to it from that list.
Script the extraction of the content from old posts with few tags and run it through the above, presenting to me a report of what tags are recommended (which could then be coupled
with a basic UI that showed me the post and suggested tags, and “approve”/”reject” buttons or similar.
Extracting training data
First, I needed to extract and curate my tag list, for which I used the following SQL4:
SELECTCOUNT(wp_term_relationships.object_id) num, wp_terms.slug FROM wp_term_taxonomy
LEFTJOIN wp_terms ON wp_term_taxonomy.term_id = wp_terms.term_id
LEFTJOIN wp_term_relationships ON wp_term_taxonomy.term_taxonomy_id = wp_term_relationships.term_taxonomy_id
WHERE wp_term_taxonomy.taxonomy ='post_tag'AND wp_terms.slug NOTIN (
-- filter out e.g. 'rss-club', 'published-on-gemini', 'dancast' etc.-- these are tags that have internal meaning only or are already accurately applied'long', 'list', 'of', 'tags', 'the', 'ai', 'should', 'never', 'apply'
)
GROUPBY wp_terms.slug
HAVING num >2-- filter down to tags I actually routinely useORDERBY wp_terms.slug
Many of my tags are used for internal purposes; e.g. I tag posts published on gemini if they’re to appear on gemini://danq.me/ and
dancast if they embed an episode of my podcast. I filtered these out because I never want the AI to suggest applying them.
I took my output and dumped it into a list, and skimmed through to add some clarity to some tags whose purpose might be considered ambiguous, writing my explanation of each in
parentheses afterwards. Here’s a part of the list, for example:
I used that list as the basis for the system message of my initial prompt:
Suggest topical tags from a predefined list that appropriately apply to the content of a given blog post.
# Steps
1. **Read the Blog Post**: Carefully read through the provided content of the blog post to identify its main themes and topics.
2. **Analyse Key Aspects**: Identify key topics, themes, or subjects discussed in the blog post.
3. **Match with Tags**: Compare these identified topics against the list of available tags.
4. **Select Appropriate Tags**: Choose tags that best represent the main topics and themes of the blog post.
# Output Format
Provide a list of suggested tags. Each tag should be presented as a single string. Multiple tags should be separated by commas.
# Allowed Tags
Tags that can be suggested are as follows. Text in parentheses are not part of the tag but are a description of the kinds of content to which the tag ought to be applied:
- aberdyfi
- aberystwyth
- ...
- youtube
- zoos
# Examples
**Input:**
The rapid advancement of AI technology has had a significant impact on my industry, even on the ways in which I write my blog posts. This post, for example, used AI to help with tagging.
**Output:**
ai, technology, blogging, meta, work
...(other examples)...
# Notes
- Ensure that all suggested tags are relevant to the key themes of the blog post.
- Tags should be selected based on their contextual relevance and not just keyword matching.
This system prompt is somewhat truncated, but you get the idea.
That post already has the following tags (but this wasn’t disclosed to the AI in its training set; it had to work from scratch): children, language, languages (a bit of a redundancy there!), spain, and unicode.
Testing it out
Let’s see what the AI suggests:
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json"\
-H "Authorization: Bearer $OPENAI_TOKEN"\
-d '{ "model": "gpt-4o-mini", "messages": [ { "role": "system", "content": [ { "type": "text", "text": "[PROMPT AS DESCRIBED ABOVE]" } ] }, { "role": "user", "content": [ { "type": "text", "text": "My 8-year-old asked me \"In Spanish, I need to use an upside-down interrobang at the start of the sentence‽\" I assume the answer is yes A little while later, I thought to check whether Unicode defines a codepoint for an inverted interrobang. Yup: ‽ = U+203D, ⸘ = U+2E18. Nice. And yet we dont have codepoints to differentiate between single-bar and double-bar \"cifrão\" dollar signs..." } ] } ], "response_format": { "type": "text" }, "temperature": 1, "max_completion_tokens": 2048, "top_p": 1, "frequency_penalty": 0, "presence_penalty": 0}'
Running this via command-line curl meant I quickly ran up against some Bash escaping issues, but set +H and a little massaging of the blog post content
seemed to fix it.
GPT-4o-mini
When I ran this query against the gpt-4o-mini model, I got back: unicode, language, education, children, symbols.
That’s… not ideal. I agree with the tags unicode, language, and children, but this isn’t really abouteducation. If I tagged
everything vaguely educational on my blog with education, it’d be an even-more-predominant tag than geocaching is! I reserve that tag for things that relate
specifically to formal education: but that’s possibly something I could correct for with a parenthetical in my approved tags list.
symbols, though, is way out. Sure, the post could be argued to be something to do with symbols… but symbols isn’t on the approved tag list in
the first place! This is a clear hallucination, and that’s pretty suboptimal!
Maybe a beefier model will fare better…
GPT-4o
I switched gpt-4o-mini for gpt-4o in the command above and ran it again. It didn’t take noticeably longer to run, which was pleasing.
The model returned: children, language, unicode, typography. That’s a big improvement. It no longer suggests education,
which was off-base, nor symbols, which was a hallucination. But it did suggest typography, which is a… not-unreasonable suggestion.
Neither model suggested spain, and strictly-speaking they were probably right not to. My post isn’t about Spain so much as it’s about Spanish. I don’t
have a specific tag for the latter, but I’ve subbed in the former to “connect” the post to ones which are about Spain, but that might not be ideal. Either way: if this is how
I’m using the tag then I probably ought to clarify as such in my tag list, or else add a note to the system prompt to explain that I use place names as the tags for posts about
the language of those places. (Or else maybe I need to be more-consistent in my tagging).
I experimented with a handful of other well-tagged posts and was moderately-satisfied with the results. Time for a more-challenging trial.
This time, with feeling…
Next, I decided to run the code against a few blog posts that are in need of tags. At this point, I wasn’t quite ready to implement a UI, so I just adapted my little hacky Bash
script and copy-pasted HTML-stripped post contents directly into it.
If it worked, I decided, I could make a UI. Until then, the command line was plenty sufficient.
In this post, I shared that my grandmother and my coworker had (independently) been taken into hospital. It had no tags whatsoever.
The AI suggested the tags hospital, family, injury, work, weddings, pub, humour. Which at
a glance, is probably a superset of the tags that I’d have considered, but there’s a clear logic to them all.
It clearly picked out weddings based on a throwaway comment I made about a cousin’s wedding, so I disagree with that one: the post isn’t strictly about weddings
just because it mentions one.
pub could go either way. It turns out my coworker’s injury occurred at or after a trip to the pub the previous night, and so its relevance is somewhat unknowable from this
post in isolation. I think that’s a reasonable suggestion, and a great example of why I’d want any such auto-tagging system to be a human assistant (suggesting
candidate tags) and not a fully-automated system. Interesting!
Finally, you might think of humour as being a little bit sarcastic, or maybe overly-laden with schadenfreude. But the blog post explicitly states that my coworker
“carefully avoided saying how he’d managed to hurt himself, which implies that it’s something particularly stupid or embarrassing”, before encouraging my friends to speculate on it.
However, it turns out that humour isn’t one of my existing tags at all! Boo, hallucinating AI!
I ended up applying all of the AI’s suggestions except weddings and humour. I also applied smartdata, because that’s where I worked (the AI couldn’t have been expected to guess that without context, though!).
This post talked about Ash and I’s travels around the UK to see REM and Green Day in concert5 and to the National Science Museum in London where I discovered that Ash was prejudiced towards…
carrot cake.
The AI suggested: concerts, travel, music, preston, london, science museum, blogging.
Those all seemed pretty good at a first glance. Personally, I’d forgotten that we swung by Preston during that particular grand tour until the AI suggested the tag, and then I had to
look back at the post more-carefully to double-check! blogging initially seemed like a stretch given that I was only blogging about not having blogged much, but on
reflection I think I agree with the robot on this one, because I did explicitly link to a 2002 page that fell off the Internet only a few years ago aboutthe pointlessness of blogging. So I think it counts.
I was able to verify that I’d been in Preston with thanks to this contemporaneous photo. I have no further explanation for the content of the photo, though.
science museum is a big fail though. I don’t use that tag, but I do use the tag museum. So close, but not quite there, AI!
I applied all of its suggestions, after switching museum in place of science museum.
I wrote this blog post in celebration of having managed to hack together some stuff to help me remote-control my PC from my phone via Bluetooth, which back then used to be a challenge,
in the hope that this would streamline pausing, playing, etc. at pizza-distribution-time at Troma Night, a weekly film night I hosted back then.
If you were sat on that sofa, fighting your way past other people and a mango-chutney-barrel-cum-table to get to a keyboard was genuinely challenging!
It already had the tag technology, which it inherited from a pre-tagging evolution of my blog which used something akin to categories (of which only one
could be assigned to a post). In addition to suggesting this, the AI also picked out the following options: bluetooth, geeky, mobile, troma
night, dvd, technology, and software.
The big failure here was dvd, which isn’t remotely one of my tags (and probably wouldn’t apply here if it were: this post isn’t about DVDs; it barely even mentions
them). Possibly some prompt engineering is required to help ensure that the AI doesn’t make a habit of this “include one tag not from the approved list, every time” trend.
Apart from that it’s a pretty solid list. Annoyingly the AI suggested mobile, which isn’t an approved tag, instead of mobiles, which is. That’s probably a
tokenisation fault, but it’s still annoying and a reminder of why even a semi-automated “human-checked” system would need a safety-check to ensure that no absent tags are
allowed through to the final stage of approval.
This post!
As a bonus experiment, I tried running my code against a version of this post, but with the information about the AI’s own prompt and the examples removed (to reduce the risk
of confusion). It came up with: ai, wordpress, blogging, tags, technology, automation.
All reasonable-sounding choices, and among those I’d made myself… except for tags and automation which, yet again, aren’t among tags that I use. Unless this
tendency to hallucinate can be reined-in, I’m guessing that this tool’s going to continue to have some challenges when used on longer posts like this one.
Conclusion and next steps
The bottom line is: yes, this is a job that an AI can assist with, but no, it’s not one that it can do without supervision. The laser-focus with which gpt-4o was able to
pick out taggable concepts, faster than I’d have been able to do for the same quantity of text, shows that there’s potential here, but it’s not yet proven itself enough of a time-saver
to justify me writing a fluffy UI for it.
However, I might expand on the command-line tools I’ve been using in order to produce a non-interactive list of tagging suggestions, and use that to help inform my work as I tidy up the
tags throughout my blog.
You still won’t see any “AI-authored” content on this site (except where it’s for the purpose of talking about AI-generated content, and it’ll always be clearly labelled), and
I can’t see that changing any time soon. But I’ll admit that there might be some value in AI-assisted curation and administration, so long as there’s an informed human in the loop at
all times.
Footnotes
1 Based on my tagging, I’ve apparently only written about escalators once, while playing Pub Jenga at Robin‘s 21st birthday party. I can’t imagine why I thought it deserved a tag.
2 There are, of course, various other people trying similar approaches to this and similar
problems. I might have tried one of them, were it not for the fact that I’m not quite as interested in solving the problem as I am in understanding how one might use an AI to
solve the problem. It’s similar to how I don’t enjoy doing puzzles like e.g. sudoku as much as I enjoy writing software that optimises for solving such puzzles. See also, for
example, how I beat my children at Mastermind or what the hardest word in Hangman is
or my variousattempts to avoid doing online jigsaws.
Okay, is every company using AI to fuck up their mailing lists, now?
For the second time this week I’ve received an email from a company whom I’d explicitly demanded cease processing my PII. This time,
in February they outright told me that they’d deleted my data and sent screenshots to “prove” it (which was already after they’d failed to unsubscribe me from their mailing lists in the
first place).
Just received an unsolicited marketing email from a company that I asked to remove me from their systems eleven and a half years ago (!).
I haven’t heard from them since then.
The marketing email is sent using a platform that “uses AI to help you seamlessly connect with your customers”. By which I assume the platform means “we’ll crawl your email history to
find anybody we can possibly spam”.
Just gonna go grab my GDPR/DPA2018 beating-stick. This is gonna be a fun one.