Because I have access to wp-config.php, I added the following to my file:
define( 'WP_AI_SUPPORT', false );
…
A useful tip.
Personally, I’ve got what feels like an even-better approach (for me, at least) I switched to ClassicPress a year and a bit ago, and haven’t
looked back! It’s a stripped-down fork of WordPress with no Gutenberg, lighter JavaScript, and a handful of other features… plusClassicPress is already AI-free and staying that way.
This isn’t to say that you can’t use AI with ClassicPress. Just that you’re not having to install the feature if you’re never going to use it. With WordPress’s good plugin architecture
it seems strange to me that such divisive features would become part of the core product, but that just seems to be the direction that the project’s been going in for a while now.
No surprises here, but it’s interesting/staggering to see quite how large the disparity between spending and profit is for some of these companies.
I enjoy the fact that there’s a real-time ticker on the site so you can watch Amazon (for example) burn five thousand dollars a second.
When I tell people that generative AI, as it’s currently used, is unsustainable, this is what I’m talking about. Unless there’s a quantum leap in AI efficiency (for which I’ve seen no
evidence of the feasibility) or a dramatic increase in the charged cost of LLM services (on the order of a tenfold increase assuming the increased cost does not drive any customers
away; more if it does), this whole thing looks like a house of cards.
A lot of the AI bubble – and that’s what it is, for all there are useful things inside there – is based on “Invest now, because when it works it’ll be fantastic!” rhetoric that’s
like investing in a mainframe company in the late 60s on the basis that smartphones will take over the world. We’re moving a lot faster than mainframes went to PCs, but it’s
important to invest in the things you can do with the system that work *now*.
There isn’t a good consumer use for AI right now. ChatGPT is a terrible source of information, confidently wrong in a way that sounds human enough to cause delusion and psychosis.
Things that AI/LLM tech is good for right now – pattern matching, repetitive tasks, logic flow – have some great business cases (It’s made some amazing breakthroughs in satellite
and medical imagery, it’s got a bright future in automated transcription), and I think there’s a good case for it in content moderation (Yeah, it’s not great at it, but given the
sick shit content mods on Facebook have had to deal with has
given them cPTSD, I strongly believe it should be a machine job). It’s use for writing, music, translation, or art is still at the very least questionable and at the most
utterly immoral.
…
Well-said, Aquarion!
The current generation of Generative AI isn’t useless. But its uses are quite specific and it certainly does
more-harm-than-good that it’s promoted as an “everything” solution to every problem. I’ve used some form of agentic coding for several years, mostly of the “spicy autocomplete”
variety1,
and I mostly agree with Aquarion’s observations.
The whole post is an enjoyable tale.
Footnotes
1 My experiments with “vibe coding” have shown me that AI working alone can produce
usually-functional code to specification, but that code is often of low quality and rarely maintainable, even by the AI.
This post contains and links to (clearly-identified) AI-generated content. As remains the case, none of my writing on this blog was generated by AI.
Imagine my excitement to learn that Pagan Wander Lu just dropped a new EP, Built In Obsolescence. And then imagine my horror to discover that it’s actually produced by P-AI-gan
Wanderer Lu; an AI that’s been given PWL lyrics and some artistic direction.
Wot.
The album art’s clearly also AI-generated, and that’s… well… you know. At least this robot hand has got the correct number of fingers.
Nothingness is what silicon dreams
My younger child’s been getting into PWL in a big way lately. As a result of this, I ended up making time for a careful re-listen to a lot of the back catalogue. This in turn inspired
a blog post last year in which I mentioned that Checker Charlie‘s observations about humans
replacing their work with machine effort feels increasingly prophetic in the age of generative AI. That’s something I didn’t see in it when I first reviewed it 13 years prior.
I’ve played with AI-generated music a couple of times myself, of course,
mostly as an academic exercise. And it’s becoming more and more apparent that it’s hard to avoid bumping into it in the “real world”.
Early efforts at AI music were pretty unconvincing, always sounding a bit auto-tuney, frequently struggling to stress lines in the right places, and tripping over themselves when they
try to do anything even remotely more-interesting than a simple repeating melody atop a predictable chord sequence. But they’re getting… shall we say… “better”, and there have been
times nowadays when I’ve gotten some way through a track before realising that I’m listening to AI.
At least PWL’s being honest about it and declaring at the outset that this is AI-generated art. There’s plenty of folks using AI to generate content online and not
declaring it, which is pretty awful1.
Anyway: in this EP the AI’s moderately well-concealed and listening casually to most of the tracks I wouldn’t have noticed it if I hadn’t been told2.
Is there life enough in these chords?
So I listened to the EP. Three times.
The cover of Checker Charlie, I’m sad to admit, works. It’s got the feel of early-nineties pop, full of synths and saccharine, but instead of insipid lyrics about
love it benefits a lot from Andy’s lyrical prowess. It’s a bouncy bop that would be forgettable if it weren’t for the excellent story told by the words is, I suppose, what
I mean to say. And, of course, it’s the song that would have made me think about this. Anyway: I enjoyed it and would absolutely listen to it again, and I don’t know what
that says about me, about the song, or anything else.
Uncanny Valley doesn’t work as well. Musically, it feels like a new artist in 2012 drew inspiration from their dad’s new wave albums but wanted to make it sound more like Carly
Rae Jepsen was collabing with Daft Punk. And the result is kind-of…flat? Could I even say… soulless? It feels like it might have been the B-side of their cover
of Chemicals Like You, which rolls out next in the same vein. Twice was probably enough for these two.
Repetition 4 is among my favourite – let’s say top 15? – Pagan Wanderer Lu songs and the AI’s cover of it starts so strong. It finishes pretty strong too. The
voice it’s chosen shows only a hint of uncanny-valley-autotune and it wails plaintively. The most human-made bits – the lyrical themes of fighting for creativity against your own
struggles as a vulnerable and flawed human “machine” – remain solid. I really expected to love this one! But by the time we were half way through the song it felt… musically-repetitive.
You know when you get a pop cover of a classic song sometimes3 and you feel like the cover artist… missed the point somehow? That’s what this feels
like to me.
The repetitions of “we are all machines… for dancing” in the original felt meaningful and real; a human’s cathartic resignation to pleasure in the simple things we all enjoy, despite
the challenges of life… but the AI cover adds this kind of doo-woppy backing vocals that subtract, rather than adding to, the meaning. I’m not saying it ruins it –
it’s still a fun and bouncy version of a great song… but it’s one of those covers that leaves you longing for the original.
And then there’s the “unaligned version” of Uncanny Valley. I’m not sure if the introduced distortions in this version are AI-generated or not. They
don’t feel like the kinds of “creative” choices that any AI I’ve played with would make, so I suspect this represents a closer human intervention in the AI’s process:
humans imitating machines imitating humans, perhaps? Anyway: the change doesn’t add anything for me.
Had this been produced entirely by a human, I’d say that EP consists one one track I’d add to my everyday playlist (the cover of Checker Charlie), maybe one or two
tracks that I “wouldn’t necessarily skip” if they came up on a random shuffle while I wad driving… and the rest just feels too much like “bad cover” vibes.
And that’s as much of a review as I’m willing to give, for the reasons touched-upon below.
Building the engines of our own defeat
I continue to have several issues with the widespread use of generative AI, and in particular I have problems with it being used in the production of art. Those are partially
mitigated by it being used by an artist to remix their own work, and partially mitigated by the transparent declaration of the use of AI by the publisher both of which are
true in this case. But many issues (ethical, environmental, etc.) still remain.
Perhaps the biggest of which in this case is my concern that we’re using automation wrong.
As a child, I was optimistic about a future in which machines would take away the boring and repetitive work that humans do, leaving us free to pivot to experimental and experiential
roles: the joy of working hard in the quest of discovery and of creativity. But instead, the predominant popular use of generative AI is to replace exactly those
things, leaving humans only with an increasing amount of drudgery, review, and fact-checking. Where did we go wrong?
Don’t get me wrong: I love that Pagan Wanderer Lu has created this EP. Taking art that he’s created, whose concept touches on the concepts of AI… and feeding them into an
actual AI for reinterpretation is transformative. It’s worthy of discussion as a piece of art in its own right. And the result is… well, some of it’s good, and other
bits are okay.
What I don’t like is what it represents: the wider societal issue of the mainstream use of these technologies that have enormous unsolved problems.
So I guess… I appreciate the cognitive dissonance of enjoying a peice of music and disliking what it means?
Footnotes
1 Whether or not the side-effect of undisclosed AI-generated content “poisoning the well”
for future AI training is a good or bad thing remains an open question, in my mind, but it’s certainly a real phenomenon. You know how we salvage the wrecks of ships sunk before the atomic age because they’re untainted by man-made radioactivity, which makes them useful for special
purposes? It feels like the Internet before the explosion in generative AI may provide a similar cultural resource for future AI training, if you see what I mean.
2 And assuming I wasn’t already familiar with the artist, who doesn’t usually
sound like an auto-tuned female singer.
3 I don’t have a specific example so I hope this is a universal experience!
I potentially saved my client a bunch of money and embarrassment with that 3-line change.
Now, I consider that a productive day.
But had I been measured on my contribution by lines of code, or commits, or features finished, it would have been seen as a very unproductive day by my manager.
…
A great anecdote and some wise words from Jason Gorman on the nature of productivity and code.
This matches my feeling on AI. It’s good at making lots of code. Sometimes it even writes the right code. But something it rarely demonstrates skill at is
comprehending the bigger issue. I’m sure we’re already seeing developers who “game” their employers’ productivity metrics, to the detriment of the end users, by having AI
make “more” code without having to engage their brain and actually understand the problem.
(And, of course, there are employers who, whether intentionally or not, promote this kind of behaviour through their policies and success metrics.)
NHS England has issued new guidance to staff, which has been shared with New Scientist, that demands existing and future software be pulled from public view and kept behind closed
doors. “All source code repositories must be private by default. Repositories must not be public unless there is an explicit and exceptional need, and public access has been
formally approved,” says the new guidance. The deadline for making code private is 11 May.
Last month, an AI created by Anthropic called Mythos was widely reported to be capable of discovering flaws in virtually any software, potentially allowing hackers to break into
systems running it.
NHS England’s guidance specifically points to Mythos as the cause for the new measures.
…
Yet again, “AI” is the reason why we can’t have nice things on an open and transparent Web.
This is bad, of course. But the worst part is the illusion it helps feed that closed-source software is necessarily more-secure than open-source software. Obviously it’s all
much more-complex than that. Indeed, the article goes on to quote Terence Eden thoroughly debunking the entire line of thought:
“Is it possible that Mythos will scan a repository and find a bug? Yes, 100 per cent likely. Is that going to be a bug that causes a security issue in a live NHS service
somewhere? Almost certainly not,” says Eden. “I think it’s someone in NHS England buying into the hype that Mythos is going to cause the end of security as we know it and
getting a bit panicked.”
He’s right. This policy change is unlikely to improve the security of any of the affected pieces of NHS software (for much of which, the code is already out-there and archived, and so
removing it from the Internet now is pretty pointless). If it’s going to be attacked, it’ll be attacked, and the resources that the bad guys have for probing a whole
database worth of CVEs or fuzz-testing the extremities makes the availability of vulnerability-scanning AI pretty-close to irrelevant.
At least if it were open source then the good guys would have a chance of helping out… as well as we, the taxpayers who made the software possible, being able to see where our money was
going!
why bother going to the brick-and-mortar store? amazon is more “convenient”. why bother cooking a nice meal for yourself? doordash and uber eats are more
“convenient”. why go out and socialize with people? facebook is more “convenient”. why use a digital camera, camcorder, or polaroid? your
smartphone is more “convenient”. why bother going to the theater or concerts? netflix and spotify are more “convenient”. why bother making art?
asking an AI to generate it for you is more “convenient”.
well, i say nuts to that. from now on, i’m going to make my life as inconvenient as possible. i’m going to go to the store and buy stuff in person. i’m going to make my own
food with my own hands. i’m going to socialize with people face-to-face. i’m going to use a true camera instead of my phone’s camera. i’m going to buy blu-rays, DVDs, and CDs
instead of streaming. i’m going to take my time when creating, watching, playing, and reading a work of art.
…
I’m seeing an growing movement in indieweb, revivalist, and adjacent circles that express RNotté’s sentiment: that the endless (and highly-marketable) quest for increased convenience in
our lives has gained us free time, but we’ve lost something along the way.
What we’ve lost varies from case to case, but includes freedom (from lock-in to subscription services), creative satisfaction (from convenient “artistic” expression), privacy (from
becoming the product, packaged-up by big-data advertising-funded tools), and social interactions (from so much of “social” media).
But reading RNotté share their thoughts on the matter today was the first time that it’s reminded me of The Matrix.
The connection was probably helped by the fact that I rewatched the film pretty recently.
There’s a bit where Agent Smith says, to his captive the rebel captain Morpheus:
Did you know that the first Matrix was designed to be a perfect human world? Where none suffered, where everyone would be happy. It was a disaster. No one would accept the program.
Entire crops were lost. Some believed we lacked the programming language to describe your perfect world.
Smith goes on to elucidate that his personal explanation for this fault was that humans depend upon suffering and misery, while acknowledging that there are other explanations. And
perhaps we’ve touched upon one.
Perhaps humans – all humans – have a limit for how much they’re willing to accept convenience as compensation. Connected humans in The Matrix grain a convenient life,
superficially superior to the struggle for survival experienced by humans living in the real world, short on food and hunted by machines. But to get that, they trade away their
individual ability to become aware of the truth and, collectively, the ability for humanity for shape its own destiny. But there’s something about the imbalance of power in the
arrangement niggles in human minds, and some rebel against the established order… and are joined by others who are shown that an alternative is available.
Clearly – as RNotté and others show – faceless technological forces need not go quite so far as enslaving an entire species before “convenience” no longer becomes a tolerable
mitigation!
I’m not convinced that seeking out inconvenience is in itself a good. But questioning what your conveniences are worth and what you’re paying for them… that’s definitely
worthwhile.
Folks at work have been encouraging to make more use of generative AI in my workflow1;
going beyond my current “fancy autocomplete” use and giving my agents more autonomy. My experience of such “vibe coding” so far has been… mixed2,
but I promised I’d revisit it.
One thing that these models are usually effective at is summarisation3. This is valuable if you’re faced with a large and unfamiliar
codebase and you’re looking to trace a particular thing but you’re not certain where it is or what it’ll be called. While they’re not always fast, these tools can
at least work in the background, which allows the developer to get on with something else while the agent trawls logs, code, and configuration to find and explain a
fuzzily-defined thing.
Recently, I had a moment which I thought might be such an instance… but it didn’t turn out quite the way I expected. Here’s the story4:
The broken dev env
I’d been drafted into an established and ongoing project to provide more hands, following a coworker’s departure last week. This project touches parts of our (sprawling,
microsevices-based) infrastructure that I hadn’t looked at before, so there was a lot I didn’t yet know.
I picked an issue that had belonged to my former colleague that QA had rejected and set out to retrace their steps: to replicate the problem that the QA engineers had identified and in
doing so learn more about the underlying process. I spun up my development environment and tried to follow the steps.
The process failed… but much earlier than QA had said it would. Clearly my development environment was at fault, or at least not representative of their setup.
But I couldn’t even get as far as their problem before my frontend barfed out an error message. Sigh! Probably there’s some configuration I’ve missed somewhere in the myriad
microservices, or else the data I’m testing with isn’t a fair reflection on what they’re doing as-standard.
Following some staff changes, I have no teammates on this side of the Atlantic who could help me decipher this: a “quick question on Slack” wouldn’t solve this one until hours
from now. It was time to start debugging!
But… maybe Claude could help? It’s got access to almost all the same code, logs, tools and browser windows I do. I started typing:
✨ What’s up next, Dan?
In my development environment for https://service.dev/asset/new, when I click “Save”, I see the error “Oops, something went wrong.”
Why?
Context is key
It’s quite possible that Claude would have gone away, had a “think”, done some tests, and then come back to me with a believable answer. It might even have been correct, and I’d have
been able to short-cut my way back to productivity (and I’d have time to make a mug of coffee and finish reading my emails while it did so). Then, I’d just have to check that it was
right, make the change, and get on with things.
But I realised that it’d probably work faster (and cheaper, and using less energy) if it had slightly more context from the get-go, so I elaborated. The first thing I’d
want to know if I were debugging this is what was actually happening behind the scenes. I dipped into my browser’s Network debugger and extracted the relevant output, adding it to my
prompt:
✨ What’s up next, Dan?
In my development environment for https://service.dev/asset/new, when I click “Save”, I see
the error “Oops, something went wrong.” Why?The payload POSTed to the server is { content: 'test1', audience: [ 'one' ], status: 'draft' } and
the response is a HTTP 500 with the following stack trace: pasted 94 lines
That’s more like it, now I could let it get on with its work. But wait…
Rubberducking
There’s a concept in computer programming called “rubberducking”. The name comes from an anecdote in The Pragmatic Programmer about a developer who, when stuck on a problem, would
explain the code line-by-line to a rubber duck. The thinking is that talking-through a problem, even to someone (or something) who doesn’t understand it, can lead the speaker to
insights they were otherwise missing.
I’ve done it myself many, many times: recruiting a convenient colleague or friend and talking them through the technical problem I was faced with, and inviting them to ask me to go
into greater detail if I seemed to be skimming over anything, and I can promise that it can work.
The panel above is part of a series in which a sorceress called Cepper who’s
coerced by her university into using Avian Intelligence (“AI”) – a robotic parrot5 that her headmaster insists is the future of magic. She experiments with it, finds it
occasionally useful but more-often frustrating, attempts to implement her own local version but find that troublesome in different ways, and eventually settles on using
an inanimate rubber duck instead. I get it, Cepper!
Let’s put that distraction aside for a moment and get back to the story of my broken development environment.
Clues in the stack trace
The top entry in the stack trace was an unsuccessful call to a different microservice, so I figured I’d pull its logs too, in order to further help direct
the AI in the right direction6:
✨ What’s up next, Dan?
In my development environment for https://service.dev/asset/new, when I click “Save”, I see
the error “Oops, something went wrong.” Why?The payload POSTed to the server is { content: 'test1',
audience: [ 'one' ], status: 'draft' } and the response is a HTTP 500 with the following stack trace: pasted 94 linesThe stack
trace suggests that a call is being made to the dojo backend service, where the following error log looks relevant: pasted 9
lines
I haven’t tried it, but I’m pretty confident that the LLM, after much number-crunching and a little warming-up of some datacentre somewhere, would get to the answer. But again, I found
something niggling inside me: the second-from top line in the dojo logs suggested that a connection was being made to a further, deeper microservice.
I should pull its logs too, I figured.
The final puzzle piece
As an aide mémoire – in a way I’ve taken to doing when taking notes or when talking to AI – I first typed what I was going to provide. This is
useful if, for example, somebody distracts me at a key moment: it means you’ve got a jumping-off point predefined by my past self:
✨ What’s up next, Dan?
In my development environment for https://service.dev/asset/new, when I click “Save”, I see
the error “Oops, something went wrong.” Why?The payload POSTed to the server is { content: 'test1',
audience: [ 'one' ], status: 'draft' } and the response is a HTTP 500 with the following stack trace: pasted 94 linesThe stack
trace suggests that a call is being made to the dojo backend service, where the following error log looks relevant: pasted 9
lines. It’s calling osiris, which says:
I dipped into the directory for
osiris, and before I even got to the logs I spotted a problem: that microservice was on an old feature branch. How odd! I switched to the main branch and… everything
started working.
The entire event took only a few minutes. I’d find some information, type it into Claude’s input field, realise that more information could be valuable, and repeat.
By the time I’d finished describing the problem, I’d discovered the solution. That’s the essence of successful rubberducking. I didn’t need the AI at all.
All I needed was the illusion of something that might be able to help if I just talked through what I was thinking.
I don’t know what the moral is, here.
I wonder if I’d have been as effective had I just typed into my text editor. I suppose I would have, but I wonder if I’d have been motivated to do so in the first place? I’ve tried
rubberducking before by talking to an imaginary person, but I’ve never tried typing to one7; maybe I should start?
Footnotes
1 I’m pretty sure every engineering department nowadays has it’s rabid fanboys, but I’m
pleased that for the most part my colleagues take a more-pragmatic and realistic outlook: balancing the potential benefits of LLM-assisted coding with its many shortfalls,
downsides, and risks.
3 So long as what you’ve got them summarising is something you can later verify!
4 I’ve taken huge liberties with the strict factual accuracy to make this more-readable as
well as to to not-expose things I probably oughtn’t. So before you swoop in to criticise my prompt-fu (not that I asked you, but I know there’s somebody out there who’s thinking about
doing this right now), please note that none of the text in this page are what I actually wrote to the AI; it’s a figurative example.
6 I’d had an experience just the previous week in which it’d gone off on completely the
wrong track, attempting to change code in order to “fix” what was ultimately a configuration or data problem, and so I thought it might be useful to give it some rails to follow, to
start with.
7 Except insofar as this AI agent is an “imaginary person”, which it possibly already a
step-too-far in implying personhood for my liking!
Unlike Alice, who spent the year reading papers with a pencil in hand, scribbling notes in the margins, getting confused, re-reading, looking things up, and slowly assembling a
working understanding of her corner of the field, Bob has been using an AI agent. When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed
to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent’s fix introduced a new bug, it debugged
that too. When it came time to write the paper, the agent wrote it. Bob’s weekly updates to his supervisor were indistinguishable from Alice’s. The questions were similar. The
progress was similar. The trajectory, from the outside, was identical.
Here’s where it gets interesting. If you are an administrator, a funding body, a hiring committee, or a metrics-obsessed department head, Alice and Bob had the same year. One paper
each. One set of minor revisions each. One solid contribution to the literature each. By every quantitative measure that the modern academy uses to assess the worth of a scientist,
they are interchangeable. We have built an entire evaluation system around counting things that can be counted, and it turns out that what actually matters is the one thing that
can’t be.
…
The strange thing is that we already know this. We have always known this. Every physics textbook ever written comes with exercises at the end of each chapter, and every physics
professor who has ever stood in front of a lecture hall has said the same thing: you cannot learn physics by watching someone else do it. You have to pick up the pencil. You have to
attempt the problem. You have to get it wrong, sit with the wrongness, and figure out where your reasoning broke. Reading the solution manual and nodding along feels like
understanding. It is not understanding. Every student who has tried to coast through a problem set by reading the solutions and then bombed the exam knows this in their bones. We
have centuries of accumulated pedagogical wisdom telling us that the attempt, including the failed attempt, is where the learning lives. And yet, somehow, when it comes to AI
agents, we’ve collectively decided that maybe this time it’s different. That maybe nodding at Claude’s output is a substitute for doing the calculation yourself. It isn’t. We knew
that before LLMs existed. We seem to have forgotten it the moment they became convenient.
Centuries of pedagogy, defeated by a chat window.
…
This piece by Minas Karamanis is excellent throughout, and if you’ve got the time to read it then you should. He’s a physics postdoc, and this post comes from his experience in his own
field, but I feel that the concerns he raises are more-widely valid, too.
In my field – of software engineering – I have similar concerns.
Let’s accept for a moment that an LLM significantly improves the useful output of a senior software engineer (which is very-definitely disputed, especially for the “10x” level of claims we often hear, but let’s just take it as-read for now). I’ve
experimented with LLM-supported development for years, in various capacities, and it certainly sometimes feels like they do (although it sometimes also feels like they have the
opposite effect!). But if it’s true, then yes: an experienced senior software engineer could conceivably increase their work performance by shepherding a flock of agents through a
variety of development tasks, “supervising” them and checking their work, getting them back on-course when they make mistakes, approving or rejecting their output, and stepping in to
manually fix things where the machines fail.
In this role, the engineer acts more like an engineering team lead, bringing their broad domain experience to maximise the output of those they manage. Except who they manage is… AI.
Again, let’s just accept all of the above for the sake of argument. If that’s all true… how do we make new senior developers?
Junior developers can use LLMs too. And those LLMs will make mistakes that the junior developer won’t catch, because the kinds of mistakes LLMs make are often hard to spot and require
significant experience to identify. But if they’re encouraged to use LLMs rather than making mistakes by hand and learning from them – to keep up, for example, or to meet corporate
policies – then these juniors will never gain the essential experience they’ll one day need. They’ll be disenfranchised of the opportunity to grow and learn.
It’s yet to be proven that more-sophisticated models will “solve” this problem, but my understanding is that issues like hallucination are fundamentally unsolvable: you might
get fewer hallucinations in a better model, but that just means that those hallucinations that slip through will be better-concealed and even harder to identify in code review
or happy-path testing.
Maybe – maybe – the trajectory of GPTs is infinite, and they’ll keep getting “smarter” to the point at which this doesn’t matter: programming genuinely will become a natural language
exercise, and nobody will need to write or understand code at all. In this possible reality, the LLMs will eventually develop entire new programming languages to best support their
work, and humans will simply express ideas and provide feedback on the outputs. But I’m very sceptical of that prediction: it’s my belief that the mechanisms by which LLMs work has a
fundamental ceiling – a capped level of sophistication that can be approached but never exceeded. And sure, maybe some other, different approach to AI might not have this
limitation, but if so then we haven’t invented it yet.
Which suggests that we will always need experienced engineers to shepherd our AIs. Which brings us back to the fundamental question: if everybody uses AI to code, how do we
make new senior developers?
I have other concerns about AI too, of course, some of which I’ve written about. But this one’s top-of-mind today, thanks to Minas’ excellent article. Go read it to learn more about how
physics research faces a similar threat… and, perhaps, consider how your own field might need to face this particular challenge.
The Gell-Mann amnesia effect is a cognitive bias describing the tendency of individuals to critically assess media reports in a domain they are knowledgeable about, yet continue
to trust reporting in other areas despite recognizing similar potential inaccuracies.
Summarizing, AI sounds like a incredible genius synthesizing the world’s knowledge right up until you ask it about the thing you know about, then it’s an idiot. Even knowing about
this phenomenon and having experienced it countless times, LLMs have an intoxicating quality to them.
…
I remember one time, maybe in the mid-1990s, when I saw a shopping channel (remember those? oh god, they’re still a thing, aren’t they?) where the host was trying to sell a personal
computer. And… clearly, they knew absolutely nothing about it. They kept hitting on the same two or three talking points they’d been given (“mention the quad-speed CD-ROM
drive!”) and fumbling their way through, and it gave me a revelation:
I knew enough about computers that I could see that the presenter was bullshitting their way through the segment. But there are plenty of things that I don’t
know much about, which are also sold on this same show. Duvets, jewellery, glassware… I’m nowhere near as much an expert on these as I was on PC featuresets. Is there something
inherently incomprehensible about computers? No. So it’s reasonable to assume that these salespeople probably know equally-little about everything they sell, it’s just
that I don’t have the knowledge base to be able to see that.
That’s what GenAI often feels like, to me. Having collated all of the publicly-available knowledge it could find into its model doesn’t make it smarter than the smartest humans, it
brings it towards probably something slightly-above-the-average in any given subject, depending on the topic. If I ask an LLM about something that I don’t understand well,
it produces often highly-believable answers, but if I ask it about something that I’m an expert in, it can come off as a fool.
I’m very interested in how we teach information literacy in this new world of rapidly-generated highly-believable nonsense.
Anyway: Dave’s post doesn’t go in that direction – instead, he’s got some clever thoughts about how the “convenience” of a “good enough” AI-driven solution to any given problem risks us
seeing humans as the friction point, which ultimately works against those very humans who are looking to benefit from the technology:
…
We need experts to share what they know and improve the quality of our work, generated or otherwise. We even need idiots to make sure we can break ideas down into their simplest form
that everyone, agents or human, understand. People can have bad attitudes, be shitty, and have wrong opinions… but people are not friction. An LLM may be able to autocorrect its way
into a plausible human response, but it’s not people. It doesn’t care if it’s right or wrong.
Many years ago, someone tried to get me into cryptocurrencies. “They’re the future of money!” they said. I replied saying that I’d rather wait until they were more useful, less
volatile, easier to use, and utterly reliable.
“You don’t want to get left behind, do you?” They countered.
That struck me as a bizarre sentiment. What is there to be left behind from? If BitCoin (or whatever) is going to liberate us all from economic drudgery, what’s the point
of “getting in early”? It’ll still be there tomorrow and I can join the journey whenever it is sensible for me.
…
100%. If I “get in early” on something, it’s because that thing interests me, not because I’m betting on its future. With a hundred new ideas a day and only one of them “making it”,
it’s a fools’ game to try to jump on board every bandwagon that comes along.
With cryptocurrencies, though, I’m fortunate enough to have an even better comeback at the cryptobros that try to shill me whatever made-up currency they’re “investing” in
today: I’ve already done better than they ever will, at them.
When Bitcoin first appeared, I took a technical interest in it. I genuinely never anticipated it’d take off (I made the same incorrect
guess with MP3s, too!), but I thought it was a fun concept to play about with. The only Bitcoins I ever paid for must’ve been worth an average of 50p each, or so.
I sold my entire wallet of Bitcoins when they hit around £750 each. I know a tulip economy when I see one, I thought. Plus: I was
no longer interested in blockchains now I was seeing how they were actually being used: my interest had been entirely in the technology and its applications, not in the actual idea of a
currency!
Sure, I kick myself ocassionally, given that I later saw the value rise to tens of thousands of pounds each. But hey, I was never in it for the money anyway.
So yeah, I tell cryptobros; I already made a 1500% ROI on cryptocurrency. And no, I’m not buying any cryptocurrencies any more. Whatever they think “getting in early” was, they’re
wrong, because I was there years ahead of them and I wasn’t even doing it to “get in early”; I did it because it was interesting. And honestly, isn’t that a better story to be able to
tell?
…
I feel the same way about the current crop of AI tools. I’ve tried a bunch of them. Some are good. Most are a bit shit. Few are useful to me as they are now.
…
If this tech is as amazing as you say it is, I’ll be able to pick it up and become productive on a timescale of my choosing not yours.
…
Yup, that’s the attitude I’m taking.
I play with new AI technologies, sometimes. I don’t do it because I’m afraid of being left behind because – as you say – if a technology is transformative, we’ll all get to catch up
eventually.
Do you think that people who had smartphones first are benefitting today because they “got in early” on something that later became mainstream?
Of course they’re not. Their experience is eventually exactly the same as everybody else’s, just like it was for everybody who “got in early” on hype trains whose final station came
early, like Compuserve GO-words, WAP, Beenz.com, WebTV, the CueCat, m-Commerce, HD-DVD, the JooJoo, or Google+.
People being unwilling to discuss their wild claims later using the lack of discussion as evidence of widespread acceptance.
When people balance the new toilet roll one atop the old one’s tube.3
Come on! It would have been so easy!
Shellfish. Why would you eat that!?
People assuming my interest in computers and technology means I want to talk to them about cryptocurrencies.4
Websites that nag you to install their shitty app. (I know you have an app. I’m choosing to use your website. Stop with the banners!)
People who seem to only be able to drive at one speed.5
The assumption that the fact I’m “sharing” my partner is some kind of compromise on my part; a concession; something that I’d “wish away” if I could.
(It’s very much not.)
Brexit.
Wow, that was strangely cathartic.
Footnotes
1 I have a special pet hate for websites that require JavaScript to render their images.
Like… we’d had the<img>tag since 1993! Why are you throwing it away and replacing it with something objectively slower, more-brittle, and
less-accessible?
2 Or, worse yet, claiming
that my long, random password is insecure because it contains my surname. I get that composition-based password rules, while terrible (even when they’re correctly
implemented, which they’re often not), are a moderately useful model for people to whom you’d otherwise struggle to
explain password complexity. I get that a password composed entirely of personal information about the owner is a bad idea too. But there’s a correct way to do this, and it’s not “ban
passwords with forbidden words in them”. Here’s what you should do: first, strip any forbidden words from the password: you might need to make multiple passes. Second, validate the
resulting password against your composition rules. If it fails, then yes: the password isn’t good enough. If it passes, then it doesn’t matter that forbidden words
were in it: a properly-stored and used password is never made less-secure by the addition of extra information into it!
Last night I was chatting to my friend (and fellow Three Rings volunteer) Ollie about our respective
workplaces and their approach to AI-supported software engineering, and it echoed conversations I’ve had with other friends. Some workplaces, it seems, are leaning so-hard into
AI-supported software development that they’re berating developers who seem to be using the tools less than their colleagues!
That’s a problem for a few reasons, principal among them that AI does not
make you significantly faster but does make you learn less.1. I stand by the statement that AI isn’t useless, and I’ve experimented with it for years. But I certainly wouldn’t feel very comfortable
working somewhere that told me I was underperforming if, say, my code contributions were less-likely than the average to be identifiably “written by an AI”.
Even if you’re one of those folks who swears by your AI assistant, you’ve got to admit that they’re not always the best choice.
I ran into something a little like what Ollie described when an AI code reviewer told me off for not describing how my AI agent assisted me with the code change… when no AI had been
involved: I’d written the code myself.2
I spoke to another friend, E, whose employers are going in a similar direction. E joked that at current rates they’d have to start tagging their (human-made!) commits with fake
AI agent logs in order to persuade management that their level of engagement with AI was correct and appropriate.3
Supposing somebody like Ollie or E or anybody else I spoke to did feel the need to “fake” AI agent logs in order to prove that they were using AI “the right way”… that sounds
like an excuse for some automation!
I got to thinking: how hard could it be to add a git hook that added an AI agent’s “logging” to each commit, as if the work had been done by a
robot?4
Turns out: pretty easy…
To try out my idea, I made two changes to a branch. When I committed, imaginary AI agent ‘frantic’ took credit, writing its own change log. Also: asciinema + svg-term remains awesome.
Here’s how it works (with source code!). After you make a commit, the post-commit hook creates a file in
.agent-logs/, named for your current branch. Each commit results in a line being appended to that file to say something like [agent] first line of your commit
message, where agent is the name of the AI agent you’re pretending that you used (you can even configure it with an array of agent names and it’ll pick one at
random each time: my sample code uses the names agent, stardust, and frantic).
There’s one quirk in my code. Git hooks only get the commit message (the first line of which I use as the imaginary agent’s description of what it did) after the commit has
taken place. Were a robot really used to write the code, it’d have updated the file already by this point. So my hook has to do an --amend commit, to
retroactively fix what was already committed. And to do that without triggering itself and getting into an infinite loop, it needs to use a temporary environment variable.
Ignoring that, though, there’s nothing particularly special about this code. It’s certainly more-lightweight, faster-running, and more-accurate than a typical coding LLM.
Sure, my hook doesn’t attempt to write any of the code for you; it just makes it look like an AI did. But in this instance: that’s a feature, not a
bug!
Footnotes
1 That research comes from Anthropic. Y’know, the company who makes Claude, one of the
most-popular AIs used by programmers.
3 Using “proportion of PRs that used AI” as a metric for success seems to me to be just
slightly worse than using “number of lines of code produced”. And, as this blog post demonstrates, the
former can be “gamed” just as effectively as the latter (infamously) could.
4 Obviously – and I can’t believe I have to say this – lying to your employer isn’t a
sensible long-term strategy, and instead educating them on what AI is (if anything) and isn’t good for in your workflow is a better solution in the end. If you read this blog post and
actually think for a moment hey, I should use this technique, then perhaps there’s a bigger problem you ought to be addressing!
Today, an AI review tool used by my workplace reviewed some code that I wrote, and incorrectly claimed that it would introduce a bug because a global variable I created could “be
available to multiple browser tabs” (that’s not how browser JavaScript works).
Just in case I was mistaken, I explained to the AI why I thought it was wrong, and asked it to explain itself.
To do so, the LLM wrote a PR to propose adding some code to use our application’s save mechanism to pass the data back, via the server, and to any other browser tab, thereby creating
the problem that it claimed existed.
This isn’t even the most-efficient way to create this problem. localStorage would have been better.
So in other words, today I watched an AI:
(a) claim to have discovered a problem (that doesn’t exist),
(b) when challenged, attempt to create the problem (that wasn’t needed), and
(c) do so in a way that was suboptimal.
Humans aren’t perfect. A human could easily make one of these mistakes. Under some circumstances, a human might even have made two of these mistakes. But to make all three? That took an
AI.
What’s the old saying? “To err is human, but to really foul things up you need a computer.”