Disabling AI in WordPress 7.0

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Because I have access to wp-config.php, I added the following to my file:

define( 'WP_AI_SUPPORT', false );

A useful tip.

Personally, I’ve got what feels like an even-better approach (for me, at least) I switched to ClassicPress a year and a bit ago, and haven’t looked back! It’s a stripped-down fork of WordPress with no Gutenberg, lighter JavaScript, and a handful of other features… plus ClassicPress is already AI-free and staying that way.

This isn’t to say that you can’t use AI with ClassicPress. Just that you’re not having to install the feature if you’re never going to use it. With WordPress’s good plugin architecture it seems strange to me that such divisive features would become part of the core product, but that just seems to be the direction that the project’s been going in for a while now.

Is AI Profitable Yet?

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Screenshot of a table and graph that shows all AI companies spending significantly more money than they make... except for NVidia, who're making bank.

No surprises here, but it’s interesting/staggering to see quite how large the disparity between spending and profit is for some of these companies.

I enjoy the fact that there’s a real-time ticker on the site so you can watch Amazon (for example) burn five thousand dollars a second.

When I tell people that generative AI, as it’s currently used, is unsustainable, this is what I’m talking about. Unless there’s a quantum leap in AI efficiency (for which I’ve seen no evidence of the feasibility) or a dramatic increase in the charged cost of LLM services (on the order of a tenfold increase assuming the increased cost does not drive any customers away; more if it does), this whole thing looks like a house of cards.

Bloomscrolling & Agentic Intelligence

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

A lot of the AI bubble – and that’s what it is, for all there are useful things inside there – is based on “Invest now, because when it works it’ll be fantastic!” rhetoric that’s like investing in a mainframe company in the late 60s on the basis that smartphones will take over the world. We’re moving a lot faster than mainframes went to PCs, but it’s important to invest in the things you can do with the system that work *now*.

There isn’t a good consumer use for AI right now. ChatGPT is a terrible source of information, confidently wrong in a way that sounds human enough to cause delusion and psychosis.

Things that AI/LLM tech is good for right now – pattern matching, repetitive tasks, logic flow – have some great business cases (It’s made some amazing breakthroughs in satellite and medical imagery, it’s got a bright future in automated transcription), and I think there’s a good case for it in content moderation (Yeah, it’s not great at it, but given the sick shit content mods on Facebook have had to deal with has given them cPTSD, I strongly believe it should be a machine job). It’s use for writing, music, translation, or art is still at the very least questionable and at the most utterly immoral.

Well-said, Aquarion!

The current generation of Generative AI isn’t useless. But its uses are quite specific and it certainly does more-harm-than-good that it’s promoted as an “everything” solution to every problem. I’ve used some form of agentic coding for several years, mostly of the “spicy autocomplete” variety1, and I mostly agree with Aquarion’s observations.

The whole post is an enjoyable tale.

Footnotes

1 My experiments with “vibe coding” have shown me that AI working alone can produce usually-functional code to specification, but that code is often of low quality and rarely maintainable, even by the AI.

Built In Obsolescence

This post contains and links to (clearly-identified) AI-generated content. As remains the case, none of my writing on this blog was generated by AI.

Imagine my excitement to learn that Pagan Wander Lu just dropped a new EP, Built In Obsolescence. And then imagine my horror to discover that it’s actually produced by P-AI-gan Wanderer Lu; an AI that’s been given PWL lyrics and some artistic direction.

Wot.

AI-generated EP cover of Built In Obsolescence by PAIgan Wanderer Lu, showing a neon digital outline of Andy.
The album art’s clearly also AI-generated, and that’s… well… you know. At least this robot hand has got the correct number of fingers.

Nothingness is what silicon dreams

My younger child’s been getting into PWL in a big way lately. As a result of this, I ended up making time for a careful re-listen to a lot of the back catalogue. This in turn inspired a blog post last year in which I mentioned that Checker Charlie‘s observations about humans replacing their work with machine effort feels increasingly prophetic in the age of generative AI. That’s something I didn’t see in it when I first reviewed it 13 years prior.

I’ve played with AI-generated music a couple of times myself, of course, mostly as an academic exercise. And it’s becoming more and more apparent that it’s hard to avoid bumping into it in the “real world”.

Early efforts at AI music were pretty unconvincing, always sounding a bit auto-tuney, frequently struggling to stress lines in the right places, and tripping over themselves when they try to do anything even remotely more-interesting than a simple repeating melody atop a predictable chord sequence. But they’re getting… shall we say… “better”, and there have been times nowadays when I’ve gotten some way through a track before realising that I’m listening to AI.

At least PWL’s being honest about it and declaring at the outset that this is AI-generated art. There’s plenty of folks using AI to generate content online and not declaring it, which is pretty awful1. Anyway: in this EP the AI’s moderately well-concealed and listening casually to most of the tracks I wouldn’t have noticed it if I hadn’t been told2.

Is there life enough in these chords?

So I listened to the EP. Three times.

The cover of Checker Charlie, I’m sad to admit, works. It’s got the feel of early-nineties pop, full of synths and saccharine, but instead of insipid lyrics about love it benefits a lot from Andy’s lyrical prowess. It’s a bouncy bop that would be forgettable if it weren’t for the excellent story told by the words is, I suppose, what I mean to say. And, of course, it’s the song that would have made me think about this. Anyway: I enjoyed it and would absolutely listen to it again, and I don’t know what that says about me, about the song, or anything else.

Uncanny Valley doesn’t work as well. Musically, it feels like a new artist in 2012 drew inspiration from their dad’s new wave albums but wanted to make it sound more like Carly Rae Jepsen was collabing with Daft Punk. And the result is kind-of…flat? Could I even say… soulless? It feels like it might have been the B-side of their cover of Chemicals Like You, which rolls out next in the same vein. Twice was probably enough for these two.

Repetition 4 is among my favourite – let’s say top 15? – Pagan Wanderer Lu songs and the AI’s cover of it starts so strong. It finishes pretty strong too. The voice it’s chosen shows only a hint of uncanny-valley-autotune and it wails plaintively. The most human-made bits – the lyrical themes of fighting for creativity against your own struggles as a vulnerable and flawed human “machine” – remain solid. I really expected to love this one! But by the time we were half way through the song it felt… musically-repetitive. You know when you get a pop cover of a classic song sometimes3 and you feel like the cover artist… missed the point somehow? That’s what this feels like to me.

The repetitions of “we are all machines… for dancing” in the original felt meaningful and real; a human’s cathartic resignation to pleasure in the simple things we all enjoy, despite the challenges of life… but the AI cover adds this kind of doo-woppy backing vocals that subtract, rather than adding to, the meaning. I’m not saying it ruins it – it’s still a fun and bouncy version of a great song… but it’s one of those covers that leaves you longing for the original.

And then there’s the “unaligned version” of Uncanny Valley. I’m not sure if the introduced distortions in this version are AI-generated or not. They don’t feel like the kinds of “creative” choices that any AI I’ve played with would make, so I suspect this represents a closer human intervention in the AI’s process: humans imitating machines imitating humans, perhaps? Anyway: the change doesn’t add anything for me.

Had this been produced entirely by a human, I’d say that EP consists one one track I’d add to my everyday playlist (the cover of Checker Charlie), maybe one or two tracks that I “wouldn’t necessarily skip” if they came up on a random shuffle while I wad driving… and the rest just feels too much like “bad cover” vibes.

And that’s as much of a review as I’m willing to give, for the reasons touched-upon below.

Building the engines of our own defeat

I continue to have several issues with the widespread use of generative AI, and in particular I have problems with it being used in the production of art. Those are partially mitigated by it being used by an artist to remix their own work, and partially mitigated by the transparent declaration of the use of AI by the publisher both of which are true in this case. But many issues (ethical, environmental, etc.) still remain.

Perhaps the biggest of which in this case is my concern that we’re using automation wrong.

As a child, I was optimistic about a future in which machines would take away the boring and repetitive work that humans do, leaving us free to pivot to experimental and experiential roles: the joy of working hard in the quest of discovery and of creativity. But instead, the predominant popular use of generative AI is to replace exactly those things, leaving humans only with an increasing amount of drudgery, review, and fact-checking. Where did we go wrong?

Don’t get me wrong: I love that Pagan Wanderer Lu has created this EP. Taking art that he’s created, whose concept touches on the concepts of AI… and feeding them into an actual AI for reinterpretation is transformative. It’s worthy of discussion as a piece of art in its own right. And the result is… well, some of it’s good, and other bits are okay.

What I don’t like is what it represents: the wider societal issue of the mainstream use of these technologies that have enormous unsolved problems.

So I guess… I appreciate the cognitive dissonance of enjoying a peice of music and disliking what it means?

Footnotes

1 Whether or not the side-effect of undisclosed AI-generated content “poisoning the well” for future AI training is a good or bad thing remains an open question, in my mind, but it’s certainly a real phenomenon. You know how we salvage the wrecks of ships sunk before the atomic age because they’re untainted by man-made radioactivity, which makes them useful for special purposes? It feels like the Internet before the explosion in generative AI may provide a similar cultural resource for future AI training, if you see what I mean.

2 And assuming I wasn’t already familiar with the artist, who doesn’t usually sound like an auto-tuned female singer.

3 I don’t have a specific example so I hope this is a universal experience!

Coding Is When We’re Least Productive

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

I potentially saved my client a bunch of money and embarrassment with that 3-line change.

Now, I consider that a productive day.

But had I been measured on my contribution by lines of code, or commits, or features finished, it would have been seen as a very unproductive day by my manager.

A great anecdote and some wise words from Jason Gorman on the nature of productivity and code.

This matches my feeling on AI. It’s good at making lots of code. Sometimes it even writes the right code. But something it rarely demonstrates skill at is comprehending the bigger issue. I’m sure we’re already seeing developers who “game” their employers’ productivity metrics, to the detriment of the end users, by having AI make “more” code without having to engage their brain and actually understand the problem.

(And, of course, there are employers who, whether intentionally or not, promote this kind of behaviour through their policies and success metrics.)

NHS England rushes to hide software over AI hacking fears

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

NHS England has issued new guidance to staff, which has been shared with New Scientist, that demands existing and future software be pulled from public view and kept behind closed doors. “All source code repositories must be private by default. Repositories must not be public unless there is an explicit and exceptional need, and public access has been formally approved,” says the new guidance. The deadline for making code private is 11 May.

Last month, an AI created by Anthropic called Mythos was widely reported to be capable of discovering flaws in virtually any software, potentially allowing hackers to break into systems running it.

NHS England’s guidance specifically points to Mythos as the cause for the new measures.

Yet again, “AI” is the reason why we can’t have nice things on an open and transparent Web.

This is bad, of course. But the worst part is the illusion it helps feed that closed-source software is necessarily more-secure than open-source software. Obviously it’s all much more-complex than that. Indeed, the article goes on to quote Terence Eden thoroughly debunking the entire line of thought:

“Is it possible that Mythos will scan a repository and find a bug? Yes, 100 per cent likely. Is that going to be a bug that causes a security issue in a live NHS service somewhere? Almost certainly not,” says Eden. “I think it’s someone in NHS England buying into the hype that Mythos is going to cause the end of security as we know it and getting a bit panicked.”

He’s right. This policy change is unlikely to improve the security of any of the affected pieces of NHS software (for much of which, the code is already out-there and archived, and so removing it from the Internet now is pretty pointless). If it’s going to be attacked, it’ll be attacked, and the resources that the bad guys have for probing a whole database worth of CVEs or fuzz-testing the extremities makes the availability of vulnerability-scanning AI pretty-close to irrelevant.

At least if it were open source then the good guys would have a chance of helping out… as well as we, the taxpayers who made the software possible, being able to see where our money was going!

Altogether a bad move by the NHS, here.

rejecting convenience

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

why bother going to the brick-and-mortar store? amazon is more “convenient”. why bother cooking a nice meal for yourself? doordash and uber eats are more “convenient”. why go out and socialize with people? facebook is more “convenient”. why use a digital camera, camcorder, or polaroid? your smartphone is more “convenient”. why bother going to the theater or concerts? netflix and spotify are more “convenient”. why bother making art? asking an AI to generate it for you is more “convenient”.

well, i say nuts to that. from now on, i’m going to make my life as inconvenient as possible. i’m going to go to the store and buy stuff in person. i’m going to make my own food with my own hands. i’m going to socialize with people face-to-face. i’m going to use a true camera instead of my phone’s camera. i’m going to buy blu-rays, DVDs, and CDs instead of streaming. i’m going to take my time when creating, watching, playing, and reading a work of art.

I’m seeing an growing movement in indieweb, revivalist, and adjacent circles that express RNotté’s sentiment: that the endless (and highly-marketable) quest for increased convenience in our lives has gained us free time, but we’ve lost something along the way.

What we’ve lost varies from case to case, but includes freedom (from lock-in to subscription services), creative satisfaction (from convenient “artistic” expression), privacy (from becoming the product, packaged-up by big-data advertising-funded tools), and social interactions (from so much of “social” media).

But reading RNotté share their thoughts on the matter today was the first time that it’s reminded me of The Matrix.

Framegrab from The Matrix. In the foreground is the silhouette of Morpheus, who is about to be interrogated by Agent Smith, a man in a suit at the windowed far end of an office.
The connection was probably helped by the fact that I rewatched the film pretty recently.

There’s a bit where Agent Smith says, to his captive the rebel captain Morpheus:

Did you know that the first Matrix was designed to be a perfect human world? Where none suffered, where everyone would be happy. It was a disaster. No one would accept the program. Entire crops were lost. Some believed we lacked the programming language to describe your perfect world.

Smith goes on to elucidate that his personal explanation for this fault was that humans depend upon suffering and misery, while acknowledging that there are other explanations. And perhaps we’ve touched upon one.

Perhaps humans – all humans – have a limit for how much they’re willing to accept convenience as compensation. Connected humans in The Matrix grain a convenient life, superficially superior to the struggle for survival experienced by humans living in the real world, short on food and hunted by machines. But to get that, they trade away their individual ability to become aware of the truth and, collectively, the ability for humanity for shape its own destiny. But there’s something about the imbalance of power in the arrangement niggles in human minds, and some rebel against the established order… and are joined by others who are shown that an alternative is available.

Clearly – as RNotté and others show – faceless technological forces need not go quite so far as enslaving an entire species before “convenience” no longer becomes a tolerable mitigation!

I’m not convinced that seeking out inconvenience is in itself a good. But questioning what your conveniences are worth and what you’re paying for them… that’s definitely worthwhile.

×

Once You’re Asking the Right Question, You Don’t Need To Ask!

Folks at work have been encouraging to make more use of generative AI in my workflow1; going beyond my current “fancy autocomplete” use and giving my agents more autonomy. My experience of such “vibe coding” so far has been… mixed2, but I promised I’d revisit it.

One thing that these models are usually effective at is summarisation3. This is valuable if you’re faced with a large and unfamiliar codebase and you’re looking to trace a particular thing but you’re not certain where it is or what it’ll be called. While they’re not always fast, these tools can at least work in the background, which allows the developer to get on with something else while the agent trawls logs, code, and configuration to find and explain a fuzzily-defined thing.

Recently, I had a moment which I thought might be such an instance… but it didn’t turn out quite the way I expected. Here’s the story4:

The broken dev env

I’d been drafted into an established and ongoing project to provide more hands, following a coworker’s departure last week. This project touches parts of our (sprawling, microsevices-based) infrastructure that I hadn’t looked at before, so there was a lot I didn’t yet know.

I picked an issue that had belonged to my former colleague that QA had rejected and set out to retrace their steps: to replicate the problem that the QA engineers had identified and in doing so learn more about the underlying process.  I spun up my development environment and tried to follow the steps.

A popup error message saying "Oops, something went wrong. Please try again."
The process failed… but much earlier than QA had said it would. Clearly my development environment was at fault, or at least not representative of their setup.

But I couldn’t even get as far as their problem before my frontend barfed out an error message. Sigh! Probably there’s some configuration I’ve missed somewhere in the myriad microservices, or else the data I’m testing with isn’t a fair reflection on what they’re doing as-standard.

Following some staff changes, I have no teammates on this side of the Atlantic who could help me decipher this: a “quick question on Slack” wouldn’t solve this one until hours from now. It was time to start debugging!

But… maybe Claude could help? It’s got access to almost all the same code, logs, tools and browser windows I do. I started typing:

✨ What’s up next, Dan?

In my development environment for https://service.dev/asset/new, when I click “Save”, I see the error “Oops, something went wrong.” Why?

Context is key

It’s quite possible that Claude would have gone away, had a “think”, done some tests, and then come back to me with a believable answer. It might even have been correct, and I’d have been able to short-cut my way back to productivity (and I’d have time to make a mug of coffee and finish reading my emails while it did so). Then, I’d just have to check that it was right, make the change, and get on with things.

But I realised that it’d probably work faster (and cheaper, and using less energy) if it had slightly more context from the get-go, so I elaborated. The first thing I’d want to know if I were debugging this is what was actually happening behind the scenes. I dipped into my browser’s Network debugger and extracted the relevant output, adding it to my prompt:

✨ What’s up next, Dan?

In my development environment for https://service.dev/asset/new, when I click “Save”, I see the error “Oops, something went wrong.” Why?The payload POSTed to the server is { content: 'test1', audience: [ 'one' ], status: 'draft' } and the response is a HTTP 500 with the following stack trace: pasted 94 lines

That’s more like it, now I could let it get on with its work. But wait…

Rubberducking

There’s a concept in computer programming called “rubberducking”. The name comes from an anecdote in The Pragmatic Programmer about a developer who, when stuck on a problem, would explain the code line-by-line to a rubber duck. The thinking is that talking-through a problem, even to someone (or something) who doesn’t understand it, can lead the speaker to insights they were otherwise missing.

I’ve done it myself many, many times: recruiting a convenient colleague or friend and talking them through the technical problem I was faced with, and inviting them to ask me to go into greater detail if I seemed to be skimming over anything, and I can promise that it can work.

A witch is happy and proud of her invention - a rubber duck. She explains to her friend: I just figured that formulating my questions out loud helps me to solve them, and finally that's all I needed.
I discovered Mini Fantasy Theater recently and loved this episode from its backlog.

The panel above is part of a series in which a sorceress called Cepper who’s coerced by her university into using Avian Intelligence (“AI”) – a robotic parrot5 that her headmaster insists is the future of magic. She experiments with it, finds it occasionally useful but more-often frustrating, attempts to implement her own local version but find that troublesome in different ways, and eventually settles on using an inanimate rubber duck instead. I get it, Cepper!

Let’s put that distraction aside for a moment and get back to the story of my broken development environment.

Clues in the stack trace

The top entry in the stack trace was an unsuccessful call to a different microservice, so I figured I’d pull its logs too, in order to further help direct the AI in the right direction6:

✨ What’s up next, Dan?

In my development environment for https://service.dev/asset/new, when I click “Save”, I see the error “Oops, something went wrong.” Why?The payload POSTed to the server is { content: 'test1', audience: [ 'one' ], status: 'draft' } and the response is a HTTP 500 with the following stack trace: pasted 94 linesThe stack trace suggests that a call is being made to the dojo backend service, where the following error log looks relevant: pasted 9 lines

I haven’t tried it, but I’m pretty confident that the LLM, after much number-crunching and a little warming-up of some datacentre somewhere, would get to the answer. But again, I found something niggling inside me: the second-from top line in the dojo logs suggested that a connection was being made to a further, deeper microservice.

I should pull its logs too, I figured.

The final puzzle piece

As an aide mémoire – in a way I’ve taken to doing when taking notes or when talking to AI – I first typed what I was going to provide. This is useful if, for example, somebody distracts me at a key moment: it means you’ve got a jumping-off point predefined by my past self:

✨ What’s up next, Dan?

In my development environment for https://service.dev/asset/new, when I click “Save”, I see the error “Oops, something went wrong.” Why?The payload POSTed to the server is { content: 'test1', audience: [ 'one' ], status: 'draft' } and the response is a HTTP 500 with the following stack trace: pasted 94 linesThe stack trace suggests that a call is being made to the dojo backend service, where the following error log looks relevant: pasted 9 lines. It’s calling osiris, which says:

I dipped into the directory for

osiris , and before I even got to the logs I spotted a problem: that microservice was on an old feature branch. How odd! I switched to the main branch and… everything started working.

The entire event took only a few minutes. I’d find some information, type it into Claude’s input field, realise that more information could be valuable, and repeat.

By the time I’d finished describing the problem, I’d discovered the solution. That’s the essence of successful rubberducking. I didn’t need the AI at all. All I needed was the illusion of something that might be able to help if I just talked through what I was thinking.

I don’t know what the moral is, here.

I wonder if I’d have been as effective had I just typed into my text editor. I suppose I would have, but I wonder if I’d have been motivated to do so in the first place? I’ve tried rubberducking before by talking to an imaginary person, but I’ve never tried typing to one7; maybe I should start?

Footnotes

1 I’m pretty sure every engineering department nowadays has it’s rabid fanboys, but I’m pleased that for the most part my colleagues take a more-pragmatic and realistic outlook: balancing the potential benefits of LLM-assisted coding with its many shortfalls, downsides, and risks.

2 My experience of vibe-coding in a nutshell: LLMs are great at knocking out the easy 80% of any engineering problem, but often in a way that makes the remaining 20% – already the hard part – harder than it would have been if a human had done the first 80% (especially if it’s the same human and they can bring their learnings with them)… and I’m definitely not the only one who’s found that. I also suspect that the unsatisfying and unimproving task of shepherding a flock of agents to write code and then casually reviewing it is not significantly more-productive (which research backs up) and results in a significantly increased regression rate… but I’m ready to be proven wrong when more studies come out. In short: I continue to think that GenAI isn’t useless, but neither is it necessarily always worthwhile.

3 So long as what you’ve got them summarising is something you can later verify!

4 I’ve taken huge liberties with the strict factual accuracy to make this more-readable as well as to to not-expose things I probably oughtn’t. So before you swoop in to criticise my prompt-fu (not that I asked you, but I know there’s somebody out there who’s thinking about doing this right now), please note that none of the text in this page are what I actually wrote to the AI; it’s a figurative example.

5 A literal stochastic parrot, one might say!

6 I’d had an experience just the previous week in which it’d gone off on completely the wrong track, attempting to change code in order to “fix” what was ultimately a configuration or data problem, and so I thought it might be useful to give it some rails to follow, to start with.

7 Except insofar as this AI agent is an “imaginary person”, which it possibly already a step-too-far in implying personhood for my liking!

×

The machines are fine. I’m worried about us.

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Unlike Alice, who spent the year reading papers with a pencil in hand, scribbling notes in the margins, getting confused, re-reading, looking things up, and slowly assembling a working understanding of her corner of the field, Bob has been using an AI agent. When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent’s fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob’s weekly updates to his supervisor were indistinguishable from Alice’s. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.

Here’s where it gets interesting. If you are an administrator, a funding body, a hiring committee, or a metrics-obsessed department head, Alice and Bob had the same year. One paper each. One set of minor revisions each. One solid contribution to the literature each. By every quantitative measure that the modern academy uses to assess the worth of a scientist, they are interchangeable. We have built an entire evaluation system around counting things that can be counted, and it turns out that what actually matters is the one thing that can’t be.

The strange thing is that we already know this. We have always known this. Every physics textbook ever written comes with exercises at the end of each chapter, and every physics professor who has ever stood in front of a lecture hall has said the same thing: you cannot learn physics by watching someone else do it. You have to pick up the pencil. You have to attempt the problem. You have to get it wrong, sit with the wrongness, and figure out where your reasoning broke. Reading the solution manual and nodding along feels like understanding. It is not understanding. Every student who has tried to coast through a problem set by reading the solutions and then bombed the exam knows this in their bones. We have centuries of accumulated pedagogical wisdom telling us that the attempt, including the failed attempt, is where the learning lives. And yet, somehow, when it comes to AI agents, we’ve collectively decided that maybe this time it’s different. That maybe nodding at Claude’s output is a substitute for doing the calculation yourself. It isn’t. We knew that before LLMs existed. We seem to have forgotten it the moment they became convenient.

Centuries of pedagogy, defeated by a chat window.

This piece by Minas Karamanis is excellent throughout, and if you’ve got the time to read it then you should. He’s a physics postdoc, and this post comes from his experience in his own field, but I feel that the concerns he raises are more-widely valid, too.

In my field – of software engineering – I have similar concerns.

Let’s accept for a moment that an LLM significantly improves the useful output of a senior software engineer (which is very-definitely disputed, especially for the “10x” level of claims we often hear, but let’s just take it as-read for now). I’ve experimented with LLM-supported development for years, in various capacities, and it certainly sometimes feels like they do (although it sometimes also feels like they have the opposite effect!). But if it’s true, then yes: an experienced senior software engineer could conceivably increase their work performance by shepherding a flock of agents through a variety of development tasks, “supervising” them and checking their work, getting them back on-course when they make mistakes, approving or rejecting their output, and stepping in to manually fix things where the machines fail.

In this role, the engineer acts more like an engineering team lead, bringing their broad domain experience to maximise the output of those they manage. Except who they manage is… AI.

Again, let’s just accept all of the above for the sake of argument. If that’s all true… how do we make new senior developers?

Junior developers can use LLMs too. And those LLMs will make mistakes that the junior developer won’t catch, because the kinds of mistakes LLMs make are often hard to spot and require significant experience to identify. But if they’re encouraged to use LLMs rather than making mistakes by hand and learning from them – to keep up, for example, or to meet corporate policies – then these juniors will never gain the essential experience they’ll one day need. They’ll be disenfranchised of the opportunity to grow and learn.

It’s yet to be proven that more-sophisticated models will “solve” this problem, but my understanding is that issues like hallucination are fundamentally unsolvable: you might get fewer hallucinations in a better model, but that just means that those hallucinations that slip through will be better-concealed and even harder to identify in code review or happy-path testing.

Maybe – maybe – the trajectory of GPTs is infinite, and they’ll keep getting “smarter” to the point at which this doesn’t matter: programming genuinely will become a natural language exercise, and nobody will need to write or understand code at all. In this possible reality, the LLMs will eventually develop entire new programming languages to best support their work, and humans will simply express ideas and provide feedback on the outputs. But I’m very sceptical of that prediction: it’s my belief that the mechanisms by which LLMs work has a fundamental ceiling – a capped level of sophistication that can be approached but never exceeded. And sure, maybe some other, different approach to AI might not have this limitation, but if so then we haven’t invented it yet.

Which suggests that we will always need experienced engineers to shepherd our AIs. Which brings us back to the fundamental question: if everybody uses AI to code, how do we make new senior developers?

I have other concerns about AI too, of course, some of which I’ve written about. But this one’s top-of-mind today, thanks to Minas’ excellent article. Go read it to learn more about how physics research faces a similar threat… and, perhaps, consider how your own field might need to face this particular challenge.

People are not friction

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

The Gell-Mann Amnesia Effect of AI is a pretty well documented phenomenon:

The Gell-Mann amnesia effect is a cognitive bias describing the tendency of individuals to critically assess media reports in a domain they are knowledgeable about, yet continue to trust reporting in other areas despite recognizing similar potential inaccuracies.

Summarizing, AI sounds like a incredible genius synthesizing the world’s knowledge right up until you ask it about the thing you know about, then it’s an idiot. Even knowing about this phenomenon and having experienced it countless times, LLMs have an intoxicating quality to them.

I remember one time, maybe in the mid-1990s, when I saw a shopping channel (remember those? oh god, they’re still a thing, aren’t they?) where the host was trying to sell a personal computer. And… clearly, they knew absolutely nothing about it. They kept hitting on the same two or three talking points they’d been given (“mention the quad-speed CD-ROM drive!”) and fumbling their way through, and it gave me a revelation:

knew enough about computers that I could see that the presenter was bullshitting their way through the segment. But there are plenty of things that I don’t know much about, which are also sold on this same show. Duvets, jewellery, glassware… I’m nowhere near as much an expert on these as I was on PC featuresets. Is there something inherently incomprehensible about computers? No. So it’s reasonable to assume that these salespeople probably know equally-little about everything they sell, it’s just that I don’t have the knowledge base to be able to see that.

That’s what GenAI often feels like, to me. Having collated all of the publicly-available knowledge it could find into its model doesn’t make it smarter than the smartest humans, it brings it towards probably something slightly-above-the-average in any given subject, depending on the topic. If I ask an LLM about something that I don’t understand well, it produces often highly-believable answers, but if I ask it about something that I’m an expert in, it can come off as a fool.

I’m very interested in how we teach information literacy in this new world of rapidly-generated highly-believable nonsense.

Anyway: Dave’s post doesn’t go in that direction – instead, he’s got some clever thoughts about how the “convenience” of a “good enough” AI-driven solution to any given problem risks us seeing humans as the friction point, which ultimately works against those very humans who are looking to benefit from the technology:

We need experts to share what they know and improve the quality of our work, generated or otherwise. We even need idiots to make sure we can break ideas down into their simplest form that everyone, agents or human, understand. People can have bad attitudes, be shitty, and have wrong opinions… but people are not friction. An LLM may be able to autocorrect its way into a plausible human response, but it’s not people. It doesn’t care if it’s right or wrong.

It’s an easy and worthwhile read.

Reply to: I’m OK being left behind, thanks!

This is a reply to a post published elsewhere. Its content might be duplicated as a traditional comment at the original source.

Terence Eden said:

Many years ago, someone tried to get me into cryptocurrencies. “They’re the future of money!” they said. I replied saying that I’d rather wait until they were more useful, less volatile, easier to use, and utterly reliable.

“You don’t want to get left behind, do you?” They countered.

That struck me as a bizarre sentiment. What is there to be left behind from? If BitCoin (or whatever) is going to liberate us all from economic drudgery, what’s the point of “getting in early”? It’ll still be there tomorrow and I can join the journey whenever it is sensible for me.

100%. If I “get in early” on something, it’s because that thing interests me, not because I’m betting on its future. With a hundred new ideas a day and only one of them “making it”, it’s a fools’ game to try to jump on board every bandwagon that comes along.

With cryptocurrencies, though, I’m fortunate enough to have an even better comeback at the cryptobros that try to shill me whatever made-up currency they’re “investing” in today: I’ve already done better than they ever will, at them.

When Bitcoin first appeared, I took a technical interest in it. I genuinely never anticipated it’d take off (I made the same incorrect guess with MP3s, too!), but I thought it was a fun concept to play about with. The only Bitcoins I ever paid for must’ve been worth an average of 50p each, or so.

I sold my entire wallet of Bitcoins when they hit around £750 each. I know a tulip economy when I see one, I thought. Plus: I was no longer interested in blockchains now I was seeing how they were actually being used: my interest had been entirely in the technology and its applications, not in the actual idea of a currency!

Sure, I kick myself ocassionally, given that I later saw the value rise to tens of thousands of pounds each. But hey, I was never in it for the money anyway.

So yeah, I tell cryptobros; I already made a 1500% ROI on cryptocurrency. And no, I’m not buying any cryptocurrencies any more. Whatever they think “getting in early” was, they’re wrong, because I was there years ahead of them and I wasn’t even doing it to “get in early”; I did it because it was interesting. And honestly, isn’t that a better story to be able to tell?

I feel the same way about the current crop of AI tools. I’ve tried a bunch of them. Some are good. Most are a bit shit. Few are useful to me as they are now.

If this tech is as amazing as you say it is, I’ll be able to pick it up and become productive on a timescale of my choosing not yours.

Yup, that’s the attitude I’m taking.

I play with new AI technologies, sometimes. I don’t do it because I’m afraid of being left behind because – as you say – if a technology is transformative, we’ll all get to catch up eventually.

Do you think that people who had smartphones first are benefitting today because they “got in early” on something that later became mainstream?

Of course they’re not. Their experience is eventually exactly the same as everybody else’s, just like it was for everybody who “got in early” on hype trains whose final station came early, like Compuserve GO-words, WAP, Beenz.com, WebTV, the CueCat, m-Commerce, HD-DVD, the JooJoo, or Google+.

A Random List of Silly Things I Hate

So apparently now this is a thing, so here I go:

  1. Websites that are just blank pages if the JavaScript doesn’t load from the CDN.1
  2. The misunderstanding that LLMs can somehow be a route to AGI.
  3. Computer systems that say my name is too short or my password is too long.2
  4. People being unwilling to discuss their wild claims later using the lack of discussion as evidence of widespread acceptance.
  5. When people balance the new toilet roll one atop the old one’s tube.3
A nearly-full roll of toilet paper perched atop an empty toilet roll tube on an open-ended spindle.
Come on! It would have been so easy!
  1. Shellfish. Why would you eat that!?
  2. People assuming my interest in computers and technology means I want to talk to them about cryptocurrencies.4
  3. Websites that nag you to install their shitty app. (I know you have an app. I’m choosing to use your website. Stop with the banners!)
  4. People who seem to only be able to drive at one speed.5
  5. The assumption that the fact I’m “sharing” my partner is some kind of compromise on my part; a concession; something that I’d “wish away” if I could. (It’s very much not.)
  6. Brexit.

Wow, that was strangely cathartic.

Footnotes

1 I have a special pet hate for websites that require JavaScript to render their images. Like… we’d had the <img> tag since 1993! Why are you throwing it away and replacing it with something objectively slower, more-brittle, and less-accessible?

2 Or, worse yet, claiming that my long, random password is insecure because it contains my surname. I get that composition-based password rules, while terrible (even when they’re correctly implemented, which they’re often not), are a moderately useful model for people to whom you’d otherwise struggle to explain password complexity. I get that a password composed entirely of personal information about the owner is a bad idea too. But there’s a correct way to do this, and it’s not “ban passwords with forbidden words in them”. Here’s what you should do: first, strip any forbidden words from the password: you might need to make multiple passes. Second, validate the resulting password against your composition rules. If it fails, then yes: the password isn’t good enough. If it passes, then it doesn’t matter that forbidden words were in it: a properly-stored and used password is never made less-secure by the addition of extra information into it!

3 This is the worst of the toilet paper crimes, but there’s a lesser but more-common offence.

4 Also: I’m uninterested in whatever multiplayer shooter game you’re playing, and no I won’t fix your printer.

5 “You were doing 35mph in the 60mph limit, then you were doing 35mph in the 40mph limit, now you’re doing 35mph in the 20mph limit. Argh!”

×

Things I do when I’m writing code that don’t look like writing code

Non-exhaustive list of things I’m doing when I’m writing code, that don’t look like “writing code”:

  • thinking
  • researching
  • contextualising
  • testing
  • measuring
  • documenting
  • communicating
  • planning
  • future-proofing
  • educating
  • learning
  • expressing
  • anticipating
  • discovering
  • inventing
  • experimenting
  • debugging
  • analysing
  • monitoring

For all its faults, an AI agent might “write code” faster than me.

But that’s only a part of the process.

My typing speed is not the bottleneck.

Subverting AI Agent Logging with a Git Post-Commit Hook

Last night I was chatting to my friend (and fellow Three Rings volunteer) Ollie about our respective workplaces and their approach to AI-supported software engineering, and it echoed conversations I’ve had with other friends. Some workplaces, it seems, are leaning so-hard into AI-supported software development that they’re berating developers who seem to be using the tools less than their colleagues!

That’s a problem for a few reasons, principal among them that AI does not make you significantly faster but does make you learn less.1. I stand by the statement that AI isn’t useless, and I’ve experimented with it for years. But I certainly wouldn’t feel very comfortable working somewhere that told me I was underperforming if, say, my code contributions were less-likely than the average to be identifiably “written by an AI”.

Even if you’re one of those folks who swears by your AI assistant, you’ve got to admit that they’re not always the best choice.

Copilot review of some code on GitHub, in which it's telling me that I should have included an .agent-logs/... file in which my AI agent describes how it helped, but I'm responding to say that 'shockingly' I wrote it without the help of AI, and telling Copilot to shut up.
I ran into something a little like what Ollie described when an AI code reviewer told me off for not describing how my AI agent assisted me with the code change… when no AI had been involved: I’d written the code myself.2

I spoke to another friend, E, whose employers are going in a similar direction. E joked that at current rates they’d have to start tagging their (human-made!) commits with fake AI agent logs in order to persuade management that their level of engagement with AI was correct and appropriate.3

Supposing somebody like Ollie or E or anybody else I spoke to did feel the need to “fake” AI agent logs in order to prove that they were using AI “the right way”… that sounds like an excuse for some automation!

I got to thinking: how hard could it be to add a git hook that added an AI agent’s “logging” to each commit, as if the work had been done by a robot?4

Turns out: pretty easy…

Animation showing a terminal. The developer switches to a branch, adds two modifications, and commits them. Afterwards, the log and filesystem show that a log file has been created crediting (fictional) AI bot 'frantic' with the change.
To try out my idea, I made two changes to a branch. When I committed, imaginary AI agent ‘frantic’ took credit, writing its own change log. Also: asciinema + svg-term remains awesome.

Here’s how it works (with source code!). After you make a commit, the post-commit hook creates a file in .agent-logs/, named for your current branch. Each commit results in a line being appended to that file to say something like [agent] first line of your commit message, where agent is  the name of the AI agent you’re pretending that you used (you can even configure it with an array of agent names and it’ll pick one at random each time: my sample code uses the names agent, stardust, and frantic).

There’s one quirk in my code. Git hooks only get the commit message (the first line of which I use as the imaginary agent’s description of what it did) after the commit has taken place. Were a robot really used to write the code, it’d have updated the file already by this point. So my hook has to do an --amend commit, to retroactively fix what was already committed. And to do that without triggering itself and getting into an infinite loop, it needs to use a temporary environment variable. Ignoring that, though, there’s nothing particularly special about this code. It’s certainly more-lightweight, faster-running, and more-accurate than a typical coding LLM.

Sure, my hook doesn’t attempt to write any of the code for you; it just makes it look like an AI did. But in this instance: that’s a feature, not a bug!

Footnotes

1 That research comes from Anthropic. Y’know, the company who makes Claude, one of the most-popular AIs used by programmers.

2 Do I write that much like an AI? Relevant XKCD.

3 Using “proportion of PRs that used AI” as a metric for success seems to me to be just slightly worse than using “number of lines of code produced”. And, as this blog post demonstrates, the former can be “gamed” just as effectively as the latter (infamously) could.

4 Obviously – and I can’t believe I have to say this – lying to your employer isn’t a sensible long-term strategy, and instead educating them on what AI is (if anything) and isn’t good for in your workflow is a better solution in the end. If you read this blog post and actually think for a moment hey, I should use this technique, then perhaps there’s a bigger problem you ought to be addressing!

× ×

To really foul things up you need an AI

Today, an AI review tool used by my workplace reviewed some code that I wrote, and incorrectly claimed that it would introduce a bug because a global variable I created could “be available to multiple browser tabs” (that’s not how browser JavaScript works).

Just in case I was mistaken, I explained to the AI why I thought it was wrong, and asked it to explain itself.

To do so, the LLM wrote a PR to propose adding some code to use our application’s save mechanism to pass the data back, via the server, and to any other browser tab, thereby creating the problem that it claimed existed.

This isn’t even the most-efficient way to create this problem. localStorage would have been better.

So in other words, today I watched an AI:
(a) claim to have discovered a problem (that doesn’t exist),
(b) when challenged, attempt to create the problem (that wasn’t needed), and
(c) do so in a way that was suboptimal.

Humans aren’t perfect. A human could easily make one of these mistakes. Under some circumstances, a human might even have made two of these mistakes. But to make all three? That took an AI.

What’s the old saying? “To err is human, but to really foul things up you need a computer.”