search – Dan Q

Ad Infinitum

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

…

For 25 years, Google Search was built on a contract. The web provided the content – billions of pages, freely linked, freely crawled. In return, Google sent people back. The link was the unit of exchange. It’s what made the Web thrive as an information system: you publish, Google indexes, users click through, and value flows back to the source. Win-win.

That contract is now broken. Generative UI doesn’t link to your article, necessarily. It absorbs your article, synthesizes it into a widget, and presents it as Google’s own answer. Information agents don’t send users to websites. They deliver “synthesized updates” with maybe a link or two buried at the bottom. The web was the scaffolding Google needed to build its index, to train its models, to accumulate the world’s information, and put ads next to it to get filthy rich. Now that the content is inside the system, the scaffolding is no longer needed. Google is creating its own context.

Google thinks it no longer needs the Web to deliver answers. And it no longer needs ad slots to deliver ads. What it needs is you. Your emails, your files, your calendar, your purchase history, your travel plans – all flowing into Spark, all building the richest possible picture of who you are and what you’re likely to click on. That’s exactly the kind of personal context those auction models need to work. The prediction module in the prominence allocation framework doesn’t run on keywords. It runs on knowing you.

…

Matthias Ott, in Ad Infinitum

An excellent piece by Matthias Ott, discussing revelations from this year’s Google I/O. In particular, the imminent pivot of Google Search from its lifelong “query in, list of links out” model to a wholesale “query in, LLM output out” one.

This isn’t just about putting AI output at the top of the search results, as I gather they do today, but about getting rid of search “results” entirely, and running everything through the model.

To which Matthias wisely asks: well, how will ads work then? Google’s business model is based on mining your personal data and shoving ads in your face. Where do they go in a search interface that it’s really a search but a “helpful” AI.

It turns out there’s a few approaches that Google seem to be considering, but what they’ve all got in common is the idea that marketers will be able to “influence” the LLM’s token generation, perhaps by using an LLM of their own to decide whether you (based on everything Google knows about you) are worth marketing to, and how much they’ll pay to do so, and then this input being “weighted” against competing advertisers and actual ingested data in order to feature advertiser-influenced content woven directly into the output of the LLM.

Superficially, this sounds a little like product placement, like you sometimes see in American-made TV shows and movies. You know, where one character says, of “I’m going to go get a drink refill. You know you can get unlimited refills on any drink you want… and it’s free?”, and the next says “It’s a wonderful restaurant.”, while they’re sitting in Burger King.

Except this isn’t about saying “hey, people who watch this show are probably high and want a snack, let’s push our fast food their way”. It’s individualised.

It’s more like if the characters, knowing that your GMail account had a recent email about some test results, and your Google Calendar had an appointment tomorrow at the doctor, started talking about a particular brand of medication to, y’know, put the idea into your head.

We’re not at the point of completely-customised TV shows – nor the injection of commercials into dreams – yet. But Google’s plans, which blur the already-grey boundaries between organic and advertising content, are pretty insidious.

Assuming you’re in their ecosystem already, and possibly even if you’re not… Google may already be looking at your search terms, your calendar, your emails, your location and schedule, who you communicate with and how often, which web pages you visit, which apps you use, where you spend money, etc. (Seriously: if you somehow haven’t begun de-googling already, what are you waiting for?)… there’s a huge potential for misuse there.

But the arms race between people blocking or learning-to-ignore ads and advertisers trying to foist them upon us continues, and Google thinks this is an acceptable next step in escalating that. Using an insane amount of energy to recycle other people’s work without crediting them, in order to mash up the result with information they know about you in order to deliver you an unverifiable soup of words which might answer your question but with no clue how much or little commercial interest went into producing it, or by whom.

That’s some proper Darkest Timeline shit, right there.

You don’t need to take my nor Matthias’s word on it (although you should read his full post because it’s excellent): just look at the concept videos in Google’s blog post on the subject. You’ll also notice that almost-nowhere in their demos do Google even hint at the possibility of linking-out to anybody else’s website: there’s like one “visit site” button that appears at the very end of one of the flows, after the agent has done its things. Google is building a walled garden where they hope you’ll live, served by their AI butler on behalf of the companies who pay Google to tell you about their products.

Ugh.

So Unbelievable it Sounds Like you Googled It

“To Google”

When it first appeared, Google Search was a breath of fresh air. Simple, powerful search that Just Worked. It’s little wonder that the phase “to Google” something became synonymous with “to search for” something.

Somewhere, Google lost its way.¹ Perhaps the latest example of that is the injection of AI into every search²:

I’ve been to the cinema a few times lately so I’ve seen the Google AI ad that inspired me to make this parody… a lot.
Music by Dead Tubes Foundation (click to unmute/mute).

Apparently the kids these days don’t “Google it”. At least, not in their colloquialisms: they’re still probably using the search engine.

They say that they’ll “search it up”.

And this presents us with an opportunity:

Let’s reclaim the phrase “to Google”

I was inspired by a blog post by Mr Scribs (itself inspired by a Fediverse conversation), discovered via Bubbles:

We should turn the verb use of googling into an insult.

Example: “That’s so unbelievable it sounds like you googled it.”

Mr Scribs, in a good domain

I love this, and I’m absolutely going to start using it. “To Google” can absolutely transform from meaning “to search for, using a Web search engine” to meaning:

to seek knowledge in a lazy and convenient way, without regard for its accuracy
(“I Googled from a guy at the pub that 5G caused Covid”)
to acquire information that can’t accurately be sourced or verified
(“don’t quote me on that, though: I Googled it”)
to prefer an answer to a question that’s mildly more-convenient for the asker, even if getting it was ethically problematic
(“pass me the jump leads, I’m going to Google one of the hostages”)

DeGoogling is so… 2010s. Let’s make the 2020s the decade where we redefine Google as a verb, in a way that better represents what it means to continue to buy in to the ever-increasingly toxic Google Search ecosystem.

Footnotes

¹ Maybe it was then the Search-Chrome-Analytics trifecta that positioned the company as both the assistant to, and the adversary of, the users. Maybe it was when they dropped “don’t be evil”. Maybe it was when they stopped listening to users, or when they stopped listening to their own developers. Maybe it was when they helped sterilise the Web. Maybe it was AMP and they way they abused their monopoly to force it down everybody’s throats. Maybe it was when they killed (insert your favourite service here). Maybe it was when they started enshittifying Android. Make your own mind up.

² Yes, I’m aware that some other search engines include AI summaries in results, too. But they all seem easier to turn off… and I’m yet to see a cinema advertisement about the fact that they do it for anything other that Google Search.

We Need to Talk About Botsplaining

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

…

“Botsplaining,” as I use the term, describes a troubling new trend on social media, whereby one person feeds comments made by another person into a large language model (like ChatGPT), asks it to provide a contrarian (often condescending) explanation for why that person is “wrong,” and then pastes the resulting response into a reply. They may occasionally add in “I asked ChatGPT to read your post, and here’s what he said,”² but most just let the LLM speak freely on their behalf without acknowledging that they’ve used it. ChatGPT’s writing style is incredibly obvious, of course, so it doesn’t really matter if they disclose their use of it or not. When you ask them to stop speaking to you through an LLM, they often simply continue feeding your responses into ChatGPT until you stop engaging with them or you block them.

This has happened to me multiple times across various social media platforms this year, and I’m over it.

…

Stephanie Vee, in We Need to Talk About Botsplaining

Stephanie hits it right on the nose in this wonderful blog post from last month.

I just don’t get it why somebody would ask an AI to reply to me on their behalf, but I see it all the time. In threads around the ‘net, I see people say “I put your question into ChatGPT, and here’s what it said…” I’ve even seen coworkers at my current and formers employer do it.

What do they think I am? Stupid? It’s not like I don’t know that LLMs exist, what they’re good at, what they’re bad at (I’ve been blogging about it for years now!), and more-importantly, what people think they’re good at but are wrong about.

If I wanted an answer from an AI (which, just sometimes, I do)… I’d have asked an AI in the first place.

If I ask a question and it’s not to an AI, then it’s safe for you to assume that it’s because what I’m looking for isn’t an answer from an AI. Because if that’s what I wanted, that’s what I would have gotten in the first place and you wouldn’t even have known. No: I asked a human a question because I wanted an answer from a human.

When you take my request, ignore this obvious truth, and ask an LLM to answer it for you… it is, as Stephanie says, disrespectful to me.

But more than that, it’s disrespectful to you. You’re telling me that your only value is to take what I say, copy-paste it to a chatbot, then copy-paste the answer back again! Your purpose in life is to do for people what they’re perfectly capable of doing for themselves, but slower.

How low an opinion must you have of yourself to volunteer, unsolicited to be the middle-man between me and a mediocre search engine?

If you don’t know the answer, say nothing. Or say you don’t know. Or tell me you’re guessing, and speculate. Or ask a clarifying question. Or talk about a related problem and see if we can find some common ground. Bring your humanity.

But don’t, don’t, don’t belittle both of us by making yourself into a pointless go-between in the middle of me and an LLM. Just… dont’t.

Reply to Blogging for traffic not design

This is a reply to a post published elsewhere. Its content might be duplicated as a traditional comment at the original source.

Andy Hawthorne said:

When you’re writing online, being unique doesn’t matter nearly as much as being found.

I’m not sure I could disagree more. But I’ve jumped in half way through his post. Let’s backtrack a bit.

Andy begins:

A blogger showed me his website the other day.

…

But no one was reading it.

Firstly: let’s just observe that you were shown a website… and now you’re talking about it… but you haven’t linked to it? You’re complaining about its lack of discoverability, while simultaneously being part of the problem.

Hyperlinks remain, as they have been since the mid-to-late 1990s, a primary mechanism in helping search engines’ spiders to discover new sites, and nowadays they’re doubly-important because they help establish legitimacy.

When you search for, say, “history of web search” and this Wikipedia article is at the top, a significant reason for that is that people link to that page when talking about the history of web search! A secondary reason is that lots of people link to Wikipedia in general.

Berating somebody for an unindexed site… but not linking to that site… feels awfully-close to victim-blaming!

(Especially recently, as still-dominant search engine Google continues to make it harder and harder for “new” sites to get onto the ladder.)

When I asked him why he didn’t just use WordPress or Bear Blog, he looked offended.

“Those are so basic. Everyone uses those. I wanted something unique.”

I’m not sure I understand the logic of the person whose argument against e.g. WordPress is that it’s not “unique”. There are lots of great reasons that you might use WordPress. There are lots of great reasons that you might not. The right choice of CMS should be based on a variety of factors.

It’s possible that the person being referred to meant “customisable”. They’d still be wrong (in the case of WordPress, at least: Bear Blog offers significantly less customisation options, which is fine if the other features are what you’re looking for), but anyway: the short of it is that I briefly agreed, here, until:

WordPress powers about 43% of all websites. That means search engines know exactly how to read WordPress sites.

They know where to look for the content, the metadata, the tags.

Let’s correct the points here:

Search engines know exactly how to read HTML. WordPress outputs HTML. (If you’re outputting HTML, your site can be indexed. Hell, even that isn’t a firm requirement: my plaintext-only blog shows up in search engines!)
Web standards dictate how content, metadata, and tags should be laid out. A search engine’s spider doesn’t look at your site and go “hey, it’s WordPress, so I need to look for this“. Instead, it’ll generally look for content and metadata based on established standards. Titles, headings, <meta> tags, semantic elements: these are the things a search engine looks for.
Sure, WordPress gets those things right. But they’re not hard to get right. You shouldn’t use WordPress (or Bear, or anything else) based just on the fact that it exposes metadata correctly. Any site can do this. And because what’s eventually exposed to the search engine – and to the user – is HTML code… which is independent of the CMS that generated it… it doesn’t have to matter what the underlying CMS is.

Then there’s some more confusion:

Here’s what matters: WordPress and other major platforms have spent years optimising for search engines and social sharing.

They’ve spent millions making sure posts load fast.

This sounds like it’s conflating WordPress (the open-source CMS) with one or more of several WordPress hosting providers (probably WordPress.com). That’s a common mistake, but it is a mistake.

WordPress can do terrible SEO. WordPress can be really slow. Trust me: in a previous life I’ve made a part of my living out of fixing and improving people’s WordPress-powered websites! A large part of this comes from WordPress’s flexibility: the theme you choose, for example, can completely change the functionality of your site. Inspired by my plain text blog, Terence Eden made a WordPress theme that does the same thing! That WordPress theme completely upends the way that most people would use WordPress, but it’s still fundamentally WordPress, even though it exposes to search engines no HTML code, no metadata, and no tags.

WordPress can also do great SEO, and it can be really fast. A properly-configured WordPress site can be a well-oiled machine. But if you conflate WordPress itself with its output, you’re arguing against a straw man.

Don’t get me wrong: I love WordPress! But I dislike people making the false claim that if you’re not using it (or another popular blogging tool), you’re destined to fail at SEO. There’s nothing “magical” about WordPress. It just takes content and renders HTML, in the end!

But all of this is moot, perhaps, when we get back to that first point:

When you’re writing online, being unique doesn’t matter nearly as much as being found.

This entire statement presupposes the purpose of “writing online”.

It’s 100% okay to write for yourself, first and foremost. It’s also okay to write for a small target audience, like for your friends or family. It’s okay to write content that isn’t exposed to search engines (consider all of the wonderful content that my fellow RSS Club members put out, sometimes!). It’s okay to write just for the joy of making things.

A website doesn’t have to be “professional”, as Andy’s post goes on to imply. A website doesn’t have to be anything in particular. A website can just… be. And that’s enough.

Step #1

I have A Plan for today. Step #2 involves a deep-dive into Algolia search indexing, ranking, and priority, to understand how one might optimise for a diverse and complex dataset.

So obviously step #1 involves a big ol’ coffee and a sugary breakfast. Here we go…