spam – Dan Q

Roll Your Own Antispam

Three Rings operates a Web contact form to help people get in touch with us: the idea is that it provides a quick and easy way to reach out if you’re a charity who might be able to make use of the system, a user who’s having difficulty with the features of the software, or maybe a potential new volunteer willing to give your time to the project.

But then the volume of spam it received increased dramatically. We don’t want our support team volunteers to spend all their time categorising spam: even if it doesn’t take long, it’s demoralising. So what could we do?

Clearly-spammy message shown in a ticket management system. — It’s clearly spam, but if it takes you 2 seconds to categorise it and there are 30 in your Inbox, that’s still a drag.

Our conventional antispam tools are configured pretty liberally: we don’t want to reject a contact from a legitimate user just because their message hits lots of scammy keywords (e.g. if a user’s having difficulty logging in and has copy-pasted all of the error messages they received, that can look a lot like a password reset spoofing scam to a spam filter). And we don’t want to add a CAPTCHA, because not only do those create a barrier to humans – while not necessarily reducing spam very much, nowadays – they’re often terrible for accessibility, privacy, or both.

But it didn’t take much analysis to spot some patterns unique to our contact form and the questions it asks that might provide an opportunity. For example, we discovered that spam messages would more-often-than-average:

Fill in both the “name” and (optional) “Three Rings username” field with the same value. While it’s cetainly possible for Three Rings users to have a login username that’s identical to their name, it’s very rare. But automated form-fillers seem to disproportionately pair-up these two fields.
Fill the phone number field with a known-fake phone number or a non-internationalised phone number from a country in which we currently support no charities. Legitimate non-UK contacts tend to put international-format phone numbers into this optional field, if they fill it at all. Spammers often put NANP (North American Numbering Plan) numbers.
Include many links in the body of the message. A few links, especially if they’re to our services (e.g. when people are asking for help) is not-uncommon in legitimate messages. Many links, few of which point to our servers, almost certainly means spam.
Choose the first option for the choose -one question “how can we help you?” Of course real humans sometimes pick this option too, but spammers almost always choose it.

None of these characteristics alone, or any of the half dozen or so others we analysed (including invisible checks like honeypots and IP-based geofencing), are reason to suspect a message of being spam. But taken together, they’re almost a sure thing.

To begin with, we assigned scores to each characteristic and automated the tagging of messages in our ticketing system with these scores. At this point, we didn’t do anything to block such messages: we were just collecting data. Over time, this allowed us to find a safe “threshold” score above which a message was certainly spam.

Once we’d found our threshold we were able to engage a soft-block of submissions that exceeded it, and immediately the volume of spam making it to the ticketing system dropped considerably. Under 70 lines of PHP code (which sadly I can’t share with you) and we reduced our spam rate by over 80% while having, as far as we can see, no impact on the false-positive rate.

Where conventional antispam solutions weren’t quite cutting it, implementing a few rules specific to our particular use-case made all the difference. Sometimes you’ve just got to roll your sleeves up and look at the actual data you do/don’t want, and adapt your filters accordingly.

Somewhat-Effective Spam Filters

I’ve tried a variety of unusual strategies to combat email spam over the years.

Here are some of them (each rated in terms the geekiness of its implementation and its efficacy), in case you’d like to try any yourself. They’re all still in use in some form or another:

Spam filters

Geekiness: 1/10
Efficacy: 5/10

Your email provider or your email software probably provides some spam filters, and they’re probably pretty good. I use Proton‘s and, when I’m at my desk, Thunderbird‘s. Double-bagging your spam filter only slightly reduces the amount of spam that gets through, but increases your false-positive rate and some non-spam gets mis-filed.

A particular problem is people who email me for help after changing their name on FreeDeedPoll.org.uk, probably because they’re not only “new” unsolicited contacts to me but because by definition many of them have strange and unusual names (which is why they’re emailing me for help in the first place).

Frankly, spam filters are probably enough for many people. Spam filtering is in general much better today than it was a decade or two ago. But skim the other suggestions in case they’re of interest to you.

Unique email addresses

Geekiness: 3/10
Efficacy: 8/10

If you give a different email address to every service you deal with, then if one of them misuses it (starts spamming you, sells your data, gets hacked, whatever), you can just block that one address. All the addresses come to the same inbox, for your convenience. Using a catch-all means that you can come up with addresses on-the-fly: you can even fill a paper form with a unique email address associated with the company whose form it is.

On many email providers, including the ever-popular GMail, you can do this using plus-sign notation. But if you want to take your unique addresses to the next level and you have your own domain name (which you should), then you can simply redirect all email addresses on that domain to the same inbox. If Bob’s Building Supplies wants your email address, give them bobs@yourname.com, which works even if Bob’s website erroneously doesn’t accept email addresses with plus signs in them.

This method actually works for catching people misusing your details. On one occasion, I helped a band identify that their mailing list had been hacked. On another, I caught a dodgy entrepreneur who used the email address I gave to one of his businesses without my consent to send marketing information of a different one of his businesses. As a bonus, you can set up your filtering/tagging/whatever based on the incoming address, rather than the sender, for the most accurate finding, prioritisation, and blocking.

Also, it makes it easy to have multiple accounts with any of those services that try to use the uniqueness of email addresses to prevent you from doing so. That’s great if, like me, you want to be in each of three different Facebook groups but don’t want to give Facebook any information (not even that you exist at the intersection of those groups).

Signed unique email addresses

Geekiness: 10/10
Efficacy: 2/10

Unique email addresses introduce two new issues: (1) if an attacker discovers that your Dreamwidth account has the email address dreamwidth@yourname.com, they can probably guess your LinkedIn email, and (2) attackers will shotgun “likely” addresses at your domain anyway, e.g. admin@yourname.com, management@yourname.com, etc., which can mean that when something gets through you get a dozen copies of it before your spam filter sits up and takes notice.

What if you could assign unique email addresses to companies but append a signature to each that verified that it was legitimate? I came up with a way to do this and implemented it as a spam filter, and made a mobile-friendly webapp to help generate the necessary signatures. Here’s what it looked like:

The domain directs all emails at that domain to the same inbox.
If the email address is on a pre-established list of valid addresses, that’s fine.
Otherwise, the email address must match the form of:
- A string (the company name), followed by
- A hyphen, followed by
- A hash generated using the mechanism described below, then
- The @-sign and domain name as usual

The hashing algorithm is as follows: concatenate a secret password that only you know with a colon then the “company name” string, run it through SHA1, and truncate to the first eight characters. So if my password were swordfish1 and I were generating a password for Facebook, I’d go:

SHA1 ( swordfish1 : facebook) [ 0 ... 8 ] = 977046ce
Therefore, the email address is facebook-977046ce@myname.com
If any character of that email address is modified, it becomes invalid, preventing an attacker from deriving your other email addresses from a single point (and making it hard to derive them given multiple points)

I implemented the code, but it soon became apparent that this was overkill and I was targeting the wrong behaviours. It was a fun exercise, but ultimately pointless. This is the one method on this page that I don’t still use.

Honeypots

Geekiness: 8/10
Efficacy: ?/10

A honeypot is a “trap” email address. Anybody who emails it get aggressively marked as a spammer to help ensure that any other messages they send – even to valid email addresses – also get marked as spam.

I litter honeypots all over the place (you might find hidden email addresses on my web pages, along with text telling humans not to use them), but my biggest source of honeypots is formerly-valid unique addresses, or “guessed” catch-all addresses, which already attract spam or are otherwise compromised!

I couldn’t tell you how effective it is without looking at my spam filter’s logs, and since the most-effective of my filters is now outsourced to Proton, I don’t have easy access to that. But it certainly feels very satisfying on the occasions that I get to add a new address to the honeypot list.

Instant throwaways

Geekiness: 5/10
Efficacy: 6/10

OpenTrashmail is an excellent throwaway email server that you can deploy in seconds with Docker, point some MX records at, and be all set! A throwaway email server gives you an infinite number of unique email addresses, like other solutions described above, but with the benefit that you never have to see what gets sent to them.

If you offer me a coupon in exchange for my email address, it’s a throwaway email address I’ll give you. I’ll make one up on the spot with one of my (several) trashmail domains at the end of it, like justgivemethedamncoupon@danstrashmailserver.com. I can just type that email address into OpenTrashmail to see what you sent me, but then I’ll never check it again so you can spam it to your heart’s content.

As a bonus, OpenTrashmail provides RSS feeds of inboxes, so I can subscribe to any email-based service using my feed reader, and then unsubscribe just as easily (without even having to tell the owner).

Summary

With the exception of whatever filters your provider or software comes with, most of these options aren’t suitable for regular folks. But you’re only a domain name (assuming you don’t have one already) away from being able to give unique email addresses to everybody you deal with, and that’s genuinely a game-changer all by itself and well worth considering, in my opinion.

GMail Tip: Use A Plus Sign To Avoid Spam

This technique’s about a decade old, but a lot of people still aren’t using it, and I can’t help but suspect that can only be because they didn’t know about it yet, so let’s revisit:

You have a GMail account, right? Or else Google for Domains? Suppose your email address is dan@gmail.com… did you know that also means that you own:

dan+smith@gmail.com
dan+something@gmail.com
dan+anything-really@gmail.com
d.an@gmail.com
d..a..n@gmail.com
…

You have a practically infinite number of GMail addresses. Just put a plus sign (+) after your name but before the @-sign and then type anything you like there, and the email will still reach you. You can also insert as many full stops (.) as you like, anywhere in the first half of your email address, and they’ll still reach you, too. And that’s really, really useful.

When you’re asked to give your email address to a company, don’t give them your email address. Instead, give them a mutated form of your email address that will still work, but that identifies exactly who you gave it to. So for example you might give the email address dan+amazon@gmail.com to Amazon, the email address dan+twitter@gmail.com to Twitter, and the email address dan+pornhub@gmail.com to… that other website you have an account on.

Why is this a clever idea? Well, there are a few reasons:

If the company sells your email address to spammers, or hackers steal their database, you’ll know who to blame by the email address they’re sending to. I’ve actually caught out an organisation in this way who were illegally reselling their mailing lists to third parties.
If you start getting unwanted mail from somebody (whether because spammers got the email or because you don’t like what the company is sending to you), you can easily block them. Even if you can’t unsubscribe or just because they make it hard to do so, you can just set up a filter to automatically discard anything that comes to that email address in future.
If you feel like organising your life better, you can set up filters for that, too: it doesn’t matter what address a company sends from, so long as you know what address they’re sending to, so you can easily have filters that e.g. automatically forward copies of the mortgage statement that come to dan+yourbank@gmail.com to your spouse, or automatically label anything coming to
dan+someshop@gmail.com with the label “Shopping”.
If you’re signing up just to get a freebie and you don’t trust them not to spam you afterwards, you don’t need to use a throwaway: just receive the goodies from them and them block them at the source.

I know that some people get some of these benefits by maintaining a ‘throwaway’ email address. But it’s far more-convenient to use the email address you already have (you’re already logged-in to it and you use it every day)! And if you ever do want a true ‘throwaway’, you’re generally better using Mailinator: when you’re asked for your email address, just mash the keyboard and then put @mailinator.com on the end, to get e.g. dsif9tsnev4y8594es87n65y4@mailinator.com. Copy the first half of the email address to the clipboard, and then when you’re done signing up to whatever spammy service it is, just go to mailinator.com and paste into the box to see what they emailed you.

A handful of badly-configured websites won’t accept email addresses with plus signs in them, claiming that they’re invalid (they’re not). Personally, when I come across these I generally just inform the owner of the site of the bug and then take my business elsewhere; that’s how important it is to me to be able to filter my email properly! But another option is to exploit the fact that you can put as many dots in (the first part of) your GMail address as you like. So you could put d…an@gmail.com in and the email will still reach you, and you can later filter-out emails to that address. I’ll leave it as an exercise for the reader to decide how to encode information about the service you’re signing up to into the pattern and number of dots that you use.

Go forth and avoid spam.

Pay To Post

I see that Facebook is experimenting with allowing you to pay a nominal fee to make sure that your posts end up “highlighted” over those of your friends’ other friends. That’s a whole new level of crazy… or is it?

I’m not on Facebook, but I think that this is a really interesting piece of news. The biggest thing that makes Facebook unusable (and which also affects Twitter) is that people will post every little banal thing that comes to their mind. I don’t care what you’re eating for your lunch. I don’t want to read the lyrics of some song that must have been written for you. I really can’t stand your chain messages (for a while there, after I hadn’t received any by email for a few years, I hoped that they’d died out… but it turns out that they just moved to Facebook instead). If you’re among my friends, I know that you have some pretty smart and interesting things to say… but unless I’m willing to spend hours sifting through the detritus it’s buried in, I’ll never find it.

Social Media Citation. The littering fine tickets of the digital generation.

But this might work. If the price sweet spot can be found, and it’s marketed right, then this kind of feature might make services like Facebook more tolerable. When you’re writing about a cute picture of the cat you’ve seen, that’s fine. And when you write something I might care about, you can tick the “this is actually relevant” box. You’ll have to pay a few pence, but at least you know I’ll see it. And if I want to churn through reams of “X likes Chocolate” (who doesn’t?) and “Y is… in a queue for the bus” then I can turn off the “only relevant things” mode and waste some time.

The problem is that the sweet spot will vary from person to person, and there’s no way to work around that. Big Bucks Bob can probably afford to pay a couple of pounds every time he wants to push some meme photo to the top of your feed, but Poor Penniless Penny can’t even justify ten pence to make sure that all of her friends hear about her birthday party.

It’s a pity that it won’t work, because a part of me is drawn to the idea that economic theory can help to improve the signal-to-noise ratio in our information-saturated lives. Turning my attention to email: of all the cost-based anti-spam systems, I was always quite impressed with Hashcash (which Microsoft seem to be reinventing with their Penny Black project). The idea is that your computer does some hard-to-do (but easy-to-verify) computational work for each and every email that it sends. But in its own way, Hashcash has a similar problem to Facebook’s new system: the ability to pay of a sender is not directly proportional to their relevance to the recipient. If my mother wants to send me an email from her aging smartphone, should she have to wait for several minutes while it processes and generates an “e-stamp”, just because – if it were made any faster – spammers with zombie networks of computers could do so too easily?

Yes, I just equated your social network status, about what you ate for your lunch, with spam. If you don’t like it, don’t share this blog post with your friends.

hashcash token: 1:20:120511:https://danq.me/2012/05/11/pay-to-post/::UVHo081pj6bSDWkI:00000000000001sxI

An Idea – How To Get Treeware Junk Mail Banned

Here’s a thought: a way to try to get unsolicited (treeware) junk mail banned –

Every time you receive a bit of junk mail, just go and put it back in the post box: it’s almost all franked mail, and so the post office will re-sort it and deliver it back to you. Put a tally on the reverse side, and add one to it each time you forward it to yourself. If enough people did it, I wonder how many recursions you’d need to put through the post office before the postal workers union petitioned the government to disallow the sending of unsolicited treeware junk mail.

Not sure if it’d work, but I think I’ll do it anyway, just out of curiosity about how high a tally I can get before the post office start refusing to re-deliver them. Heh.

Got my Dad’s web site done. Just waiting for the domain name registration to go through so I can deploy it.

I’m So Fucking Clever

[this post has been partially damaged during a server failure on Sunday 11th July 2004, and it has been possible to recover only a part of it]

[this post was partially recovered on 24 November 2017]

I’ve had a great idea that I might try to implement sometime (or put of the stack of things I might try to implement sometime (or put on the stack of things I might think about moving onto the stack of things I might try to implement sometime)). Allow me to illustrate…

As Kit has noticed (Andy too), SpamBots prowl around LiveJournal‘s servers (I’ve described a possible strategy they could be using as a comment to Kit’s entry). Basically, these are semi-intelligent robots which, starting with people with ‘relevant’ interests, advertise their product on the blogs of them and their Friends… and their Friends’ Friends… etc.

Now, think back a little further to an entry in Alec’s blog, mid-January: some ‘random’ came along and began to flirt with him through the medium of comments in his blog. ‘She’ kept up conversation for some time before disappearing. …