You may remember the long-running story of my letters to the Office of National Statistics, and the more-concentrated effort by another blogger, in regard to the automatic “correction” of supposedly-“erroneous” data in the 2011 census, like somebody having multiple partners or identifying as neither gender. You don’t? Well here’s a reminder: part one, part two, part three, part four.
Well: we’ve finally had some success. A response has been received from the ONS, including – at last – segments of business logic from their “correction” code.
It’s hard to tell for certain what the result of the correction will be, but one thing’s for sure – Ruth, JTA and I’s census data won’t have passed their validation! Their relationship validations BP2, BP2a, and BP2b state that it is logically-impossible for a person to have a spouse and a partner living with them in the same household.
I should invite them around for dinner sometime, and they can see for themselves that this isn’t true.
I also note that they consider it invalid for anybody to tick both or neither of the (two) gender option boxes, although again, it’s not clear from the data they’ve provided how the automatic correction occurs. Increasingly, I’m coming to suspect that this might actually be a manual process, in which case I’m wondering what guidelines there are for their operators?
One good piece of news from this FoI request, though: the ONS has confirmed that the original census data – the filled-in paper forms, which unlike the online version doesn’t enforce its validation upon you – is not adjusted. So in a hundred years time, people will be able to look back at the actual forms filled in by poly, trans, and other non-standard households around the UK, and generate actual statistics on the frequency with which these occur. It’s not much, but it’s something.
Following up on my earlier blog posts about how data on polyamorous households is recorded in the census (see parts one, two, and three), as well as subsequent queries by Zoe O’Connell on this and related topics (how the census records data on other relationships, such as marriage between same-gender partners and civil partnerships between opposite-gender partners), there’s finally been some progress!
No; that’s a lie, I’m afraid. We’re still left wading around in the same muddy puddle. Zoe’s Freedom of Information Act request, which basically said “Okay, so you treat this kind of data as erroneous. How often does this happen?” got a response. And that response basically said, “We can’t tell you that, because we don’t have the information and it’d cost too much to work it out.” Back to square one.
Still: it looks like she’s not keen to be beaten, as she’s sent a fresh FoI request to instead ask “So what’s the algorithm you’re using to detect this erroneous data?” I was pleased to see that she went on to add, effectively, “I don’t need an explanation: send me the code if you need to,” which makes it harder for them to fall behind the “It’s too expensive!” excuse yet again.
Anyway: it’s one to watch. And needless to say, I’ll keep you all posted when anything changes…
Unimpressed with the slow response time that I and others were getting to my query to the Office of National Statistics (to which I still never received a response) the month before last, Zoe O’Connell decided to send a Freedom of Information Act request demanding a response to a couple of similar questions. After some hassling (I suppose they’ve been busy, with the census and all), they finally responded. The original request and the full response is online now, as is Zoe’s blog post about the response. But here’s the short version of the response:
Polygamous marriages are not legally recognised in the UK and therefore any data received from a questionnaire that appeared to show polygamous relationship in the manner that you suggest would be read as an error. It is recognised that the majority of respondents recording themselves as being in a polygamous relationship in a UK census do so erroneously, for example, ticking the wrong box for one household member on the relationships question.
Therefore, the data to be used for statistical purposes would be adjusted by changing one or more of these relationships, so that each respondent is in a relationship with no more than one person. This is consistent with all previous UK censuses, and others around the world.
A copy of the original questionnaire would be retained as part of the historical record which would show such relationships as they were recorded. We do not attempt to amend the original record.
Any mismatches between the indicated sex and marital status of respondents will be resolved using a probabilistic statistical system which will not necessarily deal with each case in the same way. The system will look at other responses for each person, including those for the Household relationships, and will alter one or more variables to make the response consistent. In the example that you propose, it would either change the sex of one individual, or change the marital status to “Same-sex civil partnership”, depending on which is considered statistically more likely to be correct.
Honestly, I’m not particularly impressed. They’ve committed to maintaining a historical record of the original, “uncorrected” data, so that future statisticians can get a true picture of the answers given, but this is about the only positive point in this response. Treating unusual data as erroneous is akin to pretending that a societal change doesn’t exist, and that this approach is “consistent with previous censuses” neglects to entertain the possibility that this data has value that it might not have had previously.
Yes, there will be erroneous data: people who accidentally said that they had two husbands when they only have one, for example. And yes, this can probably (although they don’t state how they know to recognise this) be assumed to be more common that genuine cases where somebody meant to put that on their census (although there will also be an error rate amongst these people, too). But taking the broad brush approach of assuming that every case can be treated as an error reeks of the same narrow-mindedness as the (alleged; almost-certainly an urban legend) statement by Queen Victoria that lesbianism “didn’t exist.”
“Fixing” the data using probabilities just results in a regression towards the mean: “Hmm; this couple of men say they’re married: they could be civil partners, or it could be a mistake… but they’re in a county with statistically-few few gay people, so we’ll assume the latter.” Really: what?
I’m not impressed, ONS.
Update: a second FoI request now aims to determine how many “corrections” have been made on censuses, historically. One to watch.