Big Data

boomzilla

NB: This thread is meant to focus on the technical aspects of the article: big data, confirmation bias, uncertainty, etc. Discussion of the politics of the situation should go to the garage (threads are free!).

https://nplusonemag.com/online-only/online-only/confirmation-bias/

The core of Clinton campaign strategy was their analytics system, developed by dozens of researchers...The oracle of the system was “Ada,” a big-data simulator that issued up-to-the-minute probabilities on Clinton’s chances by state and county.
...
Ada ran “400,000 simulations a day of what the race against Trump might look like.”
...
Yet what must have seemed like a foolproof, detailed prescription for victory based on data and computation was mostly a confirmation of preexisting biases—particularly the campaign’s faith in the firewall [of Great Lakes states].
...
Once the initial analysis showed that Clinton was favored to win in certain states, Ada helped prevent the campaign from questioning her conclusions.
...
The campaign validated Ada’s model nightly, but the question is, what was being validated? Certainly not Michigan voter tendencies, because the campaign wasn’t collecting enough data there. Where the campaign was collecting data, such as Pennsylvania, they allocated more resources, because the data confirmed that the state was a trouble spot. What was validated, ultimately, was the internal consistency of the campaign’s initial assumptions. Those assumptions, and Ada’s apparent statistical support for them, caused so much inertia that the Clinton campaign starved Michigan of resources and ignored Wisconsin’s low-enthusiasm Clinton supporters, many of whom ended up not voting.
...
Throughout the year, Nate Silver’s 538 assigned Clinton a lower probability of winning than most outlets, but that was not a consequence of 538 having better data or “seeing” things that others did not. It was a result of their choosing a distribution that mandated less certainty.
...
The most dangerous kind of code—as I learned too many times in my years as a software engineer at Google and Microsoft—is the kind that breaks but appears to keep working
...
What Ada needed to do was to generate recommendations for collecting new data most likely to falsify her recommendations—like ground-level voter verification throughout Michigan, or interrogating turnout in the “safe” Clinton districts of Pennsylvania. Only an aggressive attempt to falsify would have broken the hermetic seal on Ada’s model.
...
One can imagine an anti-Ada, which instead of spitting out probabilities, generates points where knowledge is thin—like presumed “safe states” like Wisconsin. Instead of creating certainty, this anti-Ada would foster doubts—but the right doubts. It could steer organizers to places they would not otherwise go, in order to collect more data, and would challenge complacency on the part of the organizers.

djls45

@boomzilla said in Big Data:

One can imagine an anti-Ada, which instead of spitting out probabilities, generates points where knowledge is thin—like presumed “safe states” like Wisconsin. Instead of creating certainty, this anti-Ada would foster doubts—but the right doubts. It could steer organizers to places they would not otherwise go, in order to collect more data, and would challenge complacency on the part of the organizers.

That sounds like an easy thing to imagine, but I think it would actually be a really difficult problem to solve. It would require an algorithm to figure out what you don't know. And if you don't know how much you know or don't know, or even what the difference is between them, you can't really expect a program to help with that.

Maciejasjmj

TRWTF is that what made or broke the election according to the article wasn't the actual merit of the candidates, but whether they showed up to say hi.

Kian

@Maciejasjmj elections are popularity contests. The majority of the population doesn't vote based on a rational analysis of what their interests are and what each candidate offers, but on identity, feelings and emotion.

Picking a party is like picking a sports team. You'll back your team even if they are objectively awful because being a part of the team is bound to your identity. Same with your political affiliation.

loopback0

djls45

@Kian said in Big Data:

@Maciejasjmj elections are popularity contests.

Yes, too much so. They ought to be more like a business's hiring process, because that's literally what we're doing: we're choosing the CEO for the country for the next four years.

The majority of the population doesn't vote based on a rational analysis of what their interests are and what each candidate offers, but on identity, feelings and emotion.

I'm not sure I agree with this. Yes, identity, feelings, and emotion do have an influence, but not so much that people flat out ignore what the candidates offer.

Picking a party is like picking a sports team. You'll back your team even if they are objectively awful because being a part of the team is bound to your identity. Same with your political affiliation.

I think my only issue here is your use of the word "objectively". Each party has its own set of goals and values and a platform of suggested solutions for the issues that it deems important, and the opposing party sees those solutions as fraught with problems or contrary to its own goals and values.

A lot of voters look at the platforms, compare them to their own beliefs, and vote for the party/candidate that matches those beliefs. The activists (including the politicians and candidates {and too often the media}) on both sides try to vilify the other side by trying to define it by the problems they see with its positions instead of the goals that it is trying to achieve.

One example that demonstrates this difference is the almost completely inverted hierarchy of goals that each side has relative to its opposition. For example in the USA, the Right (generally Republicans) tends to prioritize individual control over private property (and "family", etc.) to be near the top and public welfare dependent on and subject to it, while the Left (generally Democrats) tends to view public welfare near the top of the list with individual control over private property as subject to the public good (the "village", etc.).

To bring it back to the topic, it becomes extremely difficult and subjective to try to encode those differences in an algorithm in such a way as to be able to predict what people will do.

HardwareGeek

@djls45 said in Big Data:

Yes, identity, feelings, and emotion do have an influence, but not so much that people flat out ignore what the candidates offer.

Leaving aside the issue of modern identity politics, identity — in the form of party affiliation — certainly can and has taken the place of any consideration of what the candidates offer. I don't know if it's still true, but in some places, the voting machines used to have buttons or levers or whatever to vote for a party. One was encouraged to think to oneself, I'm a ${Party}an, so I'll vote for $Party. Pull a single lever; mark the ballot for every $Party candidate on the ballot, without regard to the merits or lack of any individual candidate; no thought required.

As for the current software under discussion, the famous question asked of Babbage comes to mind, "If you put the wrong figures in, ..." The more complicated the analysis, the harder it is to know whether you're analyzing the right inputs. Until we invent Deep Thought (or Skynet), I don't see the software itself helping much with that.

The article said they tested the validity of their model (or something like that; it's not easy to refer back to previous posts while writing a reply on mobile), but at least the quoted bit doesn't say how they did that. That, obviously, is the part that went wrong.

boomzilla

@Maciejasjmj said in Big Data:

TRWTF is that what made or broke the election according to the article wasn't the actual merit of the candidates, but whether they showed up to say hi.

If you look back at the previous two elections, this sort of thing was something that the Obama campaigns had perfected. It was yet another reason why the Republicans would never win again.

But any campaign has to deal with how to spend scarce resources.

Maciejasjmj

@boomzilla said in Big Data:

It was yet another reason why the Republicans would never win again.

...what?

Maciejasjmj

@HardwareGeek said in Big Data:

One was encouraged to think to oneself, I'm a ${Party}an, so I'll vote for $Party. Pull a single lever; mark the ballot for every $Party candidate on the ballot, without regard to the merits or lack of any individual candidate; no thought required.

Yeah, but... that's not even it either. At least when you affiliate with the party, you kind of, sort of have an idea of what the party stands for. But here we have people going or not going to the elections depending on whether the candidates showed up and told them that yep, it's kind of important.

And from what I know of election rallies, it's not even a question of whether Clinton or Trump says on record that issues of $STATE are important to them, it's just making the same fuss as everywhere else, since people seem to not go out of their way to actually do research.

Yamikuronue

Great find, @boomzilla ! I was just reading yesterday about the Trump campaign's version of this: http://www.newyorker.com/magazine/2017/03/27/the-reclusive-hedge-fund-tycoon-behind-the-trump-presidency

It seems like one of the things they did better was testing their predictions using Breitbart as a staging ground:

Patterson told me that Mercer seems to have applied “a very Renaissance Technologies way of thinking” to politics: “He probably estimated the probability of Trump winning, and when it wasn’t very high he said to himself, ‘O.K., what has to happen in order for this twenty-per-cent thing to occur?’ It’s like playing a card game when you haven’t got a very good hand.”

On top of this nonprofit spending, Mercer invested in private businesses. He put ten million dollars into Breitbart News, which was conceived as a conservative counterweight to the Huffington Post. The Web site freely mixes right-wing political commentary with juvenile rants and racist innuendo; under Bannon’s direction, the editors introduced a rubric called Black Crime. The site played a key role in undermining Hillary Clinton; by tracking which negative stories about her got the most clicks and “likes,” the editors helped identify which story lines and phrases were the most potent weapons against her.

boomzilla

@Yamikuronue Unfortunately not much technical meat there (not surprising for a New Yorker article).

My father in law worked at ORC back when they were Nixon's pollsters in 1968. They did something that was new back then, which was to ask people who listened to Nixon speak how close Nixon's views were to their own. Then the campaign could figure out how to fine tune the message.

Yamikuronue

@boomzilla yeah, it doesn't sound like Mercer is talking much about his algorithms, but I suspect there's a great story there -- the guy who got rich doing financial algorithms turns his mind to the campaign trail, how could there not be?

boomzilla

@Yamikuronue They mentioned some companies that are doing stuff like that, just no details on what they're actually doing:

Mercer also invested some five million dollars in Cambridge Analytica, a firm that mines online data to reach and influence potential voters.

Some comments about people saying that they don't do anything useful, but no way to evaluate anything from the article.

CrazyEyes

@Maciejasjmj He's referring to the popular opinion that Republicans were dead in the water before Trump became the Republican candidate for president.

Maciejasjmj

@CrazyEyes said in Big Data:

@Maciejasjmj He's referring to the popular opinion that Republicans were dead in the water before Trump became the Republican candidate for president.

...what? Didn't they win the Congress by a landslide two years before?

djls45

@Maciejasjmj Any policy requires a 2/3 majority vote in both houses in order to override a presidential veto, so an opposition president can inhibit a lot of what the legislature might try to do.

And 1/3 of the Senate is up for re-election every two years, as well as all of the House of Representatives, so a win 2 years ago means little if the election this time is different.

The Senate is currently split 52R/46D/2I, while the House is split 237R/193D with 5 vacant. Neither of those is enough to override a veto without bipartisan support, which precludes a partisan policy bill.

boomzilla

@Maciejasjmj said in Big Data:

...what? Didn't they win the Congress by a landslide two years before?

They've been winning lots of elections while Obama was President.

Dec 27, 2016 / 05:44

Democrats lost over 1,000 seats under Obama

President Obama claims he could have won a third term if he had been allowed to run – but even if he's right, his coattails haven’t done much for the rest of his party.

Still, that was explained as being in certain states and with the aid of jerrymandering, both of which were true to one extent or another. But the conventional wisdom said that demographic shifts were creating a permanent national Democratic voting bloc. For instance, a fairly balanced look:

http://www.citylab.com/politics/2016/02/demography-favors-the-democrats/470937/

But lots of people took that sort of thinking and imagined that the Republicans were doomed in Presidential elections for decades after Obama's successes. These ideas are being revised:

Mar 1, 2017 / Politics

The Democratic Party Is Facing A Demographic Crisis

Trends in traditionally Democratic voting blocs betray a potentially bleak future for the party.