Data do not lie, but statistics does even when not intended
-
In Tales from Coronavee-rooss Italy, mamma mia! I mentioned:
I've read a column saying somebody took some set of data and handed it out to many teams with a task to test two specific hypothesis on those data—and even under those conditions some teams proved the hypothesis and some proved the opposite.
So, here is the citation:
Schweinsberg M. et al.: Organizational Behavior and Human Decision Processes, 2021, DOI: 10.1016/j.obhdp.2021.02.003
The paper itself:
I couldn't quickly find any ‘news summary’ for it, so transcribing from the one I have:
They gave a set of the same data, almost 8000 comments from a science discussion server edge.org, to psychologists from different institutions. All participants in the experiment were asked to prove two hypothesis: that debaters with higher professional status talk more then their less esteemed colleagues, and that women are more likely to join a discussion the more women are already participating.
All participants shared their results and the methods used on dataexplained.net. After excluding poorly documented analysis there remained 29 results with huge differences between them. 29% proved debaters with higher status were more verbose, but 21% analysis proved the opposite result and the rest found no difference. According to 64% of the analysis proved the hypothesis about number of women on willingness of another women to join a discussion, but 21% showed opposite correlation and 15% didn't detect any difference.
Source of the difference lies in different definitions (some determined professional status from position, some from number of citations or h-index), used methods (some counted words, some characters, other number of comments independent of their length), also used statistical methods played a role.This shows how big leeway the statistic processing of any observation leaves for any researcher and their cognitive biases, conscious or unconscious. This is (another) mechanism beyond why so many (often quite important) studies fail to be replicated (as also discussed on this forum some time ago).
-
-
-
@Applied-Mediocrity said in Data do not lie, but statistics does even when not intended:
It's fresh… maybe Scott Adams recently ran across the same article.
-
@dkf said in Data do not lie, but statistics does even when not intended:
We all know one can lie with statistics if they want to. The point of the article is more along the lines that even if people are not (actively) trying to lie with statistics, their choices and cognitive biases still cause them to arrive at different conclusions.
-
@Bulb said in Data do not lie, but statistics does even when not intended:
their choices and cognitive biases still cause them to arrive at different conclusions
And beyond that, people will design their experiments such that their desired outcome is more likely - without being aware of that.
Science is dangerous.
-
@Bulb said in Data do not lie, but statistics does even when not intended:
The point of the article is more along the lines that even if people are not (actively) trying to lie with statistics, their choices and cognitive biases still cause them to arrive at different conclusions.
It does go to show that the questions posed were ones that it was difficult to map onto the data actually available. Whether that means that the variables available were just partial proxies for the hypotheses of interest, or that the initial questions were just ill-founded, well…
-
@dkf I'd say that the questions were just vague enough that there were many ways to interpret them and map them onto the data. Which is the case quite often in research in general.
-
@Bulb One of the main things you do in research is trying to understand your research question. Not the answer, the question itself. If you don't understand the question, you won't understand the answers you get.
-
@dkf said in Data do not lie, but statistics does even when not intended:
Not the answer, the question itself.
“What do you get if you multiply six by nine?”
If you don't understand the question, you won't understand the answers you get.
That’s when you invest in a planetary supercomputer.
-
@kazitor in base 13?
-
@topspin said in Data do not lie, but statistics does even when not intended:
in base 13?
I see you're trying to have it make sense. That's your fundamental error.
-
@Bulb said in Data do not lie, but statistics does even when not intended:
@dkf I'd say that the questions were just vague enough that there were many ways to interpret them and map them onto the data. Which is the case quite often in research in general.
Hypothesis 1: "A woman's tendency to participate actively in the conversation correlates positively with the number of females in the discussion."
There were nineteen different interpretations of the two things being correlated, including four ideas about whether to count comments, conversations, or something else as a unit of "participation". Two agreed on all three of those things, but then used different statistical measures to measure "tendency".Hypothesis 2: "Higher status participants are more verbose than lower status participants."
Twenty-three different interpretations.
-
@Bulb said in Data do not lie, but statistics does even when not intended:
@Applied-Mediocrity said in Data do not lie, but statistics does even when not intended:
It's fresh… maybe Scott Adams recently ran across the same
articleforum thread.
-
@kazitor said in Data do not lie, but statistics does even when not intended:
@dkf said in Data do not lie, but statistics does even when not intended:
Not the answer, the question itself.
“What do you get if you multiply six by nine?”
I saw a cool C code snippet once.
#define SIX 1+5 #define NINE 8+1 printf("Six times nine is %d", SIX * NINE);