p-hacking

p-hacking – błędy metodologiczne, jakich dopuszczają się badacze łamiący założenia przyjętego podejścia wnioskowania statystycznego, szczególnie w obszarze weryfikacji hipotez statystycznych, kierując się nadmierną motywacją uzyskania wyniku istotnego statystycznie, ze szkodą dla faktycznej wartości naukowej badań^[1]^[2]^[3]^[4]. Polega to na łamaniu założeń używanych modeli statystycznych, w tym stosowaniu niezależnych prób losowych, oraz na popełnianiu błędów logicznych.

Przykłady błędów typu p-hacking to:

Nadużycie metodologiczne	Prawidłowe podejście
Data fishing, HARKing – wykonywanie niezaplanowanych porównań, w poszukiwaniu różnic istotnych statystycznie, i przedstawianie ich jako zaplanowanej weryfikacji hipotez	Badania eksploracyjne powinny być przedstawiane jako takie, a nie jako z góry zaplanowana weryfikacja hipotez^[5].
Cherry picking – wybiórcze przedstawianie jedynie podzbioru porównań, które osiągnęły istotność statystyczną	Aby publikacje rzeczywiście prezentowały nominalny poziom ryzyka błędu I i II rodzaju należy przedstawić wyniki wszystkich wykonanych testów^[6]^[7].
Problem porównań wielokrotnych – wykonywanie wielu testów tej samej rodziny hipotez na tych samych danych, bez stosowania na to żadnej poprawki	W przypadku wykonywania wielu testów hipotezy na tych samych danych, należy uwzględnić problem porównań wielokrotnych i zastosować np. poprawkę Holma-Bonferroniego^[8].
Podglądanie wyników – obserwacja wyników w trakcie zbierania danych i przedwczesne przerywanie badania, gdy tylko uzyska się istotność statystyczną	Jeśli badanie jest kosztowne, można rozważyć wykorzystanie technik analizy sekwencyjnej, pozwalającej na etapowe testowanie danych w trakcie ich zbierania, i przerwanie badania gdy tylko zostanie uzyskana istotność statystyczna, w sposób kontrolujący błędy I rodzaju^[9]^[10]^[11].

Jedną z metod przeciwdziałania takiemu zjawisku jest prerejestracja planów badawczych, dającą gwarancję, że zaplanowana struktura analiz była przestrzegana, oraz częstsze wykonywanie replikacji badań^[4]^[12]. Wicherts i inni zaproponowali listę kontrolną wyliczającą 34 błędy badawcze, których należy unikać^[13].

Narzędzia metaanalityczne pomagające wykryć p-hacking to między innymi wykresy lejkowe^[14]^[15] i krzywa p^[16]. Wykryto dzięki nim pewną ilość nadużyć w badaniach, na przykład w obszarze psychologii społecznej^[17]^[18]. Przeglądy publikacji wskazują, że p-hacking jest często spotykany, jednak możliwe że nie ma poważnego wpływu np. na metaanalizy^[19]. Przykładowo, w jednym z badań ankietowych do niezaplanowanego podglądania danych i przerywania badań przyznało się 55% z około 2000 naukowców^[20]. Przeglądy zademonstrowały występowanie problemów tego typu między innymi w biznesowych badaniach A/B w informatyce^[21], w ekologii behawioralnej^[22], i w quasi-eksperymentach ekonomii^[23].

Przypisy[edytuj | edytuj kod]

↑ JosephJ. Simmons JosephJ., Leif D.L.D. Nelson Leif D.L.D., UriU. Simonsohn UriU., Life After P-Hacking, „NA - Advances in Consumer Research Volume 41”, 2013 [dostęp 2017-01-15] .
↑ S. StanleyS.S. Young S. StanleyS.S., AlanA. Karr AlanA., Deming, data and observational studies, „Significance”, 8 (3), 2011, s. 116–120, DOI: 10.1111/j.1740-9713.2011.00506.x, ISSN 1740-9713 [dostęp 2017-01-15] (ang.).
↑ George DaveyG.D. Smith George DaveyG.D., ShahS. Ebrahim ShahS., Data dredging, bias, or confounding, „British Medical Journal”, 325 (7378), 2002, s. 1437–1438, DOI: 10.1136/bmj.325.7378.1437, ISSN 0959-8138, PMID: 12493654 [dostęp 2017-01-15] (ang.).
↑ ^a ^b WolfgangW. Forstmeier WolfgangW., Eric-JanE.J. Wagenmakers Eric-JanE.J., Timothy H.T.H. Parker Timothy H.T.H., Detecting and avoiding likely false-positive findings – a practical guide, „Biological Reviews”, 92 (4), 2017, s. 1941–1968, DOI: 10.1111/brv.12315, ISSN 1469-185X [dostęp 2019-03-31] (ang.).
↑ Norbert L.N.L. Kerr Norbert L.N.L., HARKing: Hypothesizing After the Results are Known, „Personality and Social Psychology Review”, 2 (3), 2016, s. 196–217, DOI: 10.1207/s15327957pspr0203_4 [dostęp 2017-01-31] (ang.).
↑ ReginaR. Nuzzo ReginaR., How scientists fool themselves – and how they can stop, „Nature”, 526 (7572), 2015, s. 182–185, DOI: 10.1038/526182a [dostęp 2017-01-31] (ang.).
↑ AndrewA. Gelman AndrewA., EricE. Loken EricE., The Statistical Crisis in Science, „American Scientist”, 102 (6), DOI: 10.1511/2014.111.460 [dostęp 2017-01-31] (ang.).
↑ Olive JeanO.J. Dunn Olive JeanO.J., Multiple Comparisons among Means, „Journal of the American Statistical Association”, 56 (293), 1961, s. 52–64, DOI: 10.1080/01621459.1961.10482090, ISSN 0162-1459 [dostęp 2017-01-31] .
↑ DaniëlD. Lakens DaniëlD., Ellen R.K.E.R.K. Evers Ellen R.K.E.R.K., Sailing From the Seas of Chaos Into the Corridor of Stability, „Perspectives on Psychological Science”, 9 (3), 2014, s. 278–292, DOI: 10.1177/1745691614528520 [dostęp 2017-01-31] (ang.).
↑ P.P. Armitage P.P., C.K.C.K. McPherson C.K.C.K., B.C.B.C. Rowe B.C.B.C., Repeated Significance Tests on Accumulating Data, „Journal of the Royal Statistical Society. Series A (General)”, 132 (2), 1969, s. 235–244, DOI: 10.2307/2343787, JSTOR: 2343787 [dostęp 2017-01-31] .
↑ DaniëlD. Lakens DaniëlD., Performing high-powered studies efficiently with sequential analyses, „European Journal of Social Psychology”, 44 (7), 2014, s. 701–710, DOI: 10.1002/ejsp.2023, ISSN 1099-0992 [dostęp 2017-01-31] (ang.).
↑ Joseph P.J.P. Simmons Joseph P.J.P., Leif D.L.D. Nelson Leif D.L.D., UriU. Simonsohn UriU., False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant, Rochester, NY: Social Science Research Network, 23 maja 2011 [dostęp 2017-01-15] .
↑ Jelte M.J.M. Wicherts Jelte M.J.M. i inni, Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking, „Frontiers in Psychology”, 7, 2016, DOI: 10.3389/fpsyg.2016.01832, ISSN 1664-1078 [dostęp 2019-03-31] (ang.).
↑ MatthiasM. Egger MatthiasM. i inni, Bias in meta-analysis detected by a simple, graphical test, „British Medical Journal”, 315 (7109), 1997, s. 629–634, DOI: 10.1136/bmj.315.7109.629, ISSN 0959-8138, PMID: 9310563 [dostęp 2017-01-15] (ang.).
↑ Jonathan A.CJ.A.C. Sterne Jonathan A.CJ.A.C., MatthiasM. Egger MatthiasM., Funnel plots for detecting bias in meta-analysis, „Journal of Clinical Epidemiology”, 54 (10), s. 1046–1055, DOI: 10.1016/s0895-4356(01)00377-8 .
↑ UriU. Simonsohn UriU., Joseph P.J.P. Simmons Joseph P.J.P., Leif D.L.D. Nelson Leif D.L.D., Better P-Curves: Making P-Curve Analysis More Robust to Errors, Fraud, and Ambitious P-Hacking, A Reply to Ulrich and Miller, Rochester, NY: Social Science Research Network, 10 lipca 2015 [dostęp 2017-01-15] .
↑ UriU. Simonsohn UriU., Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone, Rochester, NY: Social Science Research Network, 29 stycznia 2013 [dostęp 2017-01-15] .
↑ Joseph P.J.P. Simmons Joseph P.J.P., UriU. Simonsohn UriU., Power Posing: P-Curving the Evidence, Rochester, NY: Social Science Research Network, 26 września 2016 [dostęp 2017-01-15] .
↑ Megan L.M.L. Head Megan L.M.L. i inni, The Extent and Consequences of P-Hacking in Science, „PLoS Biology”, 13 (3), 2015, DOI: 10.1371/journal.pbio.1002106, ISSN 1544-9173, PMID: 25768323, PMCID: PMC4359000 [dostęp 2017-01-15] .
↑ Leslie K.L.K. John Leslie K.L.K., GeorgeG. Loewenstein GeorgeG., DrazenD. Prelec DrazenD., Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling, „Psychological Science”, 23 (5), 2012, s. 524–532, DOI: 10.1177/0956797611430953 [dostęp 2017-01-31] (ang.).
↑ Christophe van denCh. Bulte Christophe van denCh. i inni, p-Hacking and False Discovery in A/B Testing, Rochester, NY, 11 grudnia 2018 [dostęp 2019-03-31] (ang.).
↑ Michael D.M.D. Jennions Michael D.M.D. i inni, Evidence that nonsignificant results are sometimes preferred: Reverse P-hacking or selective reporting?, „PLOS Biology”, 17 (1), 2019, e3000127, DOI: 10.1371/journal.pbio.3000127, ISSN 1545-7885, PMID: 30682013, PMCID: PMC6364929 [dostęp 2019-03-31] (ang.).
↑ Anthony G.A.G. Heyes Anthony G.A.G., NikolaiN. Cook NikolaiN., AbelA. Brodeur AbelA., Methods Matter: P-Hacking and Causal Inference in Economics, „IZA Discussion Paper”, Rochester, NY, 17 września 2018 [dostęp 2019-03-31] (ang.).

[1] JosephJ. Simmons JosephJ., Leif D.L.D. Nelson Leif D.L.D., UriU. Simonsohn UriU., Life After P-Hacking, „NA - Advances in Consumer Research Volume 41”, 2013 [dostęp 2017-01-15] .

[2] S. StanleyS.S. Young S. StanleyS.S., AlanA. Karr AlanA., Deming, data and observational studies, „Significance”, 8 (3), 2011, s. 116–120, DOI: 10.1111/j.1740-9713.2011.00506.x, ISSN 1740-9713 [dostęp 2017-01-15] (ang.).

[3] George DaveyG.D. Smith George DaveyG.D., ShahS. Ebrahim ShahS., Data dredging, bias, or confounding, „British Medical Journal”, 325 (7378), 2002, s. 1437–1438, DOI: 10.1136/bmj.325.7378.1437, ISSN 0959-8138, PMID: 12493654 [dostęp 2017-01-15] (ang.).

[:0-4] WolfgangW. Forstmeier WolfgangW., Eric-JanE.J. Wagenmakers Eric-JanE.J., Timothy H.T.H. Parker Timothy H.T.H., Detecting and avoiding likely false-positive findings – a practical guide, „Biological Reviews”, 92 (4), 2017, s. 1941–1968, DOI: 10.1111/brv.12315, ISSN 1469-185X [dostęp 2019-03-31] (ang.).

[5] Norbert L.N.L. Kerr Norbert L.N.L., HARKing: Hypothesizing After the Results are Known, „Personality and Social Psychology Review”, 2 (3), 2016, s. 196–217, DOI: 10.1207/s15327957pspr0203_4 [dostęp 2017-01-31] (ang.).

[6] ReginaR. Nuzzo ReginaR., How scientists fool themselves – and how they can stop, „Nature”, 526 (7572), 2015, s. 182–185, DOI: 10.1038/526182a [dostęp 2017-01-31] (ang.).

[7] AndrewA. Gelman AndrewA., EricE. Loken EricE., The Statistical Crisis in Science, „American Scientist”, 102 (6), DOI: 10.1511/2014.111.460 [dostęp 2017-01-31] (ang.).

[8] Olive JeanO.J. Dunn Olive JeanO.J., Multiple Comparisons among Means, „Journal of the American Statistical Association”, 56 (293), 1961, s. 52–64, DOI: 10.1080/01621459.1961.10482090, ISSN 0162-1459 [dostęp 2017-01-31] .

[9] DaniëlD. Lakens DaniëlD., Ellen R.K.E.R.K. Evers Ellen R.K.E.R.K., Sailing From the Seas of Chaos Into the Corridor of Stability, „Perspectives on Psychological Science”, 9 (3), 2014, s. 278–292, DOI: 10.1177/1745691614528520 [dostęp 2017-01-31] (ang.).

[10] P.P. Armitage P.P., C.K.C.K. McPherson C.K.C.K., B.C.B.C. Rowe B.C.B.C., Repeated Significance Tests on Accumulating Data, „Journal of the Royal Statistical Society. Series A (General)”, 132 (2), 1969, s. 235–244, DOI: 10.2307/2343787, JSTOR: 2343787 [dostęp 2017-01-31] .

[11] DaniëlD. Lakens DaniëlD., Performing high-powered studies efficiently with sequential analyses, „European Journal of Social Psychology”, 44 (7), 2014, s. 701–710, DOI: 10.1002/ejsp.2023, ISSN 1099-0992 [dostęp 2017-01-31] (ang.).

[12] Joseph P.J.P. Simmons Joseph P.J.P., Leif D.L.D. Nelson Leif D.L.D., UriU. Simonsohn UriU., False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant, Rochester, NY: Social Science Research Network, 23 maja 2011 [dostęp 2017-01-15] .

[13] Jelte M.J.M. Wicherts Jelte M.J.M. i inni, Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking, „Frontiers in Psychology”, 7, 2016, DOI: 10.3389/fpsyg.2016.01832, ISSN 1664-1078 [dostęp 2019-03-31] (ang.).

[14] MatthiasM. Egger MatthiasM. i inni, Bias in meta-analysis detected by a simple, graphical test, „British Medical Journal”, 315 (7109), 1997, s. 629–634, DOI: 10.1136/bmj.315.7109.629, ISSN 0959-8138, PMID: 9310563 [dostęp 2017-01-15] (ang.).

[15] Jonathan A.CJ.A.C. Sterne Jonathan A.CJ.A.C., MatthiasM. Egger MatthiasM., Funnel plots for detecting bias in meta-analysis, „Journal of Clinical Epidemiology”, 54 (10), s. 1046–1055, DOI: 10.1016/s0895-4356(01)00377-8 .

[16] UriU. Simonsohn UriU., Joseph P.J.P. Simmons Joseph P.J.P., Leif D.L.D. Nelson Leif D.L.D., Better P-Curves: Making P-Curve Analysis More Robust to Errors, Fraud, and Ambitious P-Hacking, A Reply to Ulrich and Miller, Rochester, NY: Social Science Research Network, 10 lipca 2015 [dostęp 2017-01-15] .

[17] UriU. Simonsohn UriU., Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone, Rochester, NY: Social Science Research Network, 29 stycznia 2013 [dostęp 2017-01-15] .

[18] Joseph P.J.P. Simmons Joseph P.J.P., UriU. Simonsohn UriU., Power Posing: P-Curving the Evidence, Rochester, NY: Social Science Research Network, 26 września 2016 [dostęp 2017-01-15] .

[19] Megan L.M.L. Head Megan L.M.L. i inni, The Extent and Consequences of P-Hacking in Science, „PLoS Biology”, 13 (3), 2015, DOI: 10.1371/journal.pbio.1002106, ISSN 1544-9173, PMID: 25768323, PMCID: PMC4359000 [dostęp 2017-01-15] .

[20] Leslie K.L.K. John Leslie K.L.K., GeorgeG. Loewenstein GeorgeG., DrazenD. Prelec DrazenD., Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling, „Psychological Science”, 23 (5), 2012, s. 524–532, DOI: 10.1177/0956797611430953 [dostęp 2017-01-31] (ang.).

[21] Christophe van denCh. Bulte Christophe van denCh. i inni, p-Hacking and False Discovery in A/B Testing, Rochester, NY, 11 grudnia 2018 [dostęp 2019-03-31] (ang.).

[22] Michael D.M.D. Jennions Michael D.M.D. i inni, Evidence that nonsignificant results are sometimes preferred: Reverse P-hacking or selective reporting?, „PLOS Biology”, 17 (1), 2019, e3000127, DOI: 10.1371/journal.pbio.3000127, ISSN 1545-7885, PMID: 30682013, PMCID: PMC6364929 [dostęp 2019-03-31] (ang.).

[23] Anthony G.A.G. Heyes Anthony G.A.G., NikolaiN. Cook NikolaiN., AbelA. Brodeur AbelA., Methods Matter: P-Hacking and Causal Inference in Economics, „IZA Discussion Paper”, Rochester, NY, 17 września 2018 [dostęp 2019-03-31] (ang.).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]