La prueba de significancia de la hipótesis nula y la dicotomización del valor p: Errare humanum est

Edward Mezones-Holguin; Ali Al-kassab-Córdova; Percy Soto-Becerra; Sonia Hernández-Díaz; Jay S. Kaufman

doi:10.17843/rpmesp.2024.414.14285

Autores/as

Edward Mezones-Holguin Centro de Excelencia en Investigaciones Económicas y Sociales en Salud, Universidad San Ignacio de Loyola, Lima, Perú. https://orcid.org/0000-0001-7168-8613
Ali Al-kassab-Córdova Centro de Excelencia en Investigaciones Económicas y Sociales en Salud, Universidad San Ignacio de Loyola, Lima, Perú. https://orcid.org/0000-0003-3718-5857
Percy Soto-Becerra Vicerrectorado de Investigación, Universidad Continental, Huancayo, Perú https://orcid.org/0000-0001-5332-9254
Sonia Hernández-Díaz Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, EE.UU. https://orcid.org/0000-0003-1458-7642
Jay S. Kaufman Department of Epidemiology, Biostatistics, & Occupational Health, McGill University, Montreal, Canada https://orcid.org/0000-0003-1606-401X

DOI:

https://doi.org/10.17843/rpmesp.2024.414.14285

Palabras clave:

Análisis Estadístico, Pruebas de Hipótesis, Bioestadística, Epidemiología y Bioestadística, estadística & datos numéricos

Resumen

La toma de decisiones en salud es compleja y requiere informarse en la mejor evidencia científica. En este proceso, la información generada a partir del análisis estadístico de los datos es crucial, el cual puede desarrollarse desde las perspectivas frecuentista o bayesiana. En la arena frecuentista, la prueba de significancia de la hipótesis nula (PSHN) y el valor p es una de las técnicas de mayor uso en diferentes disciplinas. No obstante, la PSHN desde la academia ha sido sometida a una serie de cuestionamientos desde diversas aristas, lo cual ha conllevado a situarla como una de las causantes de la denominada crisis de replicabilidad en la ciencia. En este artículo de revisión, realizamos un breve recuento histórico sobre su desarrollo, resumimos los métodos subyacentes, describimos algunas controversias y limitaciones, abordamos el mal uso y mala interpretación, para finalmente dar algunos alcances y reflexiones en el contexto de la investigación biomédica.

Descargas

Los datos de descarga aún no están disponibles.

Referencias

Fardet A, Lebredonchel L, Rock E. Empirico-inductive and/or hypothetico-deductive methods in food science and nutrition research:

which one to favor for a better global health? Crit Rev Food Sci Nutr. 2023;63(15):2480–93. doi: 10.1080/10408398.2021.1976101.

Lash TL, VanderWeele TJ, Haneause S, Rothman K. Modern Epidemiology. Wolters Kluwer Health; 2020. 1340 p.

Hubbard R, Haig BD, Parsa RA. The Limited Role of Formal Statistical Inference in Scientific Inference. Am Stat. 2019;73(sup1):91–8. doi: 10.1080/00031305.2018.1464947.

Lin H. To Be a Frequentist or Bayesian? Five Positions in a Spectrum. Harv Data Sci Rev [Internet]. 2024 [citado el 4 de agosto de 2024];6(3). doi: 10.1162/99608f92.9a53b923.

Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA. Evolution of Reporting P Values in the Biomedical Literature, 1990-2015. JAMA. 2016;315(11):1141–8. doi: 10.1001/jama.2016.1952.

Gelman A. P values and statistical practice. Epidemiol Camb Mass. 2013;24(1):69–72. doi: 10.1097/EDE.0b013e31827886f7.

Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337. doi: 10.1007/s10654-016-0149-3.

Dahabreh IJ, Bibbins-Domingo K. Causal Inference About the Effects of Interventions From Observational Studies in Medical Journals. JAMA. 2024;331(21):1845–53. doi: 10.1001/jama.2024.7741.

Chén OY, Bodelet JS, Saraiva RG, Phan H, Di J, Nagels G, et al. The roles, challenges, and merits of the p value. Patterns. 2023;4(12):100878. doi: 10.1016/j.patter.2023.100878.

Baker M. Statisticians issue warning over misuse of P values. Nature. 2016;531(7593):151. doi: 10.1038/nature.2016.19503.

Demidenko E. The p-Value You Can’t Buy. Am Stat. 2016;70(1):33–8. doi: 10.1080/00031305.2015.1069760.

Kuffner TA, Walker SG. Why are p-Values Controversial? Am Stat. 2019;73(1):1–3. doi: 10.1080/00031305.2016.1277161.

Mendoza C. El Valor P en Epidemiología. Rev Chil Salud Pública. 2006;10(1):47–51.

Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature. 2019;567(7748):305–7. doi: 10.1038/d41586-019-00857-9.

Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124. doi: 10.1371/journal.pmed.0020124.

McShane BB, Gal D, Gelman A, Robert C, Tackett JL. Abandon Statistical Significance. Am Stat. 2019;73(sup1):235–45. doi: 10.1080/00031305.2018.1527253.

Szucs D, Ioannidis JPA. When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment. Front Hum Neurosci [Internet]. 2017 [citado el 11 de marzo de 2019];11. doi: 10.3389/fnhum.2017.00390.

Pagano M, Gauvreau K. Principles of Biostatistics. Taylor & Francis; 2018. 584 p.

Perezgonzalez JD. Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing. Front Psychol. 2015;6:223. doi: 10.3389/fpsyg.2015.00223.

Lehmann EL. The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two? J Am Stat Assoc. 1993;88(424):1242–9. doi: 10.2307/2291263.

Mark DB, Lee KL, Harrell FE. Understanding the Role of P Values and Hypothesis Tests in Clinical Research. JAMA Cardiol. 2016;1(9):1048–54. doi: 10.1001/jamacardio.2016.3312.

Lytsy P. P in the right place: Revisiting the evidential value of P‐values. J Evid-Based Med. 2018;11(4):288–91. doi: 10.1111/jebm.12319.

Gibson EW. The Role of p-Values in Judging the Strength of Evidence and Realistic Replication Expectations. Stat Biopharm Res. 2021;13(1):6–18. doi: 10.1080/19466315.2020.1724560.

Desai J, Watson D, Wang V, Taddeo M, Floridi L. The epistemological foundations of data science: a critical review. Synthese. 2022;200(6):469. doi: 10.1007/s11229-022-03933-2.

Duerr PM. Popper: Critical Rationalist, Conventionalist, and Virtue Epistemologist. HOPOS J Int Soc Hist Philos Sci. 2023;13(1):54–90. doi: 10.1086/724046.

Koch E, Otarola A, Romero T, Kirschbaum A, Ortuzar E. Popperian epidemiology and the logic of bi-conditional modus tollens arguments for refutational analysis of randomised controlled trials. Med Hypotheses. 2006;67(4):980–8. doi: 10.1016/j.mehy.2006.03.033.

Amrhein V, Greenland S. Remove, rather than redefine, statistical significance. Nat Hum Behav. 2018;2(1):4. doi: 10.1038/s41562-017-0224-0.

Trafimow D, Amrhein V, Areshenkoff CN, Barrera-Causil CJ, Beh EJ, Bilgiç YK, et al. Manipulating the Alpha Level Cannot Cure Significance Testing. Front Psychol [Internet]. 2018;9. doi: 10.3389/fpsyg.2018.00699.

Schober P, Bossers SM, Schwarte LA. Statistical Significance Versus Clinical Importance of Observed Effect Sizes: What Do P Values and Confidence Intervals Really Represent? Anesth Analg. 2018;126(3):1068–72. doi: 10.1213/ANE.0000000000002798.

Wasserstein RL, Lazar NA. The ASA’s Statement on p-Values: Context, Process, and Purpose. Am Stat. 2016;70(2):129–33. doi: 10.1080/00031305.2016.1154108.

van Zwet E, Gelman A, Greenland S, Imbens G, Schwab S, Goodman SN. A New Look at P Values for Randomized Clinical Trials. NEJM Evid. 2023;3(1):EVIDoa2300003. doi: 10.1056/EVIDoa2300003.

van Zwet EW, Cator EA. The significance filter, the winner’s curse and the need to shrink. Stat Neerlandica. 2021;75(4):437–52. doi: 10.1111/stan.12241.

Liao C, Speirs AL, Goldsmith S, Silber SJ. When “facts” are not facts: what does p value really mean, and how does it deceive us? J Assist Reprod Genet. 2020;37(6):1303–10. doi: 10.1007/s10815-020-01751-4.

Ferrill MJ, Brown DA, Kyle JA. Clinical versus statistical significance: interpreting P values and confidence intervals related to measures of association to guide decision making. J Pharm Pract. 2010;23(4):344–51. doi: 10.1177/0897190009358774.

Lavine M. P-values don’t measure evidence. Commun Stat - Theory Methods. 2024;53(2):718–26. doi:10.1080/03610926.2022.2091783

Amrhein V, Korner-Nievergelt F, Roth T. The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research. PeerJ [Internet]. 2017;5. doi: 10.7717/peerj.3544.

Betensky RA. The p-Value Requires Context, Not a Threshold. Am Stat. 2019;73(sup1):115–7. doi: 10.1080/00031305.2018.1529624.

Bird A. Understanding the Replication Crisis as a Base Rate Fallacy. Br J Philos Sci. 2021;72(4):965–93. doi: 10.1093/bjps/axy051.

Colquhoun D. The reproducibility of research and the misinterpretation of p-values. R Soc Open Sci. 2017;4(12):171085. doi: 10.1098/rsos.171085.

Ioannidis JPA. Why most discovered true associations are inflated. Epidemiol Camb Mass. 2008;19(5):640–8. doi: 10.1097/EDE.0b013e31818131e7.

Schimmack U, Bartoš F. Estimating the false discovery risk of (randomized) clinical trials in medical journals based on published p-values. PLOS ONE. 2023;18(8):e0290084. doi: 10.1371/journal.pone.0290084.

Sidebotham D, Dominick F, Deng C, Barlow J, Jones PM. Statistically significant differences versus convincing evidence of real treatment effects: an analysis of the false positive risk for single-centre trials in anaesthesia. Br J Anaesth. 2024;132(1):116–23. doi: 10.1016/j.bja.2023.10.036.

Andrade C. HARKing, Cherry-Picking, P-Hacking, Fishing Expeditions, and Data Dredging and Mining as Questionable Research Practices. J Clin Psychiatry. 2021;82(1):20f13804. doi: 10.4088/JCP.20f13804.

Dmitrienko A, D’Agostino RB. Multiplicity Considerations in Clinical Trials. N Engl J Med. 2018;378(22):2115–22. doi: 10.1056/NEJMra1709701.

Hoffmann S, Schönbrodt F, Elsas R, Wilson R, Strasser U, Boulesteix A-L. The multiplicity of analysis strategies jeopardizes replicability: lessons learned across disciplines. R Soc Open Sci. 2021;8(4):201925. doi: 10.1098/rsos.201925.

Lydersen S. Adjustment of p values for multiple hypotheses: why, when and how. Ann Rheum Dis. 2024;83(10):1254–5. doi: 10.1136/ard-2024-225537.

Adda J, Decker C, Ottaviani M. P-hacking in clinical trials and how incentives shape the distribution of results across phases. Proc Natl Acad Sci. 2020;117(24):13386–92. doi: 10.1073/pnas.1919906117.

Matthews R. The p -value Statement, Five Years On. Significance. 2021;18(2):16–9. doi: 10.1111/1740-9713.01505.

Benjamini Y, De Veaux RD, Efron B, Evans S, Glickman M, Graubard BI, et al. ASA President’s Task Force Statement on Statistical Significance and Replicability. CHANCE. 2021;34(4):10–1. doi: 10.1080/09332480.2021.2003631.

Lecoutre M-P, Poitevineau J, Lecoutre B. Even statisticians are not immune to misinterpretations of Null Hypothesis Significance Tests. Int J Psychol. 2003;38(1):37–45. doi: 10.1080/00207590244000250.