The replication crisis represents one of the most thoroughly documented methodological challenges in contemporary psychology. Large-scale projects have produced clear, quantifiable evidence of low reproducibility across both high-profile and lower-profile studies. This document synthesizes the key statistical findings, distinguishes between high- and low-profile research, and examines what these numbers reveal about the field’s overall reliability.
The most cited large-scale effort remains the 2015 Open Science Collaboration project, which attempted to replicate 100 studies published in three high-impact psychology journals (Psychological Science, Journal of Personality and Social Psychology, and Journal of Experimental Psychology: Learning, Memory, and Cognition).
Subsequent projects have produced comparable ranges. A 2018 study replicating 21 experimental economics studies achieved a 61% replication rate, while social psychology-focused efforts have often fallen in the 25–50% range depending on methodology and strictness criteria.
High-Profile Studies (often published in top-tier journals and widely cited): These tend to show the lowest replication rates. The Open Science Collaboration’s 100-study set deliberately targeted influential papers. Only about one-third replicated robustly. Many headline findings in social priming, ego depletion, and implicit bias research have shown particularly weak replication (e.g., several classic social psychology effects have failed in multiple independent attempts). High-profile work benefits from greater visibility and scrutiny, which has exposed fragility in effect sizes and contextual sensitivity.
Low-Profile Studies (published in specialized or lower-impact journals): Replication rates are generally higher but still concerning. A 2016–2020 meta-project examining a broader sample of psychology studies estimated replication success around 50–60% for less-cited work. However, even these studies frequently show substantial shrinkage in effect size upon replication. The pattern suggests that publication bias and questionable research practices (p-hacking, selective reporting) inflate apparent success across the field, with high-profile findings suffering most from over-optimism in initial reporting.
The verifiable statistics paint a consistent picture: psychology produces many findings that do not hold up under independent scrutiny. High-profile claims — those most likely to influence theory, clinical practice, and public understanding — show the weakest replication. This is not random noise but a structural feature of studying complex, context-dependent human phenomena with methods that often fail to account for observer effects, cultural variation, and environmental context.
The degradation of falsifiability is evident in the gap between initial claims and replication outcomes. When roughly 60–75% of published significant results fail or substantially weaken upon re-testing, the field’s ability to build cumulative knowledge is severely compromised.
These numbers do not invalidate all psychological research. They do, however, demonstrate that the current standard of evidence in much of the field falls short of the rigorous, externally anchored standards seen in physics, chemistry, or molecular biology. Greater transparency, pre-registration, larger samples, and adversarial collaboration are necessary corrections if psychology seeks to strengthen its scientific standing.