Illustration shows cupid on horseback, shooting two handguns into the air outside the North Portico of the White House, on the occasion of the wedding of Alice Lee Roosevelt, President Theodore Roosevelt's daughter, and Nicholas Longworth.

Cupid at the White House / C.H.” (1906) | Carl Hassmann / Library of Congress / Public Domain


C-SPAN’s Presidential Historians Survey has become one of the most widely cited efforts to evaluate presidential performance—shaping how figures such as Donald Trump, Barack Obama, and George W. Bush are understood alongside their predecessors. Its rankings circulate in classrooms, media coverage, and public debate, offering what appears to be a structured, expert-driven answer to a perennial question: Who were America’s best and worst presidents? Yet the authority of these rankings’ rests on a methodological choice that is less neutral than it seems.

The 2021 survey illustrates how reputations of contemporary presidents take shape. Barack Obama ranks in the upper tier, buoyed by strong evaluations in crisis leadership, public persuasion, and vision setting. George W. Bush’s standing has improved over time as historians reassess his response to September 11 and reconsider aspects of his longer-term policy legacy. Trump, by contrast, ranks near the bottom, with particularly low scores in moral authority and administrative skills. Yet these placements are not fixed. They remain sensitive to how the survey aggregates its measures, underscoring that even widely accepted rankings of recent presidencies depend on methodological choices rather than settled historical consensus.

Since 2000, C-SPAN has asked historians to rate presidents across nine dimensions, including crisis leadership, economic management, moral authority, administrative skills, and relations with Congress. Respondents also provide a tenth rating—“Performance in Context”—intended to capture their overall judgment. The published rankings simply average all 10 scores, presenting the result as a balanced synthesis of detailed evaluation and holistic assessment.

At first glance, this approach might seem reasonable. More inputs suggest a more comprehensive evaluation, and averaging them appears neutral. The problem is that the overall rating is not independent of the nine components. In practice, it largely summarizes them. Averaging the 10 scores therefore counts the same underlying judgments twice—once directly through the component ratings, and again through the overall score that reflects them.

The data make this clear. Statistical analysis of the 2017 survey shows that the nine specific categories account for roughly 98.5 percent of the variation in “Performance in Context.” In plain terms, once a president’s overall score is known, the specific categories add very little additional information.

Many surveys are designed so that detailed measures inform broader judgments. The issue arises when both are combined into a single index. Doing so introduces a hidden weighting scheme. Because the overall score is more strongly associated with some dimensions than others, those dimensions receive disproportionate influence in the final average—without that weighting ever being made explicit.

The effects are not merely technical. When rankings are recalculated using only the “Performance in Context” measure, they differ from C-SPAN’s published results for a majority of presidents. In the 2021 survey, about two-thirds shift positions: 52 percent by one or two ranks and 16 percent by three to five ranks. Even small changes in aggregation produce meaningfully different hierarchies.

These differences reveal how the composite index implicitly prioritizes certain dimensions of presidential performance. Further statistical analysis shows that crisis leadership, vision setting, economic management, and public persuasion exert greater influence on the combined index, while others—such as equal justice for all or administrative skills—carry less weight. This prioritization is not explicitly chosen or explained; it emerges from the structure of the aggregation itself.

This critique does not suggest that the survey is flawed. On the contrary, it is thoughtfully designed and unusually transparent. The issue is one of interpretation. By combining component scores with the overall judgment they inform, the survey blurs the line between measurement and evaluation, conveying a sense of precision without fully revealing how that precision is constructed.

A modest adjustment would clarify that logic. C-SPAN could base its headline rankings solely on the “Performance in Context” score, presenting it explicitly as a holistic expert judgment. The nine component ratings would remain essential, showing how historians arrive at that judgment and where each presidency’s strengths and weaknesses lie. In this respect, this revised methodology would be analogous to the approach used by Arthur Schlesinger Sr. and Jr. in their presidential rankings from 1948 to 1996: historians were asked to judge presidents solely on their performance in the White House, while defining for themselves the components that should count in that assessment (Political Science Quarterly, November 2, 1997). Our choices, therefore, are clear: either a holistic ranking like Schlesinger or the components, but not both collapsed into a single, redundant index.

Returning to the three contemporary presidents, applying the holistic ranking to the 2021 survey would worsen President Trump’s position by one place, while leaving Presidents Obama and George W. Bush unchanged. By contrast, Presidents Nixon and Carter would each worsen by four places under the 2021 holistic rankings.

The broader issue extends beyond a single survey. Any effort to quantify historical judgment involves choices about what counts, how it is weighted, and how it is presented. When those choices are embedded in widely circulated metrics, they shape not only scholarly debate but also civic understanding. Rankings influence how presidents are taught, discussed, and remembered—elevating some qualities while quietly diminishing others. In translating professional judgment into a single composite score, the survey not only reflects reputations—it helps produce them, embedding a hidden logic in how presidential greatness is measured.