Fortune telling: Crowds surpass pundits

Experts and pundits tend to be terrible fortune tellers.

Often wrong but rarely in doubt, they become invested in their own theories, rejecting new information that challenges their beliefs.

The evidence is overwhelming, from Albert Einstein's prediction that "there is not the slightest indication that nuclear energy will ever be obtainable," to George Will's that Mitt Romney would win the 2012 presidential election by a landslide.

Just as solidly proven, but far less known, is that in most cases, a group of average citizens venturing good guesses is more likely to make accurate forecasts than a typical authority on a subject, especially a smugly confident one.

This counterintuitive truth has fascinated social scientists, psychologists, and statisticians for more than a century. But it was not until four years ago that the nation's intelligence community decided to focus its attention - and largesse - on figuring out how to take advantage of what has come to be known as "the wisdom of the crowd."

Hoping to improve its accuracy forecasting critical world events, the federal Intelligence Advanced Research Projects Activity (IARPA) organized a tournament.

The agency, which is the research and development branch of the Office of the Director of National Intelligence, invited five academic and industry groups to enter, including one based at the University of Pennsylvania.

Each team was asked to predict the outcome of hundreds of international political, military, and economic scenarios. The objective was to beat the accuracy of a control group.

The contest was supposed to last four years. But at the end of the second year, the Penn-led team, dubbed the Good Judgment Project, was so far ahead of the others that all the other teams were fired, said Barbara Mellers.

She and her husband, Philip Tetlock, both psychology professors at Penn, and their colleague Don Moore at the University of California, Berkeley, are the force behind that winning team.

In last month's issue of the Journal of Experimental Psychology: Applied, they described how they have been breaking ground in the quest to make better guesses about the future.

"Our study is the first to keep score and track categories of variables that predict performance in the politically sensitive domain of intelligence analysis," they wrote.

The Good Judgment Project recruited thousands of volunteers through professional and scholarly listservs and word of mouth.

To qualify, forecasters had to be at least 18 years old. Nearly all had finished college and many held advanced degrees.

They had to complete a battery of psychological and political tests that assessed what kind of forecasters they were and how well they behaved in an open-minded way toward the evidence, Mellers said. The participants tended to be "news junkies," she said - people who "enjoy reading about details of stories, researching them, and finding out what's going on behind the scenes."

To optimize the team's success, Tetlock, Meller, and their colleagues are using complex algorithms, subtle adjustments, and sophisticated evaluations.

But the main principle behind the experiment is startlingly simple. Its origins can be traced to a Victorian country fair.

In 1906, at the West of England Fat Stock and Poultry Exhibition, 800 people bought tickets for the chance to guess the weight of a butchered ox. The scientist Francis Galton, interested in the "trustworthiness and peculiarities of popular judgments," collected the answers, tossed out a dozen that were illegible or defective, and calculated the median from the rest.

The result - 1,207 pounds - turned out to be a few dinner plates' difference from the ox's true weight of 1,198 pounds. The average of the guesses was even more dead on: 1,197 pounds.

The accuracy of collective thinking has been proven in study after study since Galton's day.

In 2004, the journalist James Surowiecki came out with the book The Wisdom of Crowds, reviewing many of those studies and exploring the phenomena in sports, stock markets, science, and the game show Who Wants to Be a Millionaire.

"Under the right circumstances," Surowiecki wrote, "groups are remarkably intelligent, and are often smarter than the smartest people in them."

From the outset, the tournament's goal was to define those circumstances and see if there was a way to beat the crowd's reliably pretty good odds, said Steven Rieber, the program manager at IARPA.

This does not mean that the CIA is planning to rely on random groups of citizens to predict the effect of new sanctions on Iran.

"Forecasting is just one part of intelligence analysis," Rieber said. Analysts will still need to cultivate sources, study history, and understand context.

But based on what the experiment has already found, it is clear that better forecasts may be less dependent on an individual's deep knowledge of a particular subject than the ability to think nimbly.

"Ignorance is not a virtue," Rieber said. "But subject matter expertise doesn't matter as much as you might think."

The tournament will end this summer, he said, when the Good Judgment Project's predictions are compared with those of a control group selected by the intelligence community.

Thus far, Rieber said, the Good Judgment forecasts have been impressive, with nearly perfect predictions on questions such as "Will a significant foreign or multinational military force invade or enter Iran between 17 December 2012 and 31 March 2013?" (the answer was no) and "Will a Zimbabwean referendum vote approve a new constitution before 1 April 2013?" (the answer was yes).

Each of the original five teams employed brilliant thinkers in a broad range of disciplines from some of the nation's most distinguished universities.

Still, Rieber said, it was not terribly surprising when the Good Judgment Project pulled ahead.

Tetlock, the author of Expert Political Judgment: How Good Is It? How Can We Know?, has devoted most of his academic career to the study of expert opinion. Mellers is an authority on decision-making and Moore's research has focused on overconfidence.

One of the features that set their team apart from the others was the training model they developed to teach forecasters how to choose relevant information and avoid bias. They also discovered that individuals make better guesses when they periodically receive feedback about how successful they have been, and are allowed some interaction with others who are working on the same questions.

Two years ago, the project began skimming the top 2 percent of forecasters - the 250 volunteers who most consistently made accurate predictions - and organized them into a subcategory of "superforecasters."

"I'm just a guinea pig in this study," said Karen Ruth Adams, an associate professor of political science at the University of Montana, who is one of only 20 women who has earned superforecaster status.

Adams was motivated to sign up, she said, by a philosophical belief that knowledge should be applied and theories tested beyond the walls of academia.

During the four years she has participated in the Good Judgment Project, Adams has worked alone as well as part of a group.

She was surprised to learn of the gender imbalance in the pool of volunteers, she said. About 30 percent of political scientists, security scholars, and policy analysts in U.S. think tanks are women, Adams said, while they constitute only 17 percent of all forecasters in the Good Judgment Project.

At this point, there does not seem to be a good answer for why that happened, she said.

"The main problem may have been recruiting," she said. The project was not set up to cull through applicants to create any particular distribution in age, gender, expertise, or field of interest.

In her current challenge, Adams and her colleagues on the superforecaster prediction market have been given a bankroll of fake money called "inkles" to bet on the probability of events such as "Will Kim Jong Un meet with a major head of state by June?"

In September, Adams believed there was a 50 percent chance that he would, so back in August, she wagered 1,500 inkles at 29 inkles per share.

One of the intellectual qualities that distinguishes a superforecaster is the willingness to keep an open mind, to seek and integrate new information and revise predictions based on changing events.

Accordingly, Adams closely follows news about North Korea and periodically adjusts her wager. When Vladimir Putin invited Kim to Russia's World War II commemoration, she raised her odds to 75 percent and bought additional shares. In late January, when the inkle market price topped 80, she cashed out on a hunch that circumstances could get in the way of Kim's trip.

"We're just at the tip of the iceberg learning what makes for a good analyst and forecaster," she said. "You really have to be willing to say, I was wrong." 215-854-2590