The Revolution Will Be Quantified

How a database is changing the way we see the world.

It's the most complete demographic record of humanity ever assembled, anywhere.

Historical demographers subsist on data—lots and lots of data. Data about how people work, procreate, worship and arrange themselves into households. Data about race, migration, literacy—everything down to whether they own a radio or happened to be sick on a certain day.

To get the data they need for their research, historical demogra phers go to extremes. Trent Alexander, a researcher at the Minnesota Population Center at the University of Minnesota, went digging through frozen boxes in a Kansas cave managed by the National Archives. Colleague Bob McCaa logged more than 200,000 air miles last year trying to convince foreign governments from Sudan to Ireland to allow the MPC to catalog and study their census data.

Historical demographers study how human populations evolve over time. Their goal is to get to the truth about what happened in our past—to affirm or destroy long-held academic narratives such as “changing family structures helped spark the Industrial Revolution” (they didn’t) and “social mobility has improved over time” (it hasn’t).

For those who study U.S. history from 1850 onward, the tool of choice is a database series called the Integrated Public Use Micro data System-USA, or IPUMS-USA. Launched by MPC director Steve Ruggles and his team in 1995, IPUMS-USA includes data on more than 150 million Americans who were recorded by the U.S. census at some point in the past 160 years. It’s the most complete demographic record of humanity ever assembled, anywhere. The older records provide storehouses of information: names, ages, occupations, military service. Records from less than 72 years ago are similar except that all identifying characteristics of the people have been removed.

Since its launch in 1989, IPUMS has attracted more than 30,000 registered researchers. In 2000, IPUMS-USA was joined by an international version, IPUMS-I. Another, IPUMS-CPS, includes data from the annual Current Population Survey. 
“There’s really nothing like it in the world,” says Ruggles, who was dubbed “King of Quant” by Wired magazine in 1996 for his assertion that many of humanity’s mysteries could be solved if only one took the time to gather and analyze the numbers. 
The introduction of IPUMS revolutionized historical demography. Even as late as the 1990s, researchers had to create and maintain their own data sets using birth records, census data and anything else they could get their hands on. Simply building an accurate data set was so laborious that researchers might produce only a handful of studies in their lifetimes, and were limited to studying tiny areas, such as small neighborhoods.

“My advisor back in college had a room full of these punch cards that he was using for his data set,” says Alexander. “It was pretty insane.”

Academics were loath to share their data because it was so time consuming to build their own study tools, so there was no way for researchers to replicate each other’s findings. “If I didn’t believe your results, well, that was just too bad for me,” says Alexander. “There was a lot of mistrust in the field.” In the 1990s, a few for-profit companies started offering ready-made data sets: great for researchers who could afford them, not so great for anyone without a steady funding stream.

But IPUMS—massive, searchable, free and online—changed all that. Demographic historians were suddenly free to spend their time analyzing and writing instead messing around with punch cards and reel-to-reel tapes. In a single swoop, the playing field was leveled between low-paid graduate students and well-funded researchers. Most remarkable of all, demographic historians were no longer limited to neighborhoods or locales.

Of late, Ruggles is focused on adding depth and complexity to his creation. MPC recently received a $2 million federal grant to bring the 1850s sample from 1 percent to 100 percent. Ruggles is working on adding a slave-owning census from 1850, as well as the mortality censuses from the 19th century. Another grant has a team of people retrieving data from original optical-scan census forms from 1960. Ruggles’ team had to dig out the forms from a frozen cave in Lenexa, Kansas, after a census bureau employee noticed that the computer record was missing 20 percent of Chicago—the second-largest city at that time. Shivering in National Archives–logoed parkas, Ruggles’ researchers gathered the forms from a storeroom called “The Icebox,” carefully thawed them and scanned them on specialized equipment.

“My goal is nothing less than collecting the world’s data,” Ruggles says. And when he says it, you believe him.

