Big Data’s dark history

by Geoff Olson

Photo: IBM subsidiary poster from 1934:
“See everything with Hollerith punchcards.”

• Big data is a buzz-term that has softened into a tabloid cliché in a remarkably short time. It refers to the massive quantities of publicly and privately archived information that can be digitally analyzed to identify subtle patterns and trends. Interested parties range from marketers to policy makers to students to hackers.

“Big data is at the foundation of all the megatrends that are happening today, from social to mobile to cloud to gaming,” says Chris Lynch of Vertica Systems. All thanks to faster computer processing, sophisticated software, cheap mobility and vast, billowy clouds of personal information. No aspect of our lives in the 21st century will remain untouched or unseen by big data, experts insist.

In a recent issue of Wired, the magazine’s founding executive director, Kevin Kelly, reminisced about a question he put to Google cofounder Larry Page in 2000. Why, with so many web search companies out there, were he and colleague Sergey Brin getting into the game by offering search for free?

“Oh, we’re really making an AI,” Page responded. Artificial intelligence, that is.

“Rather than use AI to make its search better, Google is using search to make its AI better,” Kelly would later realize. In other words, each query instructs the company’s networked machine intelligence to sharpen its inventory of concepts. For example, image searches for “dog” teaches the AI to refine its recognition of the word, independent of the breed, angle of view or lighting.

Bear in mind this is happening now, even without the much-ballyhooed prediction of machine consciousness. Kelly predicts cloud computing will have a network effect, feeding on itself as more people train the AIs of Google and other companies.

“Our AI future is likely to be ruled by an oligarchy of two or three large, general-purpose cloud-based commercial intelligences,” he concludes. Brands with brains, in effect. But don’t worry, autonomous machine intelligence only makes a huge swath of Earth’s population redundant in sci-fi dystopias, not in Kelly’s sunny speculations – even though current digital technology has already made hundreds of thousands of North American jobs redundant, affecting photographers, journalists, legal secretaries, copywriters, factory workers, clerical workers and cashiers.

If we look into the unsettling early history of computation and big data, we have plenty of reasons to doubt Kelly’s uncritical, bring-it-on optimism.

“Mankind barely noticed when the concept of massively organized information quietly emerged to become a means of social control, a weapon of war and a roadmap for group destruction,” writes investigative author Edwin Black in his landmark 2008 study, IBM and the Holocaust: The Strategic Alliance Between Nazi Germany and America’s Most Powerful Corporation.

Reichsführer Adolf Hitler was determined to eliminate a large ethnic fraction of Germans, but intermarriage and secularization made it difficult to track racial bloodlines. “This was the Nazi data lust. Not just to count the Jews, but to identify them,” notes the author. IBM Germany, known in 1930s Europe as Deutsche Hollerith Maschinen-Gesellschaft, or Dehomag, had just the thing necessary for the job: index cards with punched holes.

Dehomag leased and maintained the machines that profiled all German Jews through punchcard processing. The machines were the analogue precursors to today’s digital computers and the punchcards were both proto-programs and tickets to the death camps.

The company was founded in 1898 by German inventor Herman Hollerith as a census tabulating company, but by the time of Hitler’s rise, Dehomag was a subsidiary of International Business Machines, which had its head office in New York. “IBM NY always understood – from the outset in 1933 – that it was courting and doing business with the upper echelon of the Nazi Party,” writes Black. The personal representatives of IBM chairman Thomas J. Watson – whose last name now adorns the company’s Jeopardy-playing supercomputer – kept their boss appraised of the German subsidiary’s work, the author claims.

The Holocaust would have proceeded without big data, but the company’s manpower and machines greatly expedited the process.

“Solipsistic and dazzled by its own swirling universe of technical possibilities, IBM was self-gripped by a special amoral corporate mantra: ‘If it can be done, it should be done.’ To the blind technocrat, the means were more important than the ends.. The destruction of the Jewish people became even less important because the invigorating nature of IBM’s technical achievement was only heightened by the fantastical profits to be made at a time when bread lines stretched across the world,” Black writes.

The phrase, “turning people into numbers,” traces back to the identification tattooed onto Jewish flesh as part of the automated machinery of mass death. The Holocaust was, in effect, the premiere of big data in the 20th century.

In 1945, the year of World War II’s end, big data reached a second benchmark with the first problem assigned to the first working electronic digital computer. Hungarian mathematician John von Neumann used ENIAC (Electronic Numerical Integrator And Computer) to perform millions of calculations required for making the hydrogen bomb. The input/output involved one million punchcards and the calculations took six weeks on a footprint of 1,800 square feet. Today, the processing would take minutes or less on a notepad computer.

In other words, big data, surveillance and militarism have been intertwined since the first half of the 20th century. It’s no accident that data-driven wars of aggression abroad come bundled with data-driven domestic surveillance at home. From drone strikes to wireless wiretapping, we still live under Edwin Black’s dictum of the technocrats: “If it can be done, it should be done.”

“As we invent more species of AI, we will be forced to surrender more of what is supposedly unique about humans,” Kevin Kelly brightly insists. “The greatest benefit of the arrival of artificial intelligence is that AIs will help define humanity. We need AIs to tell us who we are.”

That is tragically wrong. We don’t need the entire contents of Wikipedia on an implant, a fully sentient Siri or AIs with corporate selfhood to tell us who we are. What we need is untrammelled wilderness and the companionship of other human beings, despite their funky smells, weird ideas and boundless capacity to disappoint.

Big data and AI can be used to either distort or cultivate our humanity. I suspect we will continue to use them to do both. The future is unwritten and we cannot presume that networked machine intelligence offers us a binary option as humanity’s saviour or destroyer. But we have every reason to be cautious.

I have a coffee table book called The Human Face of Big Data filled with nifty charts, up-tempo quotes and smiling faces. The caption on the back cover reads, “Every animate and inanimate object on Earth will soon be generating data, including our homes, our cars, and yes, even our bodies.” It is accompanied by a photograph of a baby surrounded by a clutch of gadget-clutching visitors, all half-shrouded in darkness. She lies on her back in a brightly lit crib, seen and scanned, but untouched by human hands.

It’s an image unintentionally Faustian in character, of a child isolated by the very tools intended to liberate her, through lifelong quantification. But as sociologist William Bruce Cameron once observed, “Not everything that can be counted counts and not everything that counts can be counted.”

Leave a comment