SHHH!
Google's scan of U-M library progresses ... quietly
Eric Morath / The Detroit News
ANN ARBOR -- As if searching billions of Web pages wasn't enough, Google is pushing full-steam ahead to scan all of the University of Michigan's -- and some day the world's -- books.
Lost in the hoopla surrounding Google's fast-growing Ann Arbor advertising office is the search engine giant's first foray into Ann Arbor: an effort to digitize U-M's library. The project has hit its stride -- scanning 30,000 volumes in a recent week -- and is beginning to make a serious dent in the school's total of 7 million.
At the current pace, the project should wrap up in the next five years, said associate university librarian John Price Wilkin.
Advertisement
That's amazing to Wilkin, who also leads the university's own digitization project that began before the school partnered with Google. The in-house project scans about 5,000 volumes a year. At that pace, scanning the entire library would take 1,400 years.
Having U-M's collection, and that of 12 other leading libraries and numerous publishers, digitized and searchable worldwide has the power to change how the world accesses information, Wilkin said.
"It will take down walls," he said. "This is one of the great research libraries in the world, and increasing amounts of it are available to those around the world."
In Google's view, even the wide expanse of the Internet can't compare to the amount of knowledge stored in books. So searching and retrieving results from written works is a natural outgrowth of Google's root technology.
The digitized books can be found at books.google.com, but also are being integrated into standard Google searches. Searches already routinely find books credited to U-M's collection.
"We are still in the early stages but even today a lot people are discovering books searching Google.com," said Adam Smith, head of Google Book Search. "We think it's extremely powerful because for years books were where ideas were primarily transmitted."
Google's altruistic motive for the project is to make the books available to those who may not have easy access to them -- whether that's people in rural Michigan or in a remote outpost in Africa.
But the corporation stands to benefit financially from the project, as well. While Google is the leader in search, it faces evolving competition. The power to search the world's books as well as the Web could help Google maintain its leadership -- and therefore its attractiveness to advertisers.
Project follows secrecy policy
In typical Google fashion, mystery surrounds the Book Search project.
The company won't say how many of U-M's books they've digitized, how many people work on the project, or even where the books are scanned -- although university officials say it's off site.
The concept dates to the Google founders' days as Stanford graduate students, although tangible work didn't begin until 2002 and U-M, the first partner, didn't have books scanned until 2005.
Scanning also is in progress at the other partner libraries, including Harvard, Oxford and the New York Public Library.
What is clear is how it works for Google searchers. When a user punches in "Napoleon at Waterloo," for example, Google will return results of books in the partner libraries that reference those words.
If the work is protected under copyright laws, a snippet of the text will appear along with information about how to buy or borrow the book. If the work is out of copyright, it will be displayed in its entirety. Google says it limits the amount of copyrighted work it displays so it stays within "fair use" expectations of copyright laws.
But a group of publishers and authors disagrees that the search engine is operating within the copyright law, and filed suit against Google in 2005. The case is pending. The publishers and authors claim Google is copying entire works without permission, and then using that information for the company's gain.
"There's no doubt whatsoever that it's to Google's financial benefit to do this," said Allan Adler, the American Association of Publishers' vice president for legal and government affairs. "It's wrong not because of what they are displaying, but because of what they have to do to make the displays."
Google argues that the limited amount of information it displays ultimately benefits holders of the copyright because it encourages searchers to seek out the book. The company also points out that roughly 70 percent of all books are in copyright, but no longer in print, meaning no one is reaping copyright benefits.
Ironically, many of the publishers suing Google also work with the firm to provide excerpts of newer books for Book Search. The difference in those cases is that Google and the publisher reached a licensing agreement about how much of a work can be displayed.
Deal lets U-M focus efforts
The legal scuttlebutt has caused some partner libraries to limit the books Google can scan to those in the public domain, but U-M hasn't placed any limitations on the search engine.
In fact, creating a digital copy of every book in its library was a university goal before it joined with Google.
President Mary Sue Coleman has spoken about the importance of digitization, especially in light of the damage Hurricane Katrina inflicted on libraries.
With Google doing the heavy lifting, U-M can narrow its preservation efforts, Wilkins said. The university scans several thousand books a year.
"Google's efforts prove to be extremely valuable," he said, "because we can focus on volumes Google can't handle because of their condition."
You can reach Eric Morath at (313) 222-2504 or emorath@detnews.com.





