News

Snippetizing

By Glenn Fleishman September 14, 2009

Google is a mighty big and frightening company. It wants to index all earthly knowledge. It mints money. And many of us have transferred much of our daily activities to its cloud-based hive-mind of servers.

Now it's coming for the books.

In Vernor Vinge's cyberpunk novel Rainbow's End
, a firm called Huertas is shredding the library at UCSD one book at a time, using high-resolution imaging to capture and reassemble the shreds as they fly through the air, rather than using gear that painstakingly turns and flattens each page. The physical pages are irretrievable, just as if the tomes were thrown into a wood chipper.

There's something quite marvelous about destroying books to save them, turning atoms into bits almost literally. In the book, a group of mostly fogies, gangs together to save the books, although far more ulterior—and iffy—motives are at work, too.

Google's not proposing to shred books. It—and, separately, Amazon—have developed hardware that enables careful high-speed, high-resolution scanning of books in libraries. Google has scanned 10 million; Amazon 3 million.

No, Google is proposing to shred rights, not books.



The Google Books project started several years ago. Google scanned millions of books at participating academic libraries, turned images into searchable text, and made portions available online as the results of searches. (For new books readily available that you find at books.google.com, Google signed deals with publishers and authors for snippets, excerpts, and so forth.)

A lawsuit against the project was launched 4 years ago by the Authors Guild and the group that represents U.S. publishers, alleging both that Google had no right to scan books under copyright for which permission wasn't obtained nor to display snippets of said books. A potential settlement is a year old, and nearly scuttled. The House of Representatives had a hearing on Sept. 10 on the topic due to the intense interest.

In brief, Google wants a pass on violating copyright law. Anything we write—to exclude other more complicated creative acts—has an inherent copyright attached. Copyright now extends to 70 years after we (as authors) die, or 95 years after a corporation has a work created for it with the copyright signed over (a work for hire).

Public domain works are those for which the copyright expired (everything before 1923, and a subset of books between 1923 and 1950), or for which a creator has explicitly given all rights away.

Google has its eyes on a very large set of books that are out of print—no longer commercially available— and yet remain under copyright, with copyright holders impossible to find. This may be millions of titles. These are called orphaned works, and are the topic of much interest, because of the knowledge and invented words locked within.

The settlement between authors, publishers, and Google proposes a non-profit clearinghouse to handle rights payments. Authors and publishers can opt out of the settlement, but the default position for orphaned works would be that Google would have the right to digitize, snippetize, and monetize, with fees going to the clearinghouse to pay out at some future point.

This represents Google's encompassing love and compassion for these orphans. But it looks like it won't hold water. And that might be a good thing.

(You're asking, "Why should I care?" Because the more human knowledge that Google has exclusive access and control over, the less sources of information we have, the less creativity flourishes, and the more expensive things become, even if Google offers most stuff "for free" for now. And, since a lot of people have become creators—of photographs, of blog entries, of twitter tweets, of email—there's a lot more protected by copyright than just books written in 1955.)

Three big reasons seem likely to scuttle the potential settlement.

First, Google would get a pass on future lawsuits, most likely. Sure, authors and publishers who don't like the settle could still sue, but the fact that a huge hunk of the industry settled and a court sanctioned it makes other suits far less likely to succeed.

Second, Google gets the equivalent of a compulsory license for orphaned works. No other firm could obtain the same license without going through the same lawsuit process, ostensibly, and then why would the authors and publishers settle? They already have a great deal. (Compulsory licensing is what allows any publicly released song to be recorded by other musicians, even though there are rules about notifying the song's creator and requirements on permission to release covers.)

Third, authors outside the U.S. that have books that have been licensed and sold here could be swept in without their permission. That violates international copyright conventions.

At the Congressional hearings last week, the Register of Copyrights said pretty clearly that while orphaned works need our succor, that anointing an exclusive party through a court process is the wrong path. She argued, whatever path is taken should be handled by Congress, be non-exclusive, and favor the rightsholders - even if they are not reachable (dead, missing, uninterested). Rightsholders must opt in, or, if not found, a reasonable effort at notification has to be made for every single orphaned work that might be used in some fashion.

At the conclusion of the Rainbow's End, the contingent of aged booklovers have managed to thwart a dangerous plot unrelated to book shredding, but also put enough of a spanner in the works that Huertas is fired from the job. Instead, the Chinese government is bringing in devices that lovingly caress each book as pages are delicately turned.

It's easy to see Google in the same position. Being the biggest gorilla in the cage doesn't necessarily let you eat all unclaimed bananas.
Filed under
Share
Show Comments