Marion No More

November 13, 2005

Why Google Print is not better than sliced bread

So much hype continues to float around Google Print, despite the fact that the flaws seem rather obvious to the critical observer. A few bold voices are pointing out its weaknesses, but the fact that most newspapers and magazines just regurgitate Google's press releases is rather disappointing. Makes one realize how little most people understand the concepts behind storing and providing access to large amounts of textual data. Here, in a few easy talking points, are reasons why GP is not going to revolutionize the world, but is instead the latest overhyped "new thing."

  • Abbyy's FineReader software is arguably the best OCR software out there. It's also the OCR engine behind the software that drives the Kirtas book scanner, which Google may or may not be using. Note on the list of languages that Finereader can recognize that one does not see, oh, Arabic, among other missing languages. It's not Abbyy's fault, of course; these languages are still waiting for viable OCR engines.
  • Google is not using metadata for the titles in Google Print, but is apparently relying on the power of their search software to index titles (again, they're not revealing details). Anyone who has ever searched across huge text databases using keywords can readily grasp why this is a bad idea. A very bad idea. Why aren't they using metadata? Well, for one it's expensive to catalog books.
  • The lack of metadata means that multivolume works are not easy to use in Google Print. There's no table of contents, no ability to pull up all of the volumes together, etc.
  • The scan quality is wretched for older works.
This is, of course, a partial list. If anyone disagrees with anything written here, I'd love to hear other views or be corrected.

2 Comments:

  • There is a lot of critics (unfortunately in German) on my wiki page at
    http://wiki.netbib.de/coma/GooglePrint

    By Blogger kg, at 6:44 PM  

  • Kein Problem. Ik snack platt, so to speak.

    By Blogger ludewig, at 3:40 PM  

Post a Comment

<< Home