Marion No More

November 23, 2005

Faculty status for librarians

My still silent co-poster Divine sent me a copy of a 1977 article by H. William Axford on faculty status for librarians. Sorry, can't link to it since it's not in an open access journal, but here's the full citation:

Axford, H. William. "The Three Faces of Eve: or The Identity of Academic Librarianship, A Symposium." Journal of Academic Librarianship 2.6 (1977): 276-278.

In my professional career, I've had positions with and without faculty status, and have come to the conclusion that while the work doesn't really change because of status, I prefer having faculty status. The Axford piece captures quite well the nebulous position of librarians in the academy, but he posits a homogenous faculty club, where terminal degree, publications, and research determine entrance. That's a bit too two-dimensional for my tastes. There are plenty of differences already extant in the faculty club. A humanities "researcher" is, in the eyes of someone in, say, biochemistry, a total hack, with their seven articles and book before tenure or whatever it takes. In the sciences, they crank out seven articles a quarter or so, all while running constant experiments where they manage a large staff of grad. students, post docs, and staff research assistants. Then there are all those fields where faculty don't publish much at all: art, music, theater performance, dance, etc., but are judged on their teaching and creative abilities. Compared to a voice professor, librarians are easy to assess for tenure. That arts bunch also has a bunch of people without PhDs, so not even that is a uniform hurdle for entrance.

Sure, no one on the faculty here views a librarian as a true peer in terms of prestige or whatever. What do I care? I'll take faculty status for what it offers me: job security, access to grant money, a place in university governance (where my departmental affiliation matters little, to be honest), etc.

November 18, 2005

Google wimps out

Anyone not living under a rock has probably already heard that Google Print is now known as Google Book Search. So, in one deft sleight of hand maneuver, Google backs away from its grandiose claims that this would create a massive digital library and now says it will help people find books so that they can "buy or borrow them." In some ways, that is all they ever claimed, but in the meantime a lot of people, including some Google spokespersons, have assumed or extrapolated that they were building a digital library.

Fine. OCLC's database whoops ass all over anything Google will create by scanning all of these books. Why? Because contributors to OCLC use what we now call metadata, while Google believes in the almighty power of full text indexing paired with sophisticated search software. Anyone who's tossed keywords at any large and diverse full text corpus--say JSTOR, and it's puny compared to even a fraction of what Google plans to do--knows how undesirable full text searching can be. Plus, there's no correlation between works, e.g.- multivolume sets, those linked by subject, and so forth. Sure, there are a lot of cool social community things you can do with Google's approach such as tagging and so forth, but before long we'll have hybrid OPACs (i.e.-data from two sources: the library's database and external sources) that allow that anyway, so big deal. Oh, and Google's OCR will suck, because getting low error rates is slow and very, very expensive and requires human monitoring, so their full text searching may work swell, but across garbage output. Let alone the languages they can't even run OCR against, but I've ragged on that enough.

I'll make a deal with anyone still beholden to Google's vision (yes, I admit it has value, just not on the hyped level they espouse): let's meet in five years and look for some books. I get OCLC, you get Google. I'm putting $50 in 2005 dollars on OCLC.

What would be slick, of course, would be a union of OCLC's database with Google Book Search, with a whole bevy of proximity and context search options for the full text piece. I'd pay real money for that ...

A Gorman hangover

OK, time for me to confess that I voted for Michael Gorman. Serves me right for failing to engage my brain before filling out my ALA ballot in, oh, five seconds. I saw his name, thought "gee, I heard him give a decent lecture eight or so years ago," and dutifully checked his name. Mea culpa, mea culpa, mea maxima culpa.

Now that I've gotten that off of my chest, I'll point out that many librarians are posting responses to Gorman's inane comments, the latest of which appeared in the WSJ on 11/1. One of the funnier I've seen is at Free Range Librarian, not least for its lovely title. Any librarian with a blog should put up at least one anti-Gorman post in the hope that the combined weight of all of these posts will offset some of the damage wrought by his high-profile foot-shooting.

November 17, 2005

Is the job market good or bad

In recent years, there has been a relatively steady stream of articles and reports in the library literature bemoaning the lack of jobs for new LIS graduates. That makes for some interesting reading for those of us who have seen many searches close in recent years for lack of qualified applicants.

What creates the gap between the new graduates and the jobs that require subject or technical expertise that can't be found? The main problems seem to lie not in a lack of MLS-holders, but a dearth of applicants with additional graduate degrees. The BA or BS simply doesn't supply enough subject expertise, so all of those BA or BS/MLS holders are excluded from any jobs requiring significant subject expertise.

Beyond that, however, there are so many unrealistic expectations in job announcements that it makes the head spin. Recently, Harvard posted a position that had absolutely no supervisory component, yet required eight years of professional experience. Such high hurdles are not uncommon. There appears to be a general hesitancy on the part of library management to accept that some members of the younger generation of libraries should be moved into management based on merit and potential, rather than on years in the profession. This is a common practice in many other fields--granted, most of these are far from academia--but from my point of view, I find the reliance on the usual indicators of time in the profession, successively broader supervisory experience, and so forth a little too conservative. What about creativity, ambition, vision?

November 13, 2005

Why Google Print is not better than sliced bread

So much hype continues to float around Google Print, despite the fact that the flaws seem rather obvious to the critical observer. A few bold voices are pointing out its weaknesses, but the fact that most newspapers and magazines just regurgitate Google's press releases is rather disappointing. Makes one realize how little most people understand the concepts behind storing and providing access to large amounts of textual data. Here, in a few easy talking points, are reasons why GP is not going to revolutionize the world, but is instead the latest overhyped "new thing."

  • Abbyy's FineReader software is arguably the best OCR software out there. It's also the OCR engine behind the software that drives the Kirtas book scanner, which Google may or may not be using. Note on the list of languages that Finereader can recognize that one does not see, oh, Arabic, among other missing languages. It's not Abbyy's fault, of course; these languages are still waiting for viable OCR engines.
  • Google is not using metadata for the titles in Google Print, but is apparently relying on the power of their search software to index titles (again, they're not revealing details). Anyone who has ever searched across huge text databases using keywords can readily grasp why this is a bad idea. A very bad idea. Why aren't they using metadata? Well, for one it's expensive to catalog books.
  • The lack of metadata means that multivolume works are not easy to use in Google Print. There's no table of contents, no ability to pull up all of the volumes together, etc.
  • The scan quality is wretched for older works.
This is, of course, a partial list. If anyone disagrees with anything written here, I'd love to hear other views or be corrected.

October 05, 2005

Google Print and OCR

Although one cannot know what is truly going on since all partners are mum on the technology, standards, and methods being used, it is generally assumed that the scanning machine in use for Google Print is the Kirtas book scanner. Having witnessed a full Kirtas demo in May, I can attest to two things. First, it is indeed an amazing machine with a lot of potential, although it has some very clear drawbacks: can't handle certain paper weights, largely useless for unbound materials, can't handle books cracked along the spine, etc. So those books and materials--of which we collectively have millions--require special handling.

Second, and more significantly, Kirtas is not currently experimenting with ABBYY's FineReader XIX, which is the only viable Fraktur OCR engine at this point. The software behind the Kirtas scanner uses FineReader (and only FineReader, might I add). When I asked the rep about XIX I got a blank stare. One could say that he's just a sales rep, but Kirtas is a tiny firm, i.e.- they all tend to know a lot about the various parts of the business, plus he was otherwise intimately familiar with the workings of the software and OCR engine. Also, Kirtas freely admits that many languages have no viable OCR options at all, first and foremost Arabic. Thankfully, no important texts were ever penned in German before 1945 nor in Arabic, so we don't have to worry about that!

Another thing to consider is OCR quality. Google Print's is garbage, but I'm not sure they care. Why? Because newer books will probably be added from native digital source, i.e.- no OCR will be necessary, or, if they are scanned, OCRing texts from recent decades is child's play in OCR terms. It's the newer books that will drive the business model by creating sales and spinoff revenue. Consider that Gale, when they create OCR for products such as ECCO (pdf), uses five different OCR engines, compares results, and then runs the best match against a painstakingly developed proprietary dictionary that copes with a host of quirks such as ligatures, f/s issues, variant forms of caps, etc. And after all of that, they're still so displeased with the results (which are actually quite good) that they developed fuzzy search technology to allow even mismatched terms to be found. I'm not saying that Google couldn't do all of this, too, but frankly, they won't I'd wager. It's too expensive and they'd have to have a market to whom they could sell the results for a high price (have you seen the price tag on ECCO lately?).

Incidentally, for the first time today I searched for a title in Google Print and was asked to log in with my Google Account info to see the text. I'm beginning to grow weary of Google wanting to track the usage of everything I touch. I've stopped using Google Desktop and won't even touch a gmail account, no matter how hip they are at the moment. The fact that they're willing to singlehandedly decide what copyright means in 2005 makes me wonder how seriously they take privacy. If publishers can be beaten back by Google, what chance do I as an individual consumer have if they choose to misuse my personal data. No thanks. As is being written these days, we have a new Microsoft on our hands.

No, the irony that this blog is hosted by a Google firm is not lost on me. I dislike Microsoft's tactics, but haven't penned a document in anything but Word for about 15 years.

October 04, 2005

Sorry your job market sucks

Speaking of Mr. Hilton, all of those wayward PhDs (generally in humanities fields) who couldn't find a job through the grind of the AHA, MLA, etc. jobseeking mill come to mind. I feel for them, I really do, not least because I'm married to someone who actually survived that and landed a decent tenure-track job. But my sympathy has its limits.

If you can't find a job as an academic, don't come whining, hat in hand, to my profession, claiming--as you are told by all those "retool your PhD into other skills" consultants--that your experience as a researcher in a library or archive qualifies you to work as a professional librarian in a research library. We'd be happy to have your PhD knowledge, but only if you come to us with an MLS in tow. Say what you will about the MLS (no, it's not the most difficult nor gratifying degree), but it is a career essential for librarians, not only for the knowledge one gains--cataloging, reference, library software systems, acquisitions, etc. are learned skills, not self-evident--but as a means to create the culture and values of the profession. Just look at the disastrous state of any collection that is built by giving faculty broad discretion to make purchases if you want evidence of the effectiveness of having professional librarians do the job.

I recently vacated a job for which one of the requirements is a second graduate degree in a humanities field in addition to the MLS. It's galling to me to see the number of PhD-holding individuals who either can't read or willfully overlook the letters MLS and applied anyway. Fortunately, the good people on the hiring committee know the perfect use for such applications--recycling bin filler--so the position will remain open until one of these knuckleheads' peers decides that, hey, if I had an MLS a whole world of job prospects would open up.

Chew on this, PhD-holding non-librarians: when I look back at my humanities grad school colleagues, I outearn them, have generally always chosen where I work and in what capacity, have the ability to earn academic tenure, and, perhaps most importantly, I don't have incessant anxious thoughts about my future and/or whether or not my employer will spit me out when it suits them. I also have a nicer office, more travel funding than most of them, and don't have to beg my chair or dean for little crumbs like decent computers and printers. Do I regret not finishing my PhD and getting an MLS instead? Ha!

October 03, 2005

More on Michigan's Hilton

It's worth pointing out the background of James Hilton. Although he is currently "associate provost and interim librarian," he is, in fact, not a librarian but one of those faculty members named to high-ranking positions who then deigns to assume that since faculty use libraries they know all there is to know about libraries and their users. I'm sure Dr. Hilton is an excellent scholar--in his field--but this does not qualify him to pronounce texts in non-digital formats as outdated and doomed for oblivion. I'm glad that in his field, evidently, the literature has gone online, but as a librarian I have a broader perspective and know well that for many disciplines and sub-disciplines, printed and archival materials will remain relevant for the foreseeable future.

Michigan and megalomania

There's always been more than a hint of megalomaniacal aspirations present in the libraries and librarians that have decided to go to bed with Google. Much is claimed about how this project will transform the world; it remains to be seen if that is true, or if it will just drive Google's bottom line (and that of firms such as Amazon), but the project will certainly transform the CV and careers of those librarians involved in it.

The hyperbole surrounding this undertaking knows few boundaries, and anyone who points out the flaws in such an undertaking (such as the annoying little facts that German Fraktur and Arabic, to name but two examples, cannot be OCRed by Kirtas technology at the present--but who cares about those languages, right, since nothing significant was ever penned in them) is dismissed as a Luddite, because, in time, Google and its partners will solve all of these problems. Again, that remains to be seen, so I choose to dismiss the dismissers by saying that I'm pointing out real and existing problems, while they are espousing possible outcomes.

For a stellar example of megalomania and hyperbole, check out this little tidbit from James Hilton of Michigan. He claims, among other things, that "[i]n the future ... [m]aterial that does not exist in digital form will effectively disappear." Really? Just like all medieval manuscripts were burned when the printing press came about?