Tuesday, December 20, 2016

The uselessness of Proquest's LION database



I do not know how much institutions round the world pay for access to this resource. It is time ProQuest cut their fees, for the database has not worked properly for years now. Here's an e-mail dating from 2014 in reply to a complaint I had made:


"Our developers are working on a fix". Yes, I bet they are. It's now two years later. Are they still working on it? This problem arose when the LION database was re-designed. As the re-design caused the problem, they should simply have reverted to the fully functional earlier version.

This is the problem: if you do a search using the 'NEAR' Boolean operator, the database simply delivers a selection of texts in which the two search terms separately occur. For example, this is a search in Victorian era prose (why the first return should be a work by Turgenev escapes me, it's just one of those LION database things) for 'gentleman NEAR horse'. The database does not find the terms in association. Nor do I necessarily believe that just 72 prose fictions of the Victorian period have somewhere in them the words 'gentleman' and 'horse'. The returns, such as they are, are not given in any kind of order that I can discern.


In some cases, usually for the two bottom items on a display of returns, the prose fiction is just a title, with no indication of the number of 'hits'. Just what is the database doing in those cases?




For the search 'lady NEAR jewellery', the database, unable to perform the Boolean function, offers seven returns (that is, ostensibly can only find seven Victorian novels in which 'lady' and 'jewellery' both occur). Trollope's Can you forgive her? provides a spectacular 1027 hits: the characters include 'Lady Macleod' and 'Lady Glencora', of course. 'Jewellery' does indeed occur once, but this is far from any notion of proximity searching.




I've been marking essays on Milton, and occasionally logged on to LION to locate relevant passages omitted by the students. Searching for Milton on the database mysteriously offers both John Milton and John Cage. When the database is being really recalcitrant, it will offer you a text by John Cage when you want Milton returns

If you set out to follow a particular word through Paradise Lost, say 'first', you can't: clicking to see the hits in Book IV shows you Book II, clicking for the hits in Book V shows you only Book III.

Why can't they put these things right? I suspect that their techies know too little of literature: they see some returns, and conclude that the database is working. As far as ProQuest are concerned, they are getting their money (librarians at my college do not seem to know of any organisation of university libraries that could confront ProQuest and demand improvements or lower fees).






No comments: