I'm having some real trouble getting to grips with XPATH, as applied to XML (specifically MARCXML) data. I'm really impressed with its efficiency at helping me track down the code I'm searching for, but I'm having trouble getting to the next step of extracting & working with that code once I've found it.
I have a large set of bibliographic records, pulled from a library catalog in MARCXML. I've put together a long query - a really long query, since it includes duplicate search terms to deal with the case-sensitivity of XPATH - searching certain subfields of each entry for keywords. MARCXML is pretty tidy, so with code that looks like this:
<collection> <record> <leader>01345cad a25003622c 4564</leader> <controlfield tag="001">9984333660001531</controlfield> <controlfield tag="005">20130789942407.2</controlfield> <controlfield tag="008">850333c19861663nybee db 201 0 eng </controlfield> <datafield tag="010" ind1=" " ind2=" "><subfield code="a">86448866</subfield> <datafield tag="500" ind1=" " ind2=" "><subfield code="a">"Published in Great Britain under the title The Grumble and Bolt Encyclopedia [i.e. Encyclopaedia] of 20th-century Archeology"--T.p. verso.</subfield></datafield> <datafield tag="504" ind1=" " ind2=" "><subfield code="a">Includes bibliographies and index.</subfield></datafield> [...] I'm running an XPATH query that looks like this:
/collection/record/datafield[@tag='500' or @tag='504']//text()[contains(.,'great') or contains(.,'Great') or contains(.,'contents') which will search through records like the one above, and return entries like the one above, since they match my request for records containing "Great" in the 504 field.
I'm using XML Oxygen Developer to run my queries, which sorts through the records and gives me a list of what matched. The problem I'm having is the results displayed in Oxygen: screenshot of Oracle's results window. I'd like to use XPATH to select data from my MARCXML file, but to extract it rather than transform it. If I save the data from the results window I have access to Oracle's "XPATH location", "Resource ID", "Location" data instead of the useful control fields contained in my original XML.
Sorry for such a long post - I wanted to ask if anyone had some advice. My first thoughts are to use XQUERY, reformatting my XPATH query so that it returns or prints the data I want (so 'If' certain fields match certain keywords, Return 'controlfield tag=001, ", " controlfield tag=005", and so on). Or alternatively to use another XML viewer that can find a match and export it intact. Any advice much appreciated! And if I'm short on info please let me know and I can supply more.
No comments:
Post a Comment