fsteeg.com | notes | tags
∞ /notes/why-lod-needs-applications-and-libraries-need-apis | 2015-12-07 | programming swib lobid web libraries nwbib
Cross-posted to: https://fsteeg.wordpress.com/2015/12/07/why-lod-needs-applications-and-libraries-need-apis/
This post is based on a talk I gave at SWIB15 on November 24 titled "LOD for applications: using the Lobid API". I recently stumbled upon a poster by the American Library Association from 1925, which advertises library work as "the profession on which all other professions and occupations depend": Source: http://imagesearchnew.library.illinois.edu/cdm/singleitem/collection/alaposters/id/23/rec/2 I'm sure that anyone who does library work kind of likes that sentiment, I certainly do. At the same time it makes you wonder how true that statement actually is. Even back then. For instance my grandfather, who was a baker, I don't think he actually depended on library work. But even more, it makes you wonder how true that is today. I don't have an answer to that, but to the extend that it is true today, there's something else involved, and that's software. Because today software is a thing on which every profession and occupation depends. Well, as the statement about library work wasn't quite true even in 1925, the statement about software isn't quite true today. Not all professions and occupations actually depend on software. But there is one profession that certainly does, and that's library work. Because "libraries are software". The services that libraries provide are provided directly or indirectly through software.This post is based on a talk titled LOD (Linked Open Data) for applications: using the Lobid API (Application Programming Interface). So why would you want to use an API? Well, because LOD for applications means building software. And APIs make software development manageable. They allow us to build modular software, with stable applications that don't have to change when other parts of the software change. At the same time, they allow us to have flexible data sources that can change without requiring changes in the applications. Lets take a look at what that means in the case of the Lobid API. The Lobid API provides access to authority data from different sources in different formats, to geodata from Wikidata, and to bibliographic title data from the hbz union catalogue. Applications access this data through the API: That way, the API decouples the applications from the concrete data sources, formats, and systems, which can change without requiring the applications to change at the same time."Libraries are Software" by @codyh — Good short essay for dev and non-dev librarians http://t.co/Y9YLmTaZj0 pic.twitter.com/iVGKFaBql8
— Tim Spalding (@librarythingtim) September 17, 2015
curl -H "Accept: application/ld+json" -H "Accept-Encoding: gzip"
Here we use curl to call the URL, we pass headers specifying the requested format (JSON-LD in this case) and that we want to gzip the response to save network bandwidth and disk space.
We can then pass arbitrary queries, in the form of URLs, e.g. to get all holdings of a specific library:
"http://lobid.org/resource?owner=DE-6&scroll=true"
Or all new titles for a specific subject:
"http://lobid.org/resource?subject=4055382-6&scroll=20151023"
We finally direct the response into a file:
> "resources-4055382-20151023.gz"
Note: this should all be in a single line, using one URL only, so to reproduce this, call e.g.:
curl -H "Accept: application/ld+json" -H "Accept-Encoding: gzip" "http://lobid.org/resource?subject=4055382-6&scroll=20151023" > "resources-4055382-20151023.gz"
So what we get from this is local data, ready for offline usage, but still retrieved from an API. There is no contradiction between local data dumps and APIs, an API is just a way to deliver data.
Or is it more like 20.000, as an initial query to our API suggested:.@fsteeg "10.200 [Bibliotheken] gibt es insgesamt". Sind eher doppelt so viel, wenn man die Pfarrbibliotheken dazu zählt.
— Adrian Pohl (@acka47) May 29, 2015
So let's take a look at how we can answer that question by using the Lobid API. First lets take a look at a single organisation ("DE-206H") in our API. This organisation contains a "type" field which specifies that this is a library. So this is the first field we'll be using, since we want to query for all libraries in Germany. The organisation also contains a nested "addressCountry" field: http://beta.lobid.org/organisations/DE-206H The "addressCountry" field is nested in an "address" field, which itself is nested in a "location" field. We can express the path to this nested field as "location.address.addressCountry". With these two fields, we can now create a query that expresses that we're looking for organisations with "type:Library AND location.address.addressCountry:Germany" (the + in the address bar are encoded spaces). In the response, we have a "hits.total" field with the answer to our question: http://beta.lobid.org/organisations/search?q=type:Library+AND+location.address.addressCountry:Germany So that's the answer right there: there are 13.285 libraries in Germany, according to our data. Now the initial result mentioned above said there are 19.347 libraries in Germany, this result now says 13.285. What about that? Well it's really that by using LOD for applications, by using the Lobid API, we were able to improve our query: while our data only contains German organisations, these are not necessarily in Germany, e.g. Goethe institutes. The initial query didn't consider that.In Deutschland gibt es 19.347 Bibliotheken http://t.co/qPcY3Kh7o7 . Datenquelle: dbs und ZDB-Sigeldatei /via @lobidOrg API
— dr0ide (@dr0ide) June 10, 2015
We were also able to improve our data. Because of course when your result is almost the double amout of the traditional answer you sanity check your results, and so we realized that we included organisations that were marked as inactive in our source data (see this issue for details). So we really saw here how usage leads to improvement.@lobidOrg @InspektorHicks Um genau zu sein: da sind auch Goethe Institute im Ausland mit dabei usw. Siehe http://t.co/wbpiGZl8OC
— dr0ide (@dr0ide) June 10, 2015
And doing usable is great, because "having other folks use your stuff makes your stuff better!"Do usable before reusable.
— Einar W. Høst (@einarwh) August 21, 2015
So this is how we make progress: by building, using, and improving stuff. A short note on how we built our particular API: We used Metafacture (a Java toolkit for stream-based library metadata processing), Elasticsearch (a Lucene-based search server, something like SOLR, we use it with Java), and Playframework (a Web application framework, something like Rails or Django, we use it with Java). In a nutshell, that's Java programming with open source tools. But this is just our choice. An API like this could be implemented with all kinds of technologies. Now you might be wondering: but what about Linked Open Data (which is in this post's title), or even the Semantic Web (which is the topic of the conference I presented this as a talk)? For one thing, the structured data in all the examples above is JSON-LD, it is RDF compatible, we use standard vocabularies where possible, so this is linked data. But linked data and the semantic web are no goals in themselves. They are "a technological solution, one of many that might fit the real goals".@edsu Guess what - it turns out having other folks use your stuff makes your stuff better! Software and collections too!
— Andy Jackson (@anjacks0n) July 30, 2015
And no matter what our roles or titles are, whether we're cataloguers, developers, or managers, that real goal is the same: "to make the product better for our users".I'm so over linked data/semantic web as a goal. Its a technological solution, one of many that might fit the real goals/outcomes
— Euan Cochrane (@euanc) June 2, 2015
And that product is software. So what's the thing to take away? It's really that libraries, that we, that you, should build APIs. APIs provide infrastructure for software in libraries. They make the great work of cataloguers available for all kinds of use cases. That's why "libraries need APIs"."No matter what our job titles, our jobs are all the same — to make the product better for our users." http://t.co/Rz8CnWvlbm
— Fabian Steeg (@fsteeg) February 25, 2015
They empower yourself and others: to use your data (as we saw in the simple question answering example), to build new applications (as we saw in the simple and complex map-based applications), and to improve existing applications (as we saw in the OpenRefine example). It doesn't happen by itself. It's not like you build an API and all these applications magically appear. What it takes is "an API and some large-but-finite amount of labor"..@fsteeg Yep. Libraries need APIs, not web portals.
— Ralf Claussnitzer (@claussni) July 29, 2015
But APIs are the foundation. They provide infrastructure for software in libraries. They empower yourself and others.Realizing that I say "it wouldn't be difficult to X" when I mean "there's an API and some large-but-finite amount of labor could X."
— Ted Underwood (@Ted_Underwood) September 2, 2015