doc/sparql.xml

   1 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.4//EN"
   2     "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
   3 ]>
   4 <refentry id="sparql">
   5  <refentryinfo>
   6   <productname>Metaproxy SPARQL module</productname>
   7   <info><orgname>Index Data</orgname></info>
   8  </refentryinfo>
   9
  10  <refmeta>
  11   <refentrytitle>sparql</refentrytitle>
  12   <manvolnum>3mp</manvolnum>
  13   <refmiscinfo class="manual">Metaproxy Module</refmiscinfo>
  14  </refmeta>
  15
  16  <refnamediv>
  17   <refname>sparql</refname>
  18   <refpurpose>
  19    Metaproxy Module for accessing a triplestore
  20   </refpurpose>
  21  </refnamediv>
  22
  23  <refsect1><title>DESCRIPTION</title>
  24   <para>
  25    This module translates Z39.50 operations init, search, present to
  26    HTTP requests that accesses a remote triplestore via HTTP
  27   </para>
  28   <para>
  29    Configuration consists of one or more db elements. Each db element
  30    describes how to access a specific database. The db element takes
  31    attributes name of Z39.50 database (<literal>path</literal>) and
  32    HTTP access point of triplestore (<literal>uri</literal>).
  33    Optionally, the schema for the database may be given with attribute
  34    <literal>schema</literal>.
  35    Each
  36    db element takes these elements:
  37    Configurable values:
  38    <variablelist>
  39     <varlistentry><term>&lt;prefix/&gt;</term>
  40      <listitem>
  41       <para>
  42        Section that maps prefixes and namespaces for RDF vocabularies.
  43        The format is prefix followed by colon, followed by value.
  44       </para>
  45      </listitem>
  46     </varlistentry>
  47     <varlistentry><term>&lt;form/&gt;</term>
  48      <listitem>
  49       <para>
  50        SPARQL Query formulation selection. SHould start with one of the
  51        query forms: SELECT or CONSTRUCT.
  52       </para>
  53      </listitem>
  54     </varlistentry>
  55     <varlistentry><term>&lt;criteria/&gt;</term>
  56      <listitem>
  57       <para>
  58        section that allows to map static graph patterns for binding
  59        variables, narrowing types, etc, or any other WHERE clause criteria
  60        static to the Z39.50/SRU database. The final query conversion logic
  61        should be able to deduce which optional criteria should be included
  62        in the generated SPARQL by analyzing variables required in the query
  63        matching and display fields.
  64       </para>
  65      </listitem>
  66     </varlistentry>
  67     <varlistentry><term>&lt;index type="attribute"/&gt;</term>
  68      <listitem>
  69       <para>
  70        Section used to declare RPN use attribute strings (indexes) and map
  71        them to BIBFRAME graph patterns.
  72        Items in this section are expanded during RPN query processing and
  73        placeholders (%s, %d) are substituted with query terms.
  74        To map a given CQL index (e.g the default keyword index) into
  75        multiple entity properties, SPARQL constructs like
  76        `OPTIONAL` or `UNION` could be used.
  77       </para>
  78      </listitem>
  79     </varlistentry>
  80     <varlistentry><term>&lt;modifier/&gt;</term>
  81      <listitem>
  82       <para>
  83        Optional section that allows you to add solution sequences or
  84        modifiers.
  85       </para>
  86      </listitem>
  87     </varlistentry>
  88
  89    </variablelist>
  90   </para>
  91  </refsect1>
  92
  93  <refsect1><title>SCHEMA</title>
  94    <literallayout><xi:include
  95                      xi:href="filter_sparql.rnc"
  96                      xi:parse="text"
  97                      xmlns:xi="http://www.w3.org/2001/XInclude" />
  98    </literallayout>
  99  </refsect1>
 100
 101  <refsect1><title>EXAMPLE</title>
 102   <para>
 103    Configuration for database "Default" that allows searching works. Only
 104    the field (use attribute) "bf.wtitle" is supported.
 105    <screen><![CDATA[
 106   <filter type="sparql">
 107     <db path="Default"
 108         uri="http://bibframe.indexdata.com/sparql/"
 109         schema="sparql-results">
 110       <prefix>rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns</prefix>
 111       <prefix>bf: http://bibframe.org/vocab/</prefix>
 112       <form>SELECT ?work ?wtitle</form>
 113       <criteria>?work a bf:Work</criteria>
 114       <criteria>?work bf:workTitle ?wt</criteria>
 115       <criteria>?wt bf:titleValue ?wtitle</criteria>
 116       <index type="bf.wtitle">?wt bf:titleValue %v FILTER(contains(%v, %s))</index>
 117     </db>
 118   </filter>
 119 ]]>
 120    </screen>
 121    The matching is done by a simple case-sensitive substring match. There is
 122    no deduplication, so if a work has two titles, we get two rows.
 123   </para>
 124  </refsect1>
 125
 126  <refsect1><title>EXAMPLE</title>
 127   <para>
 128    A more complex configuration for database "work". This could be included in
 129    the same filter section as the "Default" db above.
 130    <screen><![CDATA[
 131     <db path="work" schema="sparql-results">
 132       <prefix>rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns</prefix>
 133       <prefix>bf: http://bibframe.org/vocab/</prefix>
 134       <form>SELECT
 135               ?work
 136               (sql:GROUP_DIGEST (?wtitle, ' ; ', 1000, 1)) AS ?title
 137               (sql:GROUP_DIGEST (?creatorlabel, ' ; ', 1000, 1))AS ?creator
 138               (sql:GROUP_DIGEST (?subjectlabel, ' ; ', 1000, 1))AS ?subject
 139       </form>
 140       <criteria>?work a bf:Work</criteria>
 141
 142       <criteria> OPTIONAL {
 143           ?work bf:workTitle ?wt .
 144           ?wt bf:titleValue ?wtitle }
 145       </criteria>
 146       <criteria> OPTIONAL {
 147           ?work bf:creator ?creator .
 148           ?creator bf:label ?creatorlabel }
 149       </criteria>
 150       <criteria>OPTIONAL {
 151           ?work bf:subject ?subject .
 152           ?subject bf:label ?subjectlabel }
 153       </criteria>
 154       <index type="4">?wt bf:titleValue %v FILTER(contains(%v, %s))</index>
 155       <index type="1003">?creator bf:label %v FILTER(contains(%v, %s))</index>
 156       <index type="21">?subject bf:label %v FILTER(contains(%v, %s))</index>
 157       <index type="1016"> {
 158             ?work ?op1 ?child .
 159             ?child ?op2 %v FILTER(contains(STR(%v), %s))
 160           }
 161       </index>
 162       <modifier>GROUP BY $work</modifier>
 163     </db>
 164 ]]>
 165    </screen>
 166    </para>
 167    <para>
 168     This returns one row for each work. Titles, authors, and subjects
 169     are all optional. If they repeat, the repeated values are concatenated into
 170     a single field, separated by semicolons. This is done by the GROUP_DIGEST
 171     function that is specific to the Virtuoso back end.
 172    </para>
 173    <para>
 174     This example supports use attributes 4 (title), 1003 (author), 21 (subject),
 175     and 1016 (keyword) which matches any literal in a triplet that refers to the
 176     work, so it works for the titleValue in the workTitle, as well as the label
 177     in the subject, and what ever else there may be. Like the preceding example,
 178     the matching is by a simple substring, case sensitive. A more realistic term
 179     matching could be done with regular expressions, at the cost of some readability
 180     portability, and performance.
 181    </para>
 182  </refsect1>
 183
 184  <refsect1><title>EXAMPLE</title>
 185    <para>
 186     Configuration for database "works". This uses CONSTRUCT to produce rdf.
 187    <screen><![CDATA[
 188     <db path="works" schema="rdf">
 189       <prefix>rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns</prefix>
 190       <prefix>bf: http://bibframe.org/vocab/</prefix>
 191       <form>CONSTRUCT {
 192           ?work bf:title ?wtitle .
 193           ?work bf:instanceTitle ?title .
 194           ?work bf:author ?creator .
 195           ?work bf:subject ?subjectlabel }
 196       </form>
 197       <criteria>?work a bf:Work</criteria>
 198
 199       <criteria>?work bf:workTitle ?wt</criteria>
 200       <criteria>?wt bf:titleValue ?wtitle</criteria>
 201       <index type="4">?wt bf:titleValue %v FILTER(contains(%v, %s))</index>
 202       <criteria>?work bf:creator ?creator</criteria>
 203       <criteria>?creator bf:label ?creatorlabel</criteria>
 204       <index type="1003">?creator bf:label %v FILTER(contains(%v, %s))</index>
 205       <criteria>?work bf:subject ?subject</criteria>
 206       <criteria>?subject bf:label ?subjectlabel</criteria>
 207       <index type="21">?subject bf:label %v FILTER(contains(%v, %s))</index>
 208     </db>
 209  ]]>
 210    </screen>
 211   </para>
 212  </refsect1>
 213
 214  <refsect1><title>EXAMPLE</title>
 215    <para>
 216     Configuration for database "instance". Like "work" above this uses SELECT
 217     to return row-based data, this time from the instances. This is not deduplicated,
 218     so if an instance has two titles, we get two rows, and if it also has
 219     two formats, we get four rows. The DISTINCT in the SELECT
 220    <screen><![CDATA[
 221     <db path="instance" schema="sparql-results">
 222       <prefix>rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns</prefix>
 223       <prefix>bf: http://bibframe.org/vocab/</prefix>
 224       <form>SELECT DISTINCT ?instance ?title ?format</form>
 225       <criteria>?instance a bf:Instance</criteria>
 226       <criteria>?instance bf:title ?title</criteria>
 227       <index type="4">?instance bf:title %v FILTER(contains(%v, %s))</index>
 228       <criteria>?instance bf:format ?format</criteria>
 229       <index type="1013">?instance bf:format %s</index>
 230     </db>
 231  ]]>
 232    </screen>
 233   </para>
 234
 235
 236  </refsect1>
 237
 238  <refsect1><title>SEE ALSO</title>
 239   <para>
 240    <citerefentry>
 241     <refentrytitle>metaproxy</refentrytitle>
 242     <manvolnum>1</manvolnum>
 243    </citerefentry>
 244   </para>
 245  </refsect1>
 246
 247 </refentry>
 248
 249 <!-- Keep this comment at the end of the file
 250 Local variables:
 251 mode: nxml
 252 nxml-child-indent: 1
 253 End:
 254 -->