Dashboard > Sesame 2.0 documentation > Home > Federation
Federation Log In View a printable version of the current page.

Added by James Leigh , last edited by James Leigh on 2009-01-23  (view change)
Labels: 
(None)

Sesame Federation

The Federation SAIL allows multiple datasets to be virtually combined into a single dataset. The Federation SAIL combines multiple Sesame repositories (members) that may exist on any HTTP(S) accessible Sesame server or embedded in the same host. The Federation uses query optimizations to distribute sections of the query to different members based on the data contained in each of the members. These results are then joined together within the federation to provide the same result as if all the data was co-located within a single repository. The Federation as of Sesame 3.0 does not support transactions and should only be used as a read-only repository or with only one writable member.

The Federation, like other Sesame repositories, uses an RDF configuration file for the repository parameters when accessed through the Sesame console or the Sesame server. Below is an example of a configuration file with three members. This file can be placed in the "configurations" directory within the data directory of the console or server and will be picked up as a repository configuration with the ID taken from the filename. This file can also be placed in the "templates" directory within the data directory of the console or server and will be picked up as a repository configuration template that can be used with the "create" command in the console. This will allow some of the parameters to be changed when the federation repository is created, such as the repository id.

#
# Sesame configuration template for a simple BSBM federation
#
@prefix rep: <http://www.openrdf.org/config/repository#>.
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
@prefix sail: <http://www.openrdf.org/config/sail#>.
@prefix hr: <http://www.openrdf.org/config/repository/http#>.
@prefix fed: <http://www.openrdf.org/config/sail/federation#>.

[] a rep:Repository ;
   rep:repositoryTitle "BSBM Federation" ;
   rep:repositoryImpl [
      rep:repositoryType "openrdf:SailRepository" ;
      sr:sailImpl [
         sail:sailType "openrdf:Federation" ;

         fed:member [
            rep:repositoryType "openrdf:HTTPRepository" ;
            hr:serverURL "http://producers:8080" ;
            hr:repositoryID "producers"
         ];

         fed:member [
            rep:repositoryType "openrdf:HTTPRepository" ;
            hr:serverURL "http://vendors:8080" ;
            hr:repositoryID "vendors"
         ];

         fed:member [
            rep:repositoryType "openrdf:HTTPRepository" ;
            hr:serverURL "http://ratings:8080" ;
            hr:repositoryID "ratings"
         ]

      ]
   ].

This next configuration includes some additional configuration options to improve the runtime performance of the federation.

Perhaps the most significant option is the fed:distinct option. When this is absent the federation passes the results from multiple members through a distinct filter. By setting this option to true, this step is avoid and can greatly improve performance, but may result in duplicate results if some member include duplicate quads (the same subject, predicate, object, and context combination appears in multiple members).

The fed:localPropertySpace option is the namespaces of predicates used in statements that are stored in the same member for the same subject. For the BSBM datasets, this is all predicates, because each subject is only described in one member. The effect of this option is that if multiple basic graph patterns use a predicate that starts with one of these values and have the same subject variable, they will only be joined within the same member. If the data is partitioned by predicates (every predicate appears in at most one member) this option has no effect.

The hr:subjectSpace is an HTTP repository optimization that is used by the client to indicate what subject URI prefixes are located on the server. This option reduces bandwidth by allowing the client to predetermine if a statement might exist on the server without connecting to it remotely.

#
# Sesame configuration template for a efficient BSBM federation
#
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rep: <http://www.openrdf.org/config/repository#>.
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
@prefix sail: <http://www.openrdf.org/config/sail#>.
@prefix hr: <http://www.openrdf.org/config/repository/http#>.
@prefix fed: <http://www.openrdf.org/config/sail/federation#>.
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix rev: <http://purl.org/stuff/rev#> .
@prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/> .
@prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/> .

[] a rep:Repository ;
   rep:repositoryTitle "BSBM Federation" ;
   rep:repositoryImpl [
      rep:repositoryType "openrdf:SailRepository" ;
      sr:sailImpl [
         sail:sailType "openrdf:Federation" ;
         fed:distinct true ;
         fed:localPropertySpace rdf: ;
         fed:localPropertySpace rdfs: ;
         fed:localPropertySpace foaf: ;
         fed:localPropertySpace dc: ;
         fed:localPropertySpace rev: ;
         fed:localPropertySpace bsbm: ;

         fed:member [
            rep:repositoryType "openrdf:HTTPRepository" ;
            hr:serverURL "http://producers:8080" ;
            hr:repositoryID "producers" ;
            hr:subjectSpace bsbm: ;
            hr:subjectSpace bsbm-inst:Product ;
            hr:subjectSpace bsbm-inst:dataFromProducer
         ];

         fed:member [
            rep:repositoryType "openrdf:HTTPRepository" ;
            hr:serverURL "http://vendors:8080" ;
            hr:repositoryID "vendors" ;
            hr:subjectSpace bsbm-inst:dataFromVendor
         ];

         fed:member [
            rep:repositoryType "openrdf:HTTPRepository" ;
            hr:serverURL "http://ratings:8080" ;
            hr:repositoryID "ratings" ;
            hr:subjectSpace bsbm-inst:dataFromRatingSite
         ]

      ]
   ].

The Federation can also be embedded without a server or console by using the org.openrdf.sail.federation.Federation class and wrapping it in a SailRepository class.

Communication

The Federation communicates with its members using the Repository API with its own query language. If a member does not support the Federation's query language, it will communicate with that member through basic graph patterns. This usually causes an significant increase in bandwidth and decrease in performance.

Any member using a SailRepository with the Federation module in the class-path will be enabled with the federation's query lanugage.

Although the Sesame protocol is a superset of the SPARQL protocol, the Federation does not use SPARQL to communicate with its members. However, the Federation can act as a SPARQL end-point by hosting it in a Sesame server.

Optimizations

The Federation uses the size of each basic graph pattern (number of statements that match the pattern) to order the joins within the query. When outside of a transaction this information is cached by the http repository, but will cause query evaluations to be suspended until all members report back on the size of each of the basic graph patterns the first time a query is evaluated. Using the "-c" option on the remote server will reduce the cache validation done and improve the responsiveness of the repository.

The ordering of the join may not be ideal for a distributed query. For example, a basic graph pattern that has many results, may only result is a few results after it is filtered. The filtering of basic graph patterns is not considered when ordering the join as the size of the result cannot be predetermined.

In situations when the federation may not be choosing the correct join order (such as when a highly restrictive filter is used) the ordering can be gamed by moving secondary joins into an optional block. The same query results can be achieved by filtering the results that did joined.

For example the follow two SPARQL queries evaluate to the same result, but the second one has a more predictable join order and will ensure the ?ind1 :name ?name pattern is evaluated first. When this regex filter is highly restrictive and the :name and :relationship are in different members this will improve performance in many situations.

SELECT *
WHERE {
  ?ind1 :name ?name
  FILTER regex(str(?name), 'James', 'i')
  ?ind1 :relationship ?ind2 . 
  ?ind2 :name ?other
}
SELECT *
WHERE {
  ?ind1 :name ?name
  FILTER regex(str(?name), 'James', 'i')
  OPTIONAL {
    ?ind1 :relationship ?ind2 . 
    ?ind2 :name ?other
  }
  FILTER bound(?ind2)
}

Partitioning

Distributed query evaluations across multiple members will usually be slower than a query evaluated within a single repository. For this reason the federated dataset should ideally be partitioned among the members such that at least 80% of the queries can be evaluated with the dataset of a single member. When properly configured and partitioned the federation is able to send whole queries to single members for evaluation where appropriate. This allows the Federation to act as a query router for its members, minimizing the bandwidth and likely improving the query performance. When more then 20% of the queries need to be evaluated with multiple members, consider repartitioning the data to avoid excess bandwidth and improve performance.

In many situations the Federation enables the cost effective possibility of querying over very large distributed datasets that may otherwise be too expensive to perform. With a simple configuration file and easy setup, the Federation is an inexpensive way to make large datasets available through a unified access point.

Powered by a free Atlassian Confluence Open Source Project License granted to Aduna Open Source. Evaluate Confluence today.
Powered by Atlassian Confluence 2.7, the Enterprise Wiki. Bug/feature request - Atlassian news - Contact administrators