Dashboard > AutoFocus Server > Facet Models
Facet Models Log In View a printable version of the current page.

Added by Christiaan Fluit , last edited by Christiaan Fluit on 2008-04-22  (view change)
Labels: 

Table of Contents

Changes

April 4 2008 Schema fix: :selectsClass should be a rdf:Property
April 6 2008 Schema change: subclassed QuerySelector with SeRQLQuerySelector and SPARQLQuerySelector
April 6 2008 Schema change: changed the namespace URI, removed the version part
April 7 2008 Schema fix: removed an obsolete class name from a rdfs:comments
April 8 2008 Schema fix: NumericRangeSelector is a subclass rather than an instance of CategorySelector
April 8 2008 Schema fix: various typos in the example facet model
April 16 2008 Fixed Fresnel link

Introduction

AutoFocus Server 4 has two mechanisms for determining the contents of a profile (our term for a exploration space featuring entities that you can browse through using facets):

  • Provide a custom Loader implementation: a Java class implementing the Loader API and producing a set of entities and a set of Classifiers, each producing the (entity,value) pairs for the facet they represent.
  • Use the GenericLoader, configured with a facet-config.xml file. This file defines a SeRQL SELECT query for determining the set of entities as well as a SeRQL SELECT query for each facet.

Creating a custom Loader gives you full control over how to create and populate your facets, but requires programming and the use of some not-so-trivial APIs, resulting in a learning curve and more work getting it deployed (you have to repackage AutoFocus Server with the new code).

The GenericLoader is much simpler to use (put a data file in the profile's data dir) but has a number of limitations. First of all, it does not support categorical facets (e.g. labeled integer and date intervals) and hierarchical facets (values organized in a tree). The approach only works when the entire selection process for a facet can be expressed within a single query. The output of the classifiers is restricted to what queries can return, no post-processing through e.g. Java code can take place. This makes use of the GenericLoader only feasible for simple cases, where the RDF graph has a structure that can be mapped to a facet data model in a trivial way.

AutoFocus Server ships with a custom loader (the DefaultLoader) that expresses queries in the Aperture namespace. Most of its facets cannot be expressed using the GenericLoader mechanism:

  • The Keyword Search facet is a dynamic facet, i.e. its values are not known beforehand but are instead entered by the end user at runtime, resulting in an "instant" facet value and corresponding entity subset. The GenericLoader does not support such facets.
  • The Size and Date facets divide the range of possible sizes and dates into named intervals. This requires a classification process that cannot be expressed using a single SELECT query.
  • The Type facet is a hierarchical facet that organizes the MIME types found in a data set in an a pre-defined hierarchy. This classification of values is domain knowledge that typically resides outside the RDF sources.
  • The Location facet is a hierarchical facet, where the entire tree is represented in the data set.

What we want for AutoFocus Server 5 is a profile configuration method that combines the best of both approaches.

Task

Our goal is to create a configuration mechanism that allows for the declarative specification of a facet model. Such a model includes declarations for:

  • Which RDF resources should serve as entities.
  • Which facets exists, including properties like name, order, etc.
  • Which values an entity has for a given facet.
  • How these values are best organized, e.g. ordered in a hierarchy or a list, sorted on a certain criterion, etc.

The mechanism should be usable by people familiar with RDF and faceted navigation basics, enabling users of the downloadable software, including our partners, to configure it with relatively little effort. It should not require detailed knowledge of AutoFocus Server internals.

It should improve on the flexibility of the GenericLoader. As a minimum requirement, the DefaultLoader's implicit model should be fully expressible in it.

It should allow the use of both SPARQL and SeRQL queries, SPARQL being the commonly used standard and SeRQL being more expressive in certain areas.

The declarative nature means that it should leave room for implementations to decide how to populate the facet model with (entity,value) pairs. The importance of this becomes clear through the following use cases:

  • AutoFocus Server needs to provide an instant faceted navigation experience: after (de)selection of a facet value, a new page with filtered entities, facet values and updated facet value counts should be displayed with subsecond response time. We cannot assume that live querying of the RDF sources will allow this level of responsiveness, meaning that some form of batch processing of the RDF data will be required.
  • On the other hand, we want to leave the door open for engines capable of handling incremental changes in the data set, for example when the data source has frequent and small changes or when a complete batch process is very expensive. This means it should be able to handle addition, deletion and updates of entities.

A declarative model ideally makes these two approaches primarily an implementation detail with no effect on the contents of the model and gives the possibility for query optimizations in both approaches.

A specific case of the incremental approach is the evaluation of dynamic facets, where new facet values are entered by the end user (e.g. through a keyword search query). This can obviously not be done in batch mode. It should therefore be possible to combine batch mode population of the facet model with a dynamic extension of the populated facet model.

An incremental faceted navigation engine is also useful for AutoFocus (the desktop client), as it could be integrated seamlessly in its crawl process. Such an engine on top of its internal RDF store means that it instantly knows the values for each facet (now they are dynamically determined) as well as the global counts. Furthermore, it provides all the information necessary to drive the Cluster Map visualization, meaning much faster query evaluation. Only document metadata still needs to be fetched but that can be done on demand.

Other Faceted Navigation Systems

I have investigated the technical aspects of a number of other faceted navigation applications, in particular:

For other systems such as those from Endeca, Siderean and Dieselpoint, no download or technical information on configuration options were available.

Some of these systems have a background in Semantic Web research and operate directly op top of an RDF source, whereas others primarily use an API and/or facet-oriented data format for population of the facet model.

The Semantic Web-oriented systems are typically capable of showing arbitrary RDF graphs by making few or no assumptions on the schema(s) used in the data set. Only RDF Schema semantics is typically exploited, e.g. by separating instances from different classes into different facet spaces or by presenting a class hierarchy as a hierarchical facet. This out-of-the-box behavior contrasts them with the non-semantic web systems, which have to be instructed what to show and how.

The nature of the default facets of the Semantic Web systems is often limited to simple properties or class hierarchies though. Only /facet mentions that it is able to detect facet values that are organized in a hierarchy and presents them as such.

Longwell has a configuration method that is in the direction of what we are considering. Its facet model is expressed using an RDF vocabulary. This scheme can only express which facets should be shown or hidden. Being able to suppress facets is necessary as Longwell presents every predicate as a facet. This makes it insufficient to cover our use cases.

AutoFocus Server currently falls in between these two categories of facet applications. It can operate on arbitrary RDF sources but requires you to explicitly declare each facet and define how it is populated. This makes it particularly suited for creating domain-specific and task-oriented interfaces but does not have the "semantic webby" nature of pointing it to an RDF source and exploring what's out there. Instead, an admin has to look at the data and decide what should be shown and how.

In the future, we can look at ways of extending our admin interface with discovery features inspired by these Semantic Web systems, to help the profile administrator define a facet model on top of a new, unknown RDF source. Sort of an IDE for defining facet models.

When we come up with a mechanism that is capable of suggesting reasonable facets (in terms of coverage of the data set and amount of entities per facet – something like document title typically does not match this criterion), we can even show this to end users, e.g. as an "advanced search" mode.

A Facet Model Ontology: Proposal 1

Below is a proposal for a RDFS ontology for defining facet models.

@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix :      <http://aduna-software.com/schema/facet#> .

:Entity a rdf:Class ;
	rdfs:comment "Entities are the resources classified using the Facets in the FacetModel."@en .

:Facet a rdf:Class ;
	rdfs:comment "Facets reflect dimensions in which a set of Entities can be navigated. Each Facet is orthogonal to all other Facets."@en .

:StaticFacet rdfs:subClassOf :Facet ;
	rdfs:comment "StaticFacets are facets whose values are extracted from a data set. Each StaticFacet has one of more Selectors that provide it with Entity-value mappings."@en .
	
:DynamicFacet rdfs:subClassOf :Facet ;
	rdfs:comment "DynamicFacets are facets whose values are entered by the user, e.g. a keyword, number or date."@en .

:hasSelector a rdf:Property ;
	rdfs:comment "hasSelector connects StaticFacets to Selectors that can instantiate them."@en ;
	rdfs:domain :StaticFacet ;
	rdfs:range :Selector .	

:FacetModel a rdf:Class ;
	rdfs:comment "A FacetModel defines all aspects of the contents of a faceted navigation application. It combines a List of Facets with a collection of Entity Selectors. Organizing the Facets in a List ensures that a default order exists in the set."@en .

:hasFacetList a rdf:Property ;
	rdfs:comment "hasFacetList defines the set of Facets contained in a FacetModel."@en ;
	rdfs:domain :FacetModel ;
	rdfs:range rdf:List .

:hasEntitySelector a rdf:Property ;
	rdfs:comment "hasEntitySelector defines the set of Selectors that gather the Entities for a FacetModel."@en ;
	rdfs:domain :FacetModel ;
	rdfs:range :Selector .

:Selector a rdf:Class ;
	rdfs:comment "A Selector selects parts of an RDF graph. Selectors can be used to define entities and facets. Their use as either one of these determines how they are applied."@en .

:includeInferred a rdf:Property ;
	rdfs:comment "The includeInferred property indicates whether inferred triples should be taken into account when evaluating the Selector. Defaults to 'true'."@en ;
	rdfs:domain :Selector ;
	rdfs:range rdfs:Literal .

:InstanceSelector a :Selector ;
	rdfs:comment "An InstanceSelector selects all instances of a given Class. When used as an Entity Selector, all resources that are an instance of that Class are presented as Entities. An InstanceSelector cannot be used for populating a Facet."@en .

:selectsClass a rdf:Property;
	rdfs:comment "The selectsClass properties determines the class of which an InstanceSelector selects the instances."@en ;
	rdfs:domain :InstanceSelector ;
	rdfs:range rdfs:Class .

:SubjectSelector a :Selector ;
	rdfs:comment "A SubjectSelector selects the subjects of all properties with a given predicate. When used as an Entity Selector, all subjects found are presented as Entities. When used as a Facet Selector, all subjects found are considered to be facet values for the corresponding objects."@en .
	
:ObjectSelector a :Selector ;
	rdfs:comment "A ObjectsSelector selects all objects of all properties with a given predicate. When used as an Entity Selector, all objects found are presented as Entities. When used as a Facet Selector, all objects found are considered to be facet values for the corresponding subjects."@en .

:hasPredicate a rdf:Property ;
	rdfs:comment "hasPredicate connects Selectors that need a single predicate to work on, such as SubjectSelector and ObjectSelector"@en ;
	rdfs:domain :Selector ;
	rdfs:range rdfs:Resource .

:FslSelector a :Selector ;
	rdfs:comment "A FslSelector selects entities and possibly values using a Fresnel Selector Language expression."@en .

:hasFslExpression a rdf:Property ;
	rdfs:comment "hasFslExpression defines the FSL expression of the FslSelector."@en ;
	rdfs:domain :FslSelector ;
	rdfs:range rdfs:Literal .

:QuerySelector a :Selector ;
	rdfs:comment "A QuerySelector selects values or key-value pairs using a RDF query. When used as Entity selectors, they should return a single column of Resources. When used as value selectors (Entity-value or value-parent, in case of HierarchicalSelectors), they should return values for variables called \"entity\" and a \"value\"."@en .

:SparqlQuerySelector rdfs:subClassOf :QuerySelector ;
	rdfs:comment "SparqlQuerySelectors use SPARQL queries as query strings."@en .

:SerqlQuerySelector rdfs:subClassOf :QuerySelector ;
	rdfs:comment "SerqlQuerySelectors use SeRQL queries as query strings."@en .

:queryString a rdf:Property ;
	rdfs:comment "A queryString reflects the query posed by the QuerySelector"@en ;
	rdfs:domain :QuerySelector ;
	rdfs:range rdfs:Literal .
 
:HierarchicalSelector a :Selector ;
	rdfs:comment "HierarchicalSelectors are \"compound\" Selectors that use at least two different Selectors: one for finding mappings from Entities to values and one for finding the parents of those values (recursively)."@en .
  
:hasMembershipSelector a rdf:Property ;
	rdfs:comment "A membership Selector selects the values of a set of Entities from a hierarchy of values."@en ;
	rdfs:domain :HierarchicalSelector ;
	rdfs:range :Selector .

:hasParentSelector a rdf:Property ;
	rdfs:comment "A parent Selector selects the direct parent values for given sets of values."@en ;
	rdfs:domain :HierarchicalSelector ;
	rdfs:range :Selector .
	
:isSubsuming a rdf:Property ;
	rdfs:comment "isSubsuming indicates whether set memberships are propagated upwards in the value hierarchy of a HierarchicalSelector. For example, an instance of an rdf:Class is also an instance of all superclasses, whereas folder hierarchies may not be (membership of a folder does not necessarily imply membership of its parent folders - see e.g. file system and mail folders)."@en ;
	rdfs:domain :HierarchicalSelector ;
	rdfs:range rdfs:Literal .

:CategorySelector a :Selector ;
	rdfs:comment "CategorySelectors impose a classification based on some value. Each CategorySelector represents a single category."@en .

:NumericRangeSelector rdfs:subClassOf :CategorySelector ;
	rdfs:comment "NumericRangeSelector selects a set of subjects of properties with a given predicate whose number match certain minimum and maximum thresholds."@en .

:hasMinNumber a rdf:Property ;
	rdfs:comment "Specifies the minimum number (inclusive) of a NumericRangeSelector."@en ;
	rdfs:domain :NumericRangeSelector ;
	rdfs:range rdfs:Literal .

:hasMaxNumber a rdf:Property ;
	rdfs:comment "Specifies the maximum number (inclusive) of a NumericRangeSelector."@en ;
	rdfs:domain :NumericRangeSelector ;
	rdfs:range rdfs:Literal .

:GroovySelector a :Selector ;
	rdfs:comment "A GroovySelector produces (entity,value) pairs by executing a Groovy script that can process the RDF data."@en .

:hasGroovyScript a rdf:Property ;
	rdfs:comment "Specifies the script of a GroovySelector."@en ;
	rdfs:domain :GroovyScript ;
	rdfs:range rdfs:Literal .

Suppose we have the following data set:

@prefix  foaf:  <http://xmlns.com/foaf/0.1/> .
@prefix  ont:   <http://ontologies-r-us.com/document/> .
@prefix  rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix  :      <http://docs-r-us.com/docs/> .

:doc1 rdf:type ont:Document ;
	ont:hasFolder :subFolder1 ;
	ont:language "nl" ;
	ont:size 1234 ;
	ont:taggedBy :Chris .

:doc2 rdf:type ont:Document ;
	ont:hasFolder :subFolder1 ;
	ont:size 123456 ;
	ont:language "en" .

:doc3 rdf:type ont:Document ;
	ont:hasFolder :subFolder2 ;
	ont:size 345678 ;
	ont:language "en" ;
	ont:taggedBy :Herko .

:subFolder1 a ont:Folder ;
	ont:hasParent :parentFolder .
	
:subFolder2 a ont:Folder ;
	ont:hasParent :parentFolder .

:parentFolder a ont:Folder .

:Chris a foaf:Person ;
	ont:hasWritten :doc1 ;
	ont:hasWritten :doc3 .
	
:Herko a foaf:Person ;
	ont:hasWritten :doc2 ;
	ont:hasWritten :doc3 .

This data set can be disclosed using the following facet model:

@prefix  facet:  <http://aduna-software.com/schema/facet#> .
@prefix  ont:    <http://ontologies-r-us.com/document/> .
@prefix  rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix  rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix  :       <http://aduna-software.com/schema/edm-facets#> .

:facetModel a facet:FacetModel ;
	facet:hasEntitySelector :entitySelector ;
	facet:hasFacetList
		(
			:writtenByFacet
			:languageFacet
			:taggedByFacet
			:taggedBy2Facet
			:taggedBy3Facet
			:keywordSearchFacet
			:folderFacet
			:sizeFacet
		) .

:entitySelector a facet:InstanceSelector ;
	facet:selectsClass ont:Document .

:writtenByFacet a facet:StaticFacet ;
	rdfs:label "Written By" ;
	facet:hasSelector :writtenBySelector .

:writtenBySelector a facet:SubjectSelector ;
	facet:hasPredicate ont:hasWritten .

:languageFacet a facet:StaticFacet ;
	rdfs:label "Language" ;
	facet:hasSelector :languageSelector .

:languageSelector a facet:ObjectSelector ;
	facet:hasPredicate ont:language .

:taggedByFacet a facet:StaticFacet ;
	rdfs:label "Tagged By" ;
	facet:hasSelector :taggedBySelector .

:taggedBySelector a facet:FslSelector ;
	facet:hasFslExpression "ont:taggedBy[foaf:Person]" .

:taggedBy2Facet a facet:StaticFacet ;
	rdfs:label "Tagged By" ;
	facet:hasSelector :taggedBy2Selector .

:taggedBy2Selector a facet:SparqlQuerySelector ;
	facet:queryString """
			PREFIX foaf:  <http://xmlns.com/foaf/0.1/> .
			SELECT ?entity, ?value
			WHERE {
				?entity ont:taggedBy ?value .
				?value rdf:type foaf:Person .
			}
		""" .

:taggedBy3Facet a facet:StaticFacet ;
	rdfs:label "Tagged By" ;
	facet:hasSelector :taggedBy3Selector .

:taggedBy3Selector a facet:SerqlQuerySelector ;
	facet:queryString """
			SELECT entity, value
			FROM {entity} ont:taggedBy {value} rdf:type {foaf:Person}
			USING NAMESPACE foaf = <http://xmlns.com/foaf/0.1/> 
		""" .

:keywordSearchFacet a facet:DynamicFacet ;
	rdfs:label "Search" .

:folderFacet a facet:StaticFacet ;
	rdfs:label "Folder" ;
	facet:hasSelector :folderSelector .

:folderSelector a facet:HierarchicalSelector ;
	facet:hasMembershipSelector :membershipSelector ;
	facet:hasParentSelector :parentSelector .
	
:membershipSelector a facet:ObjectSelector ;
	facet:hasPredicate ont:hasFolder .

:parentSelector a facet:ObjectSelector ;
	facet:hasPredicate ont:hasParent .

:sizeFacet a facet:StaticFacet ;
	rdfs:label "Size" ;
	facet:hasSelector :smallSizeSelector ;
	facet:hasSelector :largeSizeSelector .
	
:smallSizeSelector a facet:NumericRangeSelector ;
	rdfs:label "Small documents" ;
	facet:hasPredicate ont:size ;
	facet:hasMaxNumber 10240 .
	
:largeSizeSelector a facet:NumericRangeSelector ;
	rdfs:label "Large documents" ;
	facet:hasPredicate ont:size ;
	facet:hasMinNumber 10241 .

The basic idea of this proposal is to have a set of building blocks for selecting entity and (entity,value) pairs. Most of these are of an atomic nature, others (the HierarchySelector, perhaps also the CategorySelector) are of a compound nature and can combine other type of selectors.

The vocabulary has various means to handle common and simple cases, e.g. where you want to map a single property to a facet. For more complicated cases FslSelectors and QuerySelectors can be used.

FslSelectors use a FSL expression, which can be thought of as an XPath-like language for traversing RDF graphs. FSL expressions are quite expressive and easy to write and interpret.

Sparql- and SerqlQuerySelectors allow for SPARQL and SeRQL queries to be used. When they are used to find entities, they should return instantiations for a single variable. When used to populate facets, they have to instantiate two variables named "entity" and "value". Selectors such as these are perhaps not that declarative, but there still is room to populate a facet on a per-entity or per-value basis by using value bindings when evaluating the query.

This ontology defines subclasses of QuerySelectors to indicate the query language, rather than custom datatypes for the query string or a separate query language property. This gives us the possibility to further constrain these two cases independently. For example, SPARQL allows for datasets to be specified in the query, a feature not supported by SeRQL. Additional properties can be imagined that customize query evaluation in ways that cannot be expressed in the query itself (authentication and authorization comes to mind). These properties can then have SparqlQuerySelector as domain or range.

The GroovySelector is a way to add programmatic behavior to the model, for those cases where FSL expressions and queries fall short, e.g. because they require a lot of string manipulation, loops, etc. This should be much more user-friendly than providing a Selector Java API that people can program against, which would require setting up a development environment, packaging the Selector implementation, bundling it with the AutoFocus software and deploying it. A lot of hassle, possible necessary to take a running server offline, lots of permissions involved: complicated.

A GroovySelector is configured with a Groovy script that is executed by an interpreter embedded in the facet population engine. We still have to look into how such scripts can be defined so that they can be used as entity and facet value selectors, how to prevent such scripts from being able to access information they should not be able to access, etc.

This ontology has been strongly inspired by Fresnel, in particular Fresnel Lenses and Selectors. There are some slight differences that still make me wonder whether it makes sense to reuse the Fresnel vocabulary and extend where necessary, or whether this would suggest similarities that are simply not true. A counter-argument for reusing the Fresnel vocabulary is that its selectors are either URIs (with the context of the selector implying whether it is an instance, class or property) or datatyped literals, indicating whether it is an FSL expression or a SPARQL query. This makes it impossible to further model aspects such as our includeInferred property (or they have to be made part of the expression or query) as well as reuse of selectors, e.g. for compound selectors.

To do's

  • Create a nice OWL ontology, similar to how the Fresnel Facets ontology is presented. Include a owl:versionInfo in it, as an alternative for the "/v1" part in the ontology URI, so that we don't have to change the URIs of existing properties and classes when we publish a new version of the ontology.
  • Should we model more about DynamicFacets or is their declaration of existance, name and place in the list sufficient and is the rest a matter of being able to "inject" a runtime evaluation engine in the system that can populate this fact?
  • Should the atomic selectors (delivering (entity,value) pairs and nothing else) be typed as such? This would for example make it possible to better describe the HierarchySelector, as it needs such selectors for its membership and parent selectors. It does not make sense to use another HierarchySelector for those tasks.
  • See how multi-type faceted browsing such as done in /facet can be modeled. Multi-type faceted browsing is basically a form of subqueries translated to faceted navigation concepts. Does it have something to do with selectors at all or is it rather a link between a facet and a facet model?
  • Specify how sorting should be indicated, e.g. alphabetical, numerical, ascending vs. descending. A back door that allows specifying a Groovy script as sorter (comparable to a Java Comparator) could be handy for those facet types that cannot be sorted in any of the usual ways.
  • Related to this is how we determine the order of category selectors. Do we specify the order of the categories in the facet model or do we want to specify a sorting strategy instead? The latter makes sense for a multilingual dataset with the end user UI showing only the labels with a certain language code. Then you cannot define the order beforehand, unless you do it for every language separately. A declarative sorting strategy seems a much better idea.
  • How do we include domain knowledge such as the statically ordered MIME types? One way is to see this as an implementation detail: the evaluation engine should transparantly merge the disclosed repository's statements with the domain knowledge and the selectors should not have to worry about it at all. An alternative would be to make those different datasets explicit in the facet model as well. Makes me thing of SPARQL datasets.

Population of a Facet Model

Some thoughts on how such facet models can be populated.

Generally, there are two ways of evaluating this facet model: per-entity or per-facet value. In both cases you start with determining those resources that will be displayed as entities in the faceted navigation UI by evaluating the entity selectors. Then, you can either:

  • Evaluate the facet queries for each entity separately. In case of QuerySelectors this would for example be done by binding the entity variable to the entity URI.
  • Evaluate the query on the entire data set and filter out those (entity,value) pairs for which the found entity resource does not occur in the set of entities determined earlier.

For batch processing, per-facet value makes most sense, assuming that:

  • the number of entities is much larger than the number of facet values.
  • properties used for determining facet values primarily match with resources that are also classified as entities.

Of course, you can extend these queries with the definition of what entities are, preventing non-entity resources from being returned, but that may become impractical and expensive when the model contains many and/or complicated entity selectors.

Evaluation on a per-entity basis is necessary for incremental systems that process new and changed entities. Such systems also pose another requirement on the facet model, one that we still have to formally define properly. In any case, it may not happen that arrival of new metadata causes existing entities to be classified differently. Ideally, each entity is classified based on data that is not used to classify other entities, or the "shared" data should at least be static, or the entity selectors should be monotonic (compare with knowledge based systems research), etc.

Note that such data and selector characteristics also play an important role when you want to be able to parallelize the facet model population in one way or another, e.g. in a multi-threaded (read: multi-core) or clustered environment. This fits well with some of our future plans. To be continued...

A Facet Model Ontology: Proposal 2

There is an alternative approach that I haven't worked out in detail but that can be explained in simple terms, given the outline of the previous proposal. This proposal replaces the Selector part of the previous proposal with SELECT and CONSTRUCT queries.

SELECT queries can be used as entity selectors, provided that they return instantiations for a single variable, i.e. a result table containing a single column.

CONSTRUCT queries can be used to populate facets. Here, a vocabulary is introduced that can express the contents of a facet, i.e. which values occur, how they are organized and which entities they relate to. The queries look for information relevant for a particular facet and output the relevant (entity,value) statements. Other CONSTRUCT queries can arrange the facets in a hierarchy or other structure.

Selectors such as the GroovySelector can also be translated to this proposal: the Groovy scripts simply have to generate statements rather than a two-column table.

The expressivity of both proposals seems comparable at first sight. One aspect may be whether all FSL expressions can be translated to SPARQL queries.

A benefit of this approach is that you can chain CONSTRUCT queries. Queries do not necessarily have to directly populate a facet, you can also have queries that calculate intermediate results that are input for other queries.

Note that this way a populated facet model is nothing more than a view on a repository. The batch-mode facet engine described above can simply read out this model and translate it into an internal, faceted navigation-optimized datastructure.

As more and more view-related RDF tools and specifications are developed, this may be a beneficial approach as it decouples facet population from facet navigation-related calculations, allowing one to choose the ones that are best-of-breed or most suitable for a particular dataset and facet model.

Also, problems that the previous proposal has with types of non-monotonic reasoning (adding statements supposedly describing new entities that also alter facet values of existing entities) are among the types of problems solved by efficient view implementations.

A disadvantage may be the learning curve: the previous proposal has some short cuts for simple cases, writing queries is only necessary for handling more complicated RDF graphs. Perhaps these shortcuts can still be offered in a modified way in this proposal as well.

The AutoFocus Connection

As said earlier, a facet model evaluation engine capable of incremental updates could be used in the desktop version of AutoFocus as well. Incremental means that it can be invoked during the scan process and that its effort will be relative to the amount of changes in the data set. This has various performance benefits as facet information will be readily available rather than determined on-demand.

There are some issues to solve that have to do with the connection between AutoFocus and AutoFocus Server. AutoFocus accesses AutoFocus Server as a regular Sesame server holding remote repositories. Next to that it can have a number of local repositories. During querying it searches across all these repositories transparently. When a facet population engine is applied on the local repositories, how should remote repositories then be handled?

It is out of the question to let the client run its facet engine on these repositories as well, as it has to be done by every user individually: duplication of effort, expensive, consuming lots of bandwidth.

A materialized view with the populated facet model on the server may seem as a step in the right direction, but this does not solve everything. The client still needs to have the full materialized view mapped to the internal datastructure of the facet engine in order to be able to quickly calculate intersections and facet value counts.

The only alternative seems to be to use a dedicated protocol between these two apps for delivering facet information. How this could work:

  • AutoFocus sends a query to AutoFocus Server, in terms of a set of selected facet values.
  • The repository on the server (or rather its facet model counterpart) responds with a set of entity URIs, filtered facet values and counts for this query.
  • These counts can be accumulated with the counts that AutoFocus has determined locally on its own information, assuming that entities from the server do not occur locally as well. I.e. each repository returns a partial result set that can be merged over all repositories.

This is really a way of doing federated faceted navigation!

Another idea that we had is to let AutoFocus operate on arbitrary SPARQL endpoints. We are almost there since the upgrade to Sesame 2, only existing SeRQL queries still have to be translated to SPARQL. This conflicts with the idea described above though: SPARQL endpoints are not able to deliver this facet information.

Perhaps we should therefore make a distinction between AutoFocus-AutoFocus Server communication and AutoFocus-SPARQL endpoint communication. The former is in terms of faceted navigation constructs and is optimized for highly responsive faceted navigation. The latter is slower as its sends metadata queries that still have to be processed by the facet model engine, but allows for integration with a much broader base of information repositories.

Powered by a free Atlassian Confluence Open Source Project License granted to Aduna Open Source. Evaluate Confluence today.
Powered by Atlassian Confluence 2.7, the Enterprise Wiki. Bug/feature request - Atlassian news - Contact administrators