Release API versions
Store providers could not keep up with the (semi-)monthly releases of Sesame 2.x. Sesame 3.x should release the repository API (and dependencies) in its own package with a slow release cycle. This will given store providers a longer release cycle with a clear indication of which versions of Sesame they are compatible with.
Simplify The Configuration Schemas
With the number of sails and repositories increasing, the number of namespaces used in a configuration is almost as much as the number of statements. The openrdf.org sails and repositories should use the same namespace. Furthermore the configurations should try and reduce the amount of properties and blank nodes to make it easier for new users to understand the significance of each property.
Add URIFactory#createRelative(URI, String)
Many formats include relative URIs and Sesame should provide a consistent way to create model.URI out of them.
Add External Charset Support to All Parsers/Writers That Support Charset
The writer/parser factories (boolean and tuple results) should either allow Charset to be passed as a parameter with the stream or allow using a Reader/Writer.
Remove BNode Interface
With BNode's equals/hashCode implementation specific in Sesame 3.0 and the availability of stringValue() method, there is no need to have a BNode interface. Instead a blank node can be defined by a Resource that is not a URI. Furthermore, this would allow a wrapping sail/repository to return a URI from the createBNode method giving users that used BNodes like URIs in Sesame 2.x an upgrade path.
Add Repository#shutdownNow()
The Repository interface and RepositoryManager should have a shutdown method that can be used within a jvm-shutdown-hook. A way to shutdown the repository(ies) quickly, without timing-out waiting for connections to close.
Propagate Etags Between Client and Server
The HTTPRepository/Client should expose the Etag sent from the remote server, so a second server can forward the correct Etag. The server should not assume it is an authoritative source for tracking the repository version, if the repository is remote. The server should relay If-Modified-Since and If-None-Match conditions to the remote server if the repository is remote.
Assume Query Parameter Variables Exclusively Use '$' Prefix in SPARQL
If the variables that will have parameters was known when the query is prepared, it may only need to be optimized once. By assuming all parameter variables exclusively use the '$' prefix and not the '?' prefix we could provide an more effective optimization for prepared queries, while still providing a reduced optimized plan for queries that use '$' w/o a parameter value.
Multiple Query Parameter Values
Most RDF facet browsing interfaces allow multiple value selection. By allowing multiple values for the same parameter name in a query, the query could be reused and could take advantage of server side optimizations, such as prepared queries.
Use RDFXMLPrettyWriter as default RDFXML writer.
Sesame's default RDFXML writer should be support xml:base and nested BNodes like pretty-writer does now.
Drop blank node context support
While this wasn't quite clear when context support was added to Sesame, there now seems to be a common consensus regarding contexts, or "named graphs" as they're commonly called now: these are identified by URIs but not by blank nodes. Sesame 3.0 seems to be a good moment to make this switch in Sesame. Supporting blank nodes as context identifiers will be impossible in a lot of the RDF store implementations that can/could be used in the Sesame framework.
Should we use the term "graph" instead of "context"?
The SAIL should be oriented around the transaction (not the connection)
This would allow the Sail to conditionally wrap the SailTransaction based on the isolation level and readonly properties - reduce the complexity of the SailTransaction because of potentially reduced state changes. This maximises the amount of behaviours that can be shared between sail implementations in the SailRepository module, reducing code duplication.
BNode mapping would exist as long as the SailValueFactory is open. The Sail could also use templated Resource, URI, Value, Resource[] for additional type checking.
The Sail interface should be:
interface SailValueFactory extends ValueFactory:
interface SailReadOnlyTransaction:
interface SailTransaction extends SailReadOnlyTransaction:
The SailRepository could have methods like:
RepositoryConnection prepare() and isRollbackOnly()
Not every implementation would need to do anything for prepare(), but for any implementation to support a two-phase commit (in Sesame) this method would need to be in the API.
Prepare() would allow a supporting implementation to prepare the current transaction for two-phase commit. After this command, the transaction would be over; its state would be fully persisted, and there is a very high probability that it could be committed successfully.
Once prepared, a transaction can later be committed or rolled back.
From the point of view of the connection, after executing prepare(), there is no active current transaction, and the effects of the prepared transaction are no longer visible. (The effects will become visible again if the transaction is committed.)
If the prepare() command fails for any reason, it becomes a rollback(): the current transaction is cancelled.
The prepare() method gives the SAIL a chance to check for conflicts and ensure consistency in the data and that the connection was properly isolated.
The isRollbackOnly() method allows the SAIL to give the apps an early indication that there is a conflict with another connection and allows the apps to reduce extra processing.
Add includeinferred to the QueryModel
QueryModel should include all the information needed to evaluate the query. The SailRepository could also use BindingAssigner so SailConneciton#evaluate(QueryModel) only needs one argument.
StatementPattern evaluation and cardinality is too complicated.
EvaluationStrategyImpl#evaluate(StatementPattern) should be refactored to move much of the logic into helper classes. (like org.openrdf.sail.optimistic.helpers.ExternalModel.)
It should be easier to override CardinalityCalculator#getCardinality(StatementPattern). This method should be moved to EvaluationStatistics or merge the two classes together.
NativeStore should be renamed to DiskStore
To be more descriptive.
RepositoryConnection createContext(URI)
Some RDF stores treat explicitly created contexts (or graphs or models) differently then implicitly created ones. For example querying a explicitly created contexts returns an empty set, while querying an implicitly context attempts to resolve the context to an external source. Currently this is simulated in Sesame in the repository-dataset module, treating all empty contexts as other RDF stores treat implied contexts.
Sesame should add the createContext(URI) to distinguish between between empty contexts and external datasets. This would allow the user to control the datasets. This is a feature found is many RDF stores and should be supported in the Sesame API if it is to gain the support of other backing stores.
Prepared queries
The Repository API currently support "prepared queries" in that one can call prepareQuery() and get a Query object back. However, in a lot of cases all of the non-evaluation related actions are repeated on each call to Query.evaluate(). The SailRepository at least prevents parsing the query string more than once, but the underlying Sail repeats the optimization and query planning with each call.
To improve this situation, we can add support for prepared queries on the Sail APIl. In the Sail API we can replace the current evaluate() method with a prepareQuery(), which returns a "query plan" object that can be evaluated multiple times, with varying externally supplied bindings.
Adding some of the advanced features of iTQL to SeRQL.
Traversing a Graph (walk Function) and Transitive Closure (trans Function)
For instance, if you want to consider rdfs:subclassOf transitively, in
TQL you'd say:
trans($x <rdfs:subclassOf> $y)
Paul is suggesting we could access this in SPARQL with a magical predicate
like the following:
$x rdfs:subclassOf $y . rdfs:subclassOf rdf:type
mulgara:TransitivePredicate
Note that owl:TransitivePredicate is not being useb, as this is
descriptive, and doesn't necessarily imply intrinsic data.
Functions like this can be exposed with
special predicates, and no change to the query language or API are required.
My own usage tends to be of the form:
trans($s $p $o) AND $p <rdf:type> <owl:TransitivePredicate>
It's also possible to use a fixed predicate, such as rdfs:subclassOf,
but the code I write tends to look at semantics a little differently.
count Function and having Clause
These functions are being talked about a lot for SPARQL 2. It might be
a good idea to talk to some of the people who will be involved in
SPARQL 2 (principally thinking of Andy Seaborne here) so that
whatever is choosen to go with is likely to conform to the future
standard. At the least, you want it to be similar enough to only
require minor tweaks when the next standard is (eventually) released.
Mulgara's COUNT queries are certainly flexible, but are frustratingly
complex in the simple case. The operate on a kind of subquery, and the
contents of the COUNT have to be a complete SELECT query. It would be
nice to simply count on a variable sometimes. For this reason, the
other aggregate types haven't really come up in Mulgara.
IBM did some work in SQL, merging HAVINGs back into WHERE clauses.
Especially with Mulgara's query transformations, I'm wondering if some
of the HAVING operators could be brought back in to the WHERE as well
(note the earlier conversation on magical predicates). That has the
additional advantage of avoiding language extensions. OTOH, maybe it's
not possible to do for the general case. I think it's worth looking at
though.
subquery Function
Mulgara also support subqueries in the SELECT clause, for example the
following would return all resource, along with a list of labels for
each resource.
Looks useful, but also a bit non-standard and problematic to serialize
in e.g. the SPARQL results format. I would expect SPARQL to be extended
with subqueries in the WHERE-clause, like in SQL. This is also what
SeRQL supports now. I do see the use case for this functionality though,
but we'll need to investigate how this fits into the relational algebra
that we use.
RDBMS SAIL Improvements
- Port to Oracle's relational database.
- Port to DB2.
- Always put rdf:type in its own table.