thespot4sap.com independent sap information
 

get SAP Access - pay monthly

SAP Tutorials    Online SAP Training    SAP CBT's    SAP Forums    SAP Articles    SAP Jobs    SAP Resumes
  SAP Access    SAP Blogs    SAP Books     Links     SAP Vendor Directory     Submit Content    Search
Previous posts in Other ERP
Page 1403 of 5577

RDBMS to RDF Mapping Workshop and Benchmarks

Blogger : Virtual Database Technology Blog
All posts : All posts by Virtual Database Technology Blog
Category : Other ERP
Blogged date : 2007 Nov 21

RDBMS to RDF Mapping Workshop and Benchmarks

I was recently in Boston for the Mapping Relational Data to RDF workshop of the W3C.

The common feeling was  that mapping everything to RDF and querying it in terms of a generic domain ontology , mapped on demand into whatever line of business systems would be very good if it only could be done. However, since this is not so easily done, the next best is to extract the data and then warehouse it as RDF.

The obstacles perceived were of thefollowing types:

- Lack of quality in the data -  The different line of business systems do not in and of themselves hold enough semantics. If the meaning of data columns in relational tables were really known and explicit, these could be meaningfully used for joining across systems.  But this is more complex than just mappingthe metal lead to the chemical symbol Pb and back.

- Lack of performance in RDF storage. Data sets even in the tens of millions of triples do not run very well in some stores.  Well, we had the Banff life sciences demo with 450M triples in a small server box, so this is not universal, plus of course we are coming up with a whole different order of magnitude, as often discussed on this blog.

- Lack of functionality in mapping and possibly lack of pushing through enough of the query processing tothe underlying data stores.

Personally, I am quite aware of what todo with regard to performance of mapping or storage and see these  aseminently solvable issues. After all, we have the great investment of talent in databases in general and it can be well deployed towards RDF, as we have been doing these past couple of years. So we talk about the promise of a 360 degree view of information with RDF beingthe top layer.  Everybody agrees that this is a nice concept. Butthis is a nice concept especially when it can do the things that arethe most common baseline expectation of any regular DBMS, i.e.aggregation, grouping, subqueries, views.  Now, I would not go sell aDBMS that has no count operator to a data warehousing shop.

The fact that OpenLink and Oracle allow RDF inside SQL and OpenLink even adds native aggregates and grouping to SPARQL fixes the problem with regard to specific products but leaves the standardization issue open. Of course, any vendor will solve these questions one way or another because a database with no aggregation is a non-starter.

I talked to Lee Feigenbaum, chair ofthe W3C DAWG about the question of aggregates and general BI capabilities in SPARQL.  He told me that, prior to his time with theDAWG, these were left out because these conflicted with the openworld assumption around RDF:  You cannot count a set because you by definition do not know that you have all the members, the world being open and all that.

Say what? Talk about the road to hell being paved with good intentions. Now, this is in no way Lee's or thepresent day DAWG's fault, as a member myself I can attest to the good work and would under no circumstances wish any delays or revisions to SPARQL at this point.  I am just pointing out a matter that all implementations should address, as a sort of precondition of entry into the real world IS space.  If this can be done interoperably, so much the better.

Now, out of the deliberations at the Boston workshop arose at least two ideas for follow up activity

The first was an incubator group for RDF store and mapping  benchmarking. This is very appropriate inorder to dispell the bad name RDF storage  and querying performance has been saddled with.  As a first step in this direction, I will outline a social web oriented benchmark on this blog.

The  second activity was an incubator group for preparing standardization of mapping methodologies from relational schemas to RDF.   We will be active on this as well.

The two offshoots appear logically separate but are not necessarily so in practice.  A benchmark is after all something that is supposed to promote a technology to a user base.  The user base seems to wish to put all online systems and data  warehouses under a common top level RDF model and then query away, introducing no further replication of data or performance cost or ETL latencies.

Updating would also be nice but even query only would be very good. Personally, I'd say the RDF streng this all  on the query side. Transactions are taken care of well enough by what there already is, RDF stands out in integration and the adhoc and discovery side of the matter. Given this, we expect the valueto be consumed in a heterogenous, multi-database, federated environment.  Thus a benchmark should measure this aspect of the usecase.  With the right mapping and queries, we could probably demonstrate the added cost of RDF to be very low, as long as we could push all queries that can be answered by a single source to the responsible DBMS.  For distributed  joins, we are back at the question of optimizing distributed queries but this is a familiar oneand RDF is not the principal cost factor.

The subject does become quite complexat this point.  We would have to take supposedly representative synthetic OLTP and BI data sets, like the TPC D, E, H ones and inventqueries across them that would both make sense and be implementable in SPARQL extended with aggregates and subqueries.  Reliance onSPARQL extensions is simply unavoidable.  Setting up the test systems would be non-trivial, even though there is a lot of industry experience in these matters on the database side.

So, while this is probably the benchmark most relevant to the target audience, we may have to start with a simpler one.  I will  next outline something to the effect.


Read comments or post a reply to : RDBMS to RDF Mapping Workshop and Benchmarks
Page 1403 of 5577

Newest posts
New Page 1

 

 

About Us   Contact Us   Privacy   Disclaimer   Feedback   Email Discussion   Newsletter  

Copyright © - Independent SAP Information
Learn XML, Guesthouses and B&B's