caBIG Semantic Liason

Semantic infrastructure enables the construction of semantically rich information models that describe distributed data sources. It is critical that such infrastructure provide a means to access and navigate information models (including standard XML metadata, APIs, and centralized repositories), as well as enable sophisticated abstraction and reasoning over these models, and the data that conforms to them, for the purpose of data integration. SemanticBits has extensive expertise in using Semantic Web technologies, such as RDF, RDFS, OWL and SPARQL, in conjunction with existing semantic infrastructure to support data integration. Our general approach is to define service interfaces, reference implementations, frameworks, and plugins in order to minimize the effort of providing these services while maximizing existing investments, flexibility and interoperability. 

SemanticBits has provided analysis, design, and prototyping services around the use W3C Semantic Web technologies in conjunction with the existing semantic infrastructure to support data integration.  A typical domain-oriented use case that we deal with includes:

A researcher has a set of genes of interest and would like to retrieve information about the pathways each gene is involved in, the SNPs associated with each gene, the splice variants of the genes’ transcribed RNA, the sequences and sequence variants of each of the proteins that the genes encode. She would like to consider as many data sources as possible and include new data sources quickly, as they become available on the grid.

SemanticBits has the breadth and depth of experience to tackle use cases described in the terms of ends users (such as the one above), as well as those that are more technical in nature.  For example, the following example is a typical technical use case that we have encountered:

The researcher's query must combine data from multiple, heterogeneous information models. Multiple queries must be designed to retrieve data from each data source, and the retrieved data must be transformed and linked to provide a unified view. Traditional ETL and data warehousing approaches that are based on relational databases suffer from latency, lack of flexibility, and a loss of the source information model semantics. Integration of each new information model requires the design of new queries and ETL processes.

Service components which facilitate the use of Semantic Web technologies can be used in conjunction with the existing semantic infrastructure to ease integration of data that conforms to multiple, heterogeneous caBIG information models.  To facilitate the use cases above, SemanticBits has the necessary expertise to enable Semantic Web (RDF/OWL) data to co-exist with W3C XML Schema-based message formats and provide an incremental path toward using Semantic Web technology for data integration.