caGRID Logo
caGRID
Cancer Biomedical Informatics Grid
Organizations in the health care and research communities have a distinct need to share data when collaborating, but there are few standards that support interoperability between software systems in this domain. caGrid is designed to solve the problem of sharing data and analytical resources in an environment where resources are hosted by multiple organizations and located in multiple administrative and security domains. caGrid provides tools and APIs for software developers to build secure, interoperable services and applications. Furthermore, caGrid provides a fully-integrated suite of software components to manage and verify user identities, data access privileges and computer trust relationships

The Problem

Whether it is patient treatments and outcomes or genome wide association data, organizations in the health care and research communities have a distinct need to share data when collaborating. However, there are few standards that support interoperability between software systems in this domain. For example, a research project may require integrative analysis of microarray, imaging, and clinical data. These datasets may be collected by different entities (such as shared resources and medical information warehouses) and may not be stored in a centralized system.

The Solution

caGrid is designed to solve the problem of sharing data and analytical resources in an environment where resources are hosted by multiple organizations and located in multiple administrative and security domains. In addition, caGrid works just as well within a single institution, providing the tools needed to share data seamlessly across departments. F Authentication and authorization controls can be used to limit access to the datasets. A key benefit of using caGrid is that caGrid makes it easy to evolve from sharing data within an institution to sharing data with external collaborators. In most cases, no new software needs to be deployed. Resources can be shared both within an institution and with external collaborators simply by changing the security access restrictions. Furthermore, caGrid is designed to support the sharing of well-defined data. Some of the ways this is achieved include the use of controlled vocabularies to label data items, the use of detailed information models to describe how data items relate to each other, and the use of concepts ­ definitions which indicate the precise meaning of items in the models. These formal descriptions of the data greatly increase the ability to reuse it across studies and for various purposes.

Benefits

The caGrid infrastructure is designed to facilitate interoperability and federation of information and analytical resources, potentially developed by independent groups, in a multi-institutional environment. caGrid provides tools and APIs for software developers to build secure, interoperable services and applications. Sharing data and analytical routines with collaborators provides researchers with the capability to benefit from the combined expertise, knowledge, and resources of multiple organizations. caGrid combines Service Oriented Architecture (SOA), Grid computing, and the Model Driven Architecture (MDA) (http://www.omg.org/mda/) in an integrated framework.

A major aspect of caGrid is the security infrastructure. caGrid provides a fully­ integrated suite of software components to manage and verify user identities, data access privileges and computer trust relationships (mutual proof and acceptance of identity between computers). caGrid allows data owners to implement any needed access control policies for e.g. proprietary, patient or experimental data. All of this is done in a way that is shared across all caGrid­ compatible tools. Each collaborating institution is able to define and enforce its own security policies: share as much or as little data as the institution decides wit h exactly whom they choose.

Technology

caGrid is based upon the Globus toolkit, which is a based on the Axis web services framework. Some of the other technologies leveraged include Tomcat, JBoss, Hibernate, Ant, Ivy, Postgres, Oracle, MySQL.

Select Adopters

Approximately 50 NCI affiliated cancer centers are participating in the caBIG program, making them primary adopters of caGrid technologies.

Furthermore, SemanticBits has been instrumental in designing and developing major components in the caGrid infrastructure, including the security components and the caGrid Portal. Also, SemanticBits personnel have contributed to the architecture and development of some of the key components of caGrid including data services, caGrid Query Language (CQL), federated querying and distributed caGrid query language (DCQL), caGrid service metadata etc. Furthermore, SemanticBits has leveraged caGrid technologies in many of our projects, including caAERS, C3PR, and PSC. We have the necessary depth of knowledge and experience to implement and deploy a wide variety of caGrid-based solutions.

Key Milestones

caGrid 1.1 was released in early 2007, caGrid 1.2 was released in 2008, and caGrid 1.3 was released in Spring 2009.

PreviousNext