Click here to subscribe to the Helix Nebula & PICSE Newsletter

Case study - A commons cloud credits business model to support and facilitate sharing and reuse of digital objects

The Procurer

We are a biomedical research agency based in the United States. Our mission is to seek fundamental knowledge about the nature and behaviour of living systems and the application of that knowledge to enhance health, lengthen life, and reduce illness and disability.

The goals of the agency are:

» to foster fundamental creative discoveries, innovative research strategies, and their applications as a basis for ultimately protecting and improving health;

» to develop, maintain, and renew scientific human and physical resources that will ensure the Nation’s capability to prevent disease;

» to expand the knowledge base in medical and associated sciences in order to enhance the Nation’s economic wellbeing and ensure a continued high return on the public investment in research;

» to exemplify and promote the highest level of scientific integrity, public accountability, and social responsibility in the conduct of science.

 

Why the cloud?

In our vision to speed up the scientific discovery, a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. to find, manage, share, use and reuse data, software, metadata and workflows is needed. Such a system is what we call a “Commons”. A commons is a digital ecosystem that supports open science and leverages currently available computing platforms in a flexible and scalable manner to allow researchers to transparently find and use computing services and tools they need, access large public data sets and connect with other resources associated with scholarly research (e.g. GitHub, Zenodo, ORCID, Figshare, journal publishers etc.). Such a system must be adaptable to the different and evolving needs of research communities as well as the evolving technology innovations.

Components of the Commons ecosystem include:

» A computing environment, such as the cloud or HPC (High Performance Computing) resources, which support access, utilization and storage of digital objects.

» Public data sets that adhere to Commons Digital Object Compliance principles. o Software services and tools; » Scalable provisioning of compute resources.

» Interoperability between digital objects within the Commons. » Indexing and thus discoverability of digital objects. » Sharing of digital objects between individuals or groups.

» Access to and deployment of scientific analysis tools and pipeline workflows. » Connectivity with other repositories, registries and resources that support scholarly research.

» A set of Digital Object Compliance principles that describes the properties of digital objects that enables them to be findable, accessible, interoperable and reproducible (FAIR).

Clouds are increasingly being used as a computing platform by biomedical researchers because they afford a high degree of scalability and flexibility in both cost and configuration of compute services. Making public data, especially large commonly used data sets, easily accessible in the cloud will reduce the burden and cost of individual investigators independently moving these data sets to cloud, enable the ability to compute against data sets and permit new and novel uses across data sets. Adherence to a digital object compliance model will be essential in order to make these data sets indexable and easily discoverable. Easily finding, deploying, linking and using computing services and analytical tools/workflows will promote rapid and flexible scientific discovery in the Commons and will make it easier for those with more limited computational skills to utilize the environment.

 

How we procured cloud services

At its foundation the Commons framework requires a computing platform that, in its initial iteration, will be implemented using a federation of public and private computing clouds and other capable compute platforms, e.g. university and national laboratory high performance computing (HPC) resources. As only a limited number of investigators today have access to such resources, it will be necessary to facilitate access to them in order to fully evaluate their use. That’s why we started testing a Cloud Credits Pilot, which is a business model to support the use of cloud computing for the Commons.

The idea behind the Commons cloud credits business model is to provide unified access to a choice of”Commonsconformant” compute resources. This cloud credits model will offer individual investigators a choice of cloud providers so that the investigators themselves can select the best value for their individual research needs. The cloud credits business model is shown in the Figure below: In this model, the participating researchers obtain ‘Commons credits’, dollar-denominated vouchers that can be used with the cloud provider of the investigator’s choice. The involvement of multiple cloud providers will empower investigators by creating a competitive marketplace where researchers are incentivized to use their credits efficiently and cloud providers are incentivized to provide better services at the lowest possible price. In order to participate in the Commons, a cloud provider must make its computing environment ‘conformant’, ensuring that it meets a set of standards for capacity (storage, compute, and network) and capabilities that enable scientists to work in such an environment.

Next steps:

The research agency is currently 3 months into a three year pilot to test the efficacy of this business model in enhancing data sharing and reducing costs. In this pilot the research agency would not directly distribute credits; rather, it will contract with a third party to manage the requests for and distribution of credits (shown as the ‘Reseller’ in Figure 4).

 

What we learned

Advantages of this model:

» Supports simplified data sharing by driving science into publicly accessible computing environments that still provide for investigator level access control

» It is scalable for the needs of the scientific community for the next 5 years

» Democratizes access to data and computational tools

» It is cost effective:

» Creates a competitive marketplace for biomedical computing services;

» Reduces redundancy

» Uses resources efficiently

Potential disadvantages:

» Novelty: This model has never been tried, so we don’t have data about likelihood of success

» CostModels: Predicated on stable or declining prices among providers. True for the last several years, but we can’t guarantee that it will continue, particularly if there is significant consolidation in industry o Service Providers: Predicated on service providers willing to make the investment to become conformant. The market research suggests 3-5 providers within 2-3 months of program launch.

» Persistence: The most significant disadvantage to this model is that it is pay as you go;that is, digital objects may no longer remain in the Commons if the research agency does not continue to pay for their maintenance. In addition, investigators have an unprecedented level of control over what lives(or dies) in the Commons.

Piloting

» The use of a relatively small number of providers, coupled with a single reseller distributing credits provides the research agency with an opportunity to assess the usage of digital objects that are being supported and maintained in the Commons.

Having general service provider conformance requirements is fundamental to streamline the management of the full model:

» In this model six areas for minimum requirements have been identified: Business relationships (coordinating centre, investigators); Interfaces (upload, download, manage, compute); Capacity (storage, compute); Networking and Connectivity; Information Assurance and Authentication and authorization.

Some examples of requirements are listed below:

1. A conformant cloud is not necessarily a provider of Infrastructure as a Service (IaaS) although all providers must provide IaaS.

2. Resellers: A reseller of services can act as a conformant provider so long as the provider upon which they operate their service layer is able to meet the conformance requirements.

3. Credit Distribution Model: The provider must accept the financial mechanism by which the Government intends to deliver payment and to provide monthly on pre-defined and mutually agreeable reporting of Commons user metrics for those utilizing their services.

4. General access considerations: In order to be part of the Commons, Providers must make their services available to the broad research community. Thus, a cloud that is inaccessible outside of that organization will not be considered conformant, since it does not make the digital objects contained within that cloud available to the broad research community.

5. Business relationships and liability: Digital Object Stewards and other investigators that interact with the Commons will do so under a business relationship with the Provider(s); the government will not be a party to these agreements. Similarly, the government and Providers will not participate in a direct relationship for the purposes of the distribution of resources; rather resources will be distributed and managed by a third party (the coordinating centre) with whom the government will have a contractual relationship. The government therefore accepts no liability for the actions of investigators in the Commons.