Data Storage Centralization
Data Storage Centralization
and
share data. Data storage centralization provides safety and gives users the ability to stream and download files provided by network
users. The concept of centralized data management opens a variety of possibilities which are essential for any company in today's
competitive and demanding market. What does this mean for the storage industry and for the evolution of storage software? Primarily,
an expanded range of possibilities is changing the standard from an ordinary file server to a complex and multifunctional device.
Reliability is another commonly overlooked cost-cutting measure. Both modern data centers and individual centralized storage solutions
are more reliable than dispersed individual machines. This single investment will ultimately prove to be more cost-effective and reliable
than seemingly less expensive solutions, based on multiple distributed machines.
Hard drives are undeniably the most burdened and hard-working parts of any computer or server, and therefore no hard drive will work
forever. Storage solutions help users overcome the consequences of damaged hard disks by saving data on several HDDs placed in a
RAID and consolidating data on a single server.
While centralizing data in one location may seem like a security risk, data scattered on several machines increases the number of portals
through which intruders can compromise your company's information. Therefore, protecting a central server is not only easier, but a
more cost effective way of doing business.
These advantages provide one conclusion - storage centralization is a must. The only question remaining is how to find the best possible
solution for your data management requirements.
Despite their differences, SAN and NAS are not mutually exclusive, and may be combined as a SAN-NAS hybrid, offering both file-
level protocols (NAS) and block-level protocols (SAN) from the same system. It gives a wide range of possibilities, from an ordinary
file sharing platform to an expanded area, created for work and archiving data.
Fibre Channel is a network technology used for storage networking (in Storage Area Network). FC is dedicated for large enterprises
and advanced projects. It offers speed and the ability for implementation of complex solutions.
The biggest advantage of FC is its high performance and the ability to perform distant connections. Fibre Channel is also used with the
most efficient HDDs. The difference between FC (2Gb/s or even 8 Gb/s) and iSCSI (1Gb/s - 10Gb/s in the most expensive option) is
minimal when dealing with point-to-point connections with a bottleneck, for example a server or mass storage device. The main
difference is the ability to create a fast network with a large number of switches.
Fibre Channel-based solutions are quite expensive and complex compared to iSCSI. FC also requires the installation of a second HBA
controller, as well as additional drivers and managing software. These expenses make Fibre Channel unattainable for smaller companies.
Technical issues are also important. The efficiency of an Ethernet network may determine whether an additional iSCSI package is an
option.
Storage centralization requires custom solutions and Open-E DSS V7 is flexible enough to execute on your centralization needs with
minimum complexity.
Abstract
Go to:
Table 1
The CR also provides the opportunity to pool data across several studies to increase the power of statistical
analyses. In addition, most NIDDK-funded studies generate genetic material for testing and some carry out
high-throughput genotyping, making it possible for other scientists to use repository resources to perform
informative genetic analyses using well-curated phenotypic data.
Go to:
Table 2
Study data from the Data Coordinating Centers (DCC) were submitted in SAS and retained in SAS format when
archived stored. Requestors seeking alternative formats were provided with alternative formats using the
dbCOPY tool [3]. All study documentation except some electronic data capture forms was stored in PDF. Some
older data capture forms were delivered as image files. In all cases these were readable via Adobe Acrobat.
Study data were not shared until a request was authorized. At that time the entire contents of the archive of the
requested study was sent by a secure FTP process to the requestor site. This meant that the data could be
uploaded to the requesters system regardless of the target operating system.
Over time, we enhanced the public system to provide some of the functionality of the original design that was
lost during initial implementation. To help users explore the vast amount of data and samples stored in the
repository, we developed a set of Public Query Tools (PQT) that allowed public users to explore data elements
in both structured and unstructured ways [Figure 1]. The structured searches used parameters to identify studies
with resources that could support a new research hypothesis (e.g., types of stored samples, intervention method,
and primary outcomes). PQT opened a window to the data for users and was an important enhancement of
public data sharing for the repository. Researchers and the lay public were able to learn specific results about
the research funded by NIDDK, and in this way PQT served as a valuable public education tool. However, this
value came at a high labor cost since study data was stored only in archived data sets [4]. To fuel PQT, select
data elements were curated by repository staff and uploaded to a database that supported the PQT functionality.
This level of curation required clinical expertise available only through senior repository staff. Thus,
maintaining PQT was a costly effort that required significant investment which ultimately has to be weighed
against the benefit. There were cost advantages. With researchers able to personally explore the availability of
stored samples and link them to specific data elements, the amount of expert labor required for sample request
processing was reduced.
Figure 1
Public Query Tools allowed public users to explore data elements in various ways.
Offering mechanisms that make data more available for public inquiry is surely an important function of a data
repository. In fact, with the increase in genomic research, sharing of actual study results with participants is
increasingly critical [5]. The question becomes one of cost and technological innovation and associated
development costs so that data sharing can take place without a high level of content review by CR curation
staff. Use of data standards and common data elements during collection should allow for a more automated
presentation of results. Such consistency must begin at the design stage and requires that the data repository be
a partner right from the start to streamline processes and reduce the cost of post study data sharing.
Highlights
Effective operation and maintenance of the NIDDK Central Data Repository required the system to be flexible
and dynamic while at the same time compliant with established data standards.
We describe some difficulties of managing a large repository, an operation that is by definition dependent on
many outside parties whose degree of expertise and efficiency have a direct impact on repository functioning.
The bio-banking industry will likely continue to become more globally centralized for studying specific genetic
diseases and monitoring the health of our environment.
The dynamic relationship between emerging technologies and the existing infrastructure will be needed to
support future research that requires supporting organizations to remain flexible even while following
established standards.
Go to:
Acknowledgments
This work was supported by National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK),
Bethesda, MD 29892, USA, contract number HHSN267200800016C
Go to:
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our
customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final citable form. Please note that during the production process errors
may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Go to:
References
1. National Institutes of Health, Trans-NIH BioMedical Informatics Coordinating Committee (BMIC) [Accessed September
18, 2013.];NIH Data Sharing Repositories. http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html.
2. Horn L, Bledsoe M, Sexton K. Marketing Your Biobank Collection Effectively. 2012 Oct 23; Retrieved
from http://www.youtube.com/watch?v=5Et4IIUqBlY.
3. Heinemann Florian, Sven Kolber GbR. [Accessed November 13, 2013.];dbCOPY Database
Tool. http://www.dbcopy.com.
4. Pan H, Ardini MA, et al. “What’s in the NIDDK CDR?”-- public query tools for the NIDDK central data repository.
[Accessed September 18, 2013];Database (Oxford) 2013 Feb; doi: 10.1093/database/bas058.2013:bas058 Print
2013. http://www.ncbi.nlm.nih.gov/pubmed/23396299. [PMC free article] [PubMed][Cross Ref]