52A Blockchain and IPFS Based Framework For Secure Research Record Keeping
52A Blockchain and IPFS Based Framework For Secure Research Record Keeping
Abstract—Research record keeping in academics is important etc., must be preserved in a secure and tamper proof
in order to ensure proper planning, management and execution environment without leakage of information, since they are
of research work. With the rapid development of technology very critical. A slight variation or modification in these
and increasing amount of information records, there are huge documents may lead to serious consequences.
chances for information leakage and record tampering, which Traditional databases can be used for storage of this data.
is a serious threat to privacy and authenticity of the research Since traditional database management systems involve a
records. This information when stored in a central server may
central authority to take control of a large amount of data,
lead to problems in efficiency. So there is a need for a
distributed system, which is both efficient and secure. one cannot trust on the confidentiality, integrity and
Blockchain is the emerging technology which attempts to solve authenticity of the data. So, there are many issues in
these issues by creating tamper proof event of records in a centralized systems such as Denial of Service(DOS) attacks
distributed environment. IPFS is a protocol designed to store and single point of failure. Here comes a need of a
hypermedia in a peer-to-peer distributed file storage with distributed technology which ensures the authenticity,
content-addressability. The framework proposed in this paper confidentiality and integrity of data.
attempts to combine both these technologies and other Blockchain technology can create tamper-proof, secure
traditional encryption methods to create a secure, tamper proof record of events in a distributed, peer-to-peer network of
model of academic research record keeping with access control several nodes of computers. The cryptocurrency based
methods. Furthermore, the system utilizes ethereum smart
transaction system like bitcoin is based on this technology.
contracts to store the provenance metadata information
retrieved from the IPFS file system to the blockchain network,
Blockchain ensures anonymity and security of the users
to create tamper-proof records for further auditing purposes. involved in the transactions. Blockchain consists of a
growing list of blocks which are linked by cryptographic
Keywords—Academic research record keeping, blockchain, algorithms. It is based on the Distributed Ledger Technology
IPFS, ethereum, smart contract, provenance (DLT) which is a system for recording digital transactions in
metadata,tamperproof. a distributed storage with no centralized data stores.
The distributed ledger technology can be used to write
I. INTRODUCTION smart contracts or digital contracts or blockchain contracts
Academic research record keeping is important for the which are self-executing contracts that can be converted to
research planning and management, replication of results, computer code with the help of certain platforms, and can be
documentation of collaborations, publishing and peer replicated, shared and supervised by network of computers
review, and for complying with governmental and that run on the blockchain. Smart contracts avoids
institutional rules and regulations. Good research records middleman by automatically defining and enforcing rules
consist of much more than just research data. They include and obligations made by the parties in the ledger. While
protocol description, data manipulation and analysis blockchain can be used for storage of less amount of data
procedures, personal and group interpretation of the results, like transaction metadata information, hash values etc., IPFS
and important communications and group decisions among can be used as a peer-to-peer, distributed system to store
collaborators. So the data must be confidential, secure and hypermedia in large quantities.
tamper proof in order to avoid any discrepancies. While InterPlanetary File System (IPFS) [3] is a peer-to-peer
considering academic research, the Principal Investigator(PI) hypermedia protocol and distributed file system that is to
is the main actor who ensures proper research planning, replace the web of tomorrow. It has a block storage model
management and execution of the ongoing research. with hyperlinks to address the contents forming a Merkle
Documents like proposal of funding agencies, project Directed Acyclic Graph (DAG). Since IPFS is distributed, it
reports, memorandum of understanding, minutes of meeting has no single point of failure.
1437
International Journal of Pure and Applied Mathematics Special Issue
There are many disadvantages of HTTP such as inefficiency, over time. IPFS has a special property of content addressing
no historic versioning, and centralization. So IPFS at the HTTP layer for the identification of files. IPFS
overcomes the disadvantages of HTTP. represents a file by the hash on it, instead of representing it
This paper presents a framework where the scientific by which server it is stored on. The hash of files in IPFS
research record keeping can be done in a secure, tamper always begins with "Qm" and the hash is actually a
proof environment using blockchain technology, IPFS and multihash. Name of files is IPFS is actually not a part of the
smart contracts. For the storage of documents such as project IPFS object, so two files with different names and same
reports, memorandum of understanding, funding projects content will have the same hash values. Ethereum
documents, attendance records, and minutes of meeting, blockchain’s Merkle Patricia tree structure [5] can also be
IPFS is utilized,along with certain access control methods, emulated as IPFS objects. For larger pieces of data to be
since all participating nodes in the network need not stored on the ethereum blockchain, a larger amount of fee
necessarily be able to access all the important information. has to be paid, so only the hashes of files are stored on the
Methods like secret sharing [20] and asymmetric key ethereum blockchain rather than storing the whole file on it.
cryptosystem can be implemented as additional functionality Further, this hash of the file can be linked with the file on
in the system for limiting the access structure only to certain the IPFS to access it [4]. A novel zig-zag based storage
users of the system. The provenance metadata information of model based on IPFS and blockchain is provided in [9] to
the documents stored in IPFS is further uploaded to the address the issue of high-throughput for individual users in
blockchain in order to ensure the integrity of the IPFS.
information. The Principal Investigator(PI) can ensure that Smart contracts provide an easy way to access the
these documents are only accessed and modified only by ethereum blockchain. Ethereum smart contracts are written
intended users, who are allowed access, using this audit in a high-level coding language called Solidity [13] which is
information on the blockchain. influenced by coding languages such as C++, javascript and
The structure of the paper is as follows: The next section Python. To develop ethereum smart contracts, Remix IDE
provides the necessary background information and the work [7] can be used, which is a browser based IDE. Another one
which is related to the framework proposed in this paper. is the Truffle framework [6], which supports built-in smart
Section 3gives the overview of the framework proposed. contract compilation, linking, deployment and binary
Section 4 provides the analysis of the framework and management. It supports both public and private network
Section 5 provides the conclusion and future work to be deployment environments. The truffle framework has a one-
done. click blockchain support mechanism called Ganache, which
is an internal javascript implementation of the ethereum
II. BACKGROUND AND RELATED WORK blockchain. It also has the support of front-end libraries with
Traditional centralized databases are mostly based on the Drizzle. In order to run ethereum decentralized apps in the
client-server architecture, where the client can store entries browser itself, without running a full node, MetaMask [8]
in a central server, and can access updated copy of the can be used. The above tools can be combined for an
effective ethereum decentralized application development.
information on each time of accessing the server. In contrast
Data provenance refers to the tracking and recording of
to this, blockchain is a growing list of blocks which are
the origins of data. It refers to the collection of history of
linked and secured using cryptographic algorithms. This
data such as creation, attribution and data versioning.
technology was invented by Satoshi Nakamoto in 2008, for
Provenance metadata is very important for forensics
the purpose of using it in his cryptocurrency Bitcoin [1].
purposes and auditing. Blockchain can be used as a platform
Each block in the blockchain contains list of transactions,
for provenance data management in a trustworthy manner.
hash of the previous block and hash of the current block.
With the Open Provenance Model(OPM) [11] and ethereum
The first block in the blockchain is called the genesis block.
smart contracts, immutable trials of data can be recorded
Blockchain is a distributed ledger technology maintained by
[10]. ProvChain [12] is a distributed, cloud based data
a peer-to-peer network consisting of nodes. For updating the
provenance architecture, which creates tamper-proof record
distributed ledger, the participating nodes in the network
of events by embedding the provenance records into the
should derive at a common consensus. The consensus
Blockchain as transactions. The system uses bitcoin
protocol is the core and it decides how a blockchain works.
Blockchain and Tierion API [14] is used to embed data
Sankar LS et al., in [2] provides an analysis and study of
records into Blockchain. Tierion API uses the Chainpoint
various consensus protocols in blockchain and the feasibility
standard [15], which is an open standard to create timestamp
and efficiency they provide in various platforms. Blockchain
proof of any data record.
can be visualized as a trusted record keeping system based
on archival science – an ancient science aimed for
III. PROPOSED FRAMEWORK
preservation of records [21].
IPFS [3] is the distributed and versioned file system The users involved in the system are the Principal
which can connect many computing nodes with the same Investigator(PI) and Junior Research Fellow (JRF). The
system of files and manage them by tracking their versions documents which are to be considered are project reports,
1438
International Journal of Pure and Applied Mathematics Special Issue
memorandum of understanding, funded project details, If the user fails to succeed in any one of the steps, then the
funding agency details, attendance records of the JRFs, and authentication would be unsuccessful. The three step
minutes of meeting. While considering the above authentication for secure login into the system is depicted in
documents, some can be accessed and modified by both the Fig. 2.
PI and the JRF and some can only be allowed to be modified
by the PI. Hence, the access control policies must be defined
according to the users using the system.
The framework proposed here can be divided into three
main phases: User Registration and Authentication, Storage
of documents and access control, Provenance Metadata
information storage and retrieval for auditing purposes. An
overall flow of the proposed framework is depicted in Fig. 1.
A. User Registration and Authentication
There are two different users involved in the system -
Principal Investigator(PI) and Junior Research Fellow(JRF).
There can be multiple PIs and JRFs who are using the
system. So the registration of the users should be unique to Fig. 2. Three step authentication process
avoid issues like impersonation. PI can be registered with
details such as (PI id number, PI biometric details, PI name,
B. Storage of documents and access control
password, secret question) while the JRF can be registered
with details such as (JRF id number, JRF biometric details, The documents such as project reports, project funding
id number of PI assigned, JRF name, password, secret details, memorandum of understanding, attendance records,
and minutes of meeting are encrypted and stored in the IPFS.
question). These details can be stored in secure data stores
IPFS is a distributed file system that creates a unique hash of
which is either centralized or distributed.
each document, and the other nodes on the network can
Once the users are registered into the system, they can access and view the files only if the unique hash of the file is
login into the system using their details. There are three known to them. In order to restrict access to particular nodes
steps of authentication in this system. In the first step users on the network, certain access control methods can be
provide their id number and valid password. If it succeeds applied.
then they would have to pass a second layer of biometric Two ways of access control can be applied: One way of
authentication process where they should provide their restricting access is by implementing asymmetric encryption
biometric details. If that too succeeds then the third layer scheme through GnuPG [19].In this scheme, the users who
would be the secret question. Only if all these steps succeed, have the key only can decrypt the document. Others cannot
users can successfully login into the system. decrypt the document.
1439
International Journal of Pure and Applied Mathematics Special Issue
Even if the link of the document is provided to the users, the Derived Key DK = KDF (PRF, Password, Salt Value,
users can only decrypt the document when they have the Number of iterations, Desired Length of the derived key).
key. So in the proposed system, this particular secret key is Here, KDF is the key derivation function such as a keyed
called as the master key. This master key is provided to the HMAC or simple HKDF, PRF is the Pseudo Random
users who are allowed to access the system when they are Function, Password is the secret password of the PI, and Salt
registering with the system. Using this master key both the Value is a sequence of random bits.
PI and JRF can access the IPFS file system and access the All the changes made to the document would be stored as
files. So, all the files are accessible by the users who are different versions by the PI. If required, the PI can merge all
registered into the system. The user can login into the system these versions into a single version to get the final document.
and access the file, download it, decrypt it and then view it. Since all the changes made are stored as different versions,
But, an important restriction is applied here. Only the PI can the original document is safe and unaltered.
upload and create documents on the IPFS network.
Uploading and creating new documents in IPFS network is C. Provenance metadata storage and retrieval for
restricted for the JRF. auditing purposes
Consider the first scenario, where the JRF wants to access The next step is provenance metadata information
a particular document from IPFS. In this case, there is no retrieval and storage. Metadata here refers to the provenance
problem since the JRF can obtain the link of the document, data that has been collected from the IPFS file system. The
download it, and decrypt the document using the master key, provenance metadata information that has been collected
and then view the document. JRF can do anything with the from the IPFS file system can be: Name of the file, Hash
downloaded document such as modifying it or deleting it. value of the file, File creation time, File access time, IDs of
But all these operations will be affected only in the local the JRF who accessed the file, IDs of the PI who accessed
copy of the JRF’s downloaded document. The original the file. IPFS file logs can be collected for this purpose.
document in the IPFS remains unchanged. Consider the scenario where the JRF with id number say
Consider the second scenario, when the JRF wants to JRF_1 want to access a document A with hash value say
make changes to the original document in the IPFS. In this HashA. The JRF obtains the document from IPFS,
case, the JRF authenticates himself into the system, obtains downloads it and decrypts it. Here the access time of the file
the link of the document, download it, and then decrypt the accessed by the JRF would be recorded. Then JRF modifies
document using the Master Key, view the document and the document local copy and request the PI for uploading it
make changes to it. But now this document has to be in the IPFS. The PI would then validate this request and then
reflected in the IPFS File system. Only the PI can upload, uploads it in the IPFS. This would again be recorded and
create and modify the documents in the IPFS. So, the JRF now hash of the file is changed to HashB. The structure of
makes changes to document, sign the document with his
the provenance data recorded is depicted in TABLE I.
own digital signature and encrypt the document again with
This provenance data is very much necessary for further
the master key and send it to the Principal Investigator. Now
auditing purposes. The logs of the IPFS file system can be
the Principal Investigator decrypts the document, verifies the
signature and validates the changes. Then the document is used to collect the provenance data of the documents stored.
again encrypted with the master key and then the PI This provenance data information can be further embedded
authenticates into the system using his own secret key called in ethereum blockchain as transactions using smart contracts
as the derived key and then uploads the document to the which are built using Solidity Programming Language [13],
IPFS. Smart contracts are written to authenticate PI using the Truffle [6] and Remix IDE [7].Algorithm 1 provides the
derived key for uploading contents. This derived key is glimpse of the smart contract to retrieve a file from IPFS,
random in nature, so each time the PI wants to upload a whose hash has been stored in blockchain using smart
document into IPFS there is a different secret key to be contracts.
given to upload the modified document.
The derived key is usually derived from a password by
using Password based key derivation function. A key
derivation function (KDF) [18] is a function which derives
one or more secret keys from another secret value such as a
password or a master key using a Pseudo Random Function
(PRF). A KDF is used for key strengthening and key
stretching. There are many modern based key derivation
functions. One such is the PBKDF2 [16] which is considered
to be secure against brute force attacks (as of 2017) and
another one is the simple HKDF which is a simple key
derivation function [17]. The derived key can be obtained as
follows:
1440
International Journal of Pure and Applied Mathematics Special Issue
TABLE I
THE STRUCTURE OF A PROVENANCE RECORD
Record File Creation File Original File File Accessed by Accessed by Hash of the
Id Date Creation Hash access date access PI JRF modified file
Time Value time ID number ID number
R_1 12/03/2018 12.00 PM HashA 13/03/2018 4.25 PM -- JRF_1 HashA
1441
1442