Capstone Report - Final Submission
Capstone Report - Final Submission
Jonathan Delgado
Tayler Mauk
Bodey Provansal
16 June. 2020
Delgado et al. 2
EXECUTIVE SUMMARY
Our team set out to develop a prototype product that will be able to encrypt and upload a
user-defined set of files to a secondary storage location. This encryption will be done using a
secret key provided by the user. After files are encrypted, they will either be transferred to a
network location or uploaded to a cloud service. Uploading will be done using either network or
cloud credentials provided by the user. The goal of this product was to provide a simple-to-use
application that allows users to further protect their important data. This application should be
simple and reliable enough to be used at many different technical skills levels. From individual
users protecting their personal data to power users like Information Technology professionals
that work for small-to-medium sized organizations. We expect by the end of this project, to have
produced an application that can be installed and operated by our test group and that the features
Table of Contents
Introduction 4
Project Goals and Objectives 5
Stakeholders and Community 7
Feasibility Discussion 9
Functional Design 12
Approach and Methodology 14
Legal and Ethical Considerations 16
Timeline and Budget 19
Usability Testing and Feedback 20
Final Implementation 23
Conclusion 31
References 33
Appendices 34
Delgado et al. 4
INTRODUCTION
The goal of this project was to develop a locally run application that would allow users to
encrypt files in a directory on their workstation or server before uploading that data to an online
cloud storage or network location. The intent was for this application to be quick,
straightforward, and simple to use, so that the target audience will not depend on technical skill
level. Our group planned to focus test this application with both novice and “power” users
OBSERVED PROBLEM
devices, and cloud storage solutions like Amazon S3, it has never been easier to find places to
store data. Unfortunately, for many small companies or individuals, the data stored in these
locations is often not protected in ways that reflect its importance. Critical data stored in a shared
Google Drive or Dropbox is only as safe as each user who has access to it. A laptop left behind
on the bus with an easy to guess login is all it can take for unprotected data to be stolen,
intentionally encrypted and held for ransom, or lost for good. However, this is typically not done
out of negligence, as backup appliances and services that properly encrypt this data often come
PROPOSED SOLUTION
Our solution to this problem was to provide a simple, “one-click” method of encrypting a user’s
directory and uploading that secured data to a secondary back-up location. Our team proposed a
project to develop an application that could be run on a user’s computer that will allow them to
Delgado et al. 5
encrypt files of a predetermined size. Afterwards, these files can then be uploaded to online
cloud storage locations like Google Drive, Microsoft OneDrive, or Amazon Web Services.
Back-ups are useless without a reliable way to restore data, so our application would also allow
users to unencrypt the files after downloading from an online storage location.
to accomplish as much as possible in the time frame allotted, while producing a product that
could be iterated on after the initial project was completed. For technology used, we decided on
leveraging the widely used and versatile NodeJS, since it offered many core features that we
could quickly utilize. Initially, we decided on first only implementing a command line interface
for the program, but quickly found options to support a GUI through Electron and React
javascript libraries. We decided adding a simple and node-friendly GUI would greatly increase
the potential pool of users that would be able to test and eventually the software.
This was a potentially very open-ended project with many possible objectives. So our
team decided to first focus on a few core objectives listed in table Figure 1. Note that the
restrictions.
Goal Details
Create an abstracted, encrypted backup ● This goal refers to the core of the
project itself
system
Delgado et al. 6
Create profiles to store settings for various ● Users define jobs containing data
locations and destinations and other
tasks settings as needed by the software
● Configurations are saved in profiles
which are evaluated by the program
either on a schedule or on demand
The objectives surrounding this project were largely focused on security without compromising
ease of use. The table in Figure 2 lists a few core objectives needed in order to accomplish this.
Users can set a file (chunk) size to meet needs ● Access to local filesystem
● Store file data into chunks such that
or requirement the data is less than the defined chunk
maximum size and does not include
partially processed files (subject to
change)
Users can interact with the program using a ● Evaluate frequently accessed and
related settings
graphical user interface ● Create graphical user interface using
HTML, JavaScript, CSS on the
Electron platform
The goals and objectives listed in this section were not meant to be exhaustive, but instead relay
the general agreement that these features are viewed as essential or at least more so than others.
The stakeholders of this project are largely the developers, as no other parties were
pool of individuals who may benefit from the success of such a project. This community consists
of Information Technology professionals, enthusiasts and hobbyists alike. The grounds for
classifying such an expansive set of people was qualified under the assumption that no individual
would prefer to lose data over having it abstracted and backed up.
Advantages for the stakeholders and community resulting from the development of this
project including, but not limited to: accessible, inexpensive backup solution; secured data
through means of both abstraction and encryption; and on-premises functionality. Large
businesses may not have had much to gain from a project of this scale, as they likely have either
in-house or currently implemented backup solutions. Conversely, small and medium businesses
without the financial or technical means will have an opportunity to securely back up data. The
same assumption held true for personal use in regards to financial accessibility and ease of
implementation.
The only notable predicted loss from investing into this product would have been the loss
of absolute control over encryption and organized storage within the backup. The reason behind
this was that under our initial design, the software would attempt to organize files in such a way
that data chunk sizes are as close to one another as possible and only certain encryption standards
may be supported. Only enthusiasts and high-security organizations are feasibly seen to fall into
this category.
The difference made by this project would be the innovation of an all-in-one solution for
securely backing up data to servers that may be considered “insecure”, such as Google Drive,
Dropbox, et cetera. (It is known that these services operate under certain security standards, but
the data stored is inherently not secure in the sense that what is stored is unencrypted.) As a
Delgado et al. 9
result of this success, a secure backup solution can be made available to the general population,
In the last decade, there has been a growing concern with small and large businesses alike
regarding the increasing risk involved with a company’s data. With every new security measure
and backup solution, there are new threats to overcome. One of the most prevalent threats to gain
traction in the last 5 years has been ransomware. In a recent article in the Journal of
Cybersecurity, “Unfortunately, there is less evidence of individuals and organizations taking the
necessary measures (particularly, regular backups) to mitigate and possibly deter the damage
from attack. This means that ransomware is likely to remain a serious threat for many years to
come” (Cartwright). While the main source of profit for ransomware criminals is with large
organizations, small businesses and people who are self-employed can still be targeted by these
attacks. The best course of defense against these attacks is to have a reliable back-up strategy.
This requires multiple back-up locations and, in the case one of those locations are compromised,
a way to encrypt that data so it cannot be used by anyone who is not authorized to access it.
FEASIBILITY DISCUSSION
Before beginning development, our team sought to find any software that performs a
similar function to our proposal with a similar amount of overhead and support cost. We
predictably found a number of cloud storage solutions, the search was limited to what are known
as Client Side Encryption (CSE) storage solutions, or solutions that first encrypt a user’s data
Delgado et al. 10
before uploading to an offsite, 3rd-party storage location. Most importantly, because our project
Non-CSE storage environments have seen a massive rise in popularity in the last decade.
These are solutions like DropBox, Google Drive, OneDrive, and iCloud. However, even the most
popular cloud storage solutions offer no “guarantees regarding the confidentiality and integrity of
the data stored” using their servers. (Henzinger) It is safe to say that the most common services
that offer cloud storage will be wide-open for most of their users, with data only being protected
by a pool of user’s password. But these services are successful for a reason, most are easy to use,
offer a suite of functions, and guarantee an extremely high rate of availability (Microsoft boasts
>99.6% uptime in the last 3 years), so users should still be able to use these services without
There are a handful of smaller products that do offer CSE, the most popular products we
found were SpiderOak, Tresorit, and MegaSync. There have been a few issues found with these
products so far. The most impactful of these is synchronization. Compared to the non-CSE cloud
giants is the fact that CSE products have a difficult time syncing updated files between multiple
users effectively, a major selling point for products like Microsoft’s OneDrive. In a study done
by Linkoping University in Sweden, they found that the process needed to synchronize files on
cloud storage servers, delta encoding, usually hurts the performance of CSE products compared
to the non-CSE counterparts. CSE services “typically have significantly higher resource usage on
the client” and SpiderOak, in particular, “comes with a higher storage footprint on the client and
on the servers, has higher bandwidth overhead for both uploaders and downloaders, and
implements less effective delta encoding than Dropbox and iCloud”. (Henziger)
Delgado et al. 11
Furthermore, once this data is encrypted and stored, it is a black box. It is impossible to
extract, update, or search for any part of the data without first: restoring the entire encrypted
block of data, finding the desired file(s), making updates, and performing the encryption and
upload process again. Also, since these services do not have any record of a user’s unencrypted
data, if a secret key is lost, effectively, so is the data. (Zhang) This likely seems like too much of
a risk for many businesses. With all of these factors combined, it may show why unprotected
In terms of feasibility, our solution addresses one of the issues described above for CSEs.
Our product does now attempt to synchronize or implement any delta encoding into a user’s data.
This eliminates the extra performance cost associated with CSE. Since we are uploading new
data that will not be synchronized locally, the remote server does not need to spend resources
keeping track of local changes. This means our solution would be able to scale with a larger
non-CSE since the two processes of encryption and storage are kept seperate.
Our solution still does, however, pose the same potential issue regarding a lost password.
While our program does not delete or alter any original user data, there is still a possibility for
user’s to lock themselves out of their protected data by losing the password for a specific backup
job. The scenario would be a user that uses our software to encrypt their local data, uploads that
data, deletes the original data, and then loses the password of that backup job before restoring the
data. Additional development time after this project could provide a solution to this issue, but it
FUNCTIONAL DESIGN
Figure 3 describing how data flows through our program. On the left is the backup
The design process started with a few key decisions which allowed the rest of the design
to follow logically afterwards based on what we could implement within the timeframe. A key
Delgado et al. 13
concern was obviously security. And the most pressing security issue would be the secret key
used to encrypt and decrypt files. Since we did not want to store this as a string at all, encrypted
or otherwise, our team decided on using a manifest or record system. If a user provides a key that
is able to decrypt the manifest file, then they have the correct key to decrypt the rest of the data.
Without a record system, we would also be limited to ‘all-or-nothing’ backups and restores.
Aligning the chunks and file information allows us to perform partial restores from a backup job.
After security, we did not want to write any data to the user’s local storage, so all chunking and
encryption had to be done in memory using buffers. This makes each chunk stored in the cloud
FINAL DELIVERABLES
The final deliverables for this product include a functioning backup program. The
program was intended to be able to first and foremost act on user input. The actions performed
by our tool are correct and benign in nature, the user should not unintentionally cause any data
loss either locally or remotely. Any other requirements needed to perform the requested actions
are automatically evaluated and executed by the software. The product is able to scan the file
system for the requested data, abstract and encrypt it to ensure that the intended security
measures are taken. The product is able to connect to a cloud service and upload the encrypted
data to it as well as retrieving data from the cloud service and allowing the user to view it,
Beyond core elements, the product itself has a graphical user interface. This interface will
allow the user to configure various options within the program, start tasks and view any
In order to complete this project efficiently and on time, we have decided to work most
closely to the Agile methodology. Since this wasn’t an existing project, speed of initial
development was the largest priority, thus we decided to proceed with Agile over something like
PROCESS APPROACH
In order for our team to accomplish this project on time, we initially specified some
process guidelines to ensure communication, code contributions and features were streamlined.
● We will have a weekly team meeting in which we will discuss outstanding issues
and reevaluate if the existing priorities are still accurate. We will use Slack as
well as pull requests and GitHub issue comments to communicate with each other
● We will leverage GitHub Issues in order to manage tasks and bugs. GitHub Issues
allows us to have a Kanban board similar to Pivot Tracker, while being able to
strongly integrate individual commits and pull requests with these tasks.
Delgado et al. 15
● Immediately prior to starting this project, we will break out all related work into
individual issues and move work that is ready to start to the repos GitHub Kanban
board. For each ticket, we will identify and outline the expected approach.
● Prior to starting the work, when breaking out tasks, we will plan out an MVP “
Minimum Viable Product” where we will identify which features are absolutely
required versus those which are “nice to haves”. When getting closer to the
completion of this project, if we have additional time, we may opt to add some of
these “nice to have” features, however, we expect to complete all of the essential
features.
TECHNOLOGY APPROACH
In terms of the technology side of this project, we’ve considered a few different
programming languages and technology stacks. The technology that allowed us to most rapidly
implement this product was NodeJS, which already offered a lot of features we needed, such as
file system streaming, encryption and hashing. Node also allows access to the hugely popular
NPM (Node Package Manager) ecosystem in case there were any additional features we were
lacking. Additionally, we were able to leverage ElectronJS in order to create a native desktop
application that is well styled that can directly interface with the NodeJS runtime. Other
languages had alternatives to this approach, however, it would have taken considerably more
time to get a well styled product using CSS versus something like Visual Basic or a Java GUI.
JavaScript isn’t the most performant language to handle processing like encryption or hashing,
however, leveraging native C++ modules as well as multiple threads can make up for that gap if
Delgado et al. 16
efficiency becomes an issue. With very little effort, we were able to generate an installer (which
will be unsigned for the purpose of this class) as well to distribute this application. Due to these
technology choices, we had the ability to develop a polished and feature-full application with a
fraction of the development time. For the primary cloud provider, we chose AWS’s S3 offering.
However, moving forward, we will now be able to abstract the project in such a way where we
PROGRAMMING APPROACH
In terms of the approach for programming, we tried to follow best practices as much as
possible to create a sustainable product, ripe for future development. Considered the following:
● Utilize unit testing and documentation, as much as possible within the deadline.
● Leverage Electron’s IPC channels to separate the core logic on the NodeJS side
● Abstract each provider into a seperate class, allowing us to easily support multiple
● Create handlers in which this application can still most likely run on a machine
LEGAL CONSIDERATIONS
The main liability that we may be open to when distributing this backup program is data
loss. Since this project directly interacts with sensitive files on the host’s file system, we need to
Delgado et al. 17
take extra precautions legally to prevent litigation. Data loss with our system can happen through
either a bug in which we traverse the local file system and delete files (perhaps if we were
attempting to delete files on the remote server instead) or if a file that is being uploaded becomes
corrupt but still passes validation. It’s possible that we could be liable for either one of those
issues, dependent upon our guarantees to the end-consumer. Upon distribution, we would need to
include a software license agreement with this project that explicitly states we do not guarantee
data consistency as well as an agreement that we hold no liability for issues, essentially the
software is provided as-is. Software user agreements are very common and help protect software
developers from litigation when there are bugs that may lead to unforeseen consequences.
Since this is a program that directly interacts with users personal files, we may want to
consider distributing the code under some kind of an open source license upon release. An open
source license would allow others to freely analyze and contribute to our projects codebase,
allowing us to both gain credibility and trust as well as gain additional code contributions from
outside the core programming group. A good license for this type of distribution would be the
MIT license, which is very popular among open source software. However, if we would like to
create a company out of this, we would most likely not release the code at all, or release it
without a license, so that the implicit “all rights reserved” set by copyright laws would be in
effect.
ETHICAL CONSIDERATIONS
The ethical considerations of this project largely revolve around the assumption made
regarding the design, deployment, and user behavior after the product is released. Namely, the
Delgado et al. 18
assumptions our team have made for: how the product will be used, by whom w
ill the product be
This product will act on a set of data decided by a user and sent to a secondary location
that is also decided by the user and protected with the user’s private credentials. That data will be
encrypted by a chosen password and can only be unencrypted with that exact password. In this
example, the user has given over trust that the following are or are not occurring: the credentials
to the online storage location, the chosen encryption password, and the encrypted data are not
being saved, shared, or otherwise compromised. And, most importantly, that the data will be
restored exactly as it was before encryption. In the best case, where there is no intentional or
accidentally immoral use of data or credentials from the development end, our product is
handling a user’s private credentials to encrypt private data and “locking” it unless that same
password is given.
In the situation above, we are also assuming the “user” is synonymous with the “owner”
of the data or storage location. This is not true in all cases. Other situations to consider can
include when the data is being handled by a trusted second party, like an I.T. Managed Service
Provider. Furthermore, there are two potential ends for misuse by untrusted sources. If the
primary data is stolen, altered, or has malware unknowingly included in the user’s directory, the
tool will not know the difference. This can potentially infect the secondary location or give bad
actors access to the files stored there. Secondly, if saved settings or user profiles are
implemented in this program and those are unknowingly changed by a malicious user, good data
can be uploaded to an unknown secondary location that only the malicious user has access to.
The data is now effectively stolen and the intended user will not know unless they check their
Delgado et al. 19
intended secondary location for new files. We will need to consider the misuse of our application
and determine ways to mitigate malicious behavior that put user’s privacy and security first.
Assuming the product is working as intended and is not being misused after deployment,
there remains the consideration of accessibility. In design and development, our team will make
certain assumptions regarding how users will be able to access the executable, install and
configure the program, and ultimately, if they will be able to operate it effectively. Our team will
assume that intended users will meet a certain minimum specification of operating system,
hardware age, processing power, data transfer speed, and networking capabilities. Essentially, we
are assuming that a user will have a “newer” computer that is internet-capable, and that the user
has a readily available and reliable internet connection. This is certainly not the case in many
personal and small business environments. Similarly, since we utilized the “electron” node.js
library to implement a GUI, we must consider the visual accessibility of our program. Luckily
there are also tools included in and compatible with Electron that help build more accessible
applications. These tools also include the ability to allow the user’s native OS accessibility
options and assistive technologies interact with Electron apps. Not all users or user environments
will be able to interact with our product in the same way, our team will need to make the
TIMELINE
Our team is pleased to report that we were able to meet all milestones laid out in our
proposal by the deadline of the project. Our production timeline roughly matches our proposal
timeline, as well. There were items in the later half of the project that took longer than expected,
Delgado et al. 20
however, that time was made up for by items in the first half of production taking less time than
expected. Most features of our program were reliant on other features working properly, meaning
we had to develop the program in series. The UI and CLI were partially developed in parallel
with other components. Figure 4 shows the approximate timeline of our development.
As stated in our proposal, we were not expecting much in terms of resources needed to complete
the project. Besides the S3 bucket to test with, all resources used were free or under open-license
use. So, there was never a concern with exceeding any budget.
Since our project was not sponsored by a specific client, our usability testing was done
with a focus group of coworkers. Each member of our team chose 2 coworkers to test the
application, while assisting them if they had any questions or comments. Our process was to
introduce the application, describe what the steps they would take during the test, and what the
Delgado et al. 21
end result of the test steps would be. We then had a checklist of steps that we walked each user
through with the goal of efficiently showing as much functionality during a single test run as
would already have, we decided to have our users test in a pre-configured environment. This
meant to allow them access to a workstation with clearly labeled test directories and files, while
For each test run, the users would be directed to perform the following tasks in order:
a. Backup one full directory, use their own password and backup name. Afterwards,
b. Backup multiple directories while excluding files. Use a different backup name
and password.
c. Attempt to restore a backup without providing a path, get a list of all possible
Users were encouraged to ask questions or suggest improvements they would like to see in future
updates as they were testing. Once the test was finished, users were asked questions from the
System Usability Scale (shown in Appendix A) to get a rough estimate of how easy they felt the
program was to use. We changed the responses slightly to just have users answer each question
The user feedback we received was consistently positive, with most concerns and
not have our UI finalized, so most users were testing using the command line interface. Some
Delgado et al. 22
found the CLI cumbersome since they needed to copy and paste our given credentials or long list
of files to restore. As stated before, most users were also new to S3 and using our testing bucket.
So, with the S3 account password, S3 bucket secret key, and personal backup password they
were creating, there were a total of 3 passwords to use for each test step, all of which were brand
There are many improvements our team can make to future tests. A huge improvement
that is already implemented is the GUI. Testing with the GUI will make much of the user testing
self-explanatory. At every step of the application’s workflow, all potential options will be laid
out for users. The most user-friendly advantage of the GUI is the addition of a visual File
Explorer. This allows directories and files to be chosen and visually represented instead of
simply showing the user a list of filenames. We suspect that our usability scored would already
Secondly, future testing would be improved by running the application in the user's
personal environments. There are a few tasks to be completed before making this change.
Currently, there are a number of dependencies that need to be installed and configured to run the
application. This would be far too cumbersome to ask users to install on their own machines.
Instead, our application would need to be packaged by yarn or into an executable file that could
be quickly downloaded, run, and uninstalled by users after testing is finished. Once that is
completed, users that are already familiar with S3 would be able to easily test with their own S3
bucket.
The biggest improvement to general user testing would be to implement a more popular
consumer cloud storage. All of our users in this round of testing frequently used either Google
Delgado et al. 23
Drive or Microsoft OneDrive and already had readily available credentials to test with. Adding
functionality to either of these services would better illustrate what our application is
accomplishing. In the scenario where a user has backed up a file to their OneDrive, they could
then verify in their preferred OneDrive interface whether or not a backup was created. They
could also see the characteristics of the chunk files our application uploaded, proving that their
data has been abstracted to something that cannot be easily viewed by others.
Once those features are implemented, it would be much simpler to have more open-ended
user testing. Having our application in the hands of users would provide real-world data and use
cases. We would gain a better idea of how this program will actually be used and what problems
will need to be addressed before releasing to a wider audience. If users are able to reliably test
the limits of the software beyond just learning to use it, we can then focus on testing to address
FINAL IMPLEMENTATION
The core logic of our application is written in typescript using node. The different
functionalities are split into several modules that individually handle the encryption/decryption,
file system scanning, compression and chunking, and data transfer to and from cloud storage.
Those back-end tools are used by the different front-end interfaces. There are currently two ways
to interface with our application: using command line prompts from the system terminal, or a
Our back-end tools and core application runs are converted from typescript to run as a
local node server which communicates to a front-end supported by Electron and ReactJS. Data
Delgado et al. 24
from our application is fed to Electron, which then gets the rendering information, visual logic,
and events it needs from pages built using ReactJS. Electron creates a native application window
Our team did not need to make many adjustments to our proposed implementation. Little
changed in how the back-end modules function and work together. Most changes that needed to
be made involved the rendering of our UI. Initially, the plan was to rely only on Electron’s
HTML/CSS rendering and inject JavaScript code to handle events. However, this approach
would mean our application was virtually “stateless”. This meant no information given by the
user could be stored in the current session of the application and used later. Every time that the
application required, for example, a password to a cloud provider, it would need to re-prompt the
user. We decided adding the ability to save information given by the user in a single run of the
application would greatly improve the application’s usability not only in this release, but for
potential future updates. Our solution to this was to find an Electron framework that included
ReactJS. Using React, we could have variables that carried values across different pages, as
opposed to using solely Electron where values would be erased with each new page that was
rendered.
Starting with the Terminal option, the user can run the cli program that will also take a command
line argument. Using the “commander” npm package, we can process those command line
arguments to run different functions of the application. These functions are found in our
application controller. The controller then reaches out to the node utilities in order to fulfil
For the GUI-based application, Electron will first open a native window that is awaiting
styling to render. Electron looks for a starting route from ReactJS, which will provide the Home
page. From here, there will be different options for the user to choose, each of which will trigger
JavaScript events. Depending on the events triggered, Electron will run different functions
Delgado et al. 26
provided by the same back-end application controller and supported by the same node utilities.
The data given by the user is also transferred and processed by the back-end components. Once
processed, it is fed back to either the CLI or Electron/React front-end. Any information that can
be saved for convenience is kept until the end of the session. Sensitive information, like a
Moving on to the back-end modules, this is where the bulk of our application’s work is
done. Our main application controller is the center of the application, it is detailed, along with
the other modules in Appendix B. The following is a description of the utility and provider
modules that assist the application controller with our backup and restoration process.
The File System module consists of two classes, FileInfo and FileScanner. FileInfo
contains the metadata for individual files that the FileScanner has processed. The FileScanner
class holds the relevant data needed to process directories and exclusions given by the user, as
well as the functionality to sort through those directories and keep track of the files that are
waiting to be processed. Most of the work in the File System module is done within the
enumerateFiles function. This function takes in a root directory path and will sort files by size
and insert the FileInfo for each file into the sortedFiles property.
The Chunk module contains the Chunk and ChunkRW classes. Each Chunk will
eventually be used to contain a predetermined number of bytes of data. Chunks will be processed
as Buffers in memory until they will eventually be written to files in cloud storage. Depending
on the chunk size and individual file size, Chunks can contain multiple files until the chosen
chunk size is reached. Each file in a chunk is assigned a file id, and each chunk is also assigned
it’s own id. When data is passed into a Chunk, it is compressed by the AdmZip package.
Delgado et al. 27
ChunkRW is the chunk handler. It holds the predetermined chunk size, generates the array of
chunks when given a collection of file metadata, and reads or opens a chunk when given a
Buffer.
The cryptography for our application is handled by the Crypto module. It contains the
CryptoED and SensitiveString classes. The SensitiveString class is how our application handles
passwords. Instead of comparing strings, we are comparing bit arrays as a way of further
abstracting sensitive data. Passwords are obtained using getValue and destroyed using scramble.
The encryption and decryption are handled by CryptoED. For either process, it reads in a Buffer,
Manifests are read and written by the ManifestRW class, both of which are contained in
the Manifest module. Records stored within manifest data are represented by the ManifestInfo
interface. Like the other modules, ManifestRW handles all manifest data in memory until it is
stored in the cloud. Importing manifest data as a Buffer creates a list of records that can then be
used for restoring files, while exporting turns the current list of file records into a Buffer that can
Finally, communication with our cloud provider is handled by the S3 module. This
module is an example of how our application would interact with just one cloud provider. In this
case, we chose Amazon’s S3 cloud storage. This module uses AWS’ API to allow write to or
read from a desired S3 bucket. Buckets only contain Objects, which are referenced by name and
key, respectively. The write function will take a string and Buffer in order to write the Buffer to
an Object, while the string will be the associated key. The read function only requires a key for a
Delgado et al. 28
desired Object, which it will return. We can obtain the Buffer data through the Body property of
that object.
These modules all work in tandem to backup or restore a local directory given by the user
by processes defined by the application controller. The controller imports most of the other node
modules since it will need to use almost every Object we define in our back-end tools. From
outside of our project, it imports the ‘bytes’ package during the backup process to parse our
chunk size. The list of exports are the primary functions that our controller makes available to
When the user requests to backup one or more directories, runBackup is called in the
application controller. For this function we take in data we need to obtain from the user. This
includes in order of the UML in Appendix B: the relative paths of the directories we should
backup, a path or extension for files we should exclude from the backup, the desired size of each
chunk file, our backup job password, the level of encryption that should be used, and the cloud
provider credentials. The controller then encapsulates this information within the other modules,
Similarly, when the user requests to restore files from their cloud storage provider, the
application controller will attempt to run the restore procedure. The following information is
required from the user to perform a restore: a directory to restore files to, files to restore, the
YABP password that was used to backup the original files, the encryption level used in the
original backup, and the cloud storage credentials. Both the backup and the restore procedure
will not require any input from the user once the proper information is given. Once these
functions are attempted and the information is validated, the backup or restore will occur, writing
Delgado et al. 29
the new files locally or in the cloud. Feedback is given by the program whether the process was
successful.
Finally, the controller can fulfil a request to view the current contents of a storage
location. All that is required for this function is the local and cloud credentials associated with a
Returning to the backup and restoration process, the controller is where those processes
are defined. The backup and restoration processes were designed to inversely mirror each other.
The process to backup should essentially be the same steps for restoration, but in reverse. Also, if
any step of either process fails, the process will not run to ensure that no files are partially
affected or modified. Failures can occur if incorrect credentials are provided, invalid file paths
are provided, or the incorrect encryption information is provided (this includes the YABP
The controller begins the backup procedure by attempting to create a handler for each
module. This step allows us to verify some of the information given by the user including
making a connection to the cloud provider using the provided credentials and verifying that the
file paths provided lead to valid files and directories located on the local machine. A connection
to S3 is made using the provided credentials which is stored in an S3 object that can be used
throughout this process. The given file paths are used by the File Scanner module to obtain
metadata regarding the files scanned and records which files we are excluding from these
directories. From here, we can use this file information to stream these files into distinct Buffer
Streams or chunks. This is done by our ChunkRW objects from the Chunk module. The size of
each individual abstracted chunk file was obtained by the user earlier and parsed by the ‘bytes’
Delgado et al. 30
package. During this chunking process, the data is also compressed with the ‘AdmZip’ package
Next, we will build a manifest file using information obtained from the File Scanner and
Chunk module. This manifest will contain the information needed to correctly restore the files
we are backing up. The manifest is then encrypted using the Crypto module. Once encryption is
finished, it is the first file written to S3. The manifest is encrypted and written to S3 separately,
since we would not be able to find it within the abstracted files. At this point, we have verified
we have all the information and data we need to properly backup our files. The controller then
iterates through our in-memory chunks. Each chunk is then encrypted using the encryption level
and encryption password provided. That encrypted data is saved to a new Buffer, which is then
immediately written to a file in cloud storage. Once the controller has iterated through all
chunks, the backup process has been completed and the user is notified.
As stated above, the intention was for the restoration process to be an inverse of the
backup process. Looking at the runRestore function in our controller, we begin by again making
a connection to our cloud provider. If that is successful, we download the manifest file located in
cloud storage. The file was statically named in the backup procedure, so we plan to find it by
filename. Once downloaded, we parse the manifest data as a Buffer and unencrypt it using the
provided backup password and encryption level. If either the password or encryption level is
incorrect, the data simply will not be decrypted, and an error will be thrown at this step. This
manifest data is then imported into a new ManifestRW, which is then able to parse the data into
separate records. These records allow us to determine which file is in each of the chunks still
stored with our cloud provider. By iterating through the records in our manifest and matching the
Delgado et al. 31
filenames with our user’s desired restoration files, we can determine exactly which chunks to
restore by both file id and chunk id. The controller iterates through each chunk. If that chunk is
successfully read, it is then decrypted. The chunk handler then opens the chunk in order to iterate
through each individual file it contains. Each file is then extracted from that chunk. In the
extraction process, the file is uncompressed and written to the restore directory provided.
The restore procedure is identical to the procedure needed to view the contents of an S3
bucket, up to a certain step. To view the contents of a bucket, the controller needs to make an S3
connection and download the manifest file associated with the given information. If the manifest
file is correctly decrypted, the controller simply needs to iterate through those records to provide
the contents stored in that S3 bucket that are associated with a specific backup job.
CONCLUSION
At the start of this project, our team saw a problem regarding large amounts of
unencrypted, personal data stored on remote servers protected only by a username and password
combination. We then attempted to address this issue by delivering a simple-to-use, locally run
application that would allow users the option to easily add on an additional layer of security to a
cloud storage solution that was already in use. We hope that what we have completed in this
project’s span provides a proof of concept that this tool can be applied to additional storage
solutions.
In this report, our team has described the details of our proposed solution, including what
our project would need to be considered a success both as a replacement for current workarounds
to the observed problem, but also as a foundation for a tool that could be improved upon in the
Delgado et al. 32
future. We discussed the need for an easy-to-use product like our proposed solution had a viable
audience as the access to cloud storage has grown without a comparable investment in protecting
the data that is stored there. This was followed with the ethical, legal, and logistical concerns
Next, we included an update once our project was completed. We included the timeline
of completion, along with the feedback from users and the final implementation of the
application as it is functioning at the time of this report. We are pleased to report we succeeded
in our proposed goals and have a fully functioning application that meets all our team’s
requirements. Our team was able to implement a design that uses back-end JavaScript libraries to
compress, abstract, encrypt and upload a desired back-up job to a cloud storage provider. The
design can also perform the inverse of those steps to provide a reliable restoration of the
backed-up files.
Ultimately, we hope to have shown that given this approach and collaboration style a
team of a larger size and budget could expand this prototype into a tool suitable for use by a
widespread audience.
Delgado et al. 33
REFERENCES
Cartwright, E., Hernandez Castro, J., & Cartwright, A. (2019). To pay or not: game theoretic
Henziger, E., & Carlsson, N. (2019). Delta Encoding Overhead Analysis of Cloud Storage
https://docs.microsoft.com/en-us/office365/servicedescriptions/office-365-platform-service-descr
iption/service-health-and-continuity
Zhang, X., Tang, Y., Wang, H., Xu, C., Miao, Y., & Cheng, H. (2019). Lattice-based
proxy-oriented identity-based encryption with keyword search for cloud storage. Information
APPENDIX A
The System Usability Scale, is a short standardized survey to quickly and efficiently have test
users gauge the usability of your product. It is typically presented as shown here:
From: https://measuringu.com/sus/
4. I think that I would need the support of a technical person to be able to use this system.
7. I would imagine that most people would learn to use this system very quickly.
10. I needed to learn a lot of things before I could get going with this system.
Delgado et al. 35
APPENDIX B
Delgado et al. 36
Delgado et al. 37