0% found this document useful (0 votes)
266 views

Capstone Report - Final Submission

Uploaded by

api-427609820
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
266 views

Capstone Report - Final Submission

Uploaded by

api-427609820
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

California State University, Monterey Bay

Capstone Project Report

Project Yet Another Backup Application

Jonathan Delgado

Tayler Mauk

Bodey Provansal

CST 499 Computer Science Capstone

Professor Brian Robertson

Dr. Eric Tao

16 June. 2020
Delgado et al. 2

E​XECUTIVE​ S​UMMARY

Our team set out to develop a prototype product that will be able to encrypt and upload a

user-defined set of files to a secondary storage location. This encryption will be done using a

secret key provided by the user. After files are encrypted, they will either be transferred to a

network location or uploaded to a cloud service. Uploading will be done using either network or

cloud credentials provided by the user. The goal of this product was to provide a simple-to-use

application that allows users to further protect their important data. This application should be

simple and reliable enough to be used at many different technical skills levels. From individual

users protecting their personal data to power users like Information Technology professionals

that work for small-to-medium sized organizations. We expect by the end of this project, to have

produced an application that can be installed and operated by our test group and that the features

we set out to implement all function as designed.


Delgado et al. 3

Table of Contents

Introduction 4
Project Goals and Objectives 5
Stakeholders and Community 7
Feasibility Discussion 9
Functional Design 12
Approach and Methodology 14
Legal and Ethical Considerations 16
Timeline and Budget 19
Usability Testing and Feedback 20
Final Implementation 23
Conclusion 31
References 33
Appendices 34
Delgado et al. 4

I​NTRODUCTION

The goal of this project was to develop a locally run application that would allow users to

encrypt files in a directory on their workstation or server before uploading that data to an online

cloud storage or network location. The intent was for this application to be quick,

straightforward, and simple to use, so that the target audience will not depend on technical skill

level. Our group planned to focus test this application with both novice and “power” users

encrypting and backing up redundant copies of business-scale amounts of data.

O​BSERVED​ P​ROBLEM

With the wide-spread availability of hard-drives, Network Attached Storage (NAS)

devices, and cloud storage solutions like Amazon S3, it has never been easier to find places to

store data. Unfortunately, for many small companies or individuals, the data stored in these

locations is often not protected in ways that reflect its importance. Critical data stored in a shared

Google Drive or Dropbox is only as safe as each user who has access to it. A laptop left behind

on the bus with an easy to guess login is all it can take for unprotected data to be stolen,

intentionally encrypted and held for ransom, or lost for good. However, this is typically not done

out of negligence, as backup appliances and services that properly encrypt this data often come

with a hefty price tag or require a dedicated IT professional to manage.

P​ROPOSED​ S​OLUTION

Our solution to this problem was to provide a simple, “one-click” method of encrypting a user’s

directory and uploading that secured data to a secondary back-up location. Our team proposed a

project to develop an application that could be run on a user’s computer that will allow them to
Delgado et al. 5

encrypt files of a predetermined size. Afterwards, these files can then be uploaded to online

cloud storage locations like Google Drive, Microsoft OneDrive, or Amazon Web Services.

Back-ups are useless without a reliable way to restore data, so our application would also allow

users to unencrypt the files after downloading from an online storage location.

Our proposed approach involved adopting an Agile development strategy, as we wanted

to accomplish as much as possible in the time frame allotted, while producing a product that

could be iterated on after the initial project was completed. For technology used, we decided on

leveraging the widely used and versatile NodeJS, since it offered many core features that we

could quickly utilize. Initially, we decided on first only implementing a command line interface

for the program, but quickly found options to support a GUI through Electron and React

javascript libraries. We decided adding a simple and node-friendly GUI would greatly increase

the potential pool of users that would be able to test and eventually the software.

P​ROJECT​ G​OALS​ & O​BJECTIVES

This was a potentially very open-ended project with many possible objectives. So our

team decided to first focus on a few core objectives listed in table Figure 1. Note that the

inclusion or exact implementation of nonessential goals will be evaluated based on time

restrictions.

Goal Details

Create an abstracted, encrypted backup ● This goal refers to the core of the
project itself
system
Delgado et al. 6

Create profiles to store settings for various ● Users define jobs containing data
locations and destinations and other
tasks settings as needed by the software
● Configurations are saved in profiles
which are evaluated by the program
either on a schedule or on demand

Create functionality to suggest more secure ● Auto-generated keys or passphrases


● Suggesting chunk size that will better
settings abstract the size and/or count of files
within

Fig. 1: Goals for the project

The objectives surrounding this project were largely focused on security without compromising

ease of use. The table in Figure 2 lists a few core objectives needed in order to accomplish this.

Objective Steps to Achieve

Users can encrypt/decrypt files ● Access to local filesystem


● Acquire user-defined passphrase or
certificate used for encryption and
decryption
● Implement at least one current, secure
algorithm such as RSA or AES
encryption
Delgado et al. 7

Users can upload to and download from ● Access to local filesystem


● Access to remote filesystem
remote servers ● Data to be backed up is divided into
chunks and encrypted
● Generate checksums or other method
to ensure successful transfer of data
● Obtain link between local system to
pull data and remote system to push
data
● Remote servers owned by the end user
will need an agent installed to allow
remote management

Users can set a file (chunk) size to meet needs ● Access to local filesystem
● Store file data into chunks such that
or requirement the data is less than the defined chunk
maximum size and does not include
partially processed files (subject to
change)

Users can interact with the program using a ● Evaluate frequently accessed and
related settings
graphical user interface ● Create graphical user interface using
HTML, JavaScript, CSS on the
Electron platform

Fig. 2: Objectives and the steps to achieve them

The goals and objectives listed in this section were not meant to be exhaustive, but instead relay

the general agreement that these features are viewed as essential or at least more so than others.

S​TAKEHOLDERS​ & C​OMMUNITY

The stakeholders of this project are largely the developers, as no other parties were

involved in development, financially or otherwise. However, the community is a much larger


Delgado et al. 8

pool of individuals who may benefit from the success of such a project. This community consists

of Information Technology professionals, enthusiasts and hobbyists alike. The grounds for

classifying such an expansive set of people was qualified under the assumption that no individual

would prefer to lose data over having it abstracted and backed up.

Advantages for the stakeholders and community resulting from the development of this

project including, but not limited to: accessible, inexpensive backup solution; secured data

through means of both abstraction and encryption; and on-premises functionality. Large

businesses may not have had much to gain from a project of this scale, as they likely have either

in-house or currently implemented backup solutions. Conversely, small and medium businesses

without the financial or technical means will have an opportunity to securely back up data. The

same assumption held true for personal use in regards to financial accessibility and ease of

implementation.

The only notable predicted loss from investing into this product would have been the loss

of absolute control over encryption and organized storage within the backup. The reason behind

this was that under our initial design, the software would attempt to organize files in such a way

that data chunk sizes are as close to one another as possible and only certain encryption standards

may be supported. Only enthusiasts and high-security organizations are feasibly seen to fall into

this category.

The difference made by this project would be the innovation of an all-in-one solution for

securely backing up data to servers that may be considered “insecure”, such as Google Drive,

Dropbox, et cetera. (It is known that these services operate under certain security standards, but

the data stored is inherently not secure in the sense that what is stored is unencrypted.) As a
Delgado et al. 9

result of this success, a secure backup solution can be made available to the general population,

or at least those that are concerned about data security.

E​VIDENCE​ T​HAT​ P​ROJECT​ I​S​ N​EEDED

In the last decade, there has been a growing concern with small and large businesses alike

regarding the increasing risk involved with a company’s data. With every new security measure

and backup solution, there are new threats to overcome. One of the most prevalent threats to gain

traction in the last 5 years has been ransomware. In a recent article in the Journal of

Cybersecurity, “Unfortunately, there is less evidence of individuals and organizations taking the

necessary measures (particularly, regular backups) to mitigate and possibly deter the damage

from attack. This means that ransomware is likely to remain a serious threat for many years to

come” (Cartwright). While the main source of profit for ransomware criminals is with large

organizations, small businesses and people who are self-employed can still be targeted by these

attacks. The best course of defense against these attacks is to have a reliable back-up strategy.

This requires multiple back-up locations and, in the case one of those locations are compromised,

a way to encrypt that data so it cannot be used by anyone who is not authorized to access it.

F​EASIBILITY​ D​ISCUSSION

Before beginning development, our team sought to find any software that performs a

similar function to our proposal with a similar amount of overhead and support cost. We

predictably found a number of cloud storage solutions, the search was limited to what are known

as Client Side Encryption (CSE) storage solutions, or solutions that first encrypt a user’s data
Delgado et al. 10

before uploading to an offsite, 3rd-party storage location. Most importantly, because our project

would be provided as an extension to non-CSE cloud solutions.

Non-CSE storage environments have seen a massive rise in popularity in the last decade.

These are solutions like DropBox, Google Drive, OneDrive, and iCloud. However, even the most

popular cloud storage solutions offer no “guarantees regarding the confidentiality and integrity of

the data stored” using their servers. (Henzinger) It is safe to say that the most common services

that offer cloud storage will be wide-open for most of their users, with data only being protected

by a pool of user’s password. But these services are successful for a reason, most are easy to use,

offer a suite of functions, and guarantee an extremely high rate of availability (Microsoft boasts

>99.6% uptime in the last 3 years), so users should still be able to use these services without

sacrificing client-side security.

There are a handful of smaller products that do offer CSE, the most popular products we

found were SpiderOak, Tresorit, and MegaSync. There have been a few issues found with these

products so far. The most impactful of these is synchronization. Compared to the non-CSE cloud

giants is the fact that CSE products have a difficult time syncing updated files between multiple

users effectively, a major selling point for products like Microsoft’s OneDrive. In a study done

by Linkoping University in Sweden, they found that the process needed to synchronize files on

cloud storage servers, delta encoding, usually hurts the performance of CSE products compared

to the non-CSE counterparts. CSE services “typically have significantly higher resource usage on

the client” and SpiderOak, in particular, “comes with a higher storage footprint on the client and

on the servers, has higher bandwidth overhead for both uploaders and downloaders, and

implements less effective delta encoding than Dropbox and iCloud”. (Henziger)
Delgado et al. 11

Furthermore, once this data is encrypted and stored, it is a black box. It is impossible to

extract, update, or search for any part of the data without first: restoring the entire encrypted

block of data, finding the desired file(s), making updates, and performing the encryption and

upload process again. Also, since these services do not have any record of a user’s unencrypted

data, if a secret key is lost, effectively, so is the data. (Zhang) This likely seems like too much of

a risk for many businesses. With all of these factors combined, it may show why unprotected

cloud storage has become more popular in recent years.

In terms of feasibility, our solution addresses one of the issues described above for CSEs.

Our product does now attempt to synchronize or implement any delta encoding into a user’s data.

This eliminates the extra performance cost associated with CSE. Since we are uploading new

data that will not be synchronized locally, the remote server does not need to spend resources

keeping track of local changes. This means our solution would be able to scale with a larger

non-CSE since the two processes of encryption and storage are kept seperate.

Our solution still does, however, pose the same potential issue regarding a lost password.

While our program does not delete or alter any original user data, there is still a possibility for

user’s to lock themselves out of their protected data by losing the password for a specific backup

job. The scenario would be a user that uses our software to encrypt their local data, uploads that

data, deletes the original data, and then loses the password of that backup job before restoring the

data. Additional development time after this project could provide a solution to this issue, but it

would likely require a reworking of our initial design.


Delgado et al. 12

F​UNCTIONAL​ D​ESIGN
Figure 3 describing how data flows through our program. On the left is the backup

procedure and the right is our restore procedure.

Fig. 3: Backup and Restoration Design

S​ELECTION​ ​OF​ D​ESIGN​ C​RITERION

The design process started with a few key decisions which allowed the rest of the design

to follow logically afterwards based on what we could implement within the timeframe. A key
Delgado et al. 13

concern was obviously security. And the most pressing security issue would be the secret key

used to encrypt and decrypt files. Since we did not want to store this as a string at all, encrypted

or otherwise, our team decided on using a manifest or record system. If a user provides a key that

is able to decrypt the manifest file, then they have the correct key to decrypt the rest of the data.

Without a record system, we would also be limited to ‘all-or-nothing’ backups and restores.

Aligning the chunks and file information allows us to perform partial restores from a backup job.

After security, we did not want to write any data to the user’s local storage, so all chunking and

encryption had to be done in memory using buffers. This makes each chunk stored in the cloud

nondescript, as each chunk could contain a partial file, or dozens of files.

F​INAL​ D​ELIVERABLES

The final deliverables for this product include a functioning backup program. The

program was intended to be able to first and foremost act on user input. The actions performed

by our tool are correct and benign in nature, the user should not unintentionally cause any data

loss either locally or remotely. Any other requirements needed to perform the requested actions

are automatically evaluated and executed by the software. The product is able to scan the file

system for the requested data, abstract and encrypt it to ensure that the intended security

measures are taken. The product is able to connect to a cloud service and upload the encrypted

data to it as well as retrieving data from the cloud service and allowing the user to view it,

provided that user has produced the proper credentials.


Delgado et al. 14

Beyond core elements, the product itself has a graphical user interface. This interface will

allow the user to configure various options within the program, start tasks and view any

operations that are currently ongoing.

There are no other components outside of the main program.

A​PPROACH​ ​AND​ M​ETHODOLOGY

In order to complete this project efficiently and on time, we have decided to work most

closely to the Agile methodology. Since this wasn’t an existing project, speed of initial

development was the largest priority, thus we decided to proceed with Agile over something like

Scrum where there is more structure, yet more time requirements.

P​ROCESS​ A​PPROACH

In order for our team to accomplish this project on time, we initially specified some

process guidelines to ensure communication, code contributions and features were streamlined.

● We will have a weekly team meeting in which we will discuss outstanding issues

and reevaluate if the existing priorities are still accurate. We will use Slack as

well as pull requests and GitHub issue comments to communicate with each other

in between these meetings.

● We will leverage GitHub Issues in order to manage tasks and bugs. GitHub Issues

allows us to have a Kanban board similar to Pivot Tracker, while being able to

strongly integrate individual commits and pull requests with these tasks.
Delgado et al. 15

● Immediately prior to starting this project, we will break out all related work into

individual issues and move work that is ready to start to the repos GitHub Kanban

board. For each ticket, we will identify and outline the expected approach.

● Prior to starting the work, when breaking out tasks, we will plan out an MVP “

Minimum Viable Product” where we will identify which features are absolutely

required versus those which are “nice to haves”. When getting closer to the

completion of this project, if we have additional time, we may opt to add some of

these “nice to have” features, however, we expect to complete all of the essential

features.

T​ECHNOLOGY​ A​PPROACH

In terms of the technology side of this project, we’ve considered a few different

programming languages and technology stacks. The technology that allowed us to most rapidly

implement this product was NodeJS, which already offered a lot of features we needed, such as

file system streaming, encryption and hashing. Node also allows access to the hugely popular

NPM (Node Package Manager) ecosystem in case there were any additional features we were

lacking. Additionally, we were able to leverage ElectronJS in order to create a native desktop

application that is well styled that can directly interface with the NodeJS runtime. Other

languages had alternatives to this approach, however, it would have taken considerably more

time to get a well styled product using CSS versus something like Visual Basic or a Java GUI.

JavaScript isn’t the most performant language to handle processing like encryption or hashing,

however, leveraging native C++ modules as well as multiple threads can make up for that gap if
Delgado et al. 16

efficiency becomes an issue. With very little effort, we were able to generate an installer (which

will be unsigned for the purpose of this class) as well to distribute this application. Due to these

technology choices, we had the ability to develop a polished and feature-full application with a

fraction of the development time. For the primary cloud provider, we chose AWS’s S3 offering.

However, moving forward, we will now be able to abstract the project in such a way where we

should be capable of supporting most providers with future code additions.

P​ROGRAMMING​ A​PPROACH

In terms of the approach for programming, we tried to follow best practices as much as

possible to create a sustainable product, ripe for future development. Considered the following:

● Utilize unit testing and documentation, as much as possible within the deadline.

● Leverage Electron’s IPC channels to separate the core logic on the NodeJS side

and the view logic on the Electron side.

● Abstract each provider into a seperate class, allowing us to easily support multiple

cloud providers in the future.

● Create handlers in which this application can still most likely run on a machine

with limited RAM, CPU, or disk space.

● Use existing third party interface libraries such as Bootstrap.

L​EGAL​ C​ONSIDERATIONS

The main liability that we may be open to when distributing this backup program is data

loss. Since this project directly interacts with sensitive files on the host’s file system, we need to
Delgado et al. 17

take extra precautions legally to prevent litigation. Data loss with our system can happen through

either a bug in which we traverse the local file system and delete files (perhaps if we were

attempting to delete files on the remote server instead) or if a file that is being uploaded becomes

corrupt but still passes validation. It’s possible that we could be liable for either one of those

issues, dependent upon our guarantees to the end-consumer. Upon distribution, we would need to

include a software license agreement with this project that explicitly states we do not guarantee

data consistency as well as an agreement that we hold no liability for issues, essentially the

software is provided as-is. Software user agreements are very common and help protect software

developers from litigation when there are bugs that may lead to unforeseen consequences.

Since this is a program that directly interacts with users personal files, we may want to

consider distributing the code under some kind of an open source license upon release. An open

source license would allow others to freely analyze and contribute to our projects codebase,

allowing us to both gain credibility and trust as well as gain additional code contributions from

outside the core programming group. A good license for this type of distribution would be the

MIT license, which is very popular among open source software. However, if we would like to

create a company out of this, we would most likely not release the code at all, or release it

without a license, so that the implicit “all rights reserved” set by copyright laws would be in

effect.

E​THICAL​ C​ONSIDERATIONS

The ethical considerations of this project largely revolve around the assumption made

regarding the design, deployment, and user behavior after the product is released. Namely, the
Delgado et al. 18

assumptions our team have made for: ​how​ the product will be used, ​by whom w
​ ill the product be

used, and how will the product be ​accessed​.

This product will act on a set of data decided by a user and sent to a secondary location

that is also decided by the user and protected with the user’s private credentials. That data will be

encrypted by a chosen password and can only be unencrypted with that exact password. In this

example, the user has given over trust that the following are or are not occurring: the credentials

to the online storage location, the chosen encryption password, and the encrypted data are not

being saved, shared, or otherwise compromised. And, most importantly, that the data will be

restored exactly as it was before encryption. In the best case, where there is no intentional or

accidentally immoral use of data or credentials from the development end, ​our product is

handling a user’s private credentials to encrypt private data and “locking” it unless that same

password is given.

In the situation above, we are also assuming the “user” is synonymous with the “owner”

of the data or storage location. This is not true in all cases. Other situations to consider can

include when the data is being handled by a trusted second party, like an I.T. Managed Service

Provider. Furthermore, there are two potential ends for misuse by untrusted sources. If the

primary data is stolen, altered, or has malware unknowingly included in the user’s directory, the

tool will not know the difference. This can potentially infect the secondary location or give bad

actors access to the files stored there. Secondly, if saved settings or user profiles are

implemented in this program and those are unknowingly changed by a malicious user, good data

can be uploaded to an unknown secondary location that only the malicious user has access to.

The data is now effectively stolen and the intended user will not know unless they check their
Delgado et al. 19

intended secondary location for new files. ​We will need to consider the misuse of our application

and determine ways to mitigate malicious behavior that put user’s privacy and security first.

Assuming the product is working as intended and is not being misused after deployment,

there remains the consideration of accessibility. In design and development, our team will make

certain assumptions regarding how users will be able to access the executable, install and

configure the program, and ultimately, if they will be able to operate it effectively. Our team will

assume that intended users will meet a certain minimum specification of operating system,

hardware age, processing power, data transfer speed, and networking capabilities. Essentially, we

are assuming that a user will have a “newer” computer that is internet-capable, and that the user

has a readily available and reliable internet connection. This is certainly not the case in many

personal and small business environments. Similarly, since we utilized the “electron” node.js

library to implement a GUI, we must consider the visual accessibility of our program. Luckily

there are also tools included in and compatible with Electron that help build more accessible

applications. These tools also include the ability to allow the user’s native OS accessibility

options and assistive technologies interact with Electron apps. ​Not all users or user environments

will be able to interact with our product in the same way, our team will need to make the

considerations we can for an equitable user experience.

T​IMELINE

Our team is pleased to report that we were able to meet all milestones laid out in our

proposal by the deadline of the project. Our production timeline roughly matches our proposal

timeline, as well. There were items in the later half of the project that took longer than expected,
Delgado et al. 20

however, that time was made up for by items in the first half of production taking less time than

expected. Most features of our program were reliant on other features working properly, meaning

we had to develop the program in series. The UI and CLI were partially developed in parallel

with other components. Figure 4 shows the approximate timeline of our development.

Week 1 Electron and ReactJS Framework chosen. Development Environment


set-up. Parallel UI set-up started.

Week 2 File and Directory Scanning Module. S3 API upload/restore module.

Week 3 Encryption and Compression Modules chosen and implementation


started. Began main project loop and command line interface.

Week 4 Completion of Encryption, Compression, and Chunking functionality.


Other modules included in main application.

Week 5 Core application completed. Command Line completed.

Week 6 Electron, ReactJS integration and user testing done in parallel.

Week 7 Updates to CLI and Electron UI


Fig. 4

As stated in our proposal, we were not expecting much in terms of resources needed to complete

the project. Besides the S3 bucket to test with, all resources used were free or under open-license

use. So, there was never a concern with exceeding any budget.

U​SABILITY​ T​ESTING​ & F​EEDBACK

Since our project was not sponsored by a specific client, our usability testing was done

with a focus group of coworkers. Each member of our team chose 2 coworkers to test the

application, while assisting them if they had any questions or comments. Our process was to

introduce the application, describe what the steps they would take during the test, and what the
Delgado et al. 21

end result of the test steps would be. We then had a checklist of steps that we walked each user

through with the goal of efficiently showing as much functionality during a single test run as

possible. Since our application is an add-on to a workflow or environment an imagined user

would already have, we decided to have our users test in a pre-configured environment. This

meant to allow them access to a workstation with clearly labeled test directories and files, while

also giving them the credentials to our test S3 bucket.

For each test run, the users would be directed to perform the following tasks in order:

a. Backup one full directory, use their own password and backup name. Afterwards,

can then test excluding files.

b. Backup multiple directories while excluding files. Use a different backup name

and password.

c. Attempt to restore a backup without providing a path, get a list of all possible

restore files for both backup jobs.

d. Perform a full or partial restore for both backup jobs

Users were encouraged to ask questions or suggest improvements they would like to see in future

updates as they were testing. Once the test was finished, users were asked questions from the

System Usability Scale (shown in Appendix A) to get a rough estimate of how easy they felt the

program was to use. We changed the responses slightly to just have users answer each question

on a scale of 1 to 5 so they would not have to write anything down.

The user feedback we received was consistently positive, with most concerns and

questions relating to quality-of-life or convenience updates. At this stage of development, we did

not have our UI finalized, so most users were testing using the command line interface. Some
Delgado et al. 22

found the CLI cumbersome since they needed to copy and paste our given credentials or long list

of files to restore. As stated before, most users were also new to S3 and using our testing bucket.

So, with the S3 account password, S3 bucket secret key, and personal backup password they

were creating, there were a total of 3 passwords to use for each test step, all of which were brand

new to the user.

There are many improvements our team can make to future tests. A huge improvement

that is already implemented is the GUI. Testing with the GUI will make much of the user testing

self-explanatory. At every step of the application’s workflow, all potential options will be laid

out for users. The most user-friendly advantage of the GUI is the addition of a visual File

Explorer. This allows directories and files to be chosen and visually represented instead of

simply showing the user a list of filenames. We suspect that our usability scored would already

show noticeable improvement if we were to perform another round of testing.

Secondly, future testing would be improved by running the application in the user's

personal environments. There are a few tasks to be completed before making this change.

Currently, there are a number of dependencies that need to be installed and configured to run the

application. This would be far too cumbersome to ask users to install on their own machines.

Instead, our application would need to be packaged by yarn or into an executable file that could

be quickly downloaded, run, and uninstalled by users after testing is finished. Once that is

completed, users that are already familiar with S3 would be able to easily test with their own S3

bucket.

The biggest improvement to general user testing would be to implement a more popular

consumer cloud storage. All of our users in this round of testing frequently used either Google
Delgado et al. 23

Drive or Microsoft OneDrive and already had readily available credentials to test with. Adding

functionality to either of these services would better illustrate what our application is

accomplishing. In the scenario where a user has backed up a file to their OneDrive, they could

then verify in their preferred OneDrive interface whether or not a backup was created. They

could also see the characteristics of the chunk files our application uploaded, proving that their

data has been abstracted to something that cannot be easily viewed by others.

Once those features are implemented, it would be much simpler to have more open-ended

user testing. Having our application in the hands of users would provide real-world data and use

cases. We would gain a better idea of how this program will actually be used and what problems

will need to be addressed before releasing to a wider audience. If users are able to reliably test

the limits of the software beyond just learning to use it, we can then focus on testing to address

optimization and addition of new, expanded features.

F​INAL​ I​MPLEMENTATION

The core logic of our application is written in typescript using node. The different

functionalities are split into several modules that individually handle the encryption/decryption,

file system scanning, compression and chunking, and data transfer to and from cloud storage.

Those back-end tools are used by the different front-end interfaces. There are currently two ways

to interface with our application: using command line prompts from the system terminal, or a

GUI-based executable supported by Electron and ReactJS.

Our back-end tools and core application runs are converted from typescript to run as a

local node server which communicates to a front-end supported by Electron and ReactJS. Data
Delgado et al. 24

from our application is fed to Electron, which then gets the rendering information, visual logic,

and events it needs from pages built using ReactJS. Electron creates a native application window

and then renders the HTML/CSS stylings it receives from ReactJS.

Our team did not need to make many adjustments to our proposed implementation. Little

changed in how the back-end modules function and work together. Most changes that needed to

be made involved the rendering of our UI. Initially, the plan was to rely only on Electron’s

HTML/CSS rendering and inject JavaScript code to handle events. However, this approach

would mean our application was virtually “stateless”. This meant no information given by the

user could be stored in the current session of the application and used later. Every time that the

application required, for example, a password to a cloud provider, it would need to re-prompt the

user. We decided adding the ability to save information given by the user in a single run of the

application would greatly improve the application’s usability not only in this release, but for

potential future updates. Our solution to this was to find an Electron framework that included

ReactJS. Using React, we could have variables that carried values across different pages, as

opposed to using solely Electron where values would be erased with each new page that was

rendered.

Figure 5, shown below, describes the different components of our project.


Delgado et al. 25

Fig. 5: Overview of “Yet Another Backup Program”

Starting with the Terminal option, the user can run the cli program that will also take a command

line argument. Using the “commander” npm package, we can process those command line

arguments to run different functions of the application. These functions are found in our

application controller. The controller then reaches out to the node utilities in order to fulfil

requests from the command line interface.

For the GUI-based application, Electron will first open a native window that is awaiting

styling to render. Electron looks for a starting route from ReactJS, which will provide the Home

page. From here, there will be different options for the user to choose, each of which will trigger

JavaScript events. Depending on the events triggered, Electron will run different functions
Delgado et al. 26

provided by the same back-end application controller and supported by the same node utilities.

The data given by the user is also transferred and processed by the back-end components. Once

processed, it is fed back to either the CLI or Electron/React front-end. Any information that can

be saved for convenience is kept until the end of the session. Sensitive information, like a

personal password, is intentionally concealed or “scrambled” as soon as it is no longer needed.

Moving on to the back-end modules, this is where the bulk of our application’s work is

done. Our main application controller is the center of the application, it is detailed, along with

the other modules in Appendix B. The following is a description of the utility and provider

modules that assist the application controller with our backup and restoration process.

The File System module consists of two classes, FileInfo and FileScanner. FileInfo

contains the metadata for individual files that the FileScanner has processed. The FileScanner

class holds the relevant data needed to process directories and exclusions given by the user, as

well as the functionality to sort through those directories and keep track of the files that are

waiting to be processed. Most of the work in the File System module is done within the

enumerateFiles function. This function takes in a root directory path and will sort files by size

and insert the FileInfo for each file into the sortedFiles property.

The Chunk module contains the Chunk and ChunkRW classes. Each Chunk will

eventually be used to contain a predetermined number of bytes of data. Chunks will be processed

as Buffers in memory until they will eventually be written to files in cloud storage. Depending

on the chunk size and individual file size, Chunks can contain multiple files until the chosen

chunk size is reached. Each file in a chunk is assigned a file id, and each chunk is also assigned

it’s own id. When data is passed into a Chunk, it is compressed by the AdmZip package.
Delgado et al. 27

ChunkRW is the chunk handler. It holds the predetermined chunk size, generates the array of

chunks when given a collection of file metadata, and reads or opens a chunk when given a

Buffer.

The cryptography for our application is handled by the Crypto module. It contains the

CryptoED and SensitiveString classes. The SensitiveString class is how our application handles

passwords. Instead of comparing strings, we are comparing bit arrays as a way of further

abstracting sensitive data. Passwords are obtained using getValue and destroyed using scramble.

The encryption and decryption are handled by CryptoED. For either process, it reads in a Buffer,

a YABP password, and an encryption level to use.

Manifests are read and written by the ManifestRW class, both of which are contained in

the Manifest module. Records stored within manifest data are represented by the ManifestInfo

interface. Like the other modules, ManifestRW handles all manifest data in memory until it is

stored in the cloud. Importing manifest data as a Buffer creates a list of records that can then be

used for restoring files, while exporting turns the current list of file records into a Buffer that can

be encrypted and uploaded to a file in the cloud.

Finally, communication with our cloud provider is handled by the S3 module. This

module is an example of how our application would interact with just one cloud provider. In this

case, we chose Amazon’s S3 cloud storage. This module uses AWS’ API to allow write to or

read from a desired S3 bucket. Buckets only contain Objects, which are referenced by name and

key, respectively. The write function will take a string and Buffer in order to write the Buffer to

an Object, while the string will be the associated key. The read function only requires a key for a
Delgado et al. 28

desired Object, which it will return. We can obtain the Buffer data through the Body property of

that object.

These modules all work in tandem to backup or restore a local directory given by the user

by processes defined by the application controller. The controller imports most of the other node

modules since it will need to use almost every Object we define in our back-end tools. From

outside of our project, it imports the ‘bytes’ package during the backup process to parse our

chunk size. The list of exports are the primary functions that our controller makes available to

the front-end components.

When the user requests to backup one or more directories, runBackup is called in the

application controller. For this function we take in data we need to obtain from the user. This

includes in order of the UML in Appendix B: the relative paths of the directories we should

backup, a path or extension for files we should exclude from the backup, the desired size of each

chunk file, our backup job password, the level of encryption that should be used, and the cloud

provider credentials. The controller then encapsulates this information within the other modules,

starting the backup procedure.

Similarly, when the user requests to restore files from their cloud storage provider, the

application controller will attempt to run the restore procedure. The following information is

required from the user to perform a restore: a directory to restore files to, files to restore, the

YABP password that was used to backup the original files, the encryption level used in the

original backup, and the cloud storage credentials. Both the backup and the restore procedure

will not require any input from the user once the proper information is given. Once these

functions are attempted and the information is validated, the backup or restore will occur, writing
Delgado et al. 29

the new files locally or in the cloud. Feedback is given by the program whether the process was

successful.

Finally, the controller can fulfil a request to view the current contents of a storage

location. All that is required for this function is the local and cloud credentials associated with a

backup and/or, in the case of AWS S3, a bucket.

Returning to the backup and restoration process, the controller is where those processes

are defined. The backup and restoration processes were designed to inversely mirror each other.

The process to backup should essentially be the same steps for restoration, but in reverse. Also, if

any step of either process fails, the process will not run to ensure that no files are partially

affected or modified. Failures can occur if incorrect credentials are provided, invalid file paths

are provided, or the incorrect encryption information is provided (this includes the YABP

password and encryption strength).

The controller begins the backup procedure by attempting to create a handler for each

module. This step allows us to verify some of the information given by the user including

making a connection to the cloud provider using the provided credentials and verifying that the

file paths provided lead to valid files and directories located on the local machine. A connection

to S3 is made using the provided credentials which is stored in an S3 object that can be used

throughout this process. The given file paths are used by the File Scanner module to obtain

metadata regarding the files scanned and records which files we are excluding from these

directories. From here, we can use this file information to stream these files into distinct Buffer

Streams or chunks. This is done by our ChunkRW objects from the Chunk module. The size of

each individual abstracted chunk file was obtained by the user earlier and parsed by the ‘bytes’
Delgado et al. 30

package. During this chunking process, the data is also compressed with the ‘AdmZip’ package

used in the Chunk module.

Next, we will build a manifest file using information obtained from the File Scanner and

Chunk module. This manifest will contain the information needed to correctly restore the files

we are backing up. The manifest is then encrypted using the Crypto module. Once encryption is

finished, it is the first file written to S3. The manifest is encrypted and written to S3 separately,

since we would not be able to find it within the abstracted files. At this point, we have verified

we have all the information and data we need to properly backup our files. The controller then

iterates through our in-memory chunks. Each chunk is then encrypted using the encryption level

and encryption password provided. That encrypted data is saved to a new Buffer, which is then

immediately written to a file in cloud storage. Once the controller has iterated through all

chunks, the backup process has been completed and the user is notified.

As stated above, the intention was for the restoration process to be an inverse of the

backup process. Looking at the runRestore function in our controller, we begin by again making

a connection to our cloud provider. If that is successful, we download the manifest file located in

cloud storage. The file was statically named in the backup procedure, so we plan to find it by

filename. Once downloaded, we parse the manifest data as a Buffer and unencrypt it using the

provided backup password and encryption level. If either the password or encryption level is

incorrect, the data simply will not be decrypted, and an error will be thrown at this step. This

manifest data is then imported into a new ManifestRW, which is then able to parse the data into

separate records. These records allow us to determine which file is in each of the chunks still

stored with our cloud provider. By iterating through the records in our manifest and matching the
Delgado et al. 31

filenames with our user’s desired restoration files, we can determine exactly which chunks to

restore by both file id and chunk id. The controller iterates through each chunk. If that chunk is

successfully read, it is then decrypted. The chunk handler then opens the chunk in order to iterate

through each individual file it contains. Each file is then extracted from that chunk. In the

extraction process, the file is uncompressed and written to the restore directory provided.

The restore procedure is identical to the procedure needed to view the contents of an S3

bucket, up to a certain step. To view the contents of a bucket, the controller needs to make an S3

connection and download the manifest file associated with the given information. If the manifest

file is correctly decrypted, the controller simply needs to iterate through those records to provide

the contents stored in that S3 bucket that are associated with a specific backup job.

C​ONCLUSION

At the start of this project, our team saw a problem regarding large amounts of

unencrypted, personal data stored on remote servers protected only by a username and password

combination. We then attempted to address this issue by delivering a simple-to-use, locally run

application that would allow users the option to easily add on an additional layer of security to a

cloud storage solution that was already in use. We hope that what we have completed in this

project’s span provides a proof of concept that this tool can be applied to additional storage

solutions.

In this report, our team has described the details of our proposed solution, including what

our project would need to be considered a success both as a replacement for current workarounds

to the observed problem, but also as a foundation for a tool that could be improved upon in the
Delgado et al. 32

future. We discussed the need for an easy-to-use product like our proposed solution had a viable

audience as the access to cloud storage has grown without a comparable investment in protecting

the data that is stored there. This was followed with the ethical, legal, and logistical concerns

involved in taking this project on.

Next, we included an update once our project was completed. We included the timeline

of completion, along with the feedback from users and the final implementation of the

application as it is functioning at the time of this report. We are pleased to report we succeeded

in our proposed goals and have a fully functioning application that meets all our team’s

requirements. Our team was able to implement a design that uses back-end JavaScript libraries to

compress, abstract, encrypt and upload a desired back-up job to a cloud storage provider. The

design can also perform the inverse of those steps to provide a reliable restoration of the

backed-up files.

Ultimately, we hope to have shown that given this approach and collaboration style a

team of a larger size and budget could expand this prototype into a tool suitable for use by a

widespread audience.
Delgado et al. 33

R​EFERENCES

Cartwright, E., Hernandez Castro, J., & Cartwright, A. (2019). To pay or not: game theoretic

models of ransomware. ​Journal of Cybersecurity​, ​5​(1), tyz009.

Henziger, E., & Carlsson, N. (2019). Delta Encoding Overhead Analysis of Cloud Storage

Systems using Client-side Encryption. ​Proc. IEEE CloudCom.​

Microsoft. (2020) “Service health and continuity”.

https://docs.microsoft.com/en-us/office365/servicedescriptions/office-365-platform-service-descr

iption/service-health-and-continuity

Zhang, X., Tang, Y., Wang, H., Xu, C., Miao, Y., & Cheng, H. (2019). Lattice-based

proxy-oriented identity-based encryption with keyword search for cloud storage. ​Information

Sciences​, ​494​, 193-207.


Delgado et al. 34

A​PPENDIX​ A

The System Usability Scale, is a short standardized survey to quickly and efficiently have test

users gauge the usability of your product. It is typically presented as shown here:

From: https://measuringu.com/sus/

The System Usability Scale

The SUS is a 10 item questionnaire with 5 response options.

1. I think that I would like to use this system frequently.

2. I found the system unnecessarily complex.

3. I thought the system was easy to use.

4. I think that I would need the support of a technical person to be able to use this system.

5. I found the various functions in this system were well integrated.

6. I thought there was too much inconsistency in this system.

7. I would imagine that most people would learn to use this system very quickly.

8. I found the system very cumbersome to use.

9. I felt very confident using the system.

10. I needed to learn a lot of things before I could get going with this system.
Delgado et al. 35

A​PPENDIX​ B
Delgado et al. 36
Delgado et al. 37

You might also like