0% found this document useful (0 votes)
86 views3 pages

Time-Travelling File System Assignment

Uploaded by

Asmit Karmakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views3 pages

Time-Travelling File System Assignment

Uploaded by

Asmit Karmakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

COL106: Data Structures and Algorithms Assignment: Time-Travelling File System

Long Assignment 1:
Time-Travelling File System

1 Introduction
In this assignment, you will implement a simplified, in-memory version control system inspired by Git.
Your system will manage versioned files with support for branching and historical inspection. The
primary goal is to apply your understanding of Trees, HashMaps, and Heaps to a complex, practical
application.

2 System Architecture
Your file system will manage a collection of files, each with its own version history represented as a tree.
The system must utilize the following core data structures:
• Tree: To maintain the version history of each file. A node in the tree represents a specific version.
• HashMap: To provide fast, 𝑂 (1) average-time lookups of versions by their unique ID.
• Heaps: To efficiently track system-wide file metrics, such as the most recently or frequently edited
files.
Note: You must implement the above data structures (along with the operations needed for this
project) yourself from scratch, i.e., you are not allowed to use C++ Libraries which already implement
these data structures.

File and Version Data Models


Each file object in your system will contain the following members:

// File Structure
TreeNode * root; // Your implementation of the tree
TreeNode * active_version ;
map <int , TreeNode *> version_map ; // Your implementation of the HashMap
int total_versions ;

Each version (a node in the tree) must store the following information:

// Version ( TreeNode ) Structure


int version_id ;
string content ;
string message ; // Empty if not a snapshot
time_t created_timestamp ;
time_t snapshot_timestamp ; // Null if not a snapshot
TreeNode * parent ;
vector < TreeNode *> children ;

1
COL106: Data Structures and Algorithms Assignment: Time-Travelling File System

3 Command Reference
Your program must read and execute a series of commands from stdin.

3.1 Core File Operations


CREATE <filename> Creates a file with a root version (ID 0), empty content, and an initial snapshot
message.

READ <filename> Displays the content of the file’s currently active version.

INSERT <filename> <content> Appends content to the file. This creates a new version if the active
version is already a snapshot; otherwise, it modifies the active version in place.

UPDATE <filename> <content> Replaces the file’s content. Follows the same versioning logic as
INSERT.

SNAPSHOT <filename> <message> Marks the active version as a snapshot, making its content im-
mutable. It stores the provided message and the current time.

ROLLBACK <filename> [versionID] Sets the active version pointer to the specified versionID. If
no ID is provided, it rolls back to the parent of the current active version.

HISTORY <filename> Lists all snapshotted versions of the file chronologically, showing their ID,
timestamp, and message.

3.2 System-Wide Analytics


RECENT FILES Lists files in descending order of their last modification time.

BIGGEST TREES Lists files in descending order of their total version count.

4 Key Semantics
• Immutability: Only snapshotted versions are immutable. Non-snapshotted versions can be edited in
place.
• Versioning: Version IDs are unique per file and assigned sequentially, starting from 0.
You must handle the cases of incorrect / inconsistent input as you deem appropriate (how you are
handling must be mentioned in the README).

5 Submission
This is an individual assignment. You must submit a compressed file (.zip/.rar) containing your project
code (.cpp, .hpp files, if any) along with a working shell script to compile your code. You must also add
a README containing the instructions on how to run your code and use different commands. Note that
the user must be able to input commands at runtime, i.e., from stdin and the commands must follow
the given syntax.
The deadline for the submission is September 11, 23:59 IST (Thursday). The submission will be on
moodlenew.

2
COL106: Data Structures and Algorithms Assignment: Time-Travelling File System

6 Evaluation
The evaluation for the project will be based on a Viva (dates to be announced later) which will involve (but
not limited to) questions regarding your code, checking output on some specific sequence of commands,
quality of the code etc.

Common questions

Powered by AI

System-wide analytics commands like RECENT and BIGGEST TREES enhance usability by providing users with insights into file activities and their historicity. The RECENT command lists files based on their last modification time, aiding users in quickly identifying files that require attention or review. The BIGGEST TREES command lists files according to their total version count, helping users ascertain the complexity or development depth of different files. These commands leverage Heaps for efficient calculation, improving the visibility of system dynamics and aiding strategic decision-making regarding file management .

The TREE data structure plays a central role in maintaining a file's version history by organizing versions in a hierarchical manner. Each node in the tree represents a version and holds pointers to its 'parent' and 'children', establishing a parent-child relationship that reflects the evolution of file states over time. This structure allows efficient traversal and manipulation of the version timeline, facilitates branching, and enables easier rollback or retrieval of previous states. By organizing versions as tree nodes, the system can efficiently manage complex version histories and branch merges .

The INSERT and UPDATE commands differ primarily in their handling of non-snapshotted versions. INSERT appends content to the file. If the active version is a snapshot, INSERT creates a new version with the appended content; otherwise, it modifies the non-snapshotted active version in place. UPDATE, on the other hand, fully replaces the file’s content. Like INSERT, if the active version is already a snapshot, UPDATE results in the creation of a new version. Both commands respect the immutability of snapshotted versions by requiring new version creation when attempting to modify them .

Version IDs are assigned sequentially, starting from 0, and are unique per file. This sequential assignment is significant as it provides a clear, chronological order of version creation, simplifying both navigation through a file's version history and ensuring that version comparisons are straightforward. It aids in maintaining a coherent version control system where the temporal order of changes is explicit .

The SNAPSHOT command is significant because it marks the current active version of a file as immutable. By doing so, it records a stable state of the file content that cannot be altered, thus ensuring data integrity over time. This snapshot is associated with a message and a timestamp, providing contextual information and a historical marker that is essential for auditing and rollback purposes. This command is key in preserving specific versions as milestones within the file's version history .

Implementing the core data structures from scratch is necessary to deepen understanding of their internal workings and to gain fundamental insights into their operations, complexities, and optimizations. It fosters a more thorough grasp of how data structures interact within the system, allows customization tailored to the specific application, and enhances problem-solving skills by overcoming implementation challenges. This approach underlines pedagogical goals, ensuring students are well-versed in algorithmic foundations and can innovate beyond standard library usage .

Handling incorrect or inconsistent input is crucial for maintaining the robustness of the system. Challenges include ensuring that invalid commands do not disrupt the file system's state, potentially leading to data loss or corruption. The system must be equipped to validate inputs and provide informative feedback to guide correct usage. This requires implementing input validation mechanisms and error handling processes that can gracefully reject nonsensical inputs, while also possibly logging such events for further analysis. Proper handling improves user experience and system stability substantially .

The system ensures immutability by marking versions as snapshots. A version becomes immutable once it is snapshotted; this is indicated by storing a snapshot message and the current time as the 'snapshot_timestamp'. This prevents any further modifications to that version's content, ensuring its integrity. Non-snapshotted versions remain mutable and can be edited in place .

The primary data structures used in the time-travelling file system are Trees, HashMaps, and Heaps. Trees are used to maintain the version history of each file, where each node represents a specific version, enabling efficient traversal and management of file versions. HashMaps facilitate fast, O(1) average-time lookups of these versions by their unique ID, crucial for quick navigation and version retrieval. Heaps are used to track system-wide file metrics, such as the most recently or frequently edited files, providing efficient access to such information. Each of these data structures contributes to the system's ability to handle file versions, history retrieval, and system analytics efficiently .

To ensure O(1) average-time lookups for versions in the HashMap, strategies such as using open addressing or chaining for collision resolution can be implemented. Open addressing reduces the need for extra memory by resolving collisions within the array, while chaining uses linked lists for managing collisions efficiently. Ensuring consistent hashing function performance and minimizing collisions by choosing appropriate load factors and resize operations are also critical strategies. These optimizations maintain efficient hashmap operations even as the dataset grows .

You might also like