1 Computer Science
1 Computer Science
Fundamentals
1
What are the three main strategies to translate source code to machine language?
● Ahead-of-Time (AOT) Compilation: Source code translated to machine code before run-time
○ Usually has better performance since more optimizations can be performed, and has already been translated.
● Interpreted: Code translated on the fly, line-by-line during execution.
○ Advantages: easier to implement (writing compilers is hard), no compilation stage. Only necessary, actually executed code is
translated.
● Just-In-Time (JIT) Compilation: Compilation is during runtime, rather than before execution. Combines speedup of AOT compilation
with flexibility interpretation. In particular, lines initially interpreted, but those which are commonly executed will be compiled. This
trade-off is continually, dynamically analyzed by the JIT compiler. Unlike AOT, has access to dynamic runtime info.
Ideally, machine translation strategies should be abstracted from the language itself:
● Java supports both AOT and JIT, depending on the compiler used
● Python can both be interpreted or compiled (into C, via Cython)
2
What are the 3 ways that a programming language can be typed?
3
Describe the following programming paradigms: declarative, functional, imperative/procedural, and
object oriented.
● Declarative: Expressing the logic of computation without describing control flow. Order doesn't matter. Examples are SQL and
HTML.
● Functional: A style of programming which treats computation as mathematical functions and avoids changing-state and mutable
data. This style is good for parallelism and recursion. Gets its origin from Church numerals, the lambda calculus, and mathematical
logic. An example is Haskell.
● Imperative/Procedural: Uses statements that change a program's state and describing how a program's flow operates. Examples
are C, C++, Java. Gets its origin from Turing’s finite state machines.
● Object Oriented: Based on object classes (factories) and object instances
4
State the (standard) compilation, typing, and paradigm of the following languages: C, Java, Python,
Haskell.
C:
Java:
● JIT compiled, but can also support AOT. Imperative, with static typing
Python:
● Interpreted. Supports functional, OO, and imperative paradigms. However, most python is coded in an imperative way. Dynamic and duck typing,
with no advance type declaration.
● These properties (dynamic typing, interpreted) make python useful for prototyping, but more difficult to be used in big projects
a. Compiling & type checking finds a lot of bugs vs linting.
● On “Functional Python”: For example, you can write functions which are “pure”, which don’t change the state (ie, doesn’t change the input or data
that exists outside the functions scope). You can also use immutable data structures (like tuples), and use higher order functions which take a
function as the argument. However, python is not the best for purely functional programming since side effects can occur if one is not careful.
● On “OO Python”: You have things like classes, polymorphism, etc in python. But, python doesn’t support encapsulation, which many believe is a
primary requirement for OO.
Haskell:
5
What are the four basic principles of OOP?
● Encapsulation: Private & public methods
● Abstraction: Don’t need to know inner details to use class
● Inheritance: To reuse code
● Polymorphism: Ability of one function to perform in different ways.
6
What are the SOLID principles in OOD?
● Single-responsibility principle: very class should have only one responsibility
● Open–closed principle: Software entities should be open for extension, but closed for modification.
● Liskov substitution principle: "Functions that use pointers or references to base classes must be
able to use objects of derived classes without knowing it (design by contract)
● Interface segregation principle: Many client-specific interfaces are better than one general-purpose
interface
● Dependency inversion principle: Depend upon abstractions, concretions.
7
What are some issues with OOP?
● Banana gorilla jungle problem
○ The problem with object-oriented languages is they’ve got all this implicit environment that they
carry around with them. You wanted a banana but what you got was a gorilla holding the banana
and the entire jungle
○ Too much overhead, deep class hierarchies, hard to navigate/reason about
● Real world doesn’t always break down into neat categories with well-defined properties
● Composition is sometimes better than inheritance
● In recent years, functional programming has had more “hype” than oop
8
Data Structures
&
Algorithms
9
Define the complexity of lists. Is it a dynamic or static array (and, what’s the difference between them)?
Static arrays require you to state the size up front, and cannot change. Dynamic arrays (the case in python)
have an “underlying capacity”, which can automatically grow as you add items to it. Both use contiguous
memory to increase ease of, reduce overhead, and speed up, access. However, in general, contiguous
memory can cause memory to be wasted due to “gaps”.
Notes:
● Largest costs come from growing beyond the current allocation, and inserting/deleting near the
beginning (since everything after must move)
○ Eg, Append or pop at end is O(1) amoritized (but, O(n) if a resize is needed)
10
Define the behavior and complexity of stacks, and how to implement in python using lists and deques.
A stack is last-in, first out (LIFO) with space complexity O(n). It can be implemented using a list S, where:
Alternatively, one can use append() and pop() collections.deque, which is based of a linked-list and avoids this:
11
Define the behavior and complexity of queues, and how to implement one in python using a list and
deque.
A queue is first in, first out (FIFO) with space complexity O(n). Enqueue and dequeue have time complexity O(1). There are several ways
to use it in python:
1. Using a list L, where you L.append(e) to enqueue and use L.pop(0) to dequeue.
a. This is not very efficient, since pop(0) causes the list to be recreated which takes O(n)
2. Collections.deque
len(D) Number of elements
D.appendleft(e) Add to beginning
D.append(e) Add to end
D.popleft() Remove from beginning
D.pop() Remove from end
D[i] Arbitrary access
D.remove(e) Find and remove element e
D.insert(i, e) Insert element e at index i
del D[i] Remove element at index i
12
Define the behavior and complexity of linked lists, and how to implement one in python. Compare
singly, circular, and doubly linked lists.
● A singly linked list is a collection of nodes each of which store an element and a reference to the next node. The nodes are not contiguous in
memory, so unlike lists, inserting at the beginning takes O(1). However, you cannot efficiently delete a node that is not the head.
● A circularly linked list can be useful if there is no notion of beginning or end.
● A doubly linked list has more symmetry, and you can efficiently delete any node in O(1) , if given a reference to it. Usually, a sentinel
header/trailer is used to make implementation simpler.
● However, unlike arrays, you can’t access the contents of an index in O(1); you need O(n).
Linked lists can be implemented with the collections.deque class; this is an implementation of double ended queues based on linked lists.
14
Define Tree, binary tree, binary search tree. What’s the difference between complete, full, and perfect
binary trees?
● A tree is a data structure which has a root node, and recursively, each child node has >=0 child nodes.
● A binary tree is a tree in which each node has <=2 children.
● A binary search tree is a binary tree in which for all nodes n, all left descendents <= n <= all right
descendents.
● A complete binary tree has every level fully filled except for rightmost elements on last level.
● A full binary tree is a binary tree where each node has either 0 or 2 children.
● A perfect binary tree has every level fully filled.
15
What is the difference between a graph and a tree?
For graph traversals in general, complexity is O(|V| + |E|). However, since the max. number of edges in a tree is |V|-1,
for trees it’s O(|V|).
IMPORTANT NOTE:
For a BST, inorder yields
numbers sorted ascending
17
What are the ways to traverse a graph? Contrast them. How does it differ from traversing a tree?
DFS (left), BFS (right). DFS is a bit simpler, and used if you want to visit every node. However, to find short paths between nodes, BFS is
generally better because you won’t get stuck going very deep; you focus on the immediate neighbors. In both cases, complexity is O(|V| +
|E|).
The main difference between tree and graph traversal is that for the latter, you need to mark nodes as visited, or there might be infinite
loops.
18
Define the behavior and complexity of heaps, and how to implement one in python.
A min/max heap is a complete binary tree (so, filled except for rightmost elements on last level), where each node is smaller/larger than its children. So,
the root is the smallest/largest element. The two key operations are:
● Insert, O(log n)
○ Add element to the bottom, preserving completeness. Then, “bubble up” by checking if the added element is smaller/larger than its
parent.
● Extract_min/max, O(log n)
○ The min/max will always be at the top; then, we replace it with the last element in the heap (bottommost, rightmost element). Then, we
bubble down the element, swapping it with its smaller/larger child as necessary.
Heaps can be implemented with the heapq library, using heappush and heappop on a list. Note that it defaults to a min heap; for a max heap, make the
values negative.
19
How do you keep track of the top-k smallest elements? How about the top-k largest?
● Top k largest: use min heap to constantly remove the minimum elements, leaving only the largest
● Top k smallest: use max heap to constantly remove the largest elements, leaving only the smallest
20
What is a priority queue? State the complexities of implementing one with an unsorted linked list, sorted
linked list, and heap.
A priority queue takes in {key, value} pairs. The main operations are adding elements, and then removing the element with the
minimum key. Three ways to implement one is as follows:
● Unsorted linked list. Here, adding elements takes O(1) but finding/removing the min takes O(n)
● Sorted linked list. Here, adding elements takes O(n) but finding/removing the min is trivial O(1)
● Min heap. Here, adding and finding/removing the min takes O(log n) on average; worst case is O(n).
22
What is a python set? How is it different from a frozenset?
Sets are unordered, have unique elements, and mutable (but their elements within must be immutable). Some common methods are:
● Creation
○ X = set([1,2,3])
○ X = {1,2,3}
○ X = set()
● Modifying
○ X.add(4)
○ X.remove(4)
● Union of sets X1, X2: X1 | X2
● Intersection of sets X1, X2: X1 & X2
● Difference of sets X1, X2: X1 - X2
● XOR of sets X1, X2: X1 ^ X2
● X1 is a subset of X2: X1 <= X2
○ Similar with superset
○ To enforce strict subset, drop the =
A frozenset is similar to a set, but is immutable. So, a set can’t contain sets, but can contain frozensets.
● To create, use x = frozenset([1,2,3]). The argument must be an iterable, so frozenset(1) or frozenset(1,2) won’t work.
● All the set operations still work, eg frozenset([1,2]) | frozenset([2,3]) = frozenset([1,2,3]). However, because they’re immutable, you
can’t add or remove.
23
What are 3 ways to describe a graph?
● Directly in the node class (usually, this is most efficient)
● As an adjacency list
● As an adjacency matrix, where a true value at matrix[i][j] represents an edge from node i to node j (ie,
column node -> row node).
24
Define the following terms: Optimal Substructure, Dynamic Programming, Bottom-up/Top-down
recursion, Tabulation, and Memoization. How do you calculate the runtime for recursive algorithms?
● A problem is said to have optimal substructure if an optimal solution can be constructed from optimal solutions of its
subproblems
● Dynamic programming: A way to optimize problems of a recursive nature, by caching the results of subproblems that
might be encountered later on.
● Bottom-Up Recursion: Here, you start with the sample case, then build on that. For example, solving for one element,
then two, then three, etc. You kind of “build up to the final solution from scratch”
○ For bottom-up, DP can be often done by using tabulation. Here, a table’s values can be populated/cached
sequentially.
○ When figuring out time complexity for recursion, multiply number of recursive calls with the time it takes for each
result/call.
● Top-Down Recursion: Here, you start with the big solution, by trying to divide the problem into subproblems.
○ For top-down, DP can be often done by using memoization. The data might not be stored/cached sequentially,
but as a byproduct of the recursive calls which can be used for later recursive calls.
25
Implement fib(n) using 2 flavors of dynamic programming: memoization and tabulation..
26
Describe & state complexity: bubble sort, insertion sort, selection sort, quick sort, merge sort, heap sort, bucket sort, radix sort.
● Quick sort: Centers around a partition subroutine, such that elements to the left of a chosen partition are less than it, and elements to its right are
greater (ie, the partition is in its final position). This can be implemented efficiently using swapping, with a “low” pointer and “high” pointer. The
partition step is recursively applied to the sublist to the left and right of the partition. Choosing a bad partition can lead to O(n^2) worst case.
● Merge sort: Repeatedly divides the array in half, sorts those halves, and then merges them back together. Eventually you just merge two single
element arrays. The merge step does all the heavy lifting, and can be implemented using pointers for each sorted half.
● Heap sort: Place all elements in a min heap, and then pop everything. Recall that min heaps are complete binary trees where parents are always
smaller than its children. Elements can be added/removed by bubbling down/up.
● Bubble sort: Start at beginning of array and traverse through each consecutive pair, swapping them if the first is greater than the second. Array
needs to be traversed n times.
● Insertion sort: Here, we keep the head of the list sorted, and repeatedly consider the next element in the tail, finding the appropriate place to insert
in the head. At each iteration, the tail shrinks and the head grows until sorted.
● Selection sort: Simply repeatedly find the smallest element in the “unsorted tail”, using a linear scan, and swap it to the “sorted head”.
● Bucket sort: Separate the array into k smaller buckets, sort them individually using a sorting algorithm or recursive call to itself, and then combine
the result. Unlike merge sort, the combine step is trivial because the buckets are already sorted.
● Radix sort: Mostly works for discretizable data, like integers. At each iteration, we focus on ith digit position and place each number into one of 10
buckets, based on that digit. Then, we combine the buckets in order so that we have sorted the numbers by the ith digit. Each iteration takes
only a single pass; in total, the sort takes O(k*n), assuming the largest number has k digits.
Algorithm Time Complexity Space Complexity
Best Average Worst Worst
Quicksort Ω(n log(n)) Θ(n log(n)) O(n^2) O(log(n))
Mergesort Ω(n log(n)) Θ(n log(n)) O(n log(n)) O(n)
Heapsort Ω(n log(n)) Θ(n log(n)) O(n log(n)) O(1)
Bubble Sort Ω(n) Θ(n^2) O(n^2) O(1)
Insertion Sort Ω(n) Θ(n^2) O(n^2) O(1)
Selection Sort Ω(n^2) Θ(n^2) O(n^2) O(1)
Bucket Sort Ω(n+k) Θ(n+k) O(n^2) O(n)
27
Radix Sort Ω(nk) Θ(nk) O(nk) O(n+k)
Implement binary search, and binary search for a pivot.
28
Describe What Union-Find is and how to implement it. What are some applications?
● Union-Find is a data structure that stores a collection of disjoint sets. It represents
each of the sets as a directed tree, with the edges pointing towards the root node.
○ For example, {{0,4,5}, {1,2,3}} can be represented as below, or alternatively,
[5,1,1,1,0,5], where each index node has the value of its parent.
● The two operations are:
○ find(A), which returns the top parent of A. An optimization is to reset all nodes
traversed to the top parent.
○ union(A,B), which finds the top parents of A and B, then sets B’s parent to be
A (or vice versa).
● In practice, find and union are effectively O(1), though technically there are some
class UnionFind:
other details.
def __init__(self, n):
● Applications: self.nodes = [i for i in range(n)]
○ Keeping track of the number of connected components & their contents of a
graph (initialize empty set for each vertex, then repeatedly merge sets based def find(self, a):
on the edges with union) to_remap = []
while self.nodes[a]!= a:
○ Determining if two vertices belong to the same connected component to_remap.append(a)
○ Checking for cycles a = self.nodes[a]
■ If an edge’s merge did not result in any change, then there’s a # optimization to remap, making things faster next time
redundant path which is a loop (assuming there aren’t any repeated for n in to_remap:
self.nodes[n] = a
edges or self loops)
return a
30
How do you find the shortest path between two nodes in an unweighted graph?
This can be done using BFS.
Imagine pouring water on the source vertex, and imagine all the edges are tubes that can spread the water to
neighboring vertices in one unit of time. BFS comes down to simulating this process and asking at what time the
water reaches each vertex: that time is the distance of the vertex from the source. (Dijkstra's algorithm is asking
the same question when the tubes don't necessarily take unit time to spread the water.)
31
What is Dijkstra’s algorithm and the Bellman-Ford algorithm? How is Dijkstra’s implemented, and what’s its time/space complexity?
Perform it for the following graph.
● Dijkstra’s Algorithm: Finds the shortest path between a start node to every node in a weighted directed graph, which may have cycles. All
edges must have positive values. Time is O(|E| log|V|), space is O(|V|+|E|). Note that if the graph is unweighted, BFS=dijkstras.
● Bellman-Ford Algorithm: Finds the shortest paths from a single node in a weighted directed graph with positive and negative edges.
# assumes nodes are represented by 0-indexed integers; n is number of nodes
# edges is a list of [source node num, target node num, weight]
def dijkstras_algorithm(edges: List[List[int]], n: int, start_node: int) -> int:
# setup node distances, previous links, and visited/finalized set
dist = [float("inf")]*n
dist[start_node] = 0
prev = [None]*n
visited = set()
# min_heap holds [weight, source_node]
min_heap = [[0, start_node]]
# convert edges into an adjacency list of form {source node: [[target node 1, weight 1],...]}
adj_list = defaultdict(lambda:[])
for a, b, weight in edges:
adj_list[a].append([b, weight])
# note: if you want to just search for a specific node, loop can be broken once that's found.
while len(min_heap) > 0:
curr_min_dist, curr_node = heapq.heappop(min_heap)
visited.add(curr_node) # this node will no longer be touched; true min distance
for neighbor, edge_weight in adj_list.get(curr_node, []):
if neighbor not in visited and curr_min_dist+edge_weight < dist[neighbor]:
dist[neighbor] = curr_min_dist+edge_weight
prev[neighbor] = curr_node
heapq.heappush(min_heap, [curr_min_dist+edge_weight, neighbor])
33
Bit
Manipulations
34
Calculate the following, describing any possible tricks/shortcuts.
35
36
Explain two’s complement integers, and represent -7 to 7 in bits. How do you convert between pos and
neg? What is the difference between logical and arithmetic shifts?
flip, +1
Pos Neg
-1, flip
In logical right shift (>>>, left), you shift bits and put 0 in the most significant bits. In arithmetic right shift (>>, right), you shift to the right but
fill the new bits with the value of the sign bit. Note that left logical (<<<) and arithmetic shifts (<<) are the same.
37
Implement getBit(num, i), setBit(num, i), clearBit(num,i)
38
Time/Space
Complexity
39
What is Big O/Theta/Omega? How is it useful and what are its shortcomings?
This is a mathematical notation that describes the limiting behavior of a function.
Formally,
Some notes:
● In industry, people usually mean “big theta” when they say “big O”.
● Big O allows us to express how runtime scales.
● However, it is not sufficient to know the true runtime of a function, since the constant factor M or value of x_0 can be
nontrivial. There are also factors, such as cache/compiler optimization, which are not captured by Big O.
40
What is the big-O for the following series:
1+2+3+…+n=?
1+2+4+8+…+n=?
2^0 + 2^1 + 2^2 + … + 2^n = ?
1 + 2 + 4 + 8 + … + n = O(2n) = O(n)
41
What is the difference between combinations and permutations, and how do you compute them? What’s
the intuition behind the equation?
● Same as permutation, but divide out the duplicates with same elements but different ordering
● If n=k, becomes 1
42
What is the time complexity of nCk?
● In general, O(n^min{k,n-k})
● In most cases, k is very small, so O(n^k)
43
How does one calculate the amortized time of adding to an array?
Amortized time describes the average time it takes to perform an action; it’s useful if periodically, things take
longer/shorter than usual.
For an array, after inserting N elements, 1 + 2 + 4 + 8 + … + N ≈ 2N copies will have been needed. Thus, N
insertions can be performed in O(N) time; a single insertion takes O(1).
44
What’s the space and time complexity of this code?
There are 1+2^1+2^2+...+2^n = 2^(n+1)-1 = O(2^n) recursive calls, each taking O(1) to complete. Thus, the
time complexity is O(2^n).
At each point in time, the stack will only contain at most n function call frames; each frame only takes O(1)
space. Thus, the space complexity is O(n).
45
How many ways can you partition a string of length N?
● 2^N (partitioning the ith str with the i+1th string is a binary decision)
46
Rank the rate of increase for common big O times.
47
What is the time complexity of hashing a string?
O(n)
48
How do you determine runtime for recursive+memoization problems?
Since using the memo means you don’t calculate any state twice, the runtime is how many different
combinations you can create from the “state variables”, times the time it takes for each combination
individually to calculate.
49
Misc
50
What are processes and threads?
● Both processes and threads are independent sequences of execution. Each process must start with least one thread, but can also later
create multiple.
● Processes/threads are implemented on the OS level, and uses the underlying CPU’s cores/processors.
○ On a multicore system, multiple threads can be executed in parallel, while on a uniprocessor system, scheduling/context switching
is used for an illusion of concurrency.
● In Python, the Global Interpreter Lock (GIL) allows only one thread to hold control of the Python interpreter.
○ So, only one thread can be executing at any point in time.
Processes Threads
Each has a separate memory space Threads of a process run in a shared memory space
Harder to share objects between processes Easier to share objects in the same memory, but need to be
careful to avoid race conditions
Code is usually straightforward Code usually harder to understand and get right
Larger memory footprint Lightweight, low memory footprint
In python, takes advantages of multiple In python, threads cannot be run in parallel using multiple
CPUs/cores CPUs/cores, due to the GIL.
Use case in python: For when you want to Use case in python: enables applications to be responsive,
really do more than one thing at a given time when execution is I/O bound (eg from internet, database
on the CPU. retrieval, etc). Things aren’t being done in parallel on the CPU,
but concurrently due to context switching.
51
Explain what a race condition is, and how locks/semaphores can help. What’s a deadlock? Give examples.
A race condition occurs when a system’s final behavior is depending on
the sequence/timing of other uncontrollable events, which can lead to
bugs/undesirable, nondeterministic results.
For example, suppose two threads each increment the value of a global
integer by 1. Ideally, left would happen, but right could potentially occur:
A lock (ie, mutex) can help with synchronization by enforcing limits on access to a resource when there are many threads/processes of execution. It’s like a
single key that must be first acquired, used, and when done, passed on to the next thread/process.
A deadlock can occur if a thread is waiting for an object lock that another thread holds, and that second thread is waiting for an object lock that the first
thread holds. For example, if you have the following code for bank transfer, calling transfer(a,b) and transfer(b,a) will cause a deadlock where the second
lock cannot be acquired.
A semaphore is a generalization that allows x number of threads to enter; for example, this can be used to limit the number of CPU, IO, or RAM intensive 52
tasks running at the same time.
Define the following scalability concepts: Horizontal/Vertical scaling, load balancing, Database
sharding/partitioning, caching.
● Vertical scaling improves the hardware on a single machine/node. This is easier to do, but is limited.
● Horizontal scaling adds additional machines/node. This is more complicated and can require more overhead, but is more scalable.
● Load balancing is often done for frontend. This ensures that no one server is overworked, and if a server goes down, the other remaining
servers can compensate.
○ There are different load balancing algorithms, such as round robin, random, least connections, or IP-based hash.
● Database sharding/partitioning: Split up the data onto multiple machines. There are several techniques, which can be combined:
○ Vertical Partitioning: Partitioning by feature (eg one data table for profiles, one for messages, one for videos). However,
repartition may be needed in the future.
○ Directory-based partitioning: You maintain a lookup table. This makes it easy to add additional servers, but can add a point of
failure and add overhead.
● Caching: Here, when an application requests data, it first tries the cache, and if it is not there, then it will look up the data.
53
What’s the difference between bandwidth, throughput, and latency? Compare them in a conveyor belt context;
how do they change as the belt is faster/slower, longer/shorter?
● Bandwidth: Maximum amount of data that can be transferred in a unit of time. For example, Mb/s.
● Throughput: Actual amount of data transferred per unit of time. While bandwidth is the upper bound,
throughput is the actual rate.
● Latency: how long it takes for data to go from one end to the other, e.g. measured in seconds
Throughput
54
Write an efficient way to compute if a number is prime. Then, implement a method to find all the primes
up to n.
55
At a high level, what is MapReduce?
MapReduce is a programming model/framework for processing and generating big data sets with a parallel, distributed algorithm on a
cluster. Its main advantage is that it is very simple, requiring only 2 functions as input to leverage many machines at scale:
Note that while powerful, forcing this structured type of computation does limit some data processing tasks. There are
improvements/generalizations of MapReduce that solve this. 56
What is graph coloring? What property do bipartite graphs have wrt coloring?
Graph coloring is a way of coloring the nodes in a graph such that no two adjacent vertices have the same
color.
A bipartite graph is a graph where you can divide its nodes into 2 sets such that every edge stretches
across the two sets. That is, there is never an edge between two nodes in the same set. There are algorithms
to check if a graph is a bipartite graph. A graph is a bipartite graph if and only if it can be colored with 2
colors.
57
What’s the power of 2 table?
58
Define P, NP, NP-Complete, and NP-Hard. At a high level, what is the P vs NP problem?
59
Question goes here
Answer goes here
60