Software Engineer Concepts - 4030afdb-00a4-4f83-A520 - 241007 - 202416
Software Engineer Concepts - 4030afdb-00a4-4f83-A520 - 241007 - 202416
● Object
○ Entity having properties as attributes and methods/procedures. Instance of a
class.
● Class
○ Defines the data format, attributes and provides implementation of methods
● Composition
○ Objects containing other objects in their instance variables. Used to represent
has-a relationship
● Inheritance
○ Is-a-type of relationship. Subclass Inherits attributes and code from parent or
superclass.
○ Abstract classes cannot be instantiated into objects
○ Multiple inheritance refers to inheritance of multiple classes and is allowed in
some languages. Suffers from the diamond problem. Method Resolution Order
(MRO) based on C3 linearization, in python resolves the diamond problem
● Polymorphism
○ Providing single interface/symbol to represent multiple different types
○ Types
■ Parametric Polymorphism (Templates in C++)
● Using parametric polymorphism, a function or a data type can be
written generically so that it can handle values identically without
depending on their type. Allows static type safety.
■ Subtyping
● Subtyping is a form of type polymorphism in which a subtype is a
data type that is related to another datatype (the supertype) by
some notion of substitutability, meaning that program elements,
typically subroutines or functions, written to operate on elements
of the supertype can also operate on elements of the subtype.
■ Row Polymorphism (Duck typing in python)
○ Static Polymorphism
○ Dynamic Polymorphism
● Abstract Class
● Class Interfaces
Data Structures
Listing all important data structures required to cover for interviews.
● Arrays
● Linked List
● Binary Search Tree
● Balanced Binary Search Tree (AVL or red black)
● Heap (Min / Max heap)
● Hash table
● Abstract data types: Tree, Graph, Stacks, Queues, Priority Queues, List, Set, Tuple, Map
Algorithms
Mentioning some of the popular algorithmic techniques. The list of algorithms is only for
academic and reference purposes. It is not important to memorize the algorithms, rather focus
should be on understanding the technique and time complexity used in the algorithm.
Graph Algorithms
● Breadth First Search (BFS)
● Depth First Search (DFS)
● Prim's algorithm (Also Greedy Algorithm)
● Kruskal's algorithm (Also Greedy Algorithm)
● Dijkstra Algorithm (Also Greedy and Dynamic)
Greedy Approach
● Interval Scheduling
● Fractional Knapsack
● Event Selection
● Dijkstra Algorithm
Dynamic Programming
● Cut rod
● Edit Distance
● Longest Increasing Subsequence
● Dijkstra Algorithm
Recursive Backtracking
● N-queen Problem
● Maze-Solving Algorithm
Transactional logs
● Article on transaction logs by percona. Link
● Transaction logs are used to guarantee atomicity and durability using write-ahead
logging.
○ Write-ahead logging (WAL) is a family of techniques for providing atomicity and
durability (two of the ACID properties) in database systems. The changes are
first recorded in the log, which must be written (flushed) to stable storage, before
the changes are written to the database.
○
MVCC
Multiversion concurrency control (MCC or MVCC), is a non-locking concurrency control method
commonly used by database management systems to provide concurrent access to the
database and in programming languages to implement transactional memory. When an MVCC
database needs to update a piece of data, it will not overwrite the original data item with new
data, but instead creates a newer version of the data item. Thus there are multiple versions
stored.
Shard (Database) and Table Partitioning
● Shard
○ Horizontal partitions of the table residing on a separate database instance/node.
● Partition
○ Partition of a table residing in the same database instance. This can speed the
look up. Partitions can easily be truncated.
Indexes
● Full index vs Partial index
● Hash indexes
● B Tree index
● Clustering
Data models
● Relational Model
○ ERD, RDBMS
● Object Model
○ ORM, ORDBMS
● Document Model
○ MongoDB
● Graph Model
○ Graph Databases like Neo4j, ArangoDB
● Multivalue Model
● Network Model
NoSQL
● Graph Databases
● Key-value
● document stores
● Column-oriented DBMS
● Time Series
● Vector Databases
SQL databases
TCP/IP stack
Load Balancing
In computing, load balancing refers to the process of distributing a set of tasks over a set of
resources (computing units), with the aim of making their overall processing more efficient. Load
balancing can optimize the response time and avoid unevenly overloading some compute
nodes while other compute nodes are left idle.
https://cloud.google.com/load-balancing/docs
Cache
● HTTP Caching
○ Browser Cache
○ Shared Proxy Cache
● Query Caching
● Full-Page Cache
● CDN caching
Task queues
Message Broker
● A message broker is software that enables applications, systems, and services to
communicate with each other and exchange information. The message broker does this
by translating messages between formal messaging protocols. This allows
interdependent services to “talk” with one another directly, even if they were written in
different languages or implemented on different platforms.
● Message brokers offer two basic message distribution patterns or messaging styles
○ Point-to-point messaging
■ one-to-one relationship between the message’s sender and receiver.
Each message in the queue is sent to only one recipient and is consumed
only once
○ Publish/subscribe messaging
■ The producer of each message publishes it to a topic, and multiple
message consumers subscribe to topics from which they want to receive
messages. All messages published to a topic are distributed to all the
applications subscribed to it. Kafka has a Pub-Sub model
Asynchronous requests
● Asynchrony, in computer programming, refers to the occurrence of events independent
of the main program flow and ways to deal with such events. These may be "outside"
events such as the arrival of signals, or actions instigated by a program that take place
concurrently with program execution, without the program blocking to wait for results.
Web sockets
● The WebSocket API is an advanced technology that makes it possible to open a
two-way interactive communication session between the user's browser and a server.
With this API, you can send messages to a server and receive event-driven responses
without having to poll the server for a reply.
WebRTC
● With WebRTC, you can add real-time communication capabilities to your application that
works on top of an open standard. It supports video, voice, and generic data to be sent
between peers, allowing developers to build powerful voice- and video-communication
solutions. The technology is available on all modern browsers as well as on native
clients for all major platforms. The technologies behind WebRTC are implemented as an
open web standard and available as regular JavaScript APIs in all major browsers.
Distributed Systems
CAP theorem
The CAP theorem says that a distributed system can deliver only two of three desired
characteristics: consistency, availability and partition tolerance (the ‘C,’ ‘A’ and ‘P’ in CAP).
https://www.ibm.com/topics/cap-theorem
Shared-nothing architecture
Clock synchronization
Leader Election
● Paxos class of Algorithms
High Availability
● High availability is the ability of an IT system to be accessible and reliable nearly 100%
of the time, eliminating or minimizing downtime. It combines two concepts to determine if
an IT system is meeting its operational performance level: that a given service or server
is accessible–or available–almost 100% of the time without downtime, and that the
service or server performs to reasonable expectations for an established time period.
Fault tolerance
● Fault tolerance is a process that enables an operating system to respond to a failure in
hardware or software. This fault-tolerance definition refers to the system's ability to
continue operating despite failures or malfunctions.
Data Warehouse
● A data warehouse is a type of data management system that is designed to enable and
support business intelligence (BI) activities, especially analytics. Data warehouses are
solely intended to perform queries and analysis and often contain large amounts of
historical data. The data within a data warehouse is usually derived from a wide range of
sources such as application log files and transaction applications.
● A data warehouse centralizes and consolidates large amounts of data from multiple
sources. Over time, it builds a historical record that can be invaluable to data scientists
and business analysts. Because of these capabilities, a data warehouse can be
considered an organization’s “single source of truth.”
● Metadata repository
ETL/ELT
● Extract, transform, and load (ETL) is a data pipeline used to collect data from various
sources. It then transforms the data according to business rules, and it loads the data
into a destination data store. The transformation work in ETL takes place in a specialized
engine, and it often involves using staging tables to temporarily hold data as it is being
transformed and ultimately loaded to its destination.
Data streaming
● Streaming data is data that is generated continuously by thousands of data sources.
Streaming data includes a wide variety of data such as log files generated by customers
using web applications, ecommerce purchases, in-game player activity, financial trading,
or geospatial services
● Streaming data processing requires two layers: a storage layer and a processing
layer.
● The storage layer needs to support record ordering and strong consistency to enable
fast, inexpensive, and replayable reads and writes of large streams of data.
● The processing layer is responsible for consuming data from the storage layer, running
computations on that data, and then notifying the storage layer to delete data that is no
longer needed. Data streaming systems need to incorporate solutions to challenges like
scalability, data durability, and fault tolerance in both the storage and processing layers.
Stream processing
Column stores
Apache Parquet
Apache Arrow
Workflow Orchestration
Lambda Architecture
Kappa Architecture
Data Engineering Tools and Technologies
● Apache Airflow
● Apache Nifi
● AWS Glue
● Google Dataflow AWS Glue vs Google Dataflow
● Talend Airflow vs Talend
● Stitch
Steaming Technologies
● Debezium
● Apache Kafka
● Apache Spark (real time data analytics platform)
○ Pyspark (python library for apache spark)
Transformation tools
● DBT
Test Automation
Unit Testing
● Stubs
● Mock
● Monkey Patching
● Parametrization
Integration Testing
API Testing
Performance Testing
● Jmeter
Interface Testing
Thread Synchronization
● Race Condition
Race condition occurs when multiple threads try to access a shared resource at
the same time. A race condition is a bug where the outcome of concurrent
threads is dependent on the precise sequence of the execution of one relative to
the other. Thread (or process) synchronization deals with developing techniques
to avoid race conditions.
● Critical Section
Different codes or processes may consist of the same variable or other resources
that need to be read or written but whose results depend on the order in which
the actions occur. For example, if a variable x is to be read by process A, and
process B has to write to the same variable x at the same time, process A might
get either the old or new value of x.
● Mutual Exclusion
Mutual exclusion is a property of concurrency control, which is instituted for the
purpose of preventing race conditions. It is the requirement that one thread of
execution never enters a critical section while a concurrent thread of execution is
already accessing the critical section.
○ Locks
○ Mutex
○ Semaphores
○ Monitors
Deadlocks
● Deadlock avoidance
Memory Management
● Virtual Memory, Paging, Translation Lookaside Buffer (TLB)
Secondary Storage
● Inodes
DevOps
Continuous integration (CI) is the practice of automating the integration of code changes from
multiple contributors into a single software project. It’s a primary DevOps best practice, allowing
developers to frequently merge code changes into a central repository where builds and tests
then run. Automated tools are used to assert the new code’s correctness before integration.
A source code version control system is the crux of the CI process. The version control system
is also supplemented with other checks like automated code quality tests, syntax style review
tools, and more.
Application Architecture
Bastion Host
Waterflow Methodology
Agile Development
● Sprint
● Scrum and Daily Standups
● Kanban
Spiral Model
Design Patterns and Principles
Fluent interface
Creational
● Singleton
● Factory
● Object Pool
● Lazy Initialization
Behavioral
● Iterator
● Observer (Publish/Subscribe)
● State
● Template method method
Structural
● Composite
● Decorator
● Module
● Proxy
Concurrency
● Active Object
● Message Design Pattern
● Monitor
● Thread Pool
Design Principles
● SOLID
○ Single Responsibility Principle
○ Open-closed Principle
○ Liskov Substitution
○ Inversion of control
○ Dependency Injection
● GRASP
○ Information Expert
○ Creator
○ Indirection
○ Low Coupling
○ High Cohesion
○ Polymorphism
○ Protected Variations
○ Pure fabrication
● KISS
○ Keep It Simple Stupid
● DRY
○ Don’t Repeat Yourself
Architectural Patterns
● Layered
● Client-server
● Peer-to-Peer
● Master-Slave
● Microservice
● MVC
Fluent Interface
● In software engineering, a fluent interface is an object-oriented API whose design relies
extensively on method chaining. Its goal is to increase code legibility by creating a
domain-specific language (DSL). The term was coined in 2005 by Eric Evans and Martin
Fowler.
● A fluent interface is normally implemented by using method chaining to implement
method cascading (in languages that do not natively support cascading), concretely by
having each method return the object to which it is attached, often referred to as this or
self.
Cloud Computing
Google
● Compute Engine (IaaS)
● Cloud SQL (database as a cloud)
● BigQuery (data warehouse)
● Cloud Storage (IaaS)
● Cloud functions
● App Engine
● Bigtable
● IAM
● Cloud Shell
AWS
● EC2 (Elastic Compute)
● RDS
● AWS Load balancer
● Cloudfront
● Cloudwatch
● Redshift (data warehouse)
● AWS Lambda
● AWS S3 (IaaS)
● DynamoDB
● Security Groups
● IAM
Microsoft
Vendor Locking
Sticky Sessions
Links
https://aosabook.org/en/
Continuous Integration
https://aosabook.org/en/v1/integration.html
Architecture of nginx
https://aosabook.org/en/v2/nginx.html
4 Vs of Big Data
Random notes
- idempotent
- monoid
- decoupled
- dependency injection
- unit
- functional programming
- asynchronous vs parallel programming
- thread locking
- eventual consistency
- exactly-once semantics
- lambda vs kappa architecture
- push vs pull architectures
- write-audit-publish pattern
Other Concepts
Serialization/Marshaling, Encoding, Encryption, Hashing,
Links
CAP theorem
PACELC theorem
https://docs.python.org/3/library/secrets.html
https://shopify.engineering/read-consistency-database-replicas
https://towardsdatascience.com/how-to-make-your-pandas-operation-100x-faster-81ebcd09265
c
https://cloud.google.com/blog/products/databases/why-you-should-pick-strong-consistency-whe
never-possible
https://www.percona.com/blog/2007/12/19/mvcc-transaction-ids-log-sequence-numbers-and-sn
apshots/
First of all, what is polymoprhism? In the context of type systems, polymorphism allows a single
term to have several types. The problem here is that the word type itself is heavily overloaded in
the computer science and programming language community. So to minimize the confusion,
let's just reintroduce it here, to be on the same page2. A type of a term usually denotes some
approximation of the term semantics. Where semantics could be as simple as a set of values
equipped with a set of operations or something more complex, like effects, annotations, and
arbitrary theories. In general, semantics denotes a set of all possible behaviors of a term. A type
system‡ denotes a set of rules, that allows some language constructs and disallows others
based on their types. I.e., it verifies that compositions of terms behave correctly. For example, if
there is a function application construct in a language the type system will allow an application
only to those arguments that have types that match with the types of parameters. And that's
where polymorphism comes into play. In monomorphic type systems, this match could be only
one to one, i.e., literal. Polymorphic type systems provide mechanisms to specify some regular
expression that will match with a family of types. So, different kinds of polymorphism are simply
different kinds of regular expressions that you may use to denote the family of types.
Chat GPT answer
In Apache Kafka, the decision on which broker (node) a message should be sent to is primarily
made by the Kafka client. Here’s how it works:
1. Topic Partitioning: When a producer sends a message, the client first determines the
partition within the topic where the message should be placed. This decision is made
based on the partitioning strategy:
○ Default Partitioning: If a key is provided, Kafka uses a hash of the key to
determine the partition. If no key is provided, it can use a round-robin approach
across partitions.
○ Custom Partitioning: Custom partitioners can be implemented if you need more
control.
2. Broker Assignment: Once the partition is determined, the client looks up the partition’s
leader. The leader for each partition is determined by the Kafka cluster’s controller and
stored in ZooKeeper (or in newer versions, in the Kafka metadata itself). The leader
broker is responsible for receiving all the writes for that partition.
3. Client Communication: The Kafka client then directly sends the message to the leader
broker for the determined partition.
So, while the Kafka cluster manages metadata and broker leadership, the Kafka client is
responsible for choosing the partition (and hence the broker) when sending data.
Coding
- Leetcode
- Cracking the coding interview book
- Neetcode
Behavioral interview
- Tech Interview Handbook (Github repo)
- A Life Engineered (YT)
- STAR method (general method)
OOD Interview
- Interviewready
- OOD by educative
- Head First Design Patterns Book
Mock interviews
- Interviewingio
- Pramp
- Meetapro