CISSP 25 Software Development
CISSP 25 Software Development
developers are first concerned with creating working applications and unfortunately, security is
mostly an afterthought or somthing that is patched on top of an insecure but working app. This
is important especially when apps handle sensitive data and interact with the public, because
weak aplications become attack vectors which create potential security risks to companies and
their data.
Otherwise, we or worse, others, looking for them will find vulnerabilities in our code. They’ll
happen due to bad coding practices, such as the creation of backdoors or default account or
unsanitized inputs. These can create buffer overflow vulnerabilities, allow a user to bypass
privileges or even authentication alltogether.
So we need an approach to the development process not only for the functional requirements
so we don’t just want a working applicationm, but also to make sure it complies with security
standards.
Security should be integrated into every stage of the development, And funny enough, even if
writing insecure code just to have something that works, something to show to your boss, to
your client, sounds easier – it’s MUCH easier to write that code securely from the first place
instead of patching security afterwards on a piece of code that was not designed with security in
mind.
In the world of software development, programming languages are the tools that developers use
to build application code, which in turn, when executed, results in what we all access, download
and enjoy - applications.
There are a lot of languages out there– the choice comes down to many things – how easy it is
to write code, how comfortable each developer is with each language, how well is a language
tailored to a specific use case, how religiously attached is a developer to his or her preferred
programming language, and so on. And also, by how they are designed and how much is taken
care of in the background vs how much is left for the developer to decide, they each have their
its own security considerations.
At the most fundamental level, computers operate on binary code, a sequence of 1s and 0s
which translate into simple instructions that can be executed directly by the CPU – this is known
as machine language. The CPU can do simple stuff – basic math, moving bytes around,
comparing numbers, jumping to specific locations to execute the next instruction, and so on. But
it can do it really really fast. Most basic instructions in the CPU take only one CPU cycle – these
cycles are what we are measuring when talking about CPU frequencies (or speed): 5 GHz
means 5 billion cycles per second – 5 billion instructions. And that’s just for one CPU core.
Actually, from a statistical perspective, CPU architectures have evolved up to the point where a
simple instruction can even be executed in less than one cycle, because CPUs now have
prediction features and instruction pipelines so if the adequace circuits are available, the CPU
can actually perform more than one instruction in one cycle. With a certain margin of error
because the CPU has to predict what the next instruction might be in order to prepare it, but it is
more right than wrong. This also has a downside, as in this predictability was exploited by the
Spectre and Meltdown attacks. But we digress here.
Now, this machine language is specific to each CPU chipset and is generally human-unfriendly -
we like to see the big picture, computers like to see details. So a first step above machine
language is assembly language, which uses short text instructions like ADD, JMP and so on to
represent a CPU’s basic instructions. I remember from my college days how one of the
homeworks was to perform a large number multiplication in assembler – it took me around 30 to
50 lines of code – for a multiplication operation, yes.
Now, if we akways had to write 30 lines of code for every large number multiplication, we
probably would not have anything resembling netflix, internet banking or fortnite.
So Developers typically opt for high-level languages like Python, C++, Ruby, Java, .NET,
Golang, Nodejs, and so on. These languages resemble a bit more the human language – we
can use function names, we can tell the code to repeat sections, check for conditions, create
connections, expose APIs and so on. OF course, it is closer to human language but it’s not
trivial. Oh and btw, higher level programming lanaguages offer some degree of portability
across different operating systems and CPU architectures. Remeber that assembler instrucitons
had to be written specifically for each CPU.
At the end of the day, these high level programming languages are also translated in low-level
CPU instructions simply because that’s the only language the CPU knows how to execute. But
in order to transform let’s say Python or Java code into machine code, we two main options:
compilation or interpretation.
Compiled languages, such as C, Java, C#, require the use of a compiler to transform the source
code into an executable file. These executables are then distributed to users, who run them on
their machines. An executable file is nothing more than an operating system-specific wrapper
around machine code. OS-spsecific because the OS must decide what to do with it and how to
execute it, and machine code because, again, that is the only truly executable code out there.
Sometimes reverse engineering is possible, that is decompiling these executables back into
source code or an intermediate assembly language, which is useful for analyzing malware,
finding bugs during software testing, finding vulnerabilities... aaand cracking software and
games.
Some programming languages depend on runtime environments for code execution across
different systems. The Java virtual machine (JVM) is a prime example, allowing the execution of
compiled Java code on any system with the JVM installed. .Net has the CLR –common runtime
library as well. Runtimes are like small virtual environments, wrappers around the running
application that provides additional benefits like running the code in an isolated environment,
like a sandbox (so that if it is compromised it is not going to destroy the entire system), better
security, better memory management and exception handling
And we also have Interpreted languages like Python, JavaScript, Bash, Powershell. SOme of
these are called scripting languages and most scripting languges are interpreted.
The code looks pretty much the same like the code of compiled languages, but instead of
distributing the compiled executable, you distribute the source code directly. When you run
these programs, an interpreter on your system executes the high-level language instructions
and translates them almost one by one into machine code that is immediately executed. This
way you can see an understand the original code and you can make changes to it on the fly.
Which one is better? Well, performance wise compiled code is usually faster than interpreted
code. From a security standpoint, it’s tough to say. You might think that compiled code is safer
because it is more difficult to modify and to add malware to it, but viruses do that all the time.
Even worse, if malware code is added to an executable, without a proper antivirus and a known
signature you cannot ever figure out that there is malware in there. OTOH, interpreted code is
visible, it’s harder to hide some malicious code in it, but anyone having access to the code can
change it.
Developes also often use libraries, which can be open source or proprietary, and they offer pre-
written code for common functions. In other words, libraries help deveopers avoid reinventing
the wheel every time they need to use it. You need code to connect to object storage in AWS?
there’s a library for that. You need code to implementdigital certificates in your app? There’s a
library for that. Security-wise, developers have to be careful which libraries they are using
becuase the are basically adding someone else’s code to their own app. And dependin on
where you found that code, it might add vulnerabilities to your app.
From a sec standpoint, each object should be similar to a black box – it should be a self-
contained environment and we should only be concerned with its inputs, outputs or in other
words, its interface, or the way it is supposed to interact with the outside world. An action that an
object can perform (or can be told to perform) is called a method. And objects belong to classes.
For example, multiple objects representing users would all belong to the same user class, since
they are all built the same way, they have the same methods and interact with the rest of the
app in a similar manner. So the class is like a template, a blueprint for the actual instantiated
objects.
Software Flaws
There is no software out there that is completely bug-free. That’s a given. You could also say
that there is no piece of code that is vulnerability-free, only code whose vulnerabilities haven’t
been discovered yet...
Most of computer security actually revolves around this concept – code will fail, code has bugs,
vulnerabilities exist, so let’s see how we can detect, mitigate, contain them. Basically that’s what
all security controls do. They just dance around errors and vulnerabilities.
First of all, i’ll start with what i like to call the root of all evil – input validation or lack of proper
input validation. I said this in other trainings as well – but thi IS actualy the root of all evil – if we
were able to find a way to perform flawless, perfect input validation, the market for security
professionals and security solutions would crash and burn. Why is that? well, pretty much
EVERY SINGLE TYPE of vulnerability has an exploitation method that relies on some sort of
input. Denial of service, buffer overflows, malware, rootkits, sql injection, xss, EVERY single
attack out there apart from social enginering and physical attacks, relies on some type of user or
attacker interaction, which in turn means malicious input sent to an app, malicious input that is
not detected as malicious, but processed and thus allowed to exploit a vulnerability.
So, in a theoretical world, proper input validation at all levels (user interface, network, apis,
everywhere) could avoid any kind of attacks, even unknown ones. And BTW, input vlaidation
means analyzing and accepting only input that we know the application was designed for,
nothing else. If you expect a person’s name in a field, make sure you only accept the proper
character set, a fixed length, no code tags, no binary data, and ideally you perform these
validations on both the client and the server, because the client can be overridden by an
attacker. And funny enough, some attacks target not the application, but the input validation
functions themselves. So very careful how you make those checks as well, because in order to
figure out if you want to accept an input or not, you still have to look at it and load at least parts
of it in memory. Sometimes, that’s all it takes to dig your own grave
Also, you might have heard that input validation should always be performed on the server side
and not on the client side. My advice? do it in both places, because on the client side perhaps it
won’t protect you against real hackers ,but validatiing inputs on the client will help keep your
server load low when all you need to reject are simple input error generated by legitimate users.
It might have happened to you as well – you copy/paste a name or an address in a web form
and you accidentally paste in there a 10000 characters conversation text you just copied a while
ago and you forgot you had it in your clipboard. That’s just human error, have your local
javascript code tell you that your input is too big or it doesn’t match the required format, there is
no need to overload the server with that junk.
authenticaton issues come next. First of all, design it so that you authenticate according to the
sensitivity of the accessed information. An age verification can be a simple yes/no prompt, but
access to confidential data should require strong paswords and MFA. If you can, it is
recommended to use your company’s authentication system, don’t reinvent the whell and don’t
start designing your own authentication system from scratch. Active Directory users or managed
identities in cloud environments are much secure, validated , true and tested solutions than
what you might come up with.
And once authenticated, careful with session management. Session ids act as unique secret
identifiers following a successful authentication so a compromised session id can be similar to a
compromised account. Store and transmit session data securely, make sure they expire sooner
than later.
Errors, or at least those that you are aware of, are another important topic. Software should be
designed to produce controlled errors based on exception in your code. You need to tell the
user when they’re doing something wrong or when the application or the database has a
problem. Detailed errors are very useful while developing or testing your code because they can
tell you exactly where the problem lies. But in a prodution application, you should never provide
more details than a regular user needs. Because if that user is an attacker, perhaps one that
intentionally caused that error, and you’re displaying a stack trace with all your function calls
that lead to the error, you’ve just given an attacker details about the inner workings of your app.
And sometimes errors can expose sensitive errors as well.
Logs, especially those related to security events, are also important. Logs are messages that
are generated and then stored or sent to some remote storage location when a certain event
happens in an app. You could perhaps log all webpage accesses to your app but this would
generate a lot of mostly useless log data. Much better is to log events that might indicate
problems, like failed authentication attempts, input validation that could not be performed or
failed (those are usually scary), attempting to use expired session data, same user accessing
the app from 2 different IP addresses, regular users performing admin tasks, and so on. And
also investigate those. Simply logging that information and then forgetting about it until the day
when you run out of log disk space is similar to not logging anything at all.
How your application fails is also a mattter of concern. But wait, you say, if you know that the
app will fail, how about make it NOT fail? You are right, my young padawan, but what if you app
fails for some unforseen reason, some vulnerability that you did not expect to happen? Maybe it
failes because of your crappy code no attackers involved! So you need to plan for situations
where specific parts, modules, in your app will simply fail, for whatever reason. Since we don’t
know what or why will fail, you can’t really predict much about how the rest of the app wil
behave with a missing limb..,. BUT in general, and for security purposes, we have two ways in
which we can try to protect the rest of the app. That is, when a component stops working, the
app can fail ”open” or “secure”. Fail-open means let the app bypass security controls when they
are not available any more. Fail-secure means lock down the app when security controls are
unavailable.
Here is a fail secure example: BSOD – it means that something failed miserably, without the os
being able to mitigate it. What does the OS do? Stop everything right now. Bc the system might
become too unpredictable to proceed. Or if a firewall fails, should we allow all connections from
the outside? Hell, no.
And here is a fail open example: power failure due to a fire in an office building. Automatic
security doors need power – do you want them to be locked down or do you want them to
become open when they lose power so that people can evacuate? Does that mean that an
attacker could intentionally cut power to those doors to bypass them? Yes. Welcome to security.
That’s why it’s a profession and not just a checklist.
When building software, project management usually adheres to a lifecycle model so that
everyone knows what they should be working on and in which phase they currently are.
And it not only helps with good code practices, but also with security because, as we said,
before, security should be embedded in software from the beginning, which also means that it
should be accounted for in each of the SDLC phases.
Speaking of, here are the phases of a generic SDLC model. There are many models, some
might differ slightly, but in general, phases align to these descriptions:
We first define the app conceptually – what exactly do we want, and we make sure that
everyone wants the same thing. The most we can do from a security perspective here is to list a
very high-level overview of what security means for our app and we can also classify the type of
data that the app will process. Sensitive, personal, medical, financial, and so on.
Next, we have the functional requirements – how the app should work, what modules it should
have and how they should interact with each other. In general functional requirements focus on
each component and its inputs, outputs and what type of processing should happen between
the input and the output.
Security control specifications comee next because at this point we should have a pretty good
understanding of what our app should look like and what kind of data it should process. So we
focus here on authentication, confidentiality and integrity of dat and also we design components
to provide audit trails of who did what.
The app as a whole is next reviewed on paper – because we don’t have any code yet and
everyone pitches in to determine if the components, the way they were designed, play well
together, match customer expectations and best practices.
Next, we code the app, of course, keeping in mind that security must be built into the app from
the beginning. Code reviews happen here often and they should happen as additional eyes can
detect additional issues before they end up in the shipped application. Different levels of testing
need to be performed here as well. Starting from unit tests, which cover small modules and
pieces of code, and then going to scenario based testing, stress testing, performance testing
and so on. Each newly added feature should be accompanied by a battery of regression tests
so that we can be sure that new code does not break old code. Once developers have had
enough of this testing business, the app can be exposed to a limited set of users to test it in a
close-to-real-life scenario. This is UAT – user acceptance testing.
Finally, after all testing is done, or the deadline comes, or the money ends, whichever comes
firsts, the app is delivered to the end users. This is when the next phase begins, the
maintenance phase. Bug fixing, monitoring the app in production, solving trouble tickets, fixing
security flaws and integrating updates all happen in this phase which as you can expect, can be
a very long phase. Until the software enters end of support, or until the development company
becomes bankrupt, or until the last developers leaves for a better job, if it was an in-house
developed app. Whichever comes first.
let’s talk about the “Just Code and Hope for the Best” approach or the “chaos” methodology”
. Obviously, we’re gonna make fun of it.
It’s pretty much like winging it. You dive into coding without much of a plan, fix bugs when you
stumble upon them, and the specs? Well, they’re kind of made up along the way, if they exist at
all. Documentation is more of an afterthought, and testing is hit or miss.
Now, this might be okay if you’re working on something small and just need to get it out the
door. But for anything else, it’s a recipe for headaches. You’ll end up with code that’s about as
organized as a teenager’s bedroom. Sure, it’s quick and dirty, but it’s not the way to go if you
want something solid and reliable in the long run.
The upside? It’s easy to get. You just follow the steps. But here’s the catch: it’s gotta be planned
to perfection. If not, well, it’s like forgetting make the sauce for spaghetti night. You can’t just
wing it. So, while it’s great when you nail the plan, it’s pretty rigid and can trip you up if anything
unexpected pops up.
Normally this plan doesn’t allow you to go back but some implementations accept a one-step
back iteration.
. Back in 2001, a bunch of smart folks came up with the Agile Manifesto, which is basically a list
of values and principles that guide this whole Agile thing. Long story short it’s about People and
how the way they work together matters more than just following a bunch of rules. Another idea
in the manifesto is that It’s better to have something that works and gets the job done than to
have tons of paperwork, so we’re prioritizing deveopment, but documentation not so much. This
is where we find the term MVP – no, not that MVP, but instead minimum viable product.
And finally, Working with customers is key, and being able to switch gears when needed is
super important. And super frustrating for developers of course, but that one is not in the
manifesto.
It can get a bit messy with complex projects, and sometimes it’s hard to see the finish line. Plus,
there’s a lot riding on each team member, so everyone’s got to be on their A-game.
Agile isn’t just one way of doing things; it’s more like a mindset. There are a bunch of different
methods like Scrum, Kanban, and Extreme Programming that put Agile into action. Scrum’s
probably the most popular one, with daily team huddles to figure out what everyone’s doing and
how to get past any roadblocks.
Of course, there’s a lot more to cover about this, but this is neither the time, nor the certification
exam for it.
But here’s the deal: it can be a handful to keep everything straight with all the moving parts. And
with all the phases you’ve got to juggle, it’s not the simplest path to take. So, while it’s great for
adapting to twists and turns, you’ve got to be on top of your game to manage The Spiral.
So, SAMM’s like your guide to making sure your software’s not just good, but good and secure.
It’s about being proactive, not just reactive, and that’s a big deal in the software world.
Now, as you can probably guess these models aren’t here for you 2 memorize every single
detail about them but at the most, but don’t hold me responsible for this at the most. The exam
is probably gonna ask you which of these are valid models or not I would highly doubt that any
question would ask you about the low level details of each step in these models, right so don’t
come back to me and say you want your your coffee money back if you encounter such a
question in the exam
Because at some point, you will have to make changes to your software, probably for one of two
reasons first of all because maybe you detect some issues and flaws. You need to patch some
security holes and need to provide some updates and secondly, because perhaps the
expectations of the client will change they start asking for additional stuff. We want to integrate it
with this and that maybe we want this additional type of report maybe we want people to be able
to access it on their mobile phones and so On.
So while you had a very nice process, very good up-to-date process for developing the
application you shouldn’t break that nice order when you start making changes to the
application so changes to an app should also follow a very strict rigorous thoroughly tested that
keeps you safe from future flaws I also allows you to come back if and when something goes
bad
And change balance rent is also crucial from a security standpoint because security means that
you need to run a tight ship. You need to be able to check all the inputs you need to validate all
permissions all the actions and when you start making changes to an application you can break
all that nice flow that you’ve designed before in the design phase of that application. You can
break an application through its updates. And I’m not talking about the example with Windows
updates which yes sometimes they break stuff but we’re talking about updates that are
performed without a strong security mindset, and which can actually introduce new
vulnerabilities in your code.
So change management can be split up into three different processes. The first one is called
request control. This is basically the process from which every change is born. This is where
users, the developers, the owners of the applications request some sort of a change so it’s
basically like opening a ticket to ask for something to be implemented. Obviously, a filtering
process should also happen here as in somebody probably combination of developers and
managers should look at a cost benefit analysis. Is it worth doing it? Is it actually going to bring
us more money, happiness satisfaction whatever and also is it really a priority right now
because we might have a lot of other unsolved issues in the back. Should we focus on this new
request right now and just drop everything that we are doing?
Then we have change control. This is where we’ve decided that we actually need to process the
change and we first start by analyzing the situation that led to the change I mean, what’s the
case what is the actual request? What is the user actually trying to obtain by requesting this
change? We’re not gonna start implementing to change and dump into production. We’re going
to test it in development environment or test environment will thoroughly documented and
ideally will also perform some regression testing. That is check the previous functionality of the
app with a batch of predefine test and make sure that we did not break anything that was
working before we implemented th, if everything is an order, then we can proceed and publish it
into production
Which brings us to the last phase which is release control. In a release control we’re double
checking that everything is in order and the code is ready to be deployed in projection. Make
sure you did not forget any debug messages in there any back doors any admin accounts. And
again in an ideal world this is where acceptance testing with happen or UA user acceptance
testing that is just make the change available maybe in a blue green deployment scenario,
make the change available to just a couple of people maybe just a small department of team
just to see how it behaves and how well their expectations are met
And even though it’s not officially mentioned here, I would add that it’s very important to have
some sort of version in control because you’re releasing your versions of an app. I need to keep
track of those changes in a very rigurously defined versioning system so that users know
exactly which version they are running which version is compatible which which one and from
which versions they can upgrade, if you have any backwards, compatibility issues, so all those
release notes that are usually published with a new software release. That’s something I need
to take care of before pushing it into production.
The whole idea is to get things done faster and smoother. Instead of dropping a big update once
in a blue moon, teams can roll out updates as often as you post on social media. And for the
real pros, it’s about getting to a point where updates are flying out the door almost every minute,
which means a lot of smart automation and slick coordination.
When you’re moving this fast, you can’t forget about keeping things secure. That’s where
DevSecOps comes in—it’s DevOps with a security badge. It’s all about making sure that as
you’re cranking out code, you’re also keeping it locked down tight.
As you can probably guess is closer related than often goes ahead in hand with agile
programming in essence since it’s about collaboration it also means that requires an upgrade to
the skill set of the engineers
So developers now need to be aware of the infrastructure that their app is being tested on and
ultimately is gonna be running on so this is where knowledge about containers k8s and cloud
environments is going to come in so developers need to be comfortable with that while on the
other side the operations people will need to learn a couple of the skill sets that initially were just
on the realm of developers so they’re gonna need to learn automation. They’re gonna need to
learn how to write a yaml file. Gonna need to learn how to interact with an API to deploy a cloud
virtualized environment.
Now, if you’re shopping online, you might think that you’re interact with the single app when fact
that app intern calls a number of different other services like stock management may user
profiling maybe credit card processing maybe a newsletter engine and so on
These APIs are the behind-the-scenes magic that lets different web services chat with each
other.
AI are basically special interfaces through which you can send special commands to an
application and control how it behaves since an API is by a programmatic interface which
means you’ll probably need a specialized tool to craft. Those APIs are also a great method for
making applications talk to each other in a machine to machine language.
So if you’re thinking about cloud infrastructure, for example, let’s take a WS right Amazon web
services you could deploy infrastructure in AWS by connecting to the cloud console in your
browser at console.aws.amazon.com and you could start clicking clicking in there and deploy
infrastructure similarly you could deploy the same infrastructure using the SSCLI which is a set
of command line facilities, specially, designed to interact with infrastructure inside AWS also,
you could be a specialized to like terraform which processes a piece of code and then deploy
infrastructure according to that code similar could also be doing thi even though configuration
management tool has the ability to deploy infrastructure as well. You could do it with this URL
utility on linux and macOS systems you could do it with postman. You could do it in number of
ways because all these tools actually do one single thing they all talk to the API of AWS and
they all know how to craft and especially package their messages so that says API understands
them and is able to process them correctly.
Now since API is allow your access to an application, obviously that access needs to be
authenticated authorized, and not everybody should be able to use it just like your regular
interface your application where you’re asking for a username and the password and maybe a
multifactor authentication method well, the same sec principles should apply to API as well
And normally, since interaction with AVI is supposed to be automatic authentication happens,
using temporary tokens or API keys, which basically act as passwords many many API’s
already rely on a single authentication factor, which makes them a bit more exposed and risky
when it comes to credentials compromise
API is also need to be very thoroughly documented because the messages that an application
will accept will obviously need to be strictly formed in order to be accepted. Most APIs rely on
standard syntax for their requests and replies and those payloads are usually JSON or XML.
Testing Software
I probably don’t need to tell you why software testing is important, right? Well, just in case i need
to tell you, it’s because through testing you detect bugs, flaws, unexpected behaviors that not
only will frustrate the app’s clients but might also open it up to potential attacks as well.
There is a lot to talk about software testing. Starting from how testing is conducted, manually or
automatically, by using scripts and automated scenarios, going further to what is exactly is
being tested: the interface, the outputs, the performance, the response time, the correctness of
the result and so on, and finally looking at what kind of data constitutes a test.
In general, after you validate that the app behaves as it should with valid input, you should move
on to invalid input and see how the app behaves. Invalid input can be provided by mistake and
we don’t want a simple user mistake to break the app , but can also be injected by an attacker
attempting to find or exploit a weakness in your code.
Designing proper test cases and their respective inputs is almost a form of an art because it
requires a lot of creativity. It’s easy to estimate what a valid input should look like, because
that’s the input that the app was designed for, but it’s soo much harder to think of all possible
invalid inputs that might be provided to the app. Invalid inputs can be about the wrong data type
- a number instead of a name - they might be the wrong length – an 5000-character text instead
of a name – it might be made up of binary/unprintable character that should not be found within
a name field. Might be executable code which again, should not be accepted and god-forbit,
executed on the server. It might be no input at all. It might be 5000 requests in one second. it
might be an SQL statement. It might be a negative value where a positive number of items in a
cart is expected. It might be the maximum allowable integer for the quantity field of an item in
the cart, followed by an addition of an extra item, which might create an integer overflow and
result in a cart with -4 billiion pizzas to be ordered. These are all called malformed inputs and
your application should be designed to handle them all.
As for the techiques for testing, in theory we have 3 of them: white –box, grey-box and black-
box testing.
And before talking about them I should warn you that in general. testing should not be perfomed
by (or only by) the person who wrote the code. Not just because the developers consider their
code their precious flawless baby (even though that’s true as well) but because a fresh pair of
eyes, who might have no knowledge about the underlying code, might have a different
approach.
And speaking of how much the tester knows about the tested code, here are the 3 testing
approaches.
White-box testing means that the tester has access to the underlying code, knows exactly how
the app should behave and is able to pinpoint exactly in its code why it’s doing what it’s doing
Black-box testing is the complete opposite and relies on the fact that the tester resembles a
regular user – with regular expectation about what the app should be doiing, but without any
knowledge about how the app is designed and coded.
Grey-box testing si somewhere in between, where just like in black-box we’re looking at inputs
and outputs, but we might be using the source code just to design proper test cases, but not to
analyze the program’s execution.
Code Repositories
Code repositories
When writing large scale projects, apps, websites, games, whatever you’re developing, in most
cases there will be a team of developers in charge of each aspect of the app.
Those developers will be specialized in different aspects of the developing app – some on the
frontend, some on design, others on database optimizations, backend communication,
middleware, graphics, user interactions, 3rd party integrations, you name it. And still, the app
itself will be a collection of code files that need to stick together or be compiled together to get
something functional.
So the problem here is – where and how do we store the source code for an app in
development so that each developer can safely work on their own part of the code without
clashing with other developers’ work, then how can we integrate all these separate development
efforts in one coherent and functional app?
`Github, bitbucket, sourceforge can act like code respositories but their role extends beyond
being just a remote storage locatio for a bunch of code files. They provide collaboration,
feedback, facilitate code reviews, provide code versioning, release management and ultimately
a way to track and address found bugs.
they are also a good method for sharing code – a lot of open-source content can be found
publicly availble on github which is free for you to use, download, even clone and continue
development on your own and maybe merge your efforts back into the big project if your
contribution is really valuable.
Code repos have become smarter over time, too. Now you can create something called a
“hook,” which is like a little alarm system for your GitHub repository. It’s a way to set up
automatic notifications or actions whenever certain things happen in your repo. For example,
you could have a hook that sends you a message every time someone pushes new code or
creates a pull request. It’s a handy way to automate tasks and stay in the loop with what’s going
on in your project. You could even do something cooler like automatically deploy a piece of
infrastructure or configure a set of virtual servers when a new configuration is pushed into the
repo. Or even get your program automatically compiled and published as a new release when a
new production code is submitted. How cool and agile-ish is that?
From a security standpoint, there is a huge risk involved with public repositories, especially
when they are used durin the development of a project. During development, developers might
use temporary credentaisls hardcoded in the app itself for testing. If those credentials end up in
a public code versioning tool, they are free for all. And don’t rely on security through obscurity
here, don’t think... nah, who’s gonna find my obscure little project in there? Well, robots,
crawlers, bots will do. Many of them are designed especially for that – search and grab anything
that looks like a password, an api key or any sensitive information forgotten in public
repositories.
I just found the other day a study saying that in 2023 alone, over 12.8M new secrets were
leaked publicly on GitHub. I mean wow, you don;t even need to be a hacker anymore, sensitive
information is given for free!
If you wanna have a little reading fun, search on google for “The State of Secrets Sprawl 2024”
Or any other year, depending on when you’re listening to this.
Databases
Databases
Apps need to work with data, otherwise they have nothing to do. And data needs to be stored in
an organized manner so that it is structured and easily accessible when needed in the right
format. So here come databases to the rescue!
And since for most applications, all their data is stored in some sort of database, that data
becomes the juicy stuff that attackers will try to get to when attempting to exploit an app. So
database security become a vital concern, as well.
For most use cases we have 2 types of databases: relational and distributed.
A relational database can be represented as a hierarchy of interconnected data points and this
is what most people think about when speaking the letter SQL – structured query language.
Distributed databases are well... distributed. Think of blockchain or DNS, where multiple data
points are stored in multiple locations but interconnected.
Now there’s also a bit of a stranger animal in here, caled a NoSQL database, which are NOT
relational databases and they are not relational becaus they are targeted at other use cases –
which are: very simple data stores, like just a key-value store, perhaps not even very big, and
the need for speed. Without the overhead of relational databases, normalization and such, nosql
dbs can be really fast.
So back to relational databases: in a RD information is neatly organized into grids, much like the
cells of a spreadsheet – which is also why some people call excel files databases, especially if
they’re huge. Btw, it’s not the size that matters (at least not here) and size is not what makes or
unmakes a datbase. instead, it’s the relational part, the ability to query specific related fields and
how they are organized. This is the essence of relational databases, where each grid, or table,
holds data that’s interconnected. It’s kinda like a digital filing system where each drawer (table)
contains folders (rows) that are labeled with various tags (let’s call them columns). So if you
need a specific document, you’ll ask for a specific drawer, a specific folder and a specific tag.
This is how select statements work in sql as well – basically, you’re asking for a specific field, in
a specific table that belongs to a specific database.
In a typical business setting, you might encounter a table for customer details, another for the
sales orders, another for past transactions. Each of these tables is a collection of data points
related to one another. For instance, the customer table would not just list names, but also
contact details, addresses, and perhaps preferences and the ids of those customers will be
found in the sales orders table and in the past transactions table. This is what makes relational
databases truly powerful: their ability to link tables together.
Now, let’s dive a bit deeper. Each piece of information in a table is an attribute, and these
attributes form the columns. So, if we’re looking at our customer table, attributes would include
the customer’s name, their city, or their phone number. Each customer’s complete set of
information forms a row, which in database lingo, is often called a record or tuple.
The beauty of a relational database lies in its flexibility. The number of records can grow as new
customers are added, which is known as the table’s cardinality.
However, the number of attributes (columns) typically remains constant, symbolizing the degree
of the table.
In a database, Every record needs to be uniquely identified – like a user table needs a unique
user id for each record, or a purchases table needs a unique transaction id. These unique fields
are called keys. A record can have more than one key, but it also should definitely have one at a
minimum.
We need to have these keys not just to ensure the uniqueness of the information in the table,
but also because these keys are the fields on which the tables are joined whenever you need to
cross-reference information between separate tables.
For example, you might have a customers table – there’s probably going to be a customer id in
there, an email, an addresss, a phone number, country, whatever information that is strictly tied
to that user.
On another table, let’s a purchases table, you would have a unique purchase id, a list of items
purchased, a total price, a payment method, a delivery address and perhaps if the order was
successfully fulfilled or not. How do you tie this to the customer who made each purchase? By
adding a customer id in the purchases table as well, the same customer id that uniquely
identifies a customer in the customers table. That way, when you join those 2 tables on the
customer id field, you can explicitely now tie each unique purchase to a specific customer, along
with all the details from both tables.
SO that’s why we need keys. And what we just described are primary keys, and a pri key should
be only one in a table and it should definitely be unique, that is you cannot have 2 records with
the same primary key in the same table. The pri key is selected from a set of candidate keys,
which are also potential unique identifiers. For example, you might have an internally generated
customer id which is your primary key. But you could just as well use the telephone number of
the customer, or their address. Those have the potential of being unique, but are not
guaranteed to be. Customers working in the same building might use the same address and
phone numbers might switch owners. So while they might be unique as well, they’are not really
good... candidates for primary keys. Even though we actually call them candidate keys – so
yeah, let them think they still have a chance, right?
Apart from candidate and primary keys, we also have alternate keys. This one is simple – if
among the candidates only one is selected as primary, then the remaining ones are alternate
keys. Told you it’s simple.
And we also have foreign keys, which are fields that are primary keys in other tables. So if the
customer table has customer id as a primary key, then the purchases table also has a customer
id, which from the purchases table’s perspective is a foreign key. So a foreign key is basically
some other table’s primary key,
Database Consistency
On the topic of consistency, In databases, normalization means arranging the data into tables in
such a way that each one deals with one topic or concept. This helps to avoid having the same
piece of information in multiple places, which can lead to confusion, mistakes and conflicting
information. By doing this, you make sure that when you update one piece of information, you
don’t have to remember to update it in several places – you only have one place to go.
So long story short, you can remember that normalization means avoiding redundancy. There
are several so-called normal forms in whcih databases can be arranged or converted, but this is
beyond the purpose of this training. I still remeber these back from my college days and trust
me, it’s quite dull, you’re not missing on any spectacular content. If you want to know more just
google “database normal forms”
Also on the topic of consistency, there is transaction-oriented processing. This means that if a
specific change in a database that requires interaction or changes in multiple tables, they should
all be processed in one single transaction that cannot be executed partially. So it has to be an
all or nothing operation.
Now, if these to operations, the check and the actual withdrawal of money would not be
executed as an atomic operation, then there would be the potential for a race condition where
another check could be performed right after the first one, followed by another withdrawal.
So if the account held 100 usd, and 2 users tried to withdraw 100 usd at the same time, and the
order of operation would happen to be: check for user 1, check for user 2, then withdraw for
user 1 and withdraw for user 2, you could end up with a negative 100 usd in the account. Both
checks passed but they were unaware of each other and they were not part of atomic
operations.
So actually there are more features of relational databases that help with these unwanted
situations:
- we have atomic operations, which are just what we described, where an entire transaction
needds to be executed from the beginning to the end, not partially and not interrupted
- there is also consistency, where a database should never end up in an inconsistent state, like
not having a primary key or using the wrong key, not even temporarily
- there is isolation, closely related to atomic operations, which says that different transactions
should not step on each other's toes, especially when they work with the same data, or the
same tables. Once you start a transaction, finish it, don't allow another one to start. This way we
avoid situations where another transactions catches the database in a bad state due to the fact
that it's in the middle of executing another transaction.
and there's also durability - don't lose data, make sure it persists. This also involves the
underlying infrastructure, of course, but from the database's perspective, durability means the
ability to recover transactions or to rollback to a previous stable state in case a transaction fails
in the middle, there is a power loss or the database engine crashes. Errors might happen but a
durable datbase needs to make sur they won't destroy the entire data.
and btw, to remeber what we just said easier, remember that database engines must follow an
ACID model – A C I D
atomic, consistent, isolated, durable.
And even if up until this point you might have thought about this concept along the lines of – so
this word document belongs to this department and needa secret clearance, while this excel
spreadsheet is publicly hosted and required no clearance – well, now we’re talking about
databases and it’s pretty obvious that we need to apply the same principles to data inside of a
database.
It’s a bit more difficult to properly classify and restrict access to data inside of a database
because all data is in one place and a smart query might reveal it right away to the wrong
person. With files, it’s a bit easier as you generally have different storage locations and folders
with different permissions and you can easily separate the secret stuff from the non-secret files.
But with databases, we need the database engine to be aware of these security requirements.
We call these engines “multilevel security databases”
simply because, well, not everything in a database is treated equally, so it needs to be
compartmentalized into multiple security levels – pretty self-explanatory. And when we cannot
do that, what we end up with is called “database contamination”, that is when data with different
security levels is incorrectly mixed and you basically lose control over who can access which
security level data.
And even more bad news: some database systems don’t support multi-level security, so all you
can to is try to patch sort of front-end or query broker on top of it to filter requests from users
trying to reach data that doesn’t match their security level.
Another method to ensure this multi-level security, regardless if the datbase management
system supports it or not, is to use views.
A view is just like a special select query than can span multiple tables and its purpose is to
return the type of data that a specific user should have access to. For example, you could have
a table that stores a lot o personally identifiable information – someone in your company who
needs to perform some queries on that table, for statistical purposes will need access to the
table, but with a view, you could give them access to a restricted, sanitized version of that table
where the PII information is stripped. So the view acts like another table that is dynamically
generated when needed. They are slower but they can solve this multi-level security issue quite
elegantly.
Concurrency is another issue not just with multi-level security databases, but with any, place,
thing that stores data and can be accessed by multiple people or apps at the same time.
Concurrency is basically an integrity protection mechanism. We either allow multiple queries at
the same time and we make sure they don’t break the database, or we just don’t allow them.
Either way, consistency is always more important than any performance-oriented bells and
whistles.
Without consistency, you might have one update overwriting what another update just did,
basically cancelling the other update, or if a query doesn’t fully execute, it might leave the
database in a partially updated state, which is a corrupted database.
How to ensure consistency? Well, the hard way is to evaluate every request pair, let’s say, see
what each is aboutt o affect, and then decide if they can be safely executed at the same time.
This is difficult, costly, sometimes impossible due to the complexity of some queries.
The easy way – locking mechanisms. If i start making a change to a database, i first put a lock
on it, saying that from now on, i’m the only one working with this database. If you want to work
with it you have 2 options: you wait for me to release the lock or you give up. It’s up to you.
Aggregation is another aspect of security and it’s a very generic term but in most cases,
aggregating data means selecting data from multiple sources and mashing them up in a single
result. A simple db join is a type of aggregation. A count function, an average function, they are
also examples. From a sec perspective, this can be problematic sometimes as perhaps you
might have designed you sec model so that specific users don’t have access to specific table
entries or specific fields in them, At the same time, if you don’t consider aggregation functions in
your security policy, you might find out that a smart user is able to deduce a lot of stuff from a
combination of sum, average, count functions, without ever seeing the actual items being
added, averaged, or counted. That’s using low-security data to deduce high-security
information. Pretty smart. And what we just described is also called inference. So inference
becomes a risk when using aggregation functions.
So what’s the Defense against these types of attacks? Well, be careful with db permissions. If a
user is able to infer something, they can do it because the security policy allows them to. So it’s
not even a policy violation when it happens.
Knowledge-Based Systems
What are knowledge based systems? Well, a manifestation, a result of kbs is what we call today
AI and ML.
While most computer-orinted advances have been focused on speed, doing stuff faster than a
human being would be able to do, kbs are somehow going the opposite direction by trying to
simulate human thinking, human reasoning, even human creativity.
At the simplest level if we can even call it that, are expert systems.
they are made up of a body of knowledge – basically, just raw data with a lot of if-then
statements targeted at covering all possible decision trees, and an inference engine which
naviagates the if-then maze using a bit of fuzzy logic starting from whatever input is being
provided.
While all the decisions in an expert system are static, they are set in stone, we might also need
some systems capable of adapting, learning from provided data or actual examples. And that’s
machine learning, based on models that are constantly or can be constantly updated
How do those updates happen? Well, we could do it through a supervised method, where the
analist provides inputs and also the desired solutions, so the system learns what it should
behave like in the context of those inputs.
Unsupervised learning, otoh, is training on data sets which do not contain the right answers but
instead that data can be used for classification, trends, and so on, not necessarily problem-
solving.
Finally, neural networks are the closest thing we can design that resembles the human brain as
far as decision making goes, so it only takes the good-ish stuff of the human brain... As opposed
to a KS, where a chain of if-else rules are being parsed to reach an answer, in a neural network
the decision results are fed into each other and at the end produce a result that might not be the
most correct answer in the world, but it is probably pretty close. And it might be an acceptable
answer even to problems never seen before, on which the neural network has never been
trained. I know i know, being confident in an answer you know nothing about sound a bit like the
dunning-kruger effect but the good part in a neural network is that you can fine-tune it. With
people, not so much. I think it was einstein who said that it’s easier to split and atom than a
prejudice. But we digress when instead we should be wrapping things up.