LESSON 7-SQL Injection
LESSON 7-SQL Injection
SQL Injection
SQL injections are one of the most frequent and dangerous attacks against databases and
web applications. According to OWASP, in 2017 injection attacks where the number 1
threat to web security:
SQL injection attacks can cause severe damage to database systems including denial of
service states and data leaks. They can also be used for privilege escalation, for example
exploiting user authentication code vulnerabilities.
This guide will explain what SQL injections are and how you can be completely safe from
them.
SQL is the standard language for relational database statements, also called queries. This
is a simple query for retrieving the rows from a table named “users”:
Web applications use databases and SQL queries extensively. User authentication, for
example, is usually implemented by executing one or more queries to check if the provided
username and password are correct. Let’s use this as an example to understand how SQL
injection attacks work.
Suppose that you want to search a table called users for the username and password sent
by the remote client. If a row with both values exists then the remote client is authenticated.
(Please note that this is not how you should authenticate users, it’s just an example to show
injection attacks at work).
$query = "SELECT * FROM users WHERE username = '" . $username . "' AND password
= '" . $password . "'";
Now suppose that a malicious remote client sends the string admin’ — as username (and
an empty string as password). The query in the previous example would become like this:
SELECT * FROM users WHERE username = 'admin' -- ' AND password = ''
The — string is used to start a comment inside a query. That means that the database will
ignore everything after it. Therefore, the query will return the row relative to the
“admin” user no matter what the password is.
With this trick, the remote client has successfully authenticated itself as the admin user even
without knowing the password. This is an example of a privilege escalation attack.
In addition, depending on the information displayed by the web site, this attack could also
cause a data leak and let the attacker obtain some admin’s private information.
But a malicious client could do much more damage than that. This is the query that would
be executed if the client sends the string ‘; DROP table users; — as username:
SELECT * FROM users WHERE username = ''; DROP TABLE users; -- ' AND password =
''
This query could potentially destroy the whole users table (the SQL command DROP
TABLE is used for that purpose). Not fun, right? Such an attack would cause severe
damage and would probably lead to a denial of service state, because the web application
would likely stop working.
You may think that an attacker is unlikely to know the database structure or the PHP source
code, and that it would be difficult to “guess” the right syntax to use for a successful attack.
Truth is, the secrecy of a system’s structure doesn’t increase its security very much. This
concept is called “security through obscurity” and it’s known for being a very poor security
paradigm.
[easy-tweet tweet=”SQL injection attacks: the complete guide to security and prevention.”
hashtags=”PHP”]
Let’s go back to the previous authentication query. Now suppose that the attacker wants to
retrieve the name of the current database, assuming the application uses MySQL. If the web
page doesn’t display the database name anywhere, how can the attacker retrieve it?
The idea is to look at the system behaviour in response to blind injection attacks and deduce
the database name (or any other information the attacker wants to retrieve) from that.
Let’s see how this can be done.
Suppose that the attacker sends the string admin’ AND 0 > 1 — as username. The resulting
query would be:
SELECT * FROM users WHERE username = 'admin' AND 0 > 1 -- AND password = ''
The comparison “1 > 0” is obviously false so, even if a row with username admin exists, this
query won’t return it because the second comparison cannot be satisfied.
Here’s the trick: the attacker can inject any SQL comparison and look at the result. If
the admin row is returned (and the authentication is successful) it means that the
comparison is TRUE. If the authentication fails, then the comparison must be FALSE.
In this specific case the attacker can create a comparison using two MySQL
functions: DATABASE() and SUBSTRING().
DATABASE() returns the current database name, while SUBSTRING() returns a part of a
string.
If the username is set as admin’ AND SUBSTRING(DATABASE(), 1, 1) = ‘a’ — the query
becomes:
This query will return the admin‘s row only if the first letter of the current database is “a”. By
looking at the result, and by checking all the possible letters and incrementing
the SUBSTRING() index, the attacker can easily retrieve the full database name.
The same kind of attack can be used to retrieve a lot of different information about the
database… right from the database itself.
2. The real attack takes place when the data is later retrieved from the database.
In the following example, the attacker first registers a new account on the target website
using a crafted username (again, admin’ —). The application handles the username
insertion properly:
<?php
In another part of the application, the remote client is authenticated with a login cookie and
the username is retrieved from the database. The same username is then used in another
query, but this time the application fails to handle it properly.
The error behind this vulnerability is that the developer assumes that a value taken directly
from the database can be trusted and doesn’t need to be validated.
As you can see in the last comment of the following example, the result is that the application
shows the admin‘s information to the attacker:
<?php
/* Login cookie */
$cookie = $_COOKIE['login'];
/* The following query retrieves some user's data from another table using the username as
key.
The $row data is trusted and considered safe, but it is not...
*/
$data_query = "SELECT * FROM user_data WHERE username = '" . $row['username'] . "'";
Second order injection attacks are caused by a poor trust assessment analysis (we’ll talk
more about that in a minute).
Since the local database is a trusted source, the developer wrongfully trusts the
database data as well, without applying the proper validation and sanitization steps.
The result is an SQL injection attack executed “from the inside”, just like a trojan horse.
Database related code must pass through three steps to be safe from injection attacks:
1. Trust assessment
2. Validation
3. Sanitization
Each step is important, and only by properly applying all of them your code can be truly
secure.
Trust assessment
SQL queries are usually created dynamically using one or more variables.
These variables can contain row values for insertion, search and update statements, but
also column, table and database names.
Each of these variables has a different level of trust. The more you know about the
variable’s content, the more you can trust it.
Variables created programmatically by the application itself have the highest level of trust,
because you know exactly its possible values. An example is a boolean variable initially set
to ‘0’ that can be set to ‘1’ inside an if control structure: you can be sure that that variable is
either 0 or 1.
On the other side, variables that contain data from the request string (like
the $_REQUEST array in PHP) have the lowest level of trust, because they can contain any
value.
It’s a common mistake to trust a request parameter (a value from the request string) just
because it comes from a known page.
Suppose that your web application creates an HTML page with a form and a select input
with a list of possible options. The form sends the input data to a PHP backend.
This backend may mistakenly trust the select value, expecting it to be one of the options set
in the HTML page. However, an attacker could easily change those values simply by
changing the HTML code client-side.
In this case, the trust assessment step should mark the variables containing
the select data as completely untrusted.
Another common trust assessment error is related to information retrieved from a local
source like a database or a filesystem, just like we saw before in the second order attack
example. The database itself (or the filesystem, if we are reading a local file) can be a trusted
service, but that doesn’t mean that the data it contains can be trusted too. Sessions’
data should not be considered completely safe either for the same reason.
You should also be very careful with data retrieved from external sources, especially remote
web services or APIs. This is true for “machine-to-machine” services too (which usually
make use of JSON or XML protocols).
The purpose of trust assessment is to calibrate the validation step: the less you can
trust a source, the more stringent the validation must be.
Validation
The purpose of the validation step is to make sure that a variable is what the application is
expecting it to be, and that its properties and its content are valid.
The validation step can be further divided into two substeps: filtering and verification.
In the filtering substep, the data to be validated is checked against some filters. Each filter
can either modify the data to make it compliant (for example removing invalid characters) or
marking the data as invalid altogether.
These are some of the most common filters:
• Type check
Checks whether the value is a number, a positive integer, a string, a specific object
etc.
• Length
Checks the value’s length (for strings) or numeric value (for numbers) against a
minimum and a maximum value.
• Elements blacklist
Checks each value’s elements (for example a string’s characters) against a
blacklist; elements from the blacklist are removed, modified or their presence marks
the whole value as invalid.
• Values blacklist
Checks the whole value (string, number etc.) against a blacklist; if there is a match,
the value is marked as invalid.
• Elements/values whitelist
Whitelists are similar to blacklists, but instead of filtering out elements or values
they only accept certain elements or values; they are probably the most effective
kind of filter, but they cannot be used in every context because of the need to
create a list of all possible values.
• Size check
Checks the value’s size, depending on its type (bytes if the variable is a binary file,
the number of elements if it’s an array etc.); this is especially important for binary
data to prevent overflow errors.
If the filtering substep is about the value’s properties, the verification substep is more about
the value itself. This substep answers the question: can we accept this specific value (even
if it passed all the filters)?
An example is a PHP script backend that saves users’ settings on the database: it reads
the “user id” and the “settings” parameters from the request string and then it updates the
settings of the user identified by the user id.
Even if the user id value is formally correct (meaning that it passed all the filtering checks),
the script should also verify that a valid user with that id actually exists on the database,
otherwise the database integrity could be compromised.
Duplicate check is another common verification step. For example, before adding a new
user on the database, you should check if another user with the same username already
exists, even if the requested username passed the filtering step.
[easy-tweet tweet=”SQL injections: the three essential steps for security and prevention.”
hashtags=”PHP”]
Sanitization
Modern extensions, like PDO for PHP, provide a better sanitization system called prepared
statements.
Unlike escaping, prepared statements can perform specific sanitization operations
depending on the value type and, at least in theory, on the query type. When possible, it’s
always better to use them.
The following video explains the difference between escaping and prepared statements
after briefly introducing SQL injection attacks:
In PHP applications, the main SQL security weak point are the dynamic variables used in
SQL statements.
(There are other risk factors like user authentication, denial of service etc., but they are not
strictly related to SQL injection).
The general rule is that every variable used inside an SQL statement must pass through all
the three security steps.
Let’s see some examples.
Trust assessment
A variable’s level of trust depends on the source of its content. The main question to find
out how much you can trust a variable is: where did its content come from?
Values set programmatically by your own application are the only ones you can trust
completely, like in this example:
<?php
if ($get_address)
{
$columns .= ', address';
}
Note that Sessions variables too need to pass all the security steps, including trust
assessment. How much you can trust a Sessions variable depends on its content source,
just like “normal” variables.
Local or remote files and other resources (FTP, SCP, POP etc.) cannot be completely
trusted either. Who has access to that file? Could it have been modified? Is the remote FTP
server secure? There is too uncertainty to trust that data.
Validation
The vanilla PHP toolset includes all the necessary operators and functions for variable
validation. To filter a variable (remember the different types of filter we saw before?) you
can use three different powerful tools:
1. Type casting
2. Information functions
3. Manipulation functions
Type casting is the semi-automatic conversion process from one type to another. You can
use type casting as a very effective type filter. For example, by casting a string into
an integer you can be sure to have a valid integer number, without the need to perform
further checks. As far as validation is concerned, the most common used type casting is
from a string into a number (either an integer or a float).
Information functions are a large class of functions that return some specific information
about a variable.
Some examples are string functions like mb_strlen() for checking a string
length, mb_strpos() for searching for substrings, or ctype_digit() and is_numeric() for
type checking.
Comparison operators (like “<” or “>”) can also be included in this list.
In the following example, mb_strpos() is use to check a string against a characters whitelist.
If any of the string’s characters is different from the ones in the whitelist, then the string is
not accepted:
<?php
/* Check... */
for ($i = 0; $i < mb_strlen($val); $i++)
{
if (mb_strpos($accept, mb_substr($val, $i, 1)) === FALSE)
{
echo 'String not accepted.';
die();
}
}
It’s up to you to decide which functions to use. In some cases it’s better not to accept invalid
values, for example when saving personal data (like an e-mail address or a phone
number). Information functions are usually enough for that purpose.
On the other hand, if you prefer to adjust an invalid value then you may need to use some
of the manipulation functions.
Whatever your validation, keep in mind that you can never consider any client-side
check (the ones enforced by HTML, by JavaScript etc.) as part of the validation step,
because they can be easily circumvented by the client.
The second validation substep, verification, doesn’t have a specific PHP implementation.
It really depends on the context. But I think it’s still useful to see an example. Let’s
complete the add_account() function from User Authentication lesson by adding a proper
verification step:
/* Password hash */
$hash = password_hash($password, PASSWORD_DEFAULT);
if (is_array($check_res)
{
echo 'Username already exists!';
return FALSE;
}
try
{
/* Add the new account on the database (it's a good idea to check first if the username
already exists) */
$sql = 'INSERT INTO accounts (account_name, account_password, account_enabled,
account_expiry) VALUES (?, ?, ?, ?)';
$st = $db->prepare($sql);
$st->execute(array($username, $hash, '1', '1999-01-01'));
}
catch (PDOException $e)
{
/* Exception (SQL error) */
echo $e->getMessage();
return FALSE;
}
It’s very clear from the previous example that filtering must be done first, because by the
time we verify a value we may need to already use it inside an SQL query.
Sanitization
Variables sanitization is very easy with PHP. Many PHP database extensions provide
some sanitization mechanisms, which is usually implemented
through escaping or prepared statements.
PHP also provides some database abstraction layers. They are extensions that can be
used to access different database types using the same syntax and functionalities.
The most used of these extension is PDO. Unlike mysqli or pgsql, PDO is just a container
and you need to select the specific database driver you want to use. PDO supports MySQL,
PostgreSQL, Oracle, MS SQL and other databases.
I suggest you to use PDO instead of specific extensions as it will make your code much
more editable and scalable. You can look at the examples in my User Authentication tutorial
to see PDO at work.
PDO prepared statements can be used for column values, but not for other elements like
column names or SQL commands (like SELECT, WHERE etc.).
When you need to handle these elements dynamically (with variables) it’s very important to
pay attention to the trust assessment and verification steps, because you won’t be able
to perform a proper sanitization. (You can use the PDO::quote() escaping method, but it
isn’t as secure as using prepared statements). In theses cases you should use a fully trusted
variable or at least perform a very strict verification, possibly using whitelists.
Whatever extension and sanitization mechanism you are using, be sure to avoid
these common errors:
[easy-tweet tweet=”How to secure your PHP code from SQL injection attacks.”
hashtags=”PHP”]
You can shield your home with the best door lock, but no lock is unpickable. This is why it’s
a good idea to install an alarm system too, in case a thief manages to enter your home. it’s
also wise to keep your valuables in a bank vault instead of inside your home wall safe, so
that in the worst case scenario you won’t lose them.
If you follow this guide’s guideline you will be as safe from SQL injection as you can be, but
what if an attacker manages to break your defenses?
In this last chapter you will find three simple steps you can take to dramatically mitigate
the damage you could sustain from an SQL injection attack.
Many web applications use just one database account. The most common work flow is:
1. create a “db.php” file with the connection code;
2. include this file at the top of every application page;
3. use the same database connection for the entire application.
I agree that it’s a good idea to keep the connection code inside a single PHP script (nobody
wants code duplication, right?)
The database account, however, should be dynamically set depending on which
database operation each script needs to perform.
<?php
In this case the database account is always the same, regardless of the actual permission
requirements of the script where this code is included.
A better idea would be to define the database username in the calling script (for example in
a variable like $db_user):
<?php
/* It's a good idea to store all the accounts' passwords here instead of the calling scripts
(or, even better, in a separate file) */
$db_accounts = array(
'account1' => 'passwd1',
'account2' => 'passwd2');
It’s simple. If you always use the same account, you have no choice but to give that account
all the database permissions that your application may require, regardless of where that
account is being used.
By using multiple accounts instead, you can set specific permissions for each one of them.
This way, a PHP page which needs to perform delete operations on the database will use a
database account with such permissions, but a PHP page which only needs to fetch data
can use a less dangerous account with read-only grants.
Keep in mind that SQL injection attacks are limited by the privileges of the database
account the target page is using.
If each page uses the necessary database privileges only, then the damage from injection
attacks will be dramatically reduced.
If you already know the maximum rows a query will return, it’s a good idea to set
a limit inside the query itself.
The most In many SQL databases (including MySQL), this can be done using the LIMIT
directive.
Limiting a query result can keep you safe from certain attack side effects, for example:
• in data leaks, the output will be limited;
• the PHP application will not have query results with an unexpected number of rows;
• if you can limit DELETE queries (MySQL can) you will limit the effects of destructive
attacks;
• limiting update statements can limit the number of rows affected by attacks.
Keep in mind that limiting a query does not guarantee that attacks will be limited too. In fact,
many attacks can exclude LIMIT (and other) directives from the query execution.
The good news is that most (if not all) PHP database extensions either don’t support multiple
statements or have specific functions for them.
For example, to use multiple statements with the MySQLi extension you need to use specific
methods like mysqli::multi_query().
While you may find convenient to use multiple statements in some circumstances, you
should avoid doing that unless strictly necessary.
A multiple-statements query is potentially much more dangerous than a single one. When
possible, use separate, single queries instead.