0% found this document useful (0 votes)
47 views

AE-Net Novel Autoencoder-Based Deep Features for SQL Injection Attack Detection

Uploaded by

muhammedlatheef
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

AE-Net Novel Autoencoder-Based Deep Features for SQL Injection Attack Detection

Uploaded by

muhammedlatheef
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Received 30 October 2023, accepted 27 November 2023, date of publication 28 November 2023,

date of current version 6 December 2023.


Digital Object Identifier 10.1109/ACCESS.2023.3337645

AE-Net: Novel Autoencoder-Based Deep Features


for SQL Injection Attack Detection
NISREAN THALJI 1 , ALI RAZA 2 , MOHAMMAD SHARIFUL ISLAM 3,

NAGWAN ABDEL SAMEE 4 , AND MONA M. JAMJOOM 5


1 Department of Robotics and Artificial Intelligence, Jadara University, Irbid 21110, Jordan
2 Institute
of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan 64200, Pakistan
3 Department of Computer Science and Telecommunication Engineering, Noakhali Science and Technology University, Chattogram 3814, Bangladesh
4 Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671,

Saudi Arabia
5 Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia

Corresponding author: Mona M. Jamjoom ([email protected])


This work was supported by Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia, through Researchers Supporting under
Project PNURSP2023R104.

ABSTRACT Structured Query Language (SQL) injection attacks represent a critical threat to
database-driven applications and systems, exploiting vulnerabilities in input fields to inject malicious SQL
code into database queries. This unauthorized access enables attackers to manipulate, retrieve, or even delete
sensitive data. The unauthorized access through SQL injection attacks underscores the critical importance
of robust Artificial Intelligence (AI) based security measures to safeguard against SQL injection attacks.
This study’s primary objective is the automated and timely detection of SQL injection attacks through
AI without human intervention. Utilizing a preprocessed database of 46,392 SQL queries, we introduce
a novel optimized approach, the Autoencoder network (AE-Net), for automatic feature engineering. The
proposed AE-Net extracts new high-level deep features from SQL textual data, subsequently input into
machine learning models for performance evaluations. Extensive experimental evaluation reveals that the
extreme gradient boosting classifier outperforms existing studies with an impressive k-fold accuracy score of
0.99 for SQL injection detection. Each applied learning approach’s performance is further enhanced through
hyperparameter tuning and validated via k-fold cross-validation. Additionally, statistical t-test analysis is
applied to assess performance variations. Our innovative research has the potential to revolutionize the timely
detection of SQL injection attacks, benefiting security specialists and organizations.

INDEX TERMS Autoencoder optimization, deep learning, feature engineering, machine learning, SQL
injection.

I. INTRODUCTION alteration or deletion of critical data to even complete system


SQL injection attacks are a prevalent and serious security compromise [3], [4]. These attacks can have far-reaching
threat in web applications and databases [1]. The attacker implications for individuals and organizations, potentially
exploits vulnerabilities in software systems that interact leading to financial losses, reputational damage, and legal
with databases through SQL injection. This is achieved by consequences [5]. Therefore, understanding the mechanics
injecting malicious SQL code into user-input fields, which is of SQL injection attacks and implementing robust AI-based
then executed by the application’s database [2]. The conse- security measures is imperative in ensuring the integrity and
quences of a successful SQL injection attack can be severe, confidentiality of databases and web applications.
ranging from unauthorized access to sensitive information, SQL injection is a prominent vulnerability in web appli-
cations that arises from improper handling of user-provided
The associate editor coordinating the review of this manuscript and input in SQL queries [6]. This attack method is insidious as it
approving it for publication was Wei Liu. can bypass conventional security measures and directly target
2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 135507
N. Thalji et al.: AE-Net: Novel Autoencoder-Based Deep Features for SQL Injection Attack Detection

the underlying database. Therefore, robust AI-based input allows us to identify gaps, challenges, and opportunities that
validation mechanisms and parameterized queries are crucial may inform the development of more effective detection
countermeasures in thwarting SQL injection attacks [7]. strategies.
SQL queries exhibit distinct properties that influence their The study [11] focuses on SQL Injection Attacks, which
performance and functionality. One crucial property is the pose a significant threat to the security and integrity
complexity of the query, which pertains to the number of of online applications utilizing databases. The analysis
operations and joins involved in retrieving or modifying employs a dataset of SQL commands for conducting research
data. Subqueries or nested queries introduce a hierarchical experiments, utilizing a combination of supervised machine
structure, influencing the execution plan. An example SQL learning algorithms and a Convolutional Neural Network
query for a database operation is: (CNN) model [12]. The reported accuracy of 0.97 was
SELECT i d FROM l o g s WHERE n i c = ’ 1234 ’ ; impressive, indicating strong detection capabilities using the
proposed model. Nevertheless, subjecting the method to
AI-based machine learning methods have emerged as further scrutiny and validation in real-world situations is
powerful tools in cybersecurity, particularly for detecting and imperative to assess its robustness.
preventing SQL injection attacks [8]. By leveraging advanced The study [13] addresses the critical issue of safeguard-
algorithms and deep learning techniques, AI models can ing web applications against SQL injection attacks. The
autonomously learn patterns indicative of SQL injection researchers utilized a dataset of flow data from various
attempts [9]. This empowers organizations to identify and SQL injection attacks on standard database engines and
mitigate potential threats proactively, bolstering their security applied a range of machine learning algorithms to tackle
posture and safeguarding sensitive information from unau- the problem. While the model based on logistic regression
thorized access or manipulation [10]. This research proposes achieved an impressive detection rate of over 97% with
a novel AE-Net approach for automatic feature engineering. a false alarm rate of less than 0.07%, concerns arise
The proposed AE-Net extracts high-level deep features from regarding its overall effectiveness due to the absence of
SQL textual query data, subsequently input into machine comprehensive performance metrics such as precision and
learning models for performance evaluations. The newly recall. To determine the applicability of this approach beyond
created features by the proposed AE-Net operate on the the mentioned database engines, it was essential to evaluate
principle of encoding and decoding, allowing it to learn to its practical usefulness. Further assessment and increased
represent data in a more compact and meaningful form. transparency in methodology are necessary to ascertain this
Our primary research contributions for SQL injection approach’s true viability and generalization in real-world
attack detection are followed as: settings.
• A novel optimized neural network, AE-Net, is pro-
In this [14] research, SQL injection detection was pro-
posed for automatic feature engineering, extracting
posed. This study addresses the challenge of mitigating
new high-level deep features from SQL textual query
the complexity and timeliness issues associated with SQL
data. The newly created feature set is then utilized for
injection attacks, which arise from variations in deployment
evaluating applied methods.
environments and backend database engine syntax. The
• Four advanced machine learning and deep learning-
article discusses the application of machine learning algo-
based approaches are employed for performance
rithms. The proposed feature ratio method for extracting and
evaluations. Performance is further enhanced through
detecting SQL injection attack payloads was also introduced.
hyperparameter tuning and validated via k-fold
It is worth noting that the reported average f1 Score of
cross-validation. Statistical t-tests and computational
96.29% appears to be lower, indicating a potential decrease in
complexity analysis are also applied to assess
accuracy and precision when compared to previous methods.
performance variations.
The remaining manuscript is structured as follows: In [15], the authors address the ongoing issue of SQL
Injection attacks in web applications, which can significantly
Section II provides a comprehensive analysis of past studies
compromise user data and manipulate database manage-
applied to SQL injection detection. Section III outlines
ment systems, posing a significant cybersecurity threat.
the workflow of the applied methods. Section IV presents
The research acknowledges the limitations of traditional
the outcomes of the machine learning and deep learning
rule-based methods and the constantly evolving nature
approaches employed. Finally, we summarize our research
of SQL attacks. The SQL query commands dataset was
findings in Section V.
utilized for building models. The research proposed a hybrid
II. LITERATURE ANALYSIS CNN-BiLSTM approach to predict SQL attacks [16]. While
This literature analysis section for detecting SQL injection the study reports a promising accuracy rate of 98%, it lacks
queries with machine learning is a foundational component critical information on precision, recall, and F1 score, making
of this study. We comprehensively understand this domain’s it challenging to assess its overall performance comprehen-
prevailing methodologies, techniques, and advancements by sively. Additionally, the proposed model’s generalizability
synthesizing and critiquing existing literature. This critical across various attack scenarios or datasets was not thoroughly
analysis provides context for our research approach and evaluated, which limits our confidence in its practicality.

135508 VOLUME 11, 2023


N. Thalji et al.: AE-Net: Novel Autoencoder-Based Deep Features for SQL Injection Attack Detection

TABLE 1. The analyzed literature summary analysis.

This study [17] focuses on the critical issue of SQL which can compromise the model’s accuracy. The research
Injection, which poses a significant threat to database-based presents the development of two IDS models based on CNNs
applications across various platforms and devices. The to improve accuracy and evaluates their reliability through
research points out a common weakness in previous SQL seven adversarial attack scenarios. Although the reported
injection models, which struggle to identify new attack accuracy improvements are 97.51% and 95.43%, the study
patterns and rely heavily on past experiences or training lacks comprehensive performance metrics such as precision
data. In contrast, the proposed model demonstrates the and recall, which limits the evaluation of the IDS’s overall
ability to detect SQL injection by analyzing input patterns, effectiveness. The observation that adversarial attacks cause
a promising innovation. The model autonomously detects drops in accuracy is a concern, but the applied defense method
various injection techniques, with feature extraction and shows some recovery. Nonetheless, the accuracy scores after
selection handled by the model, making it more user-friendly. adversarial attacks, ranging from 78.12% to 89.40%, indicate
The model’s scalability across a range of applications that the defense method may be helpful but may not fully
is emphasized. The MPL model achieved an impressive restore the model’s reliability.
cross-validated accuracy of 98%, with precision and recall
scores of 98% and 97%, respectively, indicating strong A. RESEARCH GAP
detection capabilities. Through a comprehensive analysis of the existing literature,
The study [18] proposes a new method for detecting we have identified the following research gaps and limitations
SQL injection attacks using a recurrent neural network. that we have addressed in our proposed research:
This research highlights the importance of addressing SQL • Classical feature extraction approaches were employed
injection’s complex and evolving attack patterns. A publicly to represent SQL query textual data.
available Kaggle SQL Injection dataset was used in this • Also, classical machine learning-based techniques were
study. The reported accuracy and f1-score of 94% and 92%, used for SQL injection attack detection.
respectively, are considered poor compared to state-of-the-art • Existing methods have high error rates when detecting
studies. SQL injection attacks.
This study [19] focuses on the persistent and damaging
issue of SQL injection attacks on web applications, a critical III. PROPOSED METHODOLOGY
problem listed in the open web application security project This section provides a comprehensive discussion of the
top 10. The complexity of detecting these attacks is acknowl- materials and methods employed in our research experiments.
edged due to the variety of methods, patterns, and attack We analyze the SQL queries dataset and the proposed feature
loads. To improve the accuracy and efficiency of detection, engineering. Additionally, we utilize machine learning and
the study proposes synBERT, a semantic learning-based deep learning techniques to evaluate the performance results.
detection model. This model captures semantic information Figure 1 illustrates our novel proposed methodology
from SQL statements into embedding vectors. The study workflow for SQL injection attack detection. Initially,
includes a diverse set of SQL datasets for evaluation and we acquired and prepared a benchmark dataset based on
reports promising results with consistently high accuracy, SQL query-based textual data for experimental evaluations.
even on untrained models, reaching 90%. However, the To capture high-level feature information from the dataset,
performance scores are lower in this study. we proposed a novel feature engineering approach. This
The focus of this study [20] is to create a trustworthy proposed approach extracts high-level deep features and
Intrusion Detection System (IDS) that utilizes convolutional creates a new feature set. The newly created feature set is
neural networks (CNNs) to enhance network security and then divided into training and testing portions. The training
user safety online. However, the study acknowledges a set comprises 80% of the data, while the testing set comprises
significant drawback - susceptibility to adversarial attacks, 20%. Next, we built advanced machine learning and deep

VOLUME 11, 2023 135509


N. Thalji et al.: AE-Net: Novel Autoencoder-Based Deep Features for SQL Injection Attack Detection

FIGURE 1. The workflow architectural analysis of our proposed approach for detecting SQL injection attacks.

target class distribution analysis of the dataset is depicted in


Figure 2. The analysis reveals a nearly equal distribution of
data, with 23,555 queries for the ‘Attack’ label and 22,837
queries for the ‘Benign’ label.

B. NOVEL PROPOSED FEATURE ENGINEERING


This section analyzes our novel proposed optimized neural
network AE-Net, as illustrated in Figure 3. The anal-
ysis demonstrates by employing the unique autoencoder
mechanism, our proposed approach extracts new high-level
automatic deep features from SQL textual query data. The
newly created feature set is subsequently employed to
evaluate the applied methods. Additionally, the novel AE-Net
FIGURE 2. The target label distributions analysis. layered architecture, which we propose, is analyzed and
presented in Table 2.
learning models using the training datasets. The performance The basic mathematical equation for an autoencoder
of each applied hyperparameterized approach is evaluated method can be represented as follows:
using unseen testing data. The AI approach that outperforms h = f (Wx + b) (1)
others with high-performance scores is then utilized for the
x̂ = g(Wh + c) (2)
detection of SQL injection attacks.
where:
A. SQL QUERY TEXTUAL DATA x is the input vector
We utilized a publicly available SQL query dataset [21]
h is the hidden representation (feature vector)
for our research experiments. The dataset comprises three
files: ‘‘sqli.csv,’’ ‘‘sqliv2.csv,’’ and ‘‘SQLiV3.csv,’’ which we x̂ is the reconstructed input
combined. For initial preprocessing, we encoded the target f is the activation function for the encoder
labels ‘Attack’ (1) and ‘Benign’ (0). The query textural g is the activation function for the decoder
data was converted into string format. We removed queries
W is the weight matrix
containing only two words. Additionally, null rows were
dropped from the dataset. Finally, the preprocessed database b is the bias vector for the encoder
of 46,392 SQL queries was used for further experiments. The c is the bias vector for the decoder

135510 VOLUME 11, 2023


N. Thalji et al.: AE-Net: Novel Autoencoder-Based Deep Features for SQL Injection Attack Detection

FIGURE 3. The architectural workflow analysis of novel proposed deep feature extraction approach.

The mathematical working mechanism of the proposed C. APPLIED ARTIFICIAL INTELLIGENCE APPROACHES
AE-Net approach for feature extraction in detecting SQL AI approaches, rooted explicitly in machine learning and
injection is represented as follows: deep learning techniques [22], have emerged as formidable
Minimize: L(X , g(f (X ))) tools in the realm of SQL injection attack detection.
Machine learning models are trained to recognize patterns
Subject to: g(f (X )) = X
in data [23], making them adept at identifying anomalous
where: SQL queries indicative of an attack. Techniques such as
X is the input data, Support Vector Machines (SVM), Random Forests (RF),
and k-Nearest Neighbors (KNN) have shown promise in
f (·) represents the encoder function, this domain. Additionally, deep learning methods, notably
g(·) represents the decoder function, Recurrent Neural Networks (RNNs) and Long Short-Term
L(·) is the loss function. Memory Networks (LSTM), excel in capturing intricate
relationships within sequences of SQL queries, enabling
them to discern between legitimate and malicious requests
TABLE 2. The novel proposed AE-Net layered architecture analysis.
effectively.
Deep learning-based feature engineering plays a piv-
otal role in extracting relevant information from raw
SQL data to facilitate learning [24]. In this research,
we employ autoencoders as unsupervised learning tools
for extracting meaningful features, thereby enhancing the
overall accuracy of SQL injection detection systems. The
amalgamation of AI-driven techniques with robust fea-
The algorithm 1 shows the step-by-step workflow of the ture engineering holds tremendous potential in fortifying
proposed automatic feature engineering approach. defenses against SQL injection attacks, offering a proac-
tive stance in safeguarding critical databases and appli-
Algorithm 1 AE-Net Algorithm cations from such security threats. The detailed working
Input: SQL query textual data. mechanisms of the applied methods are demonstrated
Output: New rich level deep features. in Table 3.

initiate; D. HYPERPARAMETER TUNING


1- Fen ←− Enfeatures (Sd) // Sd ϵ SQL queries, here Sd is Through the recursive process of training and evaluations,
original SQL data and Fen is the deep encoded feature set. we have conducted hyperparameter tuning and determined
2- Fde ←− Defeatures (Fen ) // here Fen is encoded features the best-fit hyperparameter for each applied method. The
and Fde is the deep decoded feature set. final selected best-fit hyperparameters are demonstrated
3- Ffeatues ←− Fde //hereFfeatues is final deep features set in Table 4. The best-fit hyperparameter helps us pre-
used for SQL injection detection. vent overfitting issues and enhance SQL injection attack
end; detection.

VOLUME 11, 2023 135511


N. Thalji et al.: AE-Net: Novel Autoencoder-Based Deep Features for SQL Injection Attack Detection

TABLE 3. The workflow mechanism analysis of applied machine learning and deep learning models for SQL injection detection.

TABLE 4. The hyperparameter tuning analyis of applied methods. are used with the following equations:
TP + FP
Accuracy = (3)
TP + FP + TN + FN
TP
Precision = (4)
TP + FP
TP
Recall = (5)
TP + FN
Precision × Recall
F1 score = 2 × (6)
Precision + Recall
where, TP, FP, TN , and FN represent true positive, false
positive, true negative, and false negative, respectively.
IV. RESULTS AND DISCUSSIONS
The research study’s results and discussions section focuses B. PERFORMANCE ANALYSIS WITH BOW FEATURES
on detecting SQL injection queries across multiple websites The performance of applied machine learning approaches
using machine learning techniques. We have provided a with Bag of Words (BoW) features is evaluated in Table 5.
comprehensive overview of the outcomes achieved through BoW is an effective natural language processing (NLP)
our research experiments. Additionally, we present a detailed technique that simplifies text into word-frequency vectors.
analysis of the performance of various machine learning To detect SQL attack queries, we have implemented KNC,
methods employed to distinguish between signs of an SQL LR, RF, and XGB approaches using BoW features. The
attack and benign queries. analysis reveals that the KNC model achieved the highest
accuracy score of 0.94 in comparisons. The RF method
A. EXPERIMENTAL SETUP achieved the lowest performance accuracy score of 0.89,
Our novel proposed research experiments were conducted followed by XGB and LR. The KNC excelled in accurately
on a machine equipped with the following specifications: identifying the ‘‘Attack’’ class, with a high precision of
an Intel(R) Core(TM) i5-10300H CPU, 16.0 GB RAM, 0.90 and perfect recall. These results demonstrate that both
2.50 GHz CPU, 8MB cache size, and a Core i5-10300H KNC and LR methods achieved acceptable scores; however,
model name for the CPU. The Python 3.0 programming they are not the highest. There is still a need for advanced
language was employed to implement the applied machine feature engineering to enhance the performance of SQL
learning and deep learning methods. The Python library injection attack detection.
modules used during method implementations were Sklearn,
Keras, and TensorFlow. The performance measures are C. PERFORMANCE ANALYSIS WITH TFIDF FEATURES
the runtime computation, p-value, accuracy, f1, recall, and After evaluating the results with the BoW feature, we imple-
precision. mented a more advanced approach, Term Frequency-Inverse
Document Frequency (TFIDF). TFIDF features assess the
1) EVALUATION METRICS significance of words in textual data by comparing their
This study employs several well-known performance evalua- frequency in a specific document to their frequency in
tion metrics to evaluate the performance of machine learning the entire dataset. The performance scores of the applied
models. In particular, accuracy, precision, recall, and F1 score methods are outlined in Table 6. The analysis revealed

135512 VOLUME 11, 2023


N. Thalji et al.: AE-Net: Novel Autoencoder-Based Deep Features for SQL Injection Attack Detection

TABLE 5. Performance results of machine learning methods with BoW the novel features effectively capture relevant information
features.
and enable accurate detection of SQL injection attacks. The
proposed novel features appear to offer higher accuracy and
more balanced precision and recall rates compared to BOW
and TFIDF representations.

TABLE 7. Performance results of machine learning and deep learning


methods with novel deep features.

that, once again, the KNC approach demonstrated strong


classification ability with an accuracy rate of 0.96. The
classification-wise metric scores are also notably high. The
other applied methods, LR, RF, and XGB, also displayed
improved performance scores compared to those with BoW
features. This analysis demonstrates that by employing
a more advanced feature engineering approach, we are The time series-based results analysis of the deep learning
able to enhance performance scores for the attack section. model LSTM during training is illustrated in Figure 4. The
Nevertheless, a 4% error remains, indicating a necessity for analysis demonstrates that during the first five epochs of
advanced neural network-based feature engineering. training, the loss scores are high and performance accuracy
scores are below 0.80. After the fifth epoch, the LSTM
TABLE 6. Performance results of machine learning methods with TFIDF neural network sets optimal weights and gradually improves
features. performance scores. The analysis shows that the LSTM
approach achieved accuracy scores above 0.85 for SQL
injection detection.
The performance variations of applied methods based on
the confusion matrix are demonstrated in Figure 5. This
analysis reveals that LSTM achieved a high wrong prediction
rate, which is the reason for the low performance scores.
The machine learning-based KNC also achieved high error
rates during classification, followed by RF. This analysis
concludes that the proposed XGB approach achieved a
minimum error rate of 102 wrong classifications for SQL
injection, ensuring high performance scores.

D. PERFORMANCE ANALYSIS WITH NOVEL PROPOSED E. KFOLD VALIDATIONS ANALYSIS


FEATURES The K-fold cross-validation technique is a popular method
The results obtained from the applied BoW and TFIDF for evaluating the performance of machine learning models.
feature engineering approaches are not up to the mark. Table 8 contains the cross-validation results of the applied
In order to address this issue, we have introduced a method. During cross-validation, the data splitting process
novel feature engineering approach named AR-Net. With is repeated 10 times, and the average results are analyzed to
this proposed approach, we have extracted high-level deep determine the model’s effectiveness. Assessing the Standard
features from the SQL query data and re-evaluated the results Deviation (SD) is essential in determining the reliability
in this section. and consistency of the outcomes. The analysis shows that
The results of the applied machine learning and deep only LSTM achieved low K-fold scores, followed by the
learning models, along with the proposed feature engineering KNC method. This analysis reveals that the proposed XGB
approach, are described in Table 7. The applied KNC and approach achieved high k-fold accuracy performance scores
RF achieved excellent performance scores of 0.96 and 0.99, of 0.99 with a minimum SD score of 0.0010 for SQL injection
respectively. The machine learning-based XGB approach attack detection.
outperformed others with a remarkable accuracy score of
0.99. The deep learning-based LSTM method achieved an F. COMPUTATIONAL COMPLEXITY ANALYSIS
acceptable score of 0.87; however, it is not the highest We have determined the computational complexity of each
compared to the other methods. This analysis concludes that applied model, and the results are reported in Table 9. This

VOLUME 11, 2023 135513


N. Thalji et al.: AE-Net: Novel Autoencoder-Based Deep Features for SQL Injection Attack Detection

TABLE 8. Performance results validation of machine learning and deep


learning methods with novel deep features.

high performance scores as a priority due to overcome


critical SQL attacks. Furthermore, we will work on reducing
the complexity by optimizing the proposed approach’s
architecture.

TABLE 9. Computational complexity analysis of machine learning and


deep learning methods with novel deep features.

G. COMPARISON WITH STATE OF THE ART APPROACHES


We have compared our novel proposed approach’s per-
formance with state-of-the-art studies published for SQL
injection detection. Table 10 contains the comparison results.
We have used recent studies from the year 2023 for
comparison. The analysis reveals that our proposed approach
outperformed the state-of-the-art approach with high accu-
FIGURE 4. The time series-based results analysis of deep learning model
LSTM during training. racy performance of 0.99 for SQL injection attack detection.

TABLE 10. The results comparison with state of the art studies used for
SQL injection attack detection.

H. STATISTICAL T-TEST ANALYSIS


In this section, we provide the outcomes of a statistical t-test
conducted on the results of our proposed model, utilizing
all the incorporated features [36]. This test assesses the
significance of the compared approach. It establishes two
hypotheses: the null hypothesis posits that the compared
approach lacks statistical significance in comparison to
others. Should the t-test refute the null hypothesis, it would
imply acceptance of the alternative hypothesis, signifying the
FIGURE 5. The confusion matrix analysis of applied neural network statistical significance of our proposed approach.
approaches. The t-test produces a Statistic Score and a corresponding
p-value. When the p-value surpasses the Statistic value,
section contains the runtime computations for each model it leads to the rejection of the null hypothesis. Table 11
during the building process on the proposed feature dataset. presents the outcomes across various scenarios. We con-
Upon analysis, it is discovered that the deep learning-based ducted a comparative analysis of the results obtained from
LSTM method is associated with high complexity rates of the machine learning model using the suggested methodology
133.7 seconds. The proposed XGB also achieved high run- against other features. In every instance of comparison, the
time complexity; however, it also achieved high performance t-test dismisses the null hypothesis, providing evidence of the
scores for SQL injection detection. As a result, we consider statistical significance of the proposed approach.

135514 VOLUME 11, 2023


N. Thalji et al.: AE-Net: Novel Autoencoder-Based Deep Features for SQL Injection Attack Detection

TABLE 11. The statistical analysis using the t-test mechanism. ACKNOWLEDGMENT
The authors would like to express their grateful to Princess
Nourah bint Abdulrahman University Researchers Support-
ing Project number (PNURSP2023R104), Princess Nourah
bint Abdulrahman University, Riyadh, Saudi Arabia.
REFERENCES
[1] M. Nasereddin, A. ALKhamaiseh, M. Qasaimeh, and R. Al-Qassas,
‘‘A systematic review of detection and prevention techniques of SQL
I. DISCUSSIONS injection attacks,’’ Inf. Secur. J., Global Perspective, vol. 32, no. 4,
The automated and timely detection of SQL injection attacks pp. 252–265, Jul. 2023.
[2] I. S. Crespo-Martínez, A. Campazas-Vega, Á. M. Guerrero-Higueras,
through AI methods is performed in experiments. We have C. Álvarez-Aparicio, and C. Fernández-Llamas, ‘‘Impact of the keep-alive
used a preprocessed database of 46,392 SQL queries. parameter on SQL injection attack detection in network flow data,’’ in
We introduce a novel approach, the AE-Net, for automatic Proc. Comput. Intell. Secur. Inf. Syst. Conf. Cham, Switzerland: Springer,
2023, pp. 69–78.
feature engineering. The proposed AE-Net extracts high- [3] A. Arshad, M. Jabeen, S. Ubaid, A. Raza, L. Abualigah, K. Aldiabat,
level deep features from SQL textual data, subsequently input and H. Jia, ‘‘A novel ensemble method for enhancing Internet of Things
into machine learning models for performance evaluations. device security against botnet attacks,’’ Decis. Anal. J., vol. 8, Sep. 2023,
Art. no. 100307.
During results evaluations, we have applied four advanced [4] F. Rustam, A. Raza, I. Ashraf, and A. D. Jurcut, ‘‘Deep ensemble-based
machine learning and deep learning-based approaches for efficient framework for network attack detection,’’ in Proc. 21st Medit.
performance assessments. Extensive experimental evalua- Commun. Comput. Netw. Conf. (MedComNet), Jun. 2023, pp. 1–10.
[5] R. Madhvan and M. F. Zolkipli, ‘‘An overview of malware injection
tion reveals that the extreme gradient boosting classifier attacks: Techniques, impacts, and countermeasures,’’ Borneo Int. J. eISSN,
outperforms existing studies with an impressive k-fold vol. 6, no. 3, pp. 22–30, 2023.
[6] T. Sheth, J. Anap, H. Patel, N. Singh, and R. R. B, ‘‘Detection of SQL
accuracy score of 0.99 for SQL injection detection. The
injection attacks by giving a priori to Q-learning agents,’’ in Proc. IEEE
performance of each applied learning approach is further IAS Global Conf. Emerg. Technol. (GlobConET), May 2023, pp. 1–6.
enhanced through hyperparameter tuning and validated via [7] M. Alghawazi, D. Alghazzawi, and S. Alarifi, ‘‘Detection of SQL injection
attack using machine learning techniques: A systematic literature review,’’
k-fold cross-validation. Additionally, statistical t-test analysis J. Cybersecur. Privacy, vol. 2, no. 4, pp. 764–777, Sep. 2022.
is applied to assess performance variations. [8] P. Roy, R. Kumar, and P. Rani, ‘‘SQL injection attack detection by
machine learning classifier,’’ in Proc. Int. Conf. Appl. Artif. Intell. Comput.
(ICAAIC), May 2022, pp. 394–400.
V. CONCLUSION AND FUTURE WORK [9] A. M. A. Badri and S. Alouneh, ‘‘Detection of malicious requests to protect
This study proposed an automated and timely method for web applications and DNS servers against SQL injection using machine
the detection of SQL injection attacks. We introduce a novel learning,’’ in Proc. Int. Conf. Intell. Comput., Commun., Netw. Services
(ICCNS), Jun. 2023, pp. 5–11.
approach called AE-Net for automatic feature engineering. [10] K. Singh, S. Kokardekar, G. Khonde, P. Dekate, N. Badkas, and S. Lachure,
AE-Net extracts high-level deep features from SQL textual ‘‘Cloud engineering-based on machine learning model for SQL injection
attack,’’ in Proc. Int. Conf. Commun., Circuits, Syst. (IC3S), May 2023,
data, which are subsequently input into machine learning pp. 1–6.
models for performance evaluation. Four advanced machine [11] J. Misquitta and S. Asha, ‘‘SQL injection detection using machine learning
learning and deep learning-based approaches are employed and convolutional neural networks,’’ in Proc. 5th Int. Conf. Smart Syst.
Inventive Technol. (ICSSIT), Jan. 2023, pp. 1262–1266.
for these evaluations. Extensive experimental evaluation [12] A. Rehman, A. Raza, F. S. Alamri, B. Alghofaily, and T. Saba, ‘‘Transfer
reveals that the extreme gradient boosting classifier outper- learning-based smart features engineering for osteoarthritis diagnosis from
forms existing studies with an impressive K-fold accuracy knee X-ray images,’’ IEEE Access, vol. 11, pp. 71326–71338, 2023.
[13] I. S. Crespo-Martínez, A. Campazas-Vega, Á. M. Guerrero-Higueras,
score of 0.99 for SQL injection detection. The performance V. Riego-DelCastillo, C. Álvarez-Aparicio, and C. Fernández-Llamas,
of each applied learning approach is further enhanced ‘‘SQL injection attack detection in network flow data,’’ Comput. Secur.,
through hyperparameter tuning and validated via K-fold vol. 127, Apr. 2023, Art. no. 103093.
[14] S. Zhang, Y. Li, and Q. Jiang, ‘‘Feature ratio method: A payload feature
cross-validation. Additionally, statistical t-test analysis is extraction and detection approach for SQL injection attacks,’’ in Proc. 3rd
applied to assess performance variations. Asia–Pacific Conf. Commun. Technol. Comput. Sci. (ACCTCS), Feb. 2023,
pp. 172–175.
[15] N. Gandhi, J. Patel, R. Sisodiya, N. Doshi, and S. Mishra,
A. FUTURE WORK ‘‘A CNN-BiLSTM based approach for detection of SQL injection
In the future, we plan to revise the architecture of our attacks,’’ in Proc. Int. Conf. Comput. Intell. Knowl. Economy (ICCIKE),
Mar. 2021, pp. 378–383.
proposed approach to reduce computational complexity. [16] A. Raza, F. Rustam, H. U. R. Siddiqui, I. D. L. T. Diez, and I. Ashraf,
Additionally, we intend to develop a user-friendly graphical ‘‘Predicting microbe organisms using data of living micro forms of life
interface tailored for networking organizations, enabling and hybrid microbes classifier,’’ PLoS ONE, vol. 18, no. 4, Apr. 2023,
Art. no. e0284522.
real-time monitoring of SQL injection attacks. [17] K. R. Jothi, S. Balaji B, N. Pandey, P. Beriwal, and A. Amarajan,
‘‘An efficient SQL injection detection system using deep learning,’’ in
Proc. Int. Conf. Comput. Intell. Knowl. Economy (ICCIKE), Mar. 2021,
FUNDING pp. 442–445.
This research was funded by Princess Nourah bint Abdul- [18] M. Alghawazi, D. Alghazzawi, and S. Alarifi, ‘‘Deep learning architecture
rahman University Researchers Supporting Project number for detecting SQL injection attacks based on RNN autoencoder model,’’
Mathematics, vol. 11, no. 15, p. 3286, Jul. 2023.
(PNURSP2023R104), Princess Nourah bint Abdulrahman [19] D. Lu, J. Fei, and L. Liu, ‘‘A semantic learning-based SQL injection attack
University, Riyadh, Saudi Arabia. detection technology,’’ Electronics, vol. 12, no. 6, p. 1344, Mar. 2023.

VOLUME 11, 2023 135515


N. Thalji et al.: AE-Net: Novel Autoencoder-Based Deep Features for SQL Injection Attack Detection

[20] A. Alotaibi and M. A. Rassam, ‘‘Enhancing the sustainability of deep- ALI RAZA received the Bachelor of Science
learning-based network intrusion detection classifiers against adversarial and M.S. degrees in computer science from the
attacks,’’ Sustainability, vol. 15, no. 12, p. 9801, Jun. 2023. Department of Computer Science, Khwaja Fareed
[21] S. S. H. Shah. SQL Injection Dataset | Kaggle. Accessed: Oct. 9, 2023. University of Engineering and Information Tech-
[Online]. Available: https://www.kaggle.com/datasets/syedsaqlainhussain/ nology (KFUEIT), Rahim Yar Khan, Pakistan, in
sql-injection-dataset
[22] A. Raza, A. M. Qadri, I. Akhtar, N. A. Samee, and M. Alabdulhafith, 2021 and 2023, respectively. He has published
‘‘LogRF: An approach to human pose estimation using skeleton landmarks several articles in reputed journals. His current
for physiotherapy fitness exercise correction,’’ IEEE Access, vol. 11, research interests include data science, artificial
pp. 107930–107939, 2023. intelligence, data mining, natural language pro-
[23] A. Raza, M. R. Al Nasar, E. S. Hanandeh, R. A. Zitar, A. Y. Nasereddin, cessing, machine learning, deep learning, and
and L. Abualigah, ‘‘A novel methodology for human kinematics motion image processing.
detection based on smartphones sensor data using artificial intelligence,’’
Technologies, vol. 11, no. 2, p. 55, Apr. 2023.
[24] G. Kaur and A. Sharma, ‘‘A deep learning-based model using hybrid
feature extraction approach for consumer sentiment analysis,’’ J. Big Data,
vol. 10, no. 1, p. 5, Jan. 2023.
[25] N. Puri, P. Saggar, A. Kaur, and P. Garg, ‘‘Application of ensemble machine
learning models for phishing detection on web networks,’’ in Proc. 5th Int.
Conf. Comput. Intell. Commun. Technol. (CCICT), Jul. 2022, pp. 296–303.
[26] R. Hajizadeh, ‘‘Unconstrained neighbor selection for minimum reconstruc-
tion error-based K-NN classifiers,’’ Complex Intell. Syst., vol. 9, no. 5,
pp. 5715–5730, Oct. 2023.
MOHAMMAD SHARIFUL ISLAM received the
[27] M. Imran, H. U. R. Siddiqui, A. Raza, M. A. Raza, F. Rustam, and I. Ashraf, B.Sc. degree in computer science and telecom-
‘‘A performance overview of machine learning-based defense strategies for munication engineering from the Department
advanced persistent threats in industrial control systems,’’ Comput. Secur., of Computer Science and Telecommunication
vol. 134, Nov. 2023, Art. no. 103445. Engineering, Noakhali Science and Technology
[28] A. M. Qadri, A. Raza, F. Eid, and L. Abualigah, ‘‘A novel transfer University, Noakhali, Bangladesh, in 2023. His
learning-based model for diagnosing malaria from parasitized and current research interests include data science,
uninfected red blood cell images,’’ Decis. Anal. J., vol. 9, Dec. 2023, machine learning, natural language processing,
Art. no. 100352. and image processing.
[29] L. J. M. León, J. L. Herrera, J. Berrocal, and J. Galán-Jiménez, ‘‘Logistic
regression-based solution to predict the transport assistant placement in
SDN networks,’’ in Proc. NOMS IEEE/IFIP Netw. Oper. Manage. Symp.,
May 2023, pp. 1–5.
[30] V. Jain and M. Agrawal, ‘‘Heart failure prediction using XGB classifier,
logistic regression and support vector classifier,’’ in Proc. Int. Conf.
Advancement Comput. Comput. Technol. (InCACCT), May 2023, pp. 1–5.
[31] S. Akbar, H. Ali, A. Ahmad, M. R. Sarker, A. Saeed, E. Salwana, S. Gul,
A. Khan, and F. Ali, ‘‘Prediction of amyloid proteins using embedded
evolutionary & ensemble feature selection based descriptors with eXtreme
gradient boosting model,’’ IEEE Access, vol. 11, pp. 39024–39036, 2023.
[32] A. Raza, F. Rustam, B. Mallampati, P. Gali, and I. Ashraf, ‘‘Preventing
NAGWAN ABDEL SAMEE received the B.S.
crimes through gunshots recognition using novel feature engineering and
degree in computer engineering from Ein Shams
meta-learning approach,’’ IEEE Access, vol. 11, pp. 103115–103131, 2023.
[33] S. Das, A. Paramane, S. Chatterjee, and U. M. Rao, ‘‘Sensing incipient University, Egypt, in 2000, and the M.S. degree
faults in power transformers using bi-directional long short-term memory in computer engineering and the Ph.D. degree in
network,’’ IEEE Sensors Lett., vol. 7, no. 1, pp. 1–4, Jan. 2023. systems and biomedical engineering from Cairo
[34] A. Raza, K. Munir, M. S. Almutairi, and R. Sehar, ‘‘Novel class probability University, Egypt, in 2008 and 2012, respectively.
features for optimizing network attack detection with machine learning,’’ Since 2013, she has been an Assistant Professor
IEEE Access, vol. 11, pp. 98685–98694, 2023. with the Information Technology Department,
[35] S. Aziz, K. Munir, A. Raza, M. S. Almutairi, and S. Nawaz, ‘‘IVNet: CCIS, Princess Nourah bint Abdulrahman Univer-
Transfer learning based diagnosis of breast cancer grading using
sity, Riyadh, Saudi Arabia. Her research interests
histopathological images of infected cells,’’ IEEE Access, vol. 11,
include data science, machine learning, bioinformatics, and parallel comput-
pp. 127880–127894, 2023.
[36] F. Rustam, A. Ishaq, M. S. A. Hashmi, H. U. R. Siddiqui, L. A. D. López, ing. Her awards and honors include the Takafull Prize (Innovation Project
J. C. Galán, and I. Ashraf, ‘‘Railway track fault detection using selective Track), Princess Nourah Award in Innovation, Mastery Award in Predictive
MFCC features from acoustic data,’’ Sensors, vol. 23, no. 16, p. 7018, Analytics (IBM), the Mastery Award in Big Data (IBM), and the Mastery
Aug. 2023. Award in Cloud Computing (IBM).

NISREAN THALJI received the bachelor’s and


master’s degrees (Hons.) in computer science
Yarmouk University, Jordan, Irbid, and the Ph.D.
degree (Hons.) in computer science, specializ-
ing in artificial intelligence and data science
from the prestigious University Malaysia Perlis
(UNIMAP), Perlis, Malaysia. With a strong educa-
tional background, she is currently an Assistance MONA M. JAMJOOM received the Ph.D. degree in computer science
Professor with Jadara University. She is an accom- from King Saud University. She is currently an Associate Professor
plished individual in the field of computer science. with the Department of Computer Sciences, College of Computer and
Her diverse expertise spans across various domains, including artificial Information Sciences, Princess Nourah bint Abdulrahman University,
intelligence, machine learning, deep learning, algorithm engineering, swarm Riyadh, Saudi Arabia. Her research interests include artificial intelligence,
intelligence, and natural language processing. Her research interests continue machine learning, deep learning, medical imaging, and data science. She has
to thrive in these cutting-edge areas, further solidifying her reputation as a published several research articles in her field.
dedicated Scholar and a Researcher in the field.

135516 VOLUME 11, 2023

You might also like