RL QOptimizer a Reinforcement Learning Based Query Optimizer
RL QOptimizer a Reinforcement Learning Based Query Optimizer
ABSTRACT With the current availability of massive datasets and scalability requirements, different systems
are required to provide their users with the best performance possible in terms of speed. On the physical
level, performance can be translated into queries’ execution time in database management systems. Queries
have to execute efficiently (i.e. in minimum time) to meet users’ needs, which puts an excessive burden
on the database management system (DBMS). In this paper, we mainly focus on enhancing the query
optimizer, which is one of the main components in DBMS that is responsible for choosing the optimal
query execution plan and consequently determines the query execution time. Inspired by recent research in
reinforcement learning in different domains, this paper proposes A Deep Reinforcement Learning Based
Query Optimizer (RL_QOptimizer), a new approach to find the best policy for join order in the query plan
which depends solely on the reward system of reinforcement learning. The experimental results show that a
notable advantage of the proposed approach against the existing query optimization model of PostgreSQL
DBMS.
INDEX TERMS Join ordering problem, query execution plan and query optimization.
I. INTRODUCTION the join order, the order in which the tables of a query are
In DBMS, a single query can be executed through different joined can have a dramatic effect on the query execution
execution plans. The query optimizer attempts to choose the time. In addition, the number of possible join orders increases
most efficient way to execute a given query from the space exponentially with the number of tables [2]. Hence, the query
of execution plans. Most DBMSs use the cost-based model optimizer can’t compute the different costs for all combina-
for query optimization where the optimizer estimates the cost tions to select the best join order during query execution.
of the execution plan and then selects the optimal plan that Consequently, most optimizers use heuristics such as consid-
minimizes the cost among the set of candidate plans [1]. ering the shape of the query tree [3] - to prune the search
As the number of intermediate rows (results) is unknown at space. In this paper, we propose two versions of A Reinforce-
run time, the Optimizer uses pre-calculated statistics such as ment Learning Based Query Optimizer (RL_QOptimizer) to
information about the distribution of data values and cardi- identify the best execution plan based on the reward system
nality estimation to estimate the cost of a plan rather than of reinforcement learning. The first model uses Reinforce-
calculating the real cost of querying the data during the query ment Learning and the second one uses Deep Reinforcement
plan. One of the main challenges in query optimization and Learning [4], [5].
query plan generations is the selection of the order to perform The main contributions of this work are:
the join operation between tables (i.e. relations). Even if the 1) Proposing a new query optimizer model
final results of the query could be the same regardless of (RL_QOptimizer) for optimizing tables’ join orders
that is based on the Deep Reinforcement Learning
The associate editor coordinating the review of this manuscript and technique. Deep Reinforcement Learning is used to
approving it for publication was R. K. Tripathy . find the optimal query execution plan.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
70502 VOLUME 10, 2022
M. Ramadan et al.: RL_QOptimizer: A Reinforcement Learning Based Query Optimizer
π
before the JOIN operations or applying the most restrictive tion horizontally, where ‘c’ is the selection condition, which
SELECT operations first before other SELECT operations is a Boolean condition. The PROJECT operation ( A(R))
are examples of heuristics rules that mostly guarantee less is used to select certain attributes while discarding others.
execution time when applied on the execution plans. In the The Project operation is also known as vertical partitioning
cost-based optimization step, the optimizer estimates and because it partitions the relation or table vertically, discarding
compares the costs of query execution based on statistics and other columns or attributes, where ‘A’ is the attribute list,
cardinality estimations using different execution strategies which is the desired set of attributes from the attributes of
and algorithms. Then, it chooses the strategy with the lowest relation(R), and finally, the JOIN operation (R1 FG R2) is
cost estimate [1]. The lowest cost estimate is usually found used to join two tables R1 and R2 based on the join condition.
by performing the operations that initially reduce the size of The outcome of joining two or more relations is a set of all
intermediate results. possible tuple combinations with the same common attribute.
Example 1: Consider the following query on the
Customer-Ordering database presented in Figure 1. The B. JOIN ORDERING PROBLEM
Customer-Ordering database has five entities which are A ‘‘join’’ operation is a relational operation that combines
‘‘ORDER’’, ‘‘CUSTOMER’’, ‘‘PRODUCT’’, ‘‘CATEGORY’’, rows from two tables based on a related column. While join
and ‘‘ADDRESS’’ with the cardinalities 1000000, 100000, works with only two tables at a time, a query that joins N
10000, 1000, and 1000 rows for each table respectively. tables is executed through N-1 joins. The optimizer needs
to take a critical decision regarding the selection of the
SELECT C.NAME, C.ADDRESS, P.PRICE, optimal join order which greatly influences the execution
P.NAME FROM CUSTOMER AS C CROSS JOIN time of a query. The process of choosing an efficient join
ORDER AS O CROSS JOIN PRODUCT AS P order is difficult as the number of possible join combinations
WHERE CUSTOMER.ID = O.CUSTOMER_ID that the optimizer needs to explore and analyze increases
AND P.ID = O.PRODUCT_ID exponentially with the number of tables [2]. In addition, the
AND C.PHONE_NUMBER = “0111” number of intermediate rows (results) is unknown at run time
which forces the optimizer to use pre-calculated statistics and
D. DEEP REINFORCEMENT LEARNING values are unknown at compile-time, the time before the
Neural networks are function approximators that can be used actual execution, the parametric techniques attempt to iden-
in reinforcement learning when the state space or action space tify several execution plans where each plan is optimal for
is very large [4]. Deep reinforcement learning is the result of a subset of possible values of the run-time parameters [6].
applying reinforcement learning using deep neural networks. The goal is to identify candidate plans at compile-time, each
Deep neural networks are used as the agents that learn to map optimal for some region of the parameter space, and the
state-action pairs to rewards. Depending on the result, the optimal plan is selected once the actual parameter values
neural network is encouraged or discouraged by the action become known at run time. This type has many drawbacks
on this input in the future. like the overhead of pruning all plans for the entire relational
Deep Reinforcement learning has been used widely in dif- selectivity space for one query which is not a cost-effective
ferent domains. More specifically, it is used in those domains approach [21]. This approach depends on assumptions that
that a reward/penalty can be given for any action of the agent. may not be true all the time like assuming that a plan is
Google DeepMind Team developed many artificial intel- optimal for all values in a specific region [22].
ligence models using deep reinforcement learning in differ- Most DBMSs use histogram-based techniques as part of
ent games like a model to play Atari games and improves their cost model to summarize the data of tables to perform
itself [4] which used a convolutional neural network, trained efficient selectivity estimations [23], [24]. A large number of
with a variant of Q-learning, whose input is raw pixels and algorithms have been proposed for constructing histograms
whose output is a value function estimating future rewards. over a single attribute and multiple attributes. A new algo-
In addition to developing AlphaGo which is a Chinese game rithm to build a histogram is introduced in [25] to construct it
that challenged artificial intelligence researchers for many by minimizing the aggregated error. As the algorithm needs
years [7] by combining deep neural networks with reinforce- huge construction time, generating an efficient execution plan
ment learning [9]. for a given query from the space of possible execution plans
Deep reinforcement learning is used for other complex is very expensive. In addition, estimating the join operation
problems like autonomous driving [10], [11]. As mentioned selectivity problem may lead to poor execution plans.
in Sallab et al. [10], it is difficult to deal with autonomous
driving as a supervised learning problem due to strong inter- B. LEARNING-BASED QUERY OPTIMIZATION
actions with the environment. Finally, End-to-End Frame- Query Optimization using learning models has become
work for Fast Learning Asynchronous Agents research [12] one of the hot topics in the database research directions
proposed a training framework that combines the benefits [26]–[35]. Some researchers have investigated the feasibility
of imitation learning (IL) and deep RL for fast learning of applying machine learning techniques in query optimiza-
asynchronous agents through extending the Asynchronous tion to improve the query optimization process.
Advantage Actor-Critic (A3C) algorithm. Some of the prior work used supervised learning to learn
from old execution plans that were generated by the query
optimizer for past queries to help in generating execution
III. RELATED WORK
plans for new queries [1]. The authors in [26] proposed an
Related work is categorized into two main directions: Tra-
execution plan recommendation system based on similar-
ditional query optimization and Learning-based query opti-
ity identification between SQL queries. They used machine
mization. Previous work in each of those directions is dis-
learning techniques to improve query similarity detection and
cussed in the following subsections.
hence were able to identify and associate similar queries
having similar execution plans. This algorithm assumes that
A. TRADITIONAL QUERY OPTIMIZATION TECHNIQUES similar textual queries have similar execution plans, however,
Most of the query optimizers rely upon the dynamic pro- this is not always true in real-world where similar textual
gramming approach of System R [13], [14] which was the queries can have different optimal execution plans. In addi-
first implementation of SQL and it pioneered several opti- tion, the paper didn’t use the query optimizers’ feedback to
mization techniques, including the utilization of dynamic enhance the query execution plan.
programming for bottom-up join tree construction. They Other proposed machine learning-based query optimizer
use the traditional cost model to determine the best plan models focus on adjusting incorrect statistics and cardinality
for a given query by generating different strategies using estimates of a query execution plan automatically by learn-
cardinality estimations [1], [15]. These estimates rely upon ing from query optimizer past mistakes. One of the first
statistics on the database and assumptions that may or approaches that focused on adjusting this information is [27]
may not be true. Invalid assumptions or inaccurate calcula- which compares the optimizer’s estimates with the actual
tions for the cardinality estimation lead to poor execution cardinalities during run time and computes the errors. Then,
plans [1], [15], [16]. the model is adjusted to perform better in future runs. Also,
Another type of query optimizer tries to use parametric Adaptive Cardinality Estimation [32] proposes a cardinality
query optimization [17]–[21] Where the traditional optimiz- estimation approach that is integrated with the use of machine
ers make assumptions about many parameters [20] whose learning techniques. The main contribution of this approach
is using query execution statistics of the previously executed The SkinnerDB system presented in [37] uses reinforcement
queries to improve cardinality estimations. The proposed learning for query optimization. The proposed model learns
approaches have many issues e.g., they are designed for static the optimal join order while running the query. The possible
queries. In addition, they focus on cardinality estimation join orders are divided into slices where the possible join
so, they still require to use of the traditional cost model order is tested on each slice of the data until the best join
and heuristic rules that may lead to poor plans. Similar to order is obtained and considered for the remaining slices of
the traditional optimizers, the proposed model planning time data. Query performance is evaluated using regret bounds
increases as the number of join conditions increases. Another as a reward system that considers the difference between
approach was proposed by [28] which uses machine learning actual execution time and the time for an optimal join order.
algorithms for cardinality estimation to learn selectivity that Fully Observed Optimizer (FOOP) presented in [38] uses a
takes a bounded range on each column as input. This method reinforcement learning model where the reward function is
focused on the selectivity estimation of several range clauses defined as the cost model of the traditional DBMS optimizer.
but did not consider Queries with joins. In addition, it focused Another model that utilizes reinforcement learning is pre-
on cardinality estimation, not the actual execution time which sented in [33], which is the closest model to the proposed
may affect dramatically the execution time. model in this paper. The model suggests a learning-based
The approaches presented in [29], [30] use a deep rein- technique for join order based on the plans generated by
forcement learning technique in determining the execution the DBMS optimizer to bootstrap the reinforcement learning
plan. ReJOIN model [29] focuses on the Join order selection model before fine-tuning it using real-time execution time.
problem by applying deep reinforcement learning techniques. BAO (the Bandit optimizer) presented in [39], is a learned
In this model, the agent learns to maximize the reward component that sits on the top of an existing query optimizer
through continuous feedback with the help of an artificial in order to enhance query optimization rather than discarding
neural network. ReJOIN used the traditional cost model based the traditional query optimizer. Bao component learns to map
on cardinality estimation during the learning phase rather the query to the best execution strategy for the query. Then,
than the actual execution time which may lead to non-optimal upon receiving a query, the query optimizer generates multi-
plans. Learning State Representations for Query Optimiza- ple plans according to different strategies where the learned
tion discussed in [30] used deep neural networks to learn state model is expected to choose the best query plan given the
representations of queries in order to learn the optimal plans. possible strategies.
More specifically, the paper introduced two approaches, the Another research direction explores the use of deep rein-
first approach transforms a query into a feature vector and forcement learning to administrate a DBMS. The case for
trains a deep neural network to take such vectors as input Automatic Database Administration investigated in [40] pro-
and output the estimated cardinality. The second approach, poses a new model of index selection to decide on which
a recursive approach, in which they train the model to predict attributes to create indexes on for a given workload based on
the cardinality of a query consisting of a single new operation deep reinforcement learning. UDO(the Universal Database
applied to a subquery to incrementally generate a representa- Optimizer) [41] considers a variety of tuning choices, starting
tion of each subquery’s intermediate results [30]. This paper from picking transaction code variants over-index selections
explored the idea of training a deep reinforcement learning up to database system parameter tuning. UDO uses reinforce-
model to predict query cardinalities instead of relying entirely ment learning to converge to near-optimal configurations.
on basic statistics to estimate costs. All of the earlier models have utilized DBMS optimizer
Neo (Neural Optimizer) presented in [31], uses a super- and its generated plans to train or at least bootstrap their
vised learning model to guide a search algorithm through a learned models. Consequently, the purpose of this research
large and complex space. Neo assumes the existence of a is to develop reinforcement learning-based models that learn
sample workload which consists of a set of queries that is directly from real query performance of different join orders
considered a representative of the total workload. In addi- where the models are rewarded or penalized based on the
tion, PostgreSQL optimizer is considered as the expert that actual execution time of different query plans. Furthermore,
is responsible for generating the best query plans. Given the proposed models explore the whole space of different
the sample workload and their best query plans generated query plans to learn the best join order for any given query.
by the expert, the learnt model tries to generalize a model
that can infer the plan with the least execution time for a IV. PROPOSED MODELS
query. In later stages, Neo retrains the supervised learning In this paper two versions of A Reinforcement Learning
model based on the feedback received while running the Based Query Optimizer (RL_QOptimizer) are proposed to
model on its environment. Towards a Hands-Free Query Opti- solve the join ordering problem during query optimization.
mizer through Deep Learning presented in [36], is another Both approaches employ the Q-learning model [5], which
attempt that tries to identify potential complications for future is one of the most popular reinforcement learning algo-
research that uses deep reinforcement in Query optimization rithms. The first approach is a simple RL Q-Learning which
problems. Also, the authors referred to the possibility of uses a simple lookup table (Q-table) to calculate the maxi-
using latency as a reward function in the research directions. mum expected future reward for each action at each state.
FIGURE 5. Q-learning.
The first part r(s, a) is the immediate reward for the taken
the action (a) given the state (s). The second part is the
The second approach is a ‘Deep’ Q-Learning-based model, discount factor (γ ) multiplied by the estimate of optimal
which is more suitable for large states and actions as it uses future value max Q(s0 , a) which is known as the discounted
a neural network to approximate the Q-value function. Both estimate of optimal future value.
models operate by applying a set of general steps as shown in The model consists of four components which are: the
Figure 4. input of the model, the states, the set of possible actions and
The system has two main phases that are applied for both the reward function. The preceding equation shows how we
models, the first phase is the generation phase and the compute the Q-value for an action (a) starting from a state (s).
second is the selection phase. It is the sum of immediate rewards, and it takes greedy action
In the generation phase, the model either generates all from the next state(s0 ) (choose the action that has maximum
possible join ordering queries that may happen in the database Q value over other actions).
schema to learn or generate all the join ordering queries The input is typically represented as encoded query join
from a given database workload. Generating all possible join conditions. The characteristics of join conditions are encoded
ordering queries allows the model to be trained from scratch in the form of a vector of size n, where n is the number of all
on every possible scenario For example, if the database has possible join conditions in the database schema. Each cell of
JOINS between A, B, and C, the possible queries will be the vector can be a 0 or 1 where 1 means that this condition
A FG B, A FG C, B FG C, A FG B FG C. On the is included in this query. E.g. Input = [1, 1, 0, 0] means that
other hand, If a database workload is available, the join this query includes first and second join conditions.
ordering queries in the workload will be used in the training The main goal in the join ordering problem is to find the
process by the model. Following that, the system selects best possible join order for a given query. In the proposed
any one of join ordering queries in the selection phase. models, the states represented by 0 or 1’s vector where
All possible execution plans of the selected query will be 1 refers to join conditions that will be applied to a query.
generated to train the model. For each possible execution In each state, the agent has a set of possible actions to select
plan, the agent interacts with the DBMS to get the actual one from them to move from one state to another. These
execution time for this plan, which represents the reward actions are all join conditions of the query represented by
in our models, multiplied by -1 to minimize the execution one’s in-state vector. After selecting an action, the query
time. Both models are discussed in details in the following builder adds this condition in its order and sets its value to
subsections. zero in the state vector.
As no rules exist to correctly choose the reward func-
A. JOIN ORDERING USING REINFORCEMENT LEARNING tion, the choice of the reward function is one of the most
The first model uses Q-table to store the expected reward for challenging tasks in any reinforcement model. In the pro-
each action-state. The main function of the Q-table is to take posed models, the goal is to optimize the total execution
time of queries, hence, the actual queries execution time During the actual running when the model is required to
multiplied by -1 is used as a reward. During experiments, generate the best execution plan for a query that is vectorized
PostgreSQL [43] DBMS is used to get the actual execution as [1, 1, 1, 0], the agent moves to the corresponding row in
time of the query. Obviously, the lower the execution time, Q-table to select the condition with maximum reward. If the
the higher the reward. agent selects the third join condition, which is coded as one
In the learning stage the system takes all possible join in the vector, it will be replaced by zero and the new state will
conditions for the given database and generates different be [1, 1, 0, 0]. Then, the agent selects the best join condition
possible queries to train on them. The model then builds given the chosen third join condition and this is also a one
a vectorized representation of the query that is later used in the vector and will be replaced by a zero. This process is
as an input to the model. The agent selects one of the join repeated recursively until the vector reaches the terminal state
ordering conditions in this vector which is represented by which is [0, 0, 0, 0] to retrieve the best join conditions order.
one’s, then the environment gives a reward by interacting
with PostgreSQL DBMS. This process is repeated until the B. JOIN ORDERING USING DEEP REINFORCEMENT
terminal state. Finally, the function Q(s, a) in the Q-table LEARNING
using this equation is updated as follows 3 [5], [42]: The join ordering using reinforcement learning model has
many limitations that need to be solved before considering
Q(s, a) = Q(s, a) + α[r(s, a) + γ maxQ(s0 , a) − Q(s, a)] it as a practical solution. The main problem is related to
(3) the size of the database schema and the number of tables.
A large database schema with many join conditions leads to
The first part Q(s, a) is the current value in the state (s) if an gigantic state space that may reach up to millions of states.
action (a) is taken, and the second part is the learning rate (α) Consequently, the Q-table needs a large amount of memory
multiplied by the TD error. which is the difference between to store. In addition, the exploration of the Q-table won’t be
the TD target and the current Q(s, a). with the following three efficient. Another limitation is related to the generalization
essential steps: as the Q-table model can’t infer a Q value of a new state
1) The agent begins in a state (s), acts (a), observe the next from the already trained states. Thus, join ordering using
state s0 , and reward r. deep reinforcement learning model was introduced to address
2) The agent chooses an action by referring to the Q-table those limitations.
with the greatest value for the next state(s0 ) Join ordering using deep reinforcement learning model
3) Update q-values. introduces a deep neural network to approximate the Q-value
Example 3: Consider the customer-ordering database function. The state is given as input and the Q-value of all
shown in Figure 1 that has 4 join conditions [CUSTOMER FG possible actions is generated as output as shown in Figure 6.
ADDRESS, CUSTOMER FG ORDER, ORDER FG PROD- Similar to any deep neural network, it uses coefficients to
UCT, PRODUCT FG CATEGORY] which is vectorized as approximate the function that maps an input to the out-
[1,1,1,1]. In the training phase, the agent tries to explore all put. Accordingly, the algorithm learns the right coefficients
possible execution plans. First, the agent explores each join by adjusting their values iteratively in the learning state.
condition individually, then it trains on the vector [1, 0, 0, 0] In the proposed model, weights of the deep neural network
and builds the first query with [CUSTOMER FG ADDRESS]. are updated during training instead of updating the Q-value
The environment interacts with the DBMS to get the actual directly in the Q-table.
execution time of this query in addition to [ADDRESS FG The proposed model uses the Deep Q Network (DQN)
CUSTOMER] query execution time to update the Q-table. The which uses a neural network to approximate the Q-value
model performs the same process for each join condition function to tell the agent what action to take. This model
individually. Then, the model trains on all possible two join was proposed in DeepMind’s paper [4] to learn policies
conditions with each other. For example, it will train on from high-dimensional sensory input using reinforcement
the vector [1, 1, 0, 0] for [CUSTOMER FG ADDRESS] and learning. As stated in [42], RL is known to be unstable or
[CUSTOMER FG ORDER]. In this case, the model trains on even diverge when neural networks are used to represent the
the best join order out of the following two join orders. action-values. There are various factors that lead to this insta-
The first join order consists of [CUSTOMER FG ADDRESS] bility: The presence of correlations in the sequence of obser-
and the better order from [CUSTOMER FG ORDER] and vations and the correlations between the action-values(Q)
[ORDER FG CUSTOMER] where the second join order is and the target values. In the proposed model we followed
[ADDRESS FG CUSTOMER] with the better join order from the following improvements on DeepMind’s model presented
[CUSTOMER FG ORDER] and [ORDER FG CUSTOMER]. in [4], [42] to tackle these issues:
Where, the best from [[CUSTOMER FG ORDER] and 1) Experience Replay: buffer replay was used to store
[ORDER FG CUSTOMER] joins is already discovered during the latest N experience tuples observed by an agent,
the previous cycle of training. This process is repeated until including state, action, reward ‘‘response time’’, and
the training process is terminated by training the whole next state which allows the network to reuse this data
[1, 1, 1, 1] vector. later by sampling from it randomly. During the training
FIGURE 14. Comparison between the results of the DQN model with
PostgreSQL on join order benchmark 113 queries using IQR with showing
FIGURE 12. Comparison between the results of reinforcement learning extreme outliers.
models with PostgreSQL on TPCH database using IQR.
FIGURE 13. Comparison between the results of the DQN model with
PostgreSQL on join order benchmark 113 queries using IQR without
showing extreme outliers.
FIGURE 17. Comparison between the results of the DQN and the
Q-learning models on IMDB database for new queries.
FIGURE 18. Training curve showing average penalty per episodes during
the training process on IMDB database.
[33] S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica, ‘‘Learn- AYMAN EL-KILANY received the M.Sc. and
ing to optimize join queries with deep reinforcement learning,’’ 2018, Ph.D. degrees from the Information Systems
arXiv:1808.03196. Department, Faculty of Computers and Artificial
[34] K. Tzoumas, T. Sellis, and C. S. Jensen, ‘‘A reinforcement learning Intelligence, Cairo University, in 2012 and 2018,
approach for adaptive query processing,’’ Inst. Datalogi, Aalborg respectively. He is currently an Assistant Professor
Universitet, Aalborg, Denmark, 1DB Tech. Rep. 22, 2008. [Online]. and a Researcher at the Faculty of Computers and
Available: https://vbn.aau.dk/en/publications/a-reinforcement-learning- Artificial Intelligence, Cairo University.
approach-for-adaptive-query-processing
[35] R. B. Guo and K. Daudjee, ‘‘Research challenges in deep reinforce-
ment learning-based join query optimization,’’ in Proc. 3rd Int. Workshop
Exploiting Artif. Intell. Techn. Data Manage., Jun. 2020, pp. 1–6.
[36] R. Marcus and O. Papaemmanouil, ‘‘Towards a hands-free query optimizer
through deep learning,’’ in Proc. 9th Biennial Conf. Innov. Data Syst. Res.,
(CIDR), 2019, pp. 1–8. HODA M. O. MOKHTAR received the B.Sc.
[37] I. Trummer, J. Wang, D. Maram, S. Moseley, S. Jo, and J. Antonakakis,
(Hons.) and M.Sc. degrees from the Department
‘‘SkinnerDB: Regret-bounded query evaluation via reinforcement learn-
of Computer Engineering, Faculty of Engineering,
ing,’’ in Proc. Int. Conf. Manage. Data, Jun. 2019, pp. 1153–1170.
[38] J. Heitz and K. Stockinger, ‘‘Join query optimization with deep reinforce- Cairo University, in 1997 and 2000, respectively,
ment learning algorithms,’’ 2019, arXiv:1911.11689. and the Ph.D. degree in computer science from the
[39] R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska, ‘‘Bao: University of California at Santa Barbara, in 2005.
Making learned query optimization practical,’’ in Proc. Int. Conf. Manage. She is currently the Dean of the Faculty of Com-
Data, Jun. 2021, pp. 1275–1288. puting and Information Sciences, Egypt University
[40] A. Sharma, F. M. Schuhknecht, and J. Dittrich, ‘‘The case for auto- of Informatics. Before being the Dean, she was
matic database administration using deep reinforcement learning,’’ 2018, the Chair of the Information Systems Department,
arXiv:1801.05643. Faculty of Computers and Artificial Intelligence, Cairo University. In 2000,
[41] J. Wang, I. Trummer, and D. Basu, ‘‘UDO: Universal database optimization she was awarded a scholarship and the Dean’s Fellowship from the Com-
using reinforcement learning,’’ Proc. VLDB Endowment, vol. 14, no. 13, puter Science Department, UCSB. She taught multiple courses both for
pp. 3402–3414, Sep. 2021. the undergraduate and graduate levels at the Faculty of Computers and
[42] V. Mnih et al., ‘‘Human-level control through deep reinforcement learn- Artificial Intelligence, Cairo University, where she has supervised a number
ing,’’ Nature, vol. 518, pp. 529–533, Feb. 2015. of master’s and Ph.D. theses at the Faculty of Computers and Artificial
[43] (1996). Postgresql. [Online]. Available: https://www.postgresql.org/
Intelligence. She has participated in several national committees, and was
[44] D. P. Kingma and J. B. Adam, ‘‘A method for stochastic optimization,’’ in
awarded multiple awards and certificates for her academic achievements. Her
Proc. 3rd Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, 2015,
pp. 1–15. research interests include big data analytics, data warehousing, data mining,
[45] V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann, database systems, social network analysis, bioinformatics, and web services.
‘‘How good are query optimizers, really?’’ Proc. VLDB Endowment, vol. 9,
no. 3, pp. 204–215, Nov. 2015.
[46] (1993). TPC-H. [Online]. Available: http://www.tpc.org/tpch/
[47] (2015). Tensorflow. [Online]. Available: https://www.tensorflow.org
[48] (2015). Keras. [Online]. Available: https://keras.io/
[49] (1990). Imdb. [Online]. Available: https://www.imdb.com/ IBRAHIM SOBH received the B.Sc. and M.Sc.
[50] M. Dreseler, M. Boissier, T. Rabl, and M. Uflacker, ‘‘Quantifying TPC-H degrees in computer engineering from the Fac-
choke points and their optimizations,’’ Proc. VLDB Endowment, vol. 13,
ulty of Engineering, Cairo University, and the
no. 8, pp. 1206–1220, Apr. 2020.
Ph.D. degree in deep reinforcement learning for
fast learning agents acting in 3D environments.
Currently, he is a Senior Expert of AI at Valeo.
He has more than 20 years of experience in the
MOHAMED RAMADAN received the B.Sc. area of machine learning and software develop-
degree from the Information Systems Department, ment. His M.Sc. thesis is in the field of machine
Faculty of Computers and Artificial Intelligence, learning applied on automatic documents summa-
Cairo University, in 2016. He is currently a Teach- rization. He has participated in several related national and international
ing Assistant and a Researcher at the Faculty of mega projects, conferences and summits. He delivers training and lectures
Computers and Artificial Intelligence, Cairo Uni- for academic and industrial entities. His publications including international
versity. He has six years of experience in the area journals and conference papers are mainly in the machine and deep learning
of software development. fields. His research interests include computer vision, natural language
processing, and speech processing.