We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4
Introduction- The usage of computer-based technology Zakkaet al.
conducted a study between traditional
is growing in numerous directions due to its easy classrooms and e-learning to improve the e-learning availability, effectiveness, etc. Technological advancements environment to match the traditional one [3]. A framework like smartphones, laptops, and other intelligent devices help to detect the motivation level of learners is incorporated us to use online learning facilities termed E-learning. E- which senses emotion and sends feedback to the teacher learning platforms have become a significant tool for and further develops a response mechanism to distinguish knowledge sharing and understanding for almost every student, especially after the pandemic. E-learning has expressions and attention. numerous advantages like eco-friendliness as it saves paper, reduces the cost of traveling and time, etc. The students can The authors created a smart computer system (EAC Net) attend the classes from their places even if they are not that can better recognize facial expressions (AUs) [4]. feeling well. If one uses such devices for a longer time Unlike other methods, it doesn't need perfectly aligned within a close range of the screen, it can cause health faces and understands facial features more effectively. It problems like eye stiffness, Eye power loss, Headache, etc. showed big improvements in accuracy on a dataset. This During a pandemic situation like COVID-19, all parents and system is useful because it works well even if faces are at students were thinking of the futures of their children in different angles or partly covered; making it more versatile lockdown situations, but E-learning has solved this problem for real-world situations. and every student was capable of learning through their mobile phones and laptops. In [5] authors emphasized more on multi-modal emotion On the other hand, the physical education system has its recognition than the single modal ones. They used benefits like Interaction between teacher and students, Affectnet as their FER model. The use of predicted assessment of student’s understandability, hands-on sessions emotions extends beyond understanding student behavior to to make the students convenient to understand the topic, etc. include visual summarization of classroom films and To extract these features into E-learning, the primary classification of the group-level emotions on videos. objective of the proposed approach is “Engagement detection during E-Learning”. The system will be capable of Based on their behavior and biological data, several finding the engagement of the students during their classes, how they interact with teachers, and their lectures. In attempts have been made to ascertain the e-learners' traditional classroom teaching, teachers evaluate their level of focus through a neuro-fuzzy inference system that students’ learning effect, the level of understanding and tracks the position of the eye’s iris to find out the comprehension, by mainly observing students’ behavior. concentration level. In it, SVM is also used to determine The behavior aspects may include body language, eye gaze, concentration level [6]. This proposed model has an facial expressions, and emotions exhibited through vocal overhead of timing for the preprocessing of data which in feedback. Multiple researchers have proposed the use of the future could be eliminated by fully automating the natural language processing, hand gesture recognition, eye process so that it could be implemented in real time also. gaze estimation, facial emotion recognition, and body language detection to estimate learners' learning effects and Pise et al. employed a two-phase method to categorize 3D provide a measure that will provide a more effective learning experience [3]. face expression images extracted from the video to quantify the optical flow intensity with the help of the Hidden DEEP Learning Markov model and Naïve Bayes [7]. Also, any slight change in the video can be measured using video rather Nonverbal behavior of forty-four undergraduate students than just static images. was observed using a USB front-facing camera while participants completed an on-screen multiple-choice Using temporal appearance and facial landmark points question-and-answer test [1]. An ANN is fed the facial gestures are extracted and then integrated to best information on their question-answer scores and behavioral recognize the expression [8]. CNN is used for object patterns to classify their comprehension states almost detection, and feature extraction, in this model. In the instantly. Future work by the authors is planned to increase future, the framework could be extended to work in GPU- the classifier's accuracy. Furthermore, this investigation supported machines for a better training time. might be extended to examine the relationship between behavior and the type of question asked. In the future, this Researchers utilize multimodal learning analytics in online technique might also be used to analyze various kinds of education to comprehend the feelings and engagement of behaviors and mental states. students [9]. They use information from posture, gestures, and emotions to estimate students' level of engagement in In [2] a concept of similarity is introduced generally to the classroom. Computer vision techniques examine lecture preserve the actual data with features and group them footage and detect feelings such as joy or indifference. creating a degree of similarity among the pairs which is Some systems even use head and eye motions to categorize achieved through fuzzification. Authors have proposed to different levels of engagement. By taking into account the develop theoretical methods to generate rules and select feelings and participation of the students, these methods features based on the similarity relation in the future. seek to improve online instruction. Thiruthuvanathan et al. discussed the challenges in whether a student is paying attention and engaged in the detecting e-learners' engagement level from their facial material. The proposed model uses an SVM (Support expression recognition. They have proposed to extend their Vector Machines), Random Forest, Neural Networks, CNN model for detecting group-level engagement in the future (Convolutional Neural Networks), LSTM (Long Short- [10]. Term Memory), InceptionV3, and VGG16 for object recognition in video scenes to analyze students' facial In [11] the author is trying to remove the challenges and expressions, body postures, and hand gestures. The data set weaknesses of the mostly used blended learning model. considered for analyzing the model is too small consisting Additionally, the gamification concept has been applied. of only 45 students. The number of features considered to The fer and gamification system were developed using an detect engagement is also very small which could be enhanced in the future for a more accurate detection with object-oriented approach using unified modeling language an increased number of labels like neutral, low engaged, (UML). The methods used are ANN, CNN, and JavaScript highly engaged, etc. apart from only engaged or not library with open-source code TensorFlowJS. It has two engaged labels. stages of testing the facial expression recognition system and gamification application. Ozdamli et al. explore various algorithms and models for facial recognition in education [17]. For face detection and In [12] the authors have used a video dataset called recognition, it mentions software like MATLAB and Children's Spontaneous Facial Expressions (LIRIS-CSE) Python utilizing techniques like PCA and 3WPCA-MD. and proposed a system that uses Convolutional-Neural- Classifications are done with diverse algorithms like SVM, Network (CNN)-based models, such as VGG19, VGG16, Bayesian, or neural networks. When it comes to and Resnet50, for feature extractions and Support Vector recognizing emotions from facial expressions, the paper Machine (SVM) and Decision Tree (DT) for classification. discusses static models, Action Unit (AU) based models, This system will automatically recognize children's and the Facial Action Coding System (FACS). Deep expressions. Several experimental configurations, such as learning frameworks like CNNs are also increasingly used. 80–20% split, K-Fold Cross-Validation (K-Fold CV), and To detect cheating in online exams, models like Multi- leave-one-out cross-validation (LOOCV), are used to Class Markov Chain Latent Dirichlet Allocation and assess the system for both image and video-based supervised dynamic Bayesian models are employed. Various datasets like GI4E for gaze tracking and FEI for categorization. general face recognition tasks are used in this model. Keerthana et al. proposed a hybrid model for identifying The authors investigate methods for evaluating teaching student facial emotions by monitoring eye gaze and head quality and student engagement in classrooms [18]. movements to analyze student engagement levels [13]. Traditional approaches, including tests and observations, Haar cascade model and binary patterns are used to detect are criticized for their limitations in providing the head movements.CNN models are used for FER. Using comprehensive and real-time data. Attention shifts towards the OpenCV object detection framework authors assessing student engagement, defined across behavior, determined the status of the student, whether he is cognition, emotion, and social interaction dimensions. Distracted or engaged. Technological advancements, particularly in computer vision, offer non-invasive means to detect engagement [14] In this paper, the proposed framework calculates the through facial expression analysis and gaze tracking. concentration index of students. CNN model using Keras is Studies demonstrate high accuracy in predicting used with fer2013 datasets for emotion recognition. engagement levels using deep learning models. Overall, the Additionally, Mamdani MATLAB software is used to review highlights a growing interest in leveraging create fuzzy rule sets and implement membership functions technology to enhance classroom evaluation and improve utilizing the principles of Fuzzy Logic. It mainly deals with teaching practices. three major steps: face detection, feature extraction, and feature classification. Proposed models define structures for representing learning Dewan et al. explore various methods for learners’ behaviors in classrooms, enabling effective feature engagement detection and classify them into three main extraction [19]. Evaluation metrics demonstrate the efficacy categories —automatic, semi-automatic, and manual [15]. of machine learning models in accurately detecting and Each category is further divided into audio video and text categorizing student behaviors. More data sources could be depending on the type of data used for detection. The authors reviewed the automatic methods for engagement added to the model in the future. Advances in the computer detection in more detail as they found it more effective than vision field, specifically in Human Pose and Graph Neural other methods in the case of online learning platforms. Networks could be incorporated also to increase the They further discussed the challenges of all of these efficacy. methods and future directions about how they could be used in a more advanced way. The Authors have used Multimodal Emotion Recognition [16] In this paper, a framework using computer vision to in Multiparty Conversations (MERMC) which focuses detect student engagement is proposed. Facial expressions, more on audio and text more while ignoring visual body language, and other cues are used to determine information in [20]. It incorporates two-stage frameworks. Facial expression-aware Multimodal Multi-Task learning importance of understanding and addressing student and Multimodal facial expression-aware emotion engagement for effective digital education. recognition model which helps in extraction of face and help improve emotion recognition. They have planned to Dukicet al. discover connections between emotions, leverage multimodal fusion mechanisms to improve the activities, and gender in online learning [25]. Authors have performance of this task in the future. analyzed two different perspectives (1) classroom experiment-related and (2) FER data-related. To gather Conventional ML based feedback on active teaching strategies, students' emotions are tracked as they complete programming assignments. Communication requires the use of facial expressions, The focus of the methodology in this research was which differ between individuals and civilizations. primarily on the activity portion. However, the variance Effective teaching requires a knowledge of students' due to age difference is not taken into consideration in this emotions, especially with the development of online study. Authors have planned to use more data to experiment learning brought on by COVID-19. Understanding facial with this model with better camera positioning and sticker expressions allows educators to modify their approaches conditions on the behavior of the participants in the future. and add interest to their lessons. While negative emotions might cause disengagement, positive emotions support The authors have designed a framework to recognize academic progress. emotions and categorize into them into 3 parts. First is the face tracker. The second one is the facial motion tracking The authors conducted a thorough review of emotion optical flow algorithm and the third is the recognition classification on facial emotion recognition [21]. It engine. elaborates analysis on emotion classifiers and datasets used This method proposed using a channel attention network in FER. Different approaches considered by the researchers with depth separable convolution to enhance the linear for preprocessing and feature extractions are discussed. The bottleneck structure. When tested on the FER2013 dataset, authors highlighted the strengths and limitations of this it outperforms other methods significantly. Mainly because approach. Their study revealed that deep learning is the it pays more attention to extracting features, resulting in most commonly used approach for FER in the academic better accuracy in recognizing emotions [26] arena. Whereas the most used dataset and emotion classifier are DAiSEE and SVM respectively. In [27] Authors conducted a systematic review of existing frameworks for Facial Emotion Recognition (FER) and Zhanget. Al. developed an algorithm that can accurately how they are used in classifying academic emotions mainly detect students' engagement in online learning in the context of online learning. Authors observed that low environments [22]. Supervised learning is used to recognize illumination, Lack of frontal pose, and small size datasets students’ emotional gestures through which it distinguishes are some of the major hindrances in FER in e-learning. emotions like frustration, happiness, boredom, confusion, They suggested that long-term monitoring of facial etc. Sensors are being used in this research. Measures of the emotions through wearable sensors, continuous video degree of engagement are divided into 3 categories namely recording, and exclusion of potential human biases could single-sensor, multiple-sensor, and sensor-free methods. produce better accuracy in the case of FER in online learning. [23] The primary objective of this research is to explore reliable facial information models that can describe how Alkabany et al. proposed a methodology that assesses people interact in a learning environment. Different students' degree of engagement in both traditional approaches for automated recognition of student classroom settings and online learning environments [28]. engagement levels are studied. Authors stated that The suggested framework records the user's video and engagement recognition will be more effective and variant follows their faces as they move through the frames. It can in long-term learning situations rather than short-term be used for tracking the development of e-learners with studies of current scenarios. different levels of learning impairments and analyzing the impact of nerve palsy on social interactions and facial In [24] authors wanted to highlight the growing interest in expressions. Different features like facial fiducial points, facial expression recognition and eye tracking, to assess head pose, eye gaze, and learning features are extracted and enhance student engagement in digital learning from the video of the user’s face to detect the Facial Action environments. Various methods, including deep learning Coding System (FACS), which decomposes facial models and facial action coding systems, have been expressions in terms of the fundamental actions of explored to measure concentration levels and emotional individual muscles or groups of muscles (i.e., action units). states. These approaches aim to provide real-time feedback The student's behavioral engagement (i.e., willingness to to instructors, allowing for personalized adjustments to participate in the learning process) and emotional content delivery. Despite challenges such as dropout rates engagement (i.e., attitude toward learning) are then in virtual classrooms, ongoing research underscores the measured using these decoded action units (AUs). Emotional changes of 67 students during a lecture on Information technology are studied in [29]. The software is developed using Microsoft Emotion Recognition API and C# programming language to categorize the feelings of students into disgust, sadness, happiness, fear, contempt, anger, and surprise. The significance of the correlation of the student’s emotions with their departments, gender, lecture hours, the location of the computer in the classroom, lecture type, and session information is studied. Finally, the association between student’s emotional change and their achievements is analyzed to examine how the emotional recognition of students could contribute to increasing the overall quality of education.
In the research work done by Gong et al. [30], the high-
definition video of classroom teaching is recorded using the camera in front of the classroom. The faces of every student in the classroom are located and intercepted using the AdaBoost algorithm from the sampled frame images, and the images are pre-processed to produce an expression area of 64 by 64 pixels. After PCA+ dimensionality reduction, Gabor and ULBPHS feature fusion is integrated with the KNN classification algorithm for expression classification. At last, the assessment and results of the semotional learning of the students are achieved.