Advance Assessment Evaluation: A Deep-Learning Framework With Sophisticated Text Extraction For Unparalleled Precision
Advance Assessment Evaluation: A Deep-Learning Framework With Sophisticated Text Extraction For Unparalleled Precision
ISSN No:-2456-2165
Abstract:- Ai-based assessment scrutiny is the most modern education. The educational landscape continues
convenient and precise method to eliminate the repetitive to evolve, our research not only addresses current
task of answer grading; consisting of text extraction challenges but also lays the groundwork for future
methodologies and using Deep Learning Architecture to advancements in the field of educational assessment,
evaluate with reference to the correct answer and promising a new era of precision and adaptability.
Question provided. In the landscape of educational
assessment, the traditional methods of answer evaluation This paper includes text extraction from
face challenges in adapting to the dynamic and evolving architecture-based Convolutional Neural Networks
nature of learning. This paper proposes a complete end- (CNN), Recurrent Neural Networks (RNN), and
to-end answer-grading architecture that can be deployed transformers like an encoder-decoder transformer
to provide an interface for a fully automated- Deep- (whisper).
learning answer-grading mechanism.
Keywords:- Audio Evaluation, Text Extraction, Deep
This research introduces a groundbreaking Learning, Grading Answer, Whisper, PALM2, Flask.
approach to address these challenges, presenting a
solution that seamlessly integrates advanced text I. INTRODUCTION
extraction and deep learning architectures. Our
objective is to achieve unparalleled precision in answer In today's scenario, a significant number of competitive
evaluation, setting a new standard in the field. Our exams adopt a multiple-choice format, posing a challenge
method involves the extraction of audio files, precise text for students to provide detailed answers. When dealing with
extraction from audio, and a Deep Neural Networks a large student population, the manual evaluation of
DNN-based model for answer evaluation, based on a responses becomes practically unfeasible. With the surging
database that provides the correct answer and relevant demand for AI and software-related jobs, students aspire to
data is fetched. Proposing a reliable, accurate, easy-to- excel in these subjects. Considering these factors, we have
deploy best-in-class technology to eradicate manual developed an application that allows students to verbally
repetitive tasks. respond to given questions, with the system providing
automated evaluations. This recording process enhances
Providing a very user-friendly interface to the students' confidence in the subject matter and improves soft
student, and a dynamic backend to monitor results along skills like verbal communication. Students can take
with the high level of precision. These AI-based immediate action as the score is displayed within no time. It
evaluation methods can be used in numerous places in maximizes the automation for the evaluation; this not only
the evolving Education industry providing students with reduces costs by minimizing manual correction efforts but
a convenient interface and automation. The objective is also saves time as responses are recorded rather than
to elevate the precision and adaptability of answer written.
assessment methodologies in the dynamic landscape of
Fig. 1: CRISP-ML (Q) Methodological Framework, Outlining its Key Components and Steps Visually
Source: Mind Map - 360DigiTMG
The application employs the open-source Cross Neural Networks (RNN) [14], among others. The project
Industry Standard Practice for Machine Learning (CRISP- initiation involved thorough research into various
ML) methodology by 360DigiTMG. CRISP-ML(Q) [Fig.1] techniques. We recorded and gathered diverse audio
[1] is designed to guide the project lifecycle of a machine- samples, questions, and answers. Data visualization was
learning solution. Deep-learning techniques are extensively performed, and a model was developed, with comparisons
utilized for text extraction from audio and subsequent made to other models. The process involved the use of a
evaluation, incorporating diverse architectures such as NoSQL database and subsequent deployment. Monitoring
Convolutional Neural Networks [15] (CNN), Recurrent confirmed the system's high accuracy.
RAM 16 GB
GPU 16 GB
This above table [Table.1] represents the entire system requirement to build and execute this project.
Fig. 2: Architecture Diagram: explanation of the workflow of the [AI evaluation] project
Source: https://360digitmg.com/ml-workflow
Architecture Recurrent neural network (RNN) or transformer-based model Connectionist temporal classification
(CTC)
Feature
Handling Background Excels in handling background noise, including May face challenges with background noise,
Noise ambient room noise, outside noise, or music playing potentially impacting transcription accuracy
Music Performance Performs well even when the speaker is performing May struggle with accurate transcription
music (singing, rapping, spoken-word poetry) during musical performances
Error Reduction Reports 20% fewer additions of missing words, 45% May have higher rates of additions and
fewer corrections per transcription corrections in transcriptions
Accented English and Demonstrates high accuracy with English speakers May experience challenges with accented
Rapid Speech having accents and rapid speech English and rapid speech, potentially leading
to lower accuracy
Auto-Translation additional feature - auto-translation to English text Does not have similar unexpected features
Then now comes the second part, evaluating the audio is designed to emulate aspects of the human brain enabling
based on the question and assigning a score to it. To do this Llama 2 to generate contextually relevant and coherent text
we employed a range of llm models comprising llama-2 outputs in response to user inputs.
mistral-7b zephyr-7b and palm2.
Zephyr-7b
Llama-2 Zephyr-7B [11] is constructed with a combination of
Llama 2 [10] has undergone a fine-tuning process transformers, including transformers 4350 dev0 pytorch,
tailored for chat-related applications encompassing training 201 ku 118 datasets, and 2120 tokenizers 0140, boasting a
with a substantial dataset comprising over 1 million human vast parameter count of 7 billion. Trained extensively in
annotations. Furthermore, its fine-tuned models undergo diverse languages, it excels in tasks such as translation,
training with the aid of over 1 million human annotations summarization, analysis, and answering questions. The
enhancing their adaptability to various chat scenarios. training process involved a blend of public and synthetic
Notably, llama 2 exhibits the flexibility to undergo further datasets, employing direct preference optimization. Fine-
fine-tuning using newer data inputs. When users input a text tuning further enhances its capabilities, ensuring accurate
prompt or provide text to llama 2 through alternative means information retrieval tailored to specific queries. The
the model endeavors to predict the most plausible model's training corpus encompasses a wide range of
subsequent text. This predictive capability is achieved sources, including websites, articles, books, and more.
through a neural network characterized by a cascading
algorithm housing billions of variables commonly referred
to as parameters. This intricate neural network architecture
First, we land on the welcome page [Fig.5] of the [2]. Rafael Dantas Lero, Chris Exton, Andrew Le Gear,
application which provides 3 different boxes each Communications using a speech-to-text-to-speech
containing a description of the subject for the exam, apart pipeline, Published: International Conference on
from the discussion there is a link that redirects to a separate Wireless and Mobile Computing, Networking and
evaluation page [Fig.6]. On this page, we have a block in Communications (WiMob), 2019,
which the question is shown, and below that is the “start DOI:https://doi.org/10.1109/WiMOB.2019.8923157
recording” button which upon pressing starts recording the [3]. Pooja Panapana, Eswara Rao Pothala, Sai Sri
answer that the student speaks in English. On pressing the Lakshman Nagireddy, Hemendra Praneeth
stop recording button the audio recording gets completed the Mattaparthi& Niranjani Meesala Towards Automatic
audio gets evaluated in the backend and the answer is Bidirectional Conversion of Audio and Text: A
returned which is the score of the answer is shown out of 10. Review from Past Research, 2023, volume 716
DOI:https://link.springer.com/chapter/10.1007/978-3-
IV. CONCLUSION 031-35501-1_30
[4]. Lishan Zhang, Yuwei Huang, Xi Yang, Shengquan Yu
Our research shows a major step toward &Fuzhen Zhuang Towards An automatic short-answer
revolutionizing answer evaluation methodologies in the grading model for semi-open-ended questions, 2019
quickly changing field of educational technology. Our work DOI: https://doi.org/10.1080/10494820.2019.1648300
tackles the inherent challenges of traditional assessment [5]. Shuyu Li, ,Yunsick Sung towards Transformer-Based
approaches by integrating sophisticated Deep Learning Seq2Seq Model for Chord Progression Generation,
Architectures with advanced text extraction techniques. Our 2023 DOI: https://doi.org/10.3390/math11051111
project aims to improve the accuracy and flexibility of [6]. Changhan Wang, Yun Tang, Xutai Ma, Anne Wu,
answer evaluation and to make answer grading more Sravya Popuri, Dmytro Okhonko, Juan Pino | fairseq
convenient and scalable. It is also expected to be a S2T: Fast Speech-to-Text Modeling with fairseq, 2020,
significant development in the field of education. We DOI:https://doi.org/10.48550/arXiv.2010.05171
present a paradigm shift in automating the evaluation [7]. Vassil Panayotov, Guoguo Chen, Daniel Povey,
process by navigating the complexities of student responses Sanjeev Khudanpur towards Librispeech: An ASR
with the seamless integration of AI technologies, corpus based on public domain audio books| Published
particularly deep learning models. in IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), 2015 DOI:
REFERENCES https://doi.org/10.1109/ICASSP.2015.7178964
[8]. Xuedong Huang, A. Acero, F. Alleva, Mei-Yuh
[1]. Stefan Studer, Thanh Binh Bui, Christian Drescher, Hwang, Li Jian g& M. Mahajan towards whisper:
Alexander Hanuschkin, Ludwig Winkler, Steven Microsoft Windows highly intelligent speech
Peters and Klaus-Robert Muller, Towards CRISP- recognizer, 2002,
ML(Q): A Machine Learning Process Model with https://doi.org/10.1109/ICASSP.1995.479281
Quality Assurance Methodology, 2021, Volume 3, [9]. Awni Hannun, Carl Case, Jared Casper & Bryan
Issue 2.https://doi.org/10.3390/make3020020 Catanzaro Towards Deep Speech: Scaling up end-to-