Unit_2_NMU
Unit_2_NMU
Skills take away From This Project Noise Reduction Techniques, Feature
Extraction, Machine Learning and Deep
Learning Skills, Data Preprocessing and
Analysis Skills, Signal Processing
Problem Statement:
Speech recognition systems are critical for applications like virtual assistants,
transcription services, and voice-controlled devices. However, raw audio signals
often contain background noise, making accurate speech recognition
challenging. Additionally, extracting meaningful features from audio signals and
building robust acoustic models require advanced signal processing and
machine learning techniques.
Data Analysis
Visualization
Advanced Analytics
● Train a Hidden Markov Model (HMM) for acoustic modeling using the
extracted features.
● Implement a simple deep learning model (e.g., CNN or RNN) for
comparison.
● Evaluate the performance of both models using metrics like Word
Error Rate (WER) and accuracy.
Power BI Integration
Visualization
● Waveform Plots : Raw vs. noise-reduced audio signals.
● Spectrograms : Time-frequency representation of audio.
● Feature Plots : MFCCs, pitch, and energy distributions.
● Accuracy Metrics : Bar charts comparing HMM and deep learning model
performance.
● Power BI Dashboard : Interactive visualizations for business stakeholders.
Results
Project Evaluation
Data Set:
Data Set Link: Data
Data Set Explanation:
● A large-scale corpus of read English speech derived from audiobooks.
● Audio is sampled at 16 kHz, ensuring high-quality recordings.
● It is split into clean and noisy subsets for varied conditions.
● Subsets include 100-hour, 360-hour, and 500-hour splits for scalability.
● Transcriptions are manually curated and aligned with audio clips.
● Metadata includes speaker IDs and chapter information for additional
tasks.
● Preprocessed train-test splits facilitate easy benchmarking of ASR
models.
● Supports research in speaker verification, language modeling, and
synthesis.
● Metadata including speaker information and chapter details.
● Usage : Ideal for training and evaluating acoustic models.
Project Deliverables:
● Source Code
● A trained speech-to-text transcription model.
● A Power BI dashboard showcasing performance metrics.
● A report summarizing EDA findings, model performance, and evaluation
metrics.
● A fully functional speech recognition pipeline.
● A report detailing the methodology, results, and recommendations.
● A Power BI dashboard for business stakeholders.
● Pipeline Creation for the streamless execution of the problem statement
Timeline:
The project must be completed and submitted within 10 days from the assigned
date.