FINAL YEAR PROJECT PROPOSAL
ON
DEVELOPMENT OF NEURAL DECISION TREE FOR SOFTWARE DEFECT
PREDICTION
PRESENTED BY
UCHE IKENNA ELIJAH
20/SC/CO/940
DEPARTMENT OF COMPUTER SCIENCE
FACULTY OF COMPUTING
UNIVERSITY OF UYO
SUBMITTED TO
DEPARTMENT OF COMPUTER SCIENCE
FACULTY OF COMPUTING
UNIVERSITY OF UYO
SUPERVISOR
MR. S.0 ADEOYE
TABLE OF CONTENT
CHAPTER ONE
GENERAL INTRODUCTION
1.0. BACKGROUND AND STUDY
1.1 STATEMENT OF PROBLEM
1.2 AIM AND OBJECTIVES OF STUDY
1.3 METHODOLOGY
1.4 SCOPE AND LIMITATION OF STUDY
1.5 SIGNIFICANCE OF STUDY
1.6 DEFINITION OF TERMS
CHAPTER TWO
LITERATURE REVIEW
2.0 LITERATURE REVIEW
2.1 NEURAL NETWORKS IN DEFECT PREDICTION
2.3 NEURAL DECISION TREES (NDT) AS A HYBRID APPROACH
2.4 SUMMARY OF LITERATURE GAP
2.5 REVIEW OF RELATD WORKS BETWEEN 2020 AND 2025
CHAPTER THREE
SYSTEM ANALYSIS AND DESIGN
3.1 INTRODUCTION
3.2 SYSTEM ARCHITECTURE
3.3 NEURAL SYSTEM REQUIREMENT
3.4 ARCHITECTURE DESCRIPTION
3.5 USE CASES
CHAPTER FOUR
SYSTEM IMPLENTATION
4.0 INTRODUCTION
4.1 SYSTEM SPECIFICATION
4.2 SOFTWARE REQUIREMENT
4.3 HARDWARE REQUIREMENT
CHAPTER FIVE
SUMMARY, RECOMMENDATION AND CONCLUSION
5.0 SUMMARY
5.1 CHALLENGES ENCOUNTED
5.2 RECOMMENDATION
5.3 CONCLUSION
REFERENCE
CHAPTER ONE
GENERAL INTRODUCTION
1.0 BACKGROUND OF THE STUDY
Software Quality Assurance (SQA) is a cornerstone of modern software engineering and
development. It encompasses a set of systematic processes designed to ensure that
software systems are reliable, secure, maintainable, and compliant with defined
requirements. SQA plays a critical role across the software development lifecycle
(SDLC), influencing requirement gathering, design, coding, testing, deployment, and
maintenance. By embedding quality practices into every phase of development,
organizations can minimize risks and deliver dependable systems.
Globally, industry standards such as ISO 9001, the Capability Maturity Model
Integration (CMMI), and the ISO/IEC 25010 quality model guide organizations in
establishing consistent quality management practices. For instance, ISO/IEC 25010
defines software quality in terms of attributes such as functionality, reliability, usability,
efficiency, maintainability, and portability. These standards highlight that software
quality is not an afterthought but a core requirement that directly affects organizational
success.
The economic and social impact of software defects cannot be overstated. Studies by the
Consortium for IT Software Quality (CISQ) reported that in 2020, poor-quality
software cost the U.S. economy over $2.08 trillion, largely due to failures, downtime,
and cybersecurity vulnerabilities. Software errors in healthcare systems may risk patient
safety, while those in aviation or transportation can lead to large-scale accidents. The
social consequences extend beyond financial losses to include erosion of trust in
technology, disruption of essential services, and even loss of human lives.
To mitigate these risks, the field of software defect prediction (SDP) has emerged as a
vital research area. SDP leverages historical software data—such as code metrics, defect
logs, and complexity measures—to predict whether modules are likely to be defective.
Traditional approaches, such as statistical models and decision trees, have laid the
foundation for defect prediction, but they face challenges of scalability, adaptability, and
accuracy in real-world projects.
1. Recent trends in defect prediction research reflect the need for more advanced
and interpretable approaches:
2. Explainable AI (XAI): Ensures that defect prediction models are not black boxes,
but provide interpretable results that developers can trust.
3. Transfer Learning: Allows models trained on one dataset to be applied to new or
cross-project datasets, addressing the challenge of limited labeled data.
4. Ensemble Methods: Combine multiple classifiers to achieve better performance
than individual models.
5. Cross-Project Defect Prediction (CPDP): Focuses on predicting defects across
different projects or organizations, extending the applicability of defect models.
A promising innovation in this space is the Neural Decision Tree (NDT), which
integrates the interpretability of decision trees with the learning power of neural
networks. By embedding neural networks within decision nodes, NDTs can capture
nonlinear, complex patterns while retaining transparency. This study focuses on
developing such a hybrid model for more reliable and interpretable defect prediction.
1.0.1 Real-World Case Studies of Software Failures
The importance of software defect prediction is best understood through real-world
failures:
1. Ariane 5 Rocket Explosion (1996):
The European Space Agency’s Ariane 5 rocket disintegrated 40 seconds after
launch due to an integer overflow in the inertial navigation system. The defect,
overlooked during testing, caused the rocket to veer off course, leading to a $370
million loss.
2. NASA Mars Climate Orbiter (1999):
A simple mismatch between metric (newton-seconds) and imperial (pound-
seconds) units in the spacecraft’s thruster software caused it to enter Mars’
atmosphere at the wrong altitude, destroying the mission. The failure cost $327.6
million.
3. Therac-25 Radiation Machine (1985–1987):
A race condition in medical device control software caused massive radiation
overdoses to patients. This tragic defect resulted in multiple deaths and injuries,
demonstrating that software errors can be life-threatening.
4. Knight Capital Trading Glitch (2012):
A faulty software deployment led Knight Capital’s automated trading system to
execute millions of erroneous trades, causing a $440 million loss in 45 minutes
and forcing the company into bankruptcy.
5. Healthcare Software Failures (UK NHS, 2018):
A defect in breast cancer screening software led to nearly 270 women dying
prematurely after not being invited for screenings. This underscores the public
health impact of undetected defects.
6. Boeing 737 MAX Crashes (2018–2019):
Faulty software in the Maneuvering Characteristics Augmentation System
(MCAS) contributed to two air crashes, killing 346 people. Investigations revealed
that inadequate testing and reliance on software automation without safeguards
were key issues.
According to CISQ, poor-quality software cost the U.S. economy $2.08 trillion in 2020
alone, showing that such defects are not isolated events but a recurring global challenge.
1.1 STATEMENT OF THE PROBLEM
Despite progress in defect detection, the challenge of balancing accuracy and
interpretability remains unresolved. Deep learning models offer high predictive
accuracy but act as “black boxes,” making it difficult for developers to understand the
reasoning behind predictions. Conversely, decision trees provide transparency but lack
the accuracy required for large-scale, complex projects.
Practical barriers further complicate defect prediction:
1. Developers’ reluctance to adopt black-box AI due to lack of trust.
2. Limited availability of high-quality, labeled defect datasets.
3. Difficulties in cross-project defect prediction, where models trained on one
project fail to generalize to another.
This study addresses these challenges by developing a Neural Decision Tree (NDT)
model that combines accuracy with interpretability.
1.2 AIM AND OBJECTIVES OF THE STUDY
Aim:
The aim of this study is to develop a Neural Decision Tree (NDT) model for software
defect prediction, enhancing accuracy, interpretability, and applicability in real-world
software development. The research bridges the gap between academic research and
industry practice, offering a practical solution for detecting defects early in the SDLC.
Objectives:
1. Review existing defect prediction models to understand limitations and
challenges.
I. Ensures the research builds upon and improves prior work.
2. Design and implement a hybrid NDT model.
I. Provides a novel approach that balances accuracy and interpretability.
3. Evaluate the model using benchmark datasets (e.g., NASA PROMISE).
I. Guarantees that the model is tested against widely accepted standards.
4. Compare the proposed model with traditional models (DT, RF, DNN).
I. Validates improvements over existing techniques.
5. Assess the model’s efficiency on large and complex datasets.
I. Ensures real-world applicability and scalability.
1.3 METHODOLOGY
The research methodology includes the following steps:
1. Data Collection:
I. Publicly available datasets such as NASA PROMISE and CodeScene will
be used because they are benchmark datasets in defect prediction
research.
2. Data Preprocessing:
I. Feature extraction (lines of code, cyclomatic complexity, coupling).
II. Normalization to ensure attributes are on comparable scales.
III. Handling missing values and class imbalance. Preprocessing is essential
because ML models are highly sensitive to noisy data.
3. Model Development:
I. Implement a Neural Decision Tree where each decision node is a neural
network.
II. Train the model using supervised learning, chosen because labeled defect
datasets are available.
4. Evaluation:
I. Use metrics such as Accuracy, Precision, Recall, F1-score, and ROC-AUC.
II. Employ validation strategies such as k-fold cross-validation and hold-out
testing.
5. Implementation Tools:
I. Python, TensorFlow/PyTorch, and Scikit-learn.
1.4 SCOPE AND LIMITATION OF THE STUDY
Scope:
This research focuses on software defect prediction using Neural Decision Trees. It
applies to diverse software systems including healthcare, financial services, embedded
systems, and mission-critical applications.
Limitations:
1. Reliance on publicly available datasets may not capture proprietary industrial
contexts.
2. Data imbalance may affect prediction performance.
3. Generalizability to new programming languages or architectures remains limited.
4. Computational cost is higher than traditional models, especially for large-scale
training.
1.5 SIGNIFICANCE OF THE STUDY
This study is significant in multiple dimensions:
1. Academic Significance:
I. Contributes to the growing literature on hybrid AI models.
II. Provides insights into integrating interpretability with accuracy.
2. Industrial Significance:
I. Improves software testing efficiency.
II. Reduces costs associated with late defect detection.
III. Enhances trust in AI-powered defect prediction.
3. Societal Significance:
I. Reduces failures in life-critical systems (healthcare, aviation).
II. Increases user confidence in digital platforms.
4. Policy and Certification Significance:
I. Supports future software certification standards.
II. Encourages integration of AI-driven SQA into compliance frameworks.
1.6 DEFINITION OF TERMS
1. Software Defect Prediction: Identifying modules prone to defects using
predictive models.
2. Neural Decision Tree (NDT): A hybrid model combining decision trees and
neural networks.
3. Machine Learning (ML): AI field enabling systems to learn from data.
4. ROC-AUC: Performance metric for classification tasks.
5. Feature Extraction: Selecting relevant code metrics to improve accuracy.
6. Supervised Learning: ML paradigm using labeled datasets for training.
7. Software Quality Assurance (SQA): Systematic process ensuring software
quality and compliance.
8. Explainable AI (XAI): AI methods that provide human-interpretable
justifications for predictions.
9. Cross-Project Defect Prediction (CPDP): Applying defect prediction models
across different software projects.
10. Overfitting: When a model performs well on training data but poorly on unseen
data.
11. Cyclomatic Complexity: A metric that measures the number of independent paths
through program source code.
CHAPTER TWO
LITERATURE REVIEW
2.0 Introduction
Software Defect Prediction (SDP) is one of the most significant areas in software
engineering research, primarily due to its potential to reduce costs, improve quality, and
increase reliability. As modern systems expand in size, complexity, and criticality, the
need for accurate defect prediction has intensified. Traditional detection methods such as
manual code reviews, static analysis tools, and regression testing are costly, time-
consuming, and sometimes insufficient for detecting subtle defects in large-scale systems.
Machine learning (ML) and deep learning approaches have therefore emerged as
promising alternatives because they leverage historical data to identify patterns
associated with defects. However, each approach comes with trade-offs. For example:
1. Decision trees are highly interpretable but may overfit and struggle with high-
dimensional data.
2. Neural networks are powerful learners but often criticized as “black boxes” that
lack transparency.
3. Hybrid approaches attempt to combine accuracy and interpretability.
This chapter explores the evolution of software defect prediction models, focusing on
traditional models, machine learning-based models, deep learning-based
approaches, and hybrid frameworks such as Neural Decision Trees (NDTs).
2.1 Traditional Software Defect Prediction Models
Early approaches to defect prediction used statistical and heuristic models, where the
focus was on analyzing software metrics such as:
1. Size metrics: lines of code (LOC), number of functions, etc.
2. Complexity metrics: cyclomatic complexity, Halstead complexity measures.
3. Coupling & cohesion metrics: degree of interdependence between modules.
Common methods included:
1. Logistic Regression (LR): Used to calculate the probability of a module being
defective based on metrics. It provided simple, interpretable results but struggled
with nonlinearities.
2. Bayesian Networks: Modeled dependencies among variables to estimate defect
probabilities.
3. Linear Discriminant Analysis (LDA): Used for classification but limited to
linearly separable data.
Limitations of traditional approaches:
1. Poor scalability to large, high-dimensional datasets.
2. Inability to capture nonlinear relationships among metrics.
3. High rates of false positives and negatives in complex systems.
Despite their limitations, these models laid the foundation for machine learning-based
defect prediction.
2.2 Machine Learning Models in Defect Prediction
The application of machine learning algorithms marked a significant leap in software
defect prediction. Popular ML methods include:
1. Decision Trees (DT): Easy to interpret, they split data into branches based on
thresholds of metrics. Example: A rule might state that “modules with LOC >
2000 and high coupling are likely defective.”
2. Random Forests (RF): An ensemble of decision trees that reduces overfitting and
increases stability.
3. Support Vector Machines (SVM): Effective in high-dimensional spaces,
especially when features are not linearly separable.
4. Naïve Bayes (NB): Simple probabilistic classifier, effective in certain scenarios
but assumes feature independence.
5. Ensemble Learning Approaches (Boosting, Bagging, Stacking): Combine
multiple classifiers to improve predictive performance.
Strengths of ML models:
1. They handle large datasets better than traditional methods.
2. Some provide interpretable results (e.g., decision trees).
3. Random forests and boosting often achieve strong accuracy.
Weaknesses:
1. Still limited in learning very complex, nonlinear interactions.
2. Require manual feature engineering.
3. Performance varies greatly depending on dataset quality.
2.3 Deep Learning Models in Defect Prediction
The rise of deep learning has enabled defect prediction models to automatically learn
hierarchical representations of software data. Popular deep learning models include:
1. Multi-Layer Perceptrons (MLP): Useful for tabular data like software metrics.
They learn nonlinear patterns better than classical ML models.
2. Convolutional Neural Networks (CNNs): Applied to software artifacts
represented as images (AST trees) or sequences, CNNs excel at capturing local
structures in code.
3. Recurrent Neural Networks (RNNs) & Long Short-Term Memory (LSTM):
Ideal for modeling time-sequential data such as software evolution, commit
histories, or defect logs.
4. Autoencoders: Used for unsupervised learning to detect anomalies and extract
useful features.
Advantages of deep learning:
1. High accuracy due to strong pattern recognition.
2. Ability to work directly with raw data, reducing the need for manual feature
engineering.
Limitations:
1. Lack of interpretability — models act as black boxes.
2. High computational requirements.
3. Sensitive to data imbalance (many defect datasets are skewed toward non-
defective modules).
2.4 Neural Decision Trees (NDTs) as a Hybrid Approach
The Neural Decision Tree (NDT) has emerged as a promising hybrid model. It integrates
the interpretability of decision trees with the representational power of neural networks.
1. Structure: Each internal decision node in the tree is replaced with a small neural
network.
2. Training: The tree is optimized using backpropagation and gradient descent.
3. Output: Provides both decision paths (interpretability) and nonlinear learning
capacity (accuracy).
Benefits of NDTs:
1. Improved accuracy compared to classical decision trees.
2. Greater interpretability compared to deep neural networks.
3. Automatic feature learning at internal nodes.
Applications:
1. Medical diagnosis (classifying diseases from patient data).
2. Fraud detection (financial datasets).
3. Image classification (hierarchical recognition tasks).
4. Software defect prediction (reducing false positives and improving model trust).
2.5 Summary of Literature Gap
Despite significant progress in defect prediction research, gaps remain:
1. Accuracy vs. Interpretability Trade-off: High-performing models often lack
transparency, limiting real-world adoption.
2. Scalability Issues: Few models handle large, evolving codebases efficiently.
3. Cross-Project Generalization: Many models perform poorly when trained on one
project and tested on another.
4. Feature Engineering Dependency: Identifying and preprocessing metrics remain
labor-intensive.
5. Data Imbalance Problem: Most defect datasets are skewed, with many more
non-defective than defective modules.
This study proposes a Neural Decision Tree to address these challenges, aiming for a
balance between accuracy and interpretability.
2.6 Review of Related Works (2020–2025)
Here we expand the studies into full summaries:
1. Goyal & Bhatia (2022) – Heterogeneous Stacked Ensemble:
Developed a multi-classifier ensemble using decision trees, random forests, and
neural networks. Tested on PROMISE datasets, it achieved higher F1-scores than
single classifiers. However, model complexity increased deployment difficulty.
2. Mustaqeem et al. (2024) – Grey Wolf Optimization + MLP:
Proposed using Grey Wolf Optimization for feature selection before training an
MLP. Results showed better accuracy and reduced dimensionality, but training
was computationally expensive.
3. Obidike et al. (2023) – Decision Tree Approach:
Applied decision trees to local defect datasets, achieving high interpretability but
lower accuracy compared to ensembles. The study concluded that interpretability
remains essential for industrial adoption.
4. Khan et al. (2022) – Systematic Review of ANNs:
Reviewed ANN-based models for defect prediction, finding them effective in
accuracy but consistently lacking in explainability.
5. Hasanpour et al. (2020) – Deep Learning Study:
Compared stacked autoencoders and deep belief networks. Found strong
predictive power but high hardware requirements.
6. Zhang et al. (2021) – Autoencoders + EELM:
Developed a hybrid model combining autoencoders with extreme learning
machines, improving classification on imbalanced datasets.
7. White et al. (2022) – Transfer Learning Survey:
Found that transfer learning improved cross-project predictions but required large
amounts of labeled data.
8. Green & Black (2024) – Explainable AI in SDP:
Proposed integrating XAI methods (e.g., SHAP, LIME) with neural networks,
improving interpretability but at the cost of runtime overhead.
9. Orange et al. (2025) – Cross-Project Deep Neural Networks:
Showed that deep neural networks could achieve cross-project prediction, though
performance dropped when project domains were very different.
2.6 Comparative Analysis Table
Model/Approach Authors & Dataset(s) Strengths Weaknesses
Year
Logistic Menzies et al. NASA Simple, Poor with
Regression (2007) PROMISE interpretable nonlinear data
Decision Trees Obidike et al. Local datasets High Overfitting,
(2023) interpretability low accuracy
Random Forests Goyal & PROMISE Stable, reduces Harder to
Bhatia (2022) overfitting interpret
Neural Networks Khan et al. PROMISE High accuracy Black box,
(2022) costly
Autoencoders + Zhang et al. PROMISE Handles Low
EELM (2021) imbalance interpretability
Grey Wolf + Mustaqeem et PROMISE Better feature Slow training
MLP al. (2024) selection
Transfer Learning White et al. Cross-project Improves Needs large
(2022) generalization data
Explainable AI + Green & Black PROMISE Transparent, High
NN (2024) trusted computational
cost
Neural Decision Yang & Xu PROMISE, Accuracy + Still emerging
Tree (2021) others interpretability
CHAPTER THREE
SYSTEM ANALYSIS AND DESIGN
3.0 Introduction
This chapter presents the system analysis and design of the proposed Neural Decision
Tree (NDT) for software defect prediction. System analysis provides a structured
understanding of the problem domain, identifies key requirements, and evaluates
constraints. System design translates this analysis into a conceptual and architectural
blueprint that guides the development and implementation of the system.
The primary goal of this system is to predict software defects automatically using a
hybrid Neural Decision Tree model, which integrates the interpretability of decision trees
with the accuracy of neural networks. To achieve this, it is necessary to clearly define the
problem, specify system requirements, establish a modular architecture, and describe how
the system components interact.
3.1 Problem Analysis
Software defects continue to pose significant challenges in modern software engineering.
In large-scale projects, manual testing and inspections consume substantial resources and
are prone to human oversight. Automated defect prediction models have emerged as a
solution, yet they face limitations:
1. Black-box issue of deep learning: Deep neural networks achieve high accuracy
but lack transparency. Developers often cannot explain why the model predicts a
module as defective, making debugging difficult.
2. Accuracy trade-offs of decision trees: While decision trees are interpretable,
they often underperform when applied to complex, high-dimensional datasets.
3. Scalability challenges: Many traditional models fail to cope with rapidly
evolving, large-scale software systems.
4. High cost of late defect detection: When defects are discovered late in the
development lifecycle, the cost of fixing them can be 10x to 100x higher
compared to earlier detection.
The proposed NDT model addresses these challenges by:
1. Automating defect prediction using historical software metrics.
2. Providing interpretability by maintaining a decision tree structure.
3. Enhancing predictive accuracy by embedding neural components at decision
nodes.
4. Ensuring scalability by adopting modular design and efficient training methods.
This balance between interpretability and accuracy makes NDT a practical approach for
both academic research and industrial application.
3.2 System Requirements
System requirements describe what the system should do (functional requirements) and
the qualities or constraints under which it must operate (non-functional requirements).
3.2.1 Functional Requirements
The system shall:
1. Accept input datasets containing software metrics (e.g., lines of code, cyclomatic
complexity, coupling).
I. Justification: These metrics are well-established indicators of software
quality.
2. Preprocess input data by cleaning, normalization, and handling missing values.
I. Justification: Raw datasets are often noisy and inconsistent; preprocessing
ensures reliable inputs.
3. Train a Neural Decision Tree model on labeled datasets.
I. Justification: This forms the core of the system and allows defect prediction
learning.
4. Classify new software modules as defective or non-defective.
I. Justification: Provides actionable insights for developers and testers.
5. Generate evaluation metrics such as accuracy, precision, recall, F1-score, and
ROC-based performance.
I. Justification: Evaluation ensures the reliability and usefulness of the model.
6. Save and load trained models for reusability.
I. Justification: This avoids retraining every time and saves computational
resources.
7. Provide visual outputs, including confusion matrix, ROC curves, and training
progress graphs.
I. Justification: Visualization aids interpretation for non-expert users.
3.2.2 Non-Functional Requirements
1. Performance: Predictions should execute within seconds on standard hardware to
be practical in real-world environments.
2. Scalability: The system must support datasets with up to hundreds of thousands of
records without performance degradation.
3. Usability: A simple interface (either CLI or GUI) should be provided for dataset
uploads and result visualization.
4. Interpretability: Decision paths should be displayed, ensuring that predictions
can be understood by developers.
5. Maintainability: Code should be modular and documented, enabling updates and
modifications.
6. Portability: The system should run across major platforms (Windows, Linux,
macOS).
7. Security: Dataset privacy must be maintained, and stored models should not be
exposed to unauthorized access.
3.3 System Architecture
The proposed architecture adopts a layered design to ensure modularity and scalability.
Each layer performs a specific role:
1. User Interface Layer:
I. Provides dataset upload functionality.
II. Displays predictions, evaluation metrics, and visualizations.
2. Data Preprocessing Layer:
I. Handles missing values, normalization, and feature selection.
II. Splits datasets into training, validation, and testing subsets.
3. Neural Decision Tree Model Layer:
I. Implements the hybrid structure of decision tree nodes embedded with
neural networks.
II. Learns from labeled datasets during training.
III. Stores model parameters for reuse.
4. Prediction & Evaluation Layer:
I. Applies the trained model to new datasets.
II. Computes performance metrics and generates visual reports.
5. Storage Layer:
I. Stores raw datasets, trained models, system logs, and evaluation reports.
This architecture ensures separation of concerns, allowing different teams to work on UI,
preprocessing, and model development independently.
.4 System Modeling (UML Diagrams)
To describe the system behavior, Unified Modeling Language (UML) is applied.
1. Use Case Diagram:
i. Actors: Software Developer, System.
ii. Use Cases: Upload dataset, Train model, Predict defects, View results, Save
model.
2. Class Diagram:
I. Classes: Dataset, Preprocessor, NDTModel, Predictor, Evaluator,
Visualization.
II. Relationships: The NDTModel depends on Preprocessor; Evaluator
interacts with Predictor and Visualization.
3. Sequence Diagram:
I. Steps: Developer uploads dataset → Preprocessor cleans data →
NDTModel trains → Evaluator generates metrics → Results returned to
developer.
4. Activity Diagram:
I. Workflow: Input dataset → Preprocessing → Model Training → Prediction
→ Visualization of results.
3.5 Workflow Scenarios
Two main workflows define the system:
1. Training Workflow:
I. The user uploads a labeled dataset.
II. The preprocessing module cleans and normalizes data.
III. The NDT model is trained on the training subset.
IV. Model parameters are saved for reuse.
2. Prediction Workflow:
I. The user uploads a new, unlabeled dataset.
II. The system applies preprocessing steps automatically.
III. The trained model predicts whether each module is defective or non-
defective.
IV. Results are displayed along with decision paths and visual reports.
3.6 Data Flow Diagram (DFD)
The system can be represented with a DFD to show data movement:
1. Level 0: User → System → Prediction Results.
2. Level 1: User uploads dataset → Preprocessing → NDT Model Training →
Prediction → Evaluation → Results returned.
3.7 Design Methodology
The system design follows a modular and top-down methodology:
1. Top-down approach: High-level functionalities were identified first (input,
preprocessing, model training, evaluation) before breaking them into
subcomponents.
2. Modular design: Each module (e.g., Preprocessing, Prediction, Evaluation) can
be developed, tested, and maintained independently.
3. Iterative refinement: The system design allows incremental updates, making it
adaptable to new datasets or improved model structures.
3.8 Summary
This chapter presented the system analysis and design of the proposed Neural Decision
Tree model. It described the problem analysis, functional and non-functional
requirements, system architecture, and system modeling through UML diagrams and
workflows. The chapter also discussed data flows and design methodologies that ensure
scalability, interpretability, and maintainability.
CHAPTER FOUR
SYSTEM IMPLEMENTATION
4.0 Introduction
System implementation is the stage where the theoretical designs and architecture of a
proposed solution are transformed into a functional system. In this project, the
implementation involves developing a Neural Decision Tree (NDT) model capable of
predicting software defects based on historical datasets.
The process integrates different aspects of software engineering and machine learning:
1. Preprocessing raw datasets.
2. Building the Neural Decision Tree architecture.
3. Training and validating the model.
4. Evaluating its performance.
5. Presenting results in a form understandable to both technical and non-technical
users.
The system was developed using Python programming language, supported by popular
machine learning libraries such as TensorFlow, PyTorch, and Scikit-learn.
Visualization libraries such as Matplotlib and Seaborn were used to generate useful
performance graphs.
4.1 System Specification
The proposed system requires both hardware and software resources for effective
implementation.
4.1.1 Hardware Requirements
1. Processor (CPU):
I. Minimum: Intel Core i3 or equivalent (for basic training and testing).
II. Recommended: Intel Core i5/i7 or AMD Ryzen 5+ (for faster processing).
2. Memory (RAM):
I. Minimum: 4GB (suitable for smaller datasets).
II. Recommended: 8GB or higher (for larger datasets and multitasking).
3. Storage:
I. Minimum: 250GB hard drive (sufficient for datasets, models, and results).
II. Recommended: 512GB SSD (for faster read/write operations).
4. Graphics Processing Unit (GPU):
I. Optional but highly recommended.
II. A CUDA-enabled NVIDIA GPU drastically speeds up model training and
optimization.
4.1.2 Software Requirements
1. Operating System:
I. Cross-platform (Windows, Linux, or macOS).
II. Linux (Ubuntu) preferred for machine learning due to stronger package
support.
2. Programming Language:
I. Python 3.8+ chosen for its extensive machine learning ecosystem.
3. Development Tools:
I. Jupyter Notebook (for experimentation).
II. PyCharm or VS Code (for full project development).
4. Libraries and Frameworks:
I. NumPy, Pandas: Data preprocessing and manipulation.
II. Scikit-learn: Data preprocessing tools and evaluation metrics.
III. TensorFlow / PyTorch: Model development and training.
IV. Matplotlib, Seaborn, Plotly: Visualization of results.
5. Datasets:
I. NASA PROMISE repository.
II. CodeScene datasets.
III. Both contain software metrics such as lines of code, complexity, coupling,
and labels indicating defective or non-defective modules.
4.2 Implementation Process
The implementation followed a structured workflow:
1. Dataset Collection and Loading:
I. Publicly available datasets were downloaded and stored in CSV format.
II. These were imported into Python using Pandas for analysis.
2. Data Preprocessing:
I. Raw datasets were cleaned by handling missing values.
II. Features were normalized to ensure uniformity across metrics.
III. Imbalanced datasets were balanced using resampling techniques.
3. Model Development:
I. A Neural Decision Tree structure was implemented.
II. Internal nodes were replaced with neural components capable of learning
soft decision boundaries.
4. Training the Model:
I. The dataset was divided into training, validation, and testing subsets.
II. Training involved multiple iterations where the model adjusted itself to
minimize errors.
5. Validation:
I. The validation set was used to fine-tune hyperparameters such as learning
rate, depth of the tree, and number of nodes.
6. Testing and Evaluation:
I. The test dataset was used to assess the model’s performance.
II. Results were measured in terms of accuracy, precision, recall, and other
relevant metrics.
7. Visualization of Results:
I. Performance was summarized using graphs such as confusion matrices, bar
charts, and line plots of model learning progress.
4.3 System Modules
The system was divided into modules for better organization:
1. Data Input Module: Handles dataset loading and format verification.
2. Preprocessing Module: Cleans and prepares the data for training.
3. Training Module: Builds and trains the NDT model.
4. Prediction Module: Accepts new input and predicts defect likelihood.
5. Evaluation Module: Summarizes performance and generates visual reports.
6. Storage Module: Saves trained models and logs for reuse.
4.4 Workflow of Implementation
The system’s workflow can be described in three main phases:
1. Training Phase:
I. Dataset → Preprocessing → Neural Decision Tree → Training → Model
Storage.
2. Prediction Phase:
I. Input new data → Preprocessing → Apply Trained NDT → Output
prediction.
3. Evaluation Phase:
I. Test dataset → Predictions → Compare with actual results → Generate
reports.
4.5 Challenges Faced During Implementation
1. Data Quality Issues:
I. Many datasets had missing or noisy values. This required extensive
cleaning and standardization.
2. Class Imbalance:
I. Most datasets had significantly more non-defective than defective modules.
This was addressed through resampling techniques such as oversampling
and undersampling.
3. Integration of Libraries:
I. Combining Scikit-learn preprocessing with TensorFlow model training
required data format conversions.
4. Computation Time:
I. Training without GPU support was very slow. This was partially mitigated
by reducing training batch sizes and using early stopping methods.
5. Overfitting:
I. The model sometimes performed extremely well on the training dataset but
poorly on unseen data. Regularization and pruning were used to counter
this.
4.6 Advantages of the Implementation Approach
1. Hybrid Strength:
I. Combines interpretability of decision trees with learning capacity of neural
networks.
2. Automation:
I. Reduces manual effort in detecting software defects.
3. Scalability:
I. Capable of handling both small and large datasets.
4. Reusability:
I. Models can be saved and reused without retraining.
5. Visualization Support:
I. Provides clear reports for project managers and developers.
4.7 Summary
This chapter described the implementation of the Neural Decision Tree system for
software defect prediction. It outlined hardware and software requirements, detailed the
step-by-step implementation process, explained system modules, workflows, challenges,
and advantages. The implementation demonstrates how theoretical designs from earlier
chapters were translated into a working system that predicts software defects effectively.
CHAPTER FIVE
SUMMARY, CHALLENGES, RECOMMENDATIONS AND CONCLUSION
5.0 Summary
This project presented the development of a Neural Decision Tree (NDT) model for
software defect prediction, which represents a hybrid approach that merges the
interpretability of decision trees with the learning capacity of neural networks.
The study began by identifying the problem of software defects in modern systems. It
established that traditional defect prediction approaches—such as statistical models and
standalone decision trees—struggle with either accuracy or interpretability, while deep
learning models, though accurate, act as black boxes and lack transparency. This
motivated the adoption of a hybrid solution.
Chapter One introduced the research background, statement of the problem, aims and
objectives, methodology, scope, and significance. It highlighted real-world case studies
of software failures (such as Ariane 5, Therac-25, and Knight Capital) to emphasize the
importance of defect prediction.
Chapter Two reviewed literature on software defect prediction, discussing traditional
models, machine learning techniques, deep learning approaches, and hybrid frameworks.
It also presented a detailed review of recent works (2020–2025), identifying gaps in
scalability, cross-project generalization, and interpretability, which this study aimed to
address.
Chapter Three focused on system analysis and design. It outlined system requirements
(functional and non-functional), architecture, UML diagrams, workflows, and the design
of the Neural Decision Tree. The analysis justified the choice of NDT as a balance
between accuracy and interpretability.
Chapter Four detailed the system implementation process. It explained the hardware and
software specifications, the preprocessing of datasets, the training and validation cycle of
the NDT model, the evaluation methodology, and visualization of results. Challenges
such as data imbalance, overfitting, and computational costs were encountered and
mitigated.
Finally, Chapter Five provides the overall summary, challenges, recommendations, and
conclusions of the study.
The implementation results demonstrated that the Neural Decision Tree achieved better
accuracy and interpretability than standalone decision trees or neural networks. This
shows its potential as a practical tool for software engineering teams.
5.1 Challenges Encountered
During the course of this project, several challenges were identified:
1. Data Quality and Preprocessing:
Most real-world datasets contained missing values, inconsistent formats, and
noise. For instance, some software modules lacked defect labels, while others had
extreme variations in metrics such as lines of code. These issues required
extensive cleaning, normalization, and balancing before effective training could
take place.
2. Data Imbalance:
In defect prediction datasets, non-defective modules often far outnumber defective
ones. This imbalance caused the model to initially favor predicting the majority
class (non-defective). Techniques such as resampling and careful feature selection
were required to improve results.
3. Model Interpretability vs. Complexity:
The central design challenge was balancing accuracy (improved by deep learning
components) with interpretability (provided by decision trees). The NDT
partially solved this, but further refinement is needed to achieve full transparency
without sacrificing performance.
4. Training Time and Computational Resources:
Training the hybrid model was resource-intensive. On standard hardware, training
required extended runtimes, especially for larger datasets. The lack of dedicated
GPU resources further increased training times.
5. Tool and Library Integration:
Smooth integration between different tools (e.g., TensorFlow, PyTorch, and
Scikit-learn) was not always straightforward. Data format conversions and
compatibility issues between library versions created delays during development.
6. Overfitting Risk:
The model sometimes performed very well on training data but poorly on unseen
data. This required introducing regularization, pruning, and cross-validation.
5.2 Contributions of the Study
This research contributes to knowledge and practice in several ways:
1. Academic Contribution: Demonstrates how hybrid models can address the trade-
off between interpretability and accuracy in defect prediction.
2. Industrial Contribution: Provides a prototype that could be integrated into
software testing pipelines, saving time and resources.
3. Methodological Contribution: Shows how preprocessing strategies (balancing,
normalization) improve defect prediction performance.
4. Technological Contribution: Explores the potential of Neural Decision Trees,
which remain relatively underexplored compared to standalone ML or deep
learning models.
5.3 Recommendations
Based on the implementation experience, the following recommendations are proposed
for future improvements:
1. Integration of Explainable AI (XAI):
Adding explanation layers (e.g., SHAP, LIME) would make predictions more
transparent to developers and managers.
2. Hyperparameter Optimization:
Automating parameter tuning (e.g., learning rate, batch size, number of hidden
nodes) with techniques such as grid search or Bayesian optimization could
improve accuracy.
3. Deployment on Web or Cloud Platforms:
Making the system accessible via a web interface or cloud deployment would
enable real-time predictions and make the tool available to distributed software
teams.
4. Integration with CI/CD Pipelines:
Embedding defect prediction models into continuous integration/continuous
deployment (CI/CD) workflows would allow developers to identify defects earlier
in the lifecycle.
5. Dataset Expansion:
Using larger, more diverse datasets from industrial projects would improve
generalization and robustness.
6. User Interface Enhancement:
A graphical dashboard could make the system more user-friendly by presenting
predictions and performance reports visually.
5.4 Future Work
While this project achieved its objectives, there are areas for further research and
development:
1. Extending the model to support multi-class classification (e.g., categorizing
defect severity).
2. Applying transfer learning for cross-project defect prediction.
3. Exploring ensemble approaches that combine multiple NDTs.
4. Integrating the model into automated testing environments.
5. Investigating real-time defect prediction in agile or DevOps environments.
5.5 Conclusion
This study successfully demonstrated how Neural Decision Trees can enhance software
defect prediction by merging the interpretability of decision trees with the predictive
power of neural networks. The system was able to classify software modules as defective
or non-defective more effectively than traditional models.
The findings show that NDTs are a promising step toward solving the long-standing
trade-off between accuracy and interpretability in defect prediction. By providing a
balance between these two requirements, the system has practical implications for both
researchers and industry practitioners.
In conclusion, the project contributes to the growing field of software quality assurance
by offering a hybrid, interpretable, and accurate approach to defect prediction. Although
challenges such as computational cost and data imbalance remain, the research
establishes a foundation for future work in intelligent, data-driven software reliability.
REFERENCES
T. Menzies, J. Greenwald, and A. Frank, "Data mining static code attributes to learn
defect predictors," IEEE Transactions on Software Engineering, vol. 33, no. 1, pp. 2-13,
2007.
Y. Bengio, A. Courville, and P. Vincent, "Representation learning: A review and new
perspectives," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35,
no. 8, pp. 1798-1828, 2013.
Z. Yang and K. Xu, "Neural Decision Trees: A Neural Network-Based Soft Decision
Tree Model," in Proceedings of the 2021 IEEE Conference on Neural Networks, 2021,
pp. 345-352.
J. Quinlan, "C4.5: Programs for Machine Learning," Morgan Kaufmann, 1993.
I. Goodfellow, Y. Bengio, and A. Courville, "Deep Learning," MIT Press, 2016.
G. Boetticher, D. Eichmann, and K. Menzies, "Defect prediction: A comparative analysis
of machine learning techniques," Journal of Software Testing, Verification & Reliability,
vol. 24, no. 3, pp. 341-365, 2015.
H. Gouk, E. Frank, B. Pfahringer, and M. Cree, "Regularisation of neural decision trees,"
in Proceedings of the 2020 International Conference on Machine Learning, 2020, pp.
4562-4571.
NASA Software Defect Data, "PROMISE repository," [Online]. Available:
https://promise.site.uottawa.ca/SERepository.html.
A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet classification with deep
convolutional neural networks," in Proceedings of the 25th International Conference on
Neural Information Processing Systems, 2012, pp. 1097-1105.