Dataset Source Kaggle-1
Dataset Source Kaggle-1
1. Importing Dependencies: Import the necessary libraries and modules for data
analysis and model implementation.
2. Load Dataset: Load the dataset from a CSV file containing information about
various diseases and their associated symptoms.
3. Checking for null data: Although there is no presence of null or dirty data as per Kaggle
score, we doubled checked it again.
Here, we found that distribution of symptoms for some disease is greater than
compared to others. It is found to be of intentional purpose as their probability of
occurrence in the real world is comparatively negligible also the dataset is tuned to
mimic the real world occurrence and distribution
7. Custom naïve bayes code and Fitting the model:
model2 = CustomGaussianNB()
model2.fit(XX_train.to_numpy(), YY_train.to_numpy())
9. Accuracy:
The analysis of user symptoms through the custom Gaussian Naïve Bayes model yielded insightful
results, effectively mapping the provided symptoms feeling ill, vomiting, headache, nausea, and
diarrhea to the most probable diseases. The model's prediction ranked ileus as the most likely
condition with a probability of 65.98%, followed by hypovolemia at 34.01%, and gastritis at 0.01%,
demonstrating a robust decision-making capability in differential diagnosis.
The classification process achieved an impressive accuracy of 86.65%, validating the effectiveness
of the custom implementation in predicting disease outcomes based on symptom patterns. This high
level of accuracy underscores the importance of a well-designed probabilistic model for healthcare
applications, where quick and reliable predictions are crucial for patient care. The result reflects the
model’s ability to handle noisy symptom data while offering valuable insights for clinical decision-
making. Further optimization and real-world validation could enhance the model's precision and
adaptability across diverse datasets.