BUSINESS INTELLIGENCE docs
BUSINESS INTELLIGENCE docs
Live Demo
Live Demo
YouTube
TARGET AUDIENCE
1 Scientific Research
2 Health & Medical Laboratories
3 Data Scientists
4 Educational Institutions
5 Statisticians
6 Natural Mathematics Researchers
7 Information System Analysts
8 Software Developers
9 Finance & Accounting Professionals
10 Machine Learning Engineers
11 Market Analysts
12 Economists
13 Business Intelligence Analysts
14 Operations Researchers
15 Environmental Scientists
16 Policy Analysts
17 Social Science Researchers
18 Clinical Data Managers
19 Actuaries
20 Product Managers
INTRODUCTION
My name is Sameer, a data scientist and software developer. I have designed this application as a
foundation for others to solve data-related problems across fields such as scientific research,
finance, and health. Through this application, you can learn how data is collected, cleaned,
analyzed, and interpreted to derive meaningful insights.
PROBLEM STATEMENT
Many organizations and researchers struggle to analyze vast amounts of data efficiently.
Traditional methods can be time-consuming and often require specialized knowledge in both
statistics and programming. The challenge is to provide an accessible tool that leverages data
science and statistical techniques to automate data analysis tasks for various applications,
including predictive modeling, trend analysis, and hypothesis testing.
MAIN OBJECTIVE
The main objective of this project is to provide knowledge about statistical and machine learning
models, demonstrating how scientific computing and programming can be used to automate
complex analysis tasks. By making these techniques accessible, users can enhance their decision-
making processes and generate insights more effectively.
METHODOLOGY
1. Data Collection: Data is gathered from relevant sources depending on the field of
application (e.g., health records, financial data, survey data).
2. Data Cleaning: The data undergoes preprocessing to handle missing values, correct
inaccuracies, and transform data types for accurate analysis.
3. Data Analysis: Using descriptive and inferential statistical methods, key patterns and
trends are identified within the dataset.
4. Model Development: Machine learning models, including regression and classification
models, are developed to predict outcomes and identify patterns.
5. Visualization: Interactive visualizations such as histograms, ogives, and scatter plots
help in the intuitive understanding of results.
6. Interpretation: Insights are derived from the results, helping users make data-driven
decisions relevant to their field of interest.
PROJECT FEATURES:
1. Data Loading
1. Purpose: Creates age intervals (e.g., 0-10, 11-20) and labels for categorizing age data
into discrete groups.
1. Purpose: Generates a table counting occurrences within each age interval, facilitating
grouped data analysis.
1. Purpose: Calculates essential statistics for grouped data, aiding in understanding data
distribution:
2. Mean: Computes the weighted average midpoint of age intervals.
3. Mode: Identifies the most frequent age interval.
4. Median: Determines the midpoint interval in cumulative frequency.
5. Variance and Standard Deviation: Measures the spread of data points around the mean.
6. Skewness and Kurtosis: Assesses the symmetry and peakedness of the data distribution.
7. Interquartile Range (IQR): Calculates the spread between the first and third quartiles.
8. Standard Error: Measures the precision of the sample mean.
1. Purpose: Displays key grouped data statistics (mean, median, mode, etc.) to the user in
an interactive dashboard.
6. Skewness Visualization
1. Purpose: Presents a frequency table with cumulative frequencies, providing insights into
data distribution across age intervals.
Correlation Analysis:
Visualizing Relationships:
Variable Distributions:
1. Checked for missing values and displayed the count of NaN entries in each column.
2. Provided descriptive statistics (mean, standard deviation, etc.) for each variable.
4. Data Preprocessing
1. Split the data into training and testing sets using train_test_split.
Standardization:
5. Modeling
Prediction:
6. Model Evaluation
Performance Metrics:
1. Calculated and displayed Mean Squared Error (MSE), Mean Absolute Error (MAE),
and Root Mean Squared Error (RMSE).
1. Calculated and displayed the R² and Adjusted R² values for model performance.
Residuals Analysis:
1. Computed residuals and visualized them using a normal distribution curve to check the
error distribution.
7. Statistical Analysis
1. Used OLS (Ordinary Least Squares) regression from statsmodels to obtain detailed
model insights, including coefficients and p-values.
PAGE 5: CO-VARIANCE
1. Data Loading: Loads data from an Excel file, allowing for further statistical operations
and visualizations.
2. Feature Selection: Provides a feature selection for X variable, enabling dynamic analysis
of various numerical features against the target variable.
3. Statistical Model Fitting: Fits an Ordinary Least Squares (OLS) regression model to
examine the relationship between the selected X feature and the target variable (Projects).
4. Key Statistical Metrics Calculation:
5. Intercept: Displays the intercept term of the model, representing the baseline effect on
Projects.
6. R-Squared: Shows the R-squared value, providing insight into the model's explanatory
power.
7. Adjusted R-Squared: Adjusts for the number of predictors to gauge model fit accuracy.
8. Standard Error: Provides the standard error, indicating the precision of the intercept
estimate.
9. Predictions and Residuals Calculation: Calculates model predictions and residuals for
further analysis.
10. Data Visualization:
11. Line of Best Fit Plot: Generates a scatter plot with a line of best fit to visualize the
relationship between the selected X feature and Projects, assessing the model fit visually.
12. Grid and Border Customization: Customizes plot appearance for better interpretability.
1. Data Loading
2. Load dataset from a CSV file for analysis.
3. Quartile and IQR Calculation
4. Calculate the 1st Quartile (Q1), 3rd Quartile (Q3), and Interquartile Range (IQR) for
understanding the spread of the dataset.
5. Basic Statistics Computation
6. Determine minimum, maximum, and median values to summarize the dataset's range and
central tendency.
7. Ogives Plotting
8. Generate Less Than and Greater Than Ogives to visualize cumulative frequency
distribution.
9. Add a vertical line and annotation for the median value to highlight central tendency in
the plot.
10. Display Statistics in Streamlit Dashboard
11. Display quartiles, IQR, min, max, and median values in an interactive layout for user
insights.
12. Apply styling to metrics for improved readability and visual appeal.
13. Interactive Visualization
14. Present the ogives plot in Streamlit to allow for intuitive data exploration.
1. A Linear Regression model is trained on the dataset using Dependant and Wives to
predict the Projects (dependent variable).
2. Predictions are made using the trained model and stored for further analysis.
3. Regression Coefficients:
1. The Intercept (Bo) and Coefficients (B1, B2) for the independent variables are
calculated and displayed. These represent the linear relationship between the predictors
and the dependent variable.
1. R-squared (R²): Measures the proportion of variance in the dependent variable explained
by the independent variables.
2. Adjusted R-squared: Adjusts R² for the number of predictors in the model, preventing
overfitting.
3. Sum of Squared Errors (SSE): Calculates the total error between the predicted and
actual values.
4. Sum of Squared Regression (SSR): Measures the variation explained by the model.
5. Prediction Table:
1. Displays a table with the actual and predicted Projects (Y) values, along with the SSE and
SSR values for each data point.
6. Residual Analysis:
1. Residuals: The difference between the actual and predicted values of Projects is
calculated.
2. A scatter plot of the residuals versus the predicted values is displayed to visualize model
fit.
3. A Kernel Density Estimation (KDE) plot of the residuals is shown to analyze their
distribution.
8. Download Option:
1. The user can download the dataset with the actual values, predicted values, SSE, and SSR
as a CSV file.
9. Visualizations:
1. Regression Line and Scatter Plot: Visualizes the relationship between actual and
predicted values, including the best fit line.
2. Residual Plot: Shows the distribution of residuals using a KDE plot.
1. Dataset Loading: A CSV file (sales.csv) is read into a pandas DataFrame for analysis.
2. Date Filtering: Users can filter the dataset by a date range (start and end dates). The data
is filtered based on the OrderDate column to display relevant sales data.
3. Data Exploration: A DataFrame explorer is used to interactively view and filter the
dataset, making it easier for users to explore the data.
2. Descriptive Analytics
1. Metrics Calculation:
2. Total Products in Inventory: Count of Product entries to display the number of
inventory items.
3. Total Price Sum: The sum of all TotalPrice values is displayed to give an overall view of
sales revenue.
4. Price Range Analysis:
5. Maximum and minimum price for products are calculated and displayed.
6. Price range (difference between the maximum and minimum prices) is calculated.
7. These metrics provide key insights into inventory and sales data.
3. Data Visualization
1. Dot Plot: A scatter plot is used to visualize the relationship between Product and
TotalPrice. Each point represents a product with its corresponding total price, and
products are color-coded by their category.
2. Bar Graph: A bar chart is used to display the relationship between Product and
UnitPrice. The chart aggregates UnitPrice over months to show trends in pricing.
3. Scatter Plot: A scatter plot is created based on user-selected features. It visualizes
relationships between categorical (qualitative) data (feature_x) and numerical
(quantitative) data (feature_y).
4. Bar Chart of Quantities: A bar chart visualizes the total quantity sold for each product,
helping to analyze product demand.
1. Date Range Selection: Users can select a date range from the sidebar, allowing them to
filter sales data dynamically.
2. Feature Selection: Users can select features for the x and y axes to explore relationships
in the data through scatter plots.
3. Data Table: The filtered dataset is displayed interactively for further analysis.
1. Price Range Insights: The metrics calculated (maximum, minimum, range) help users
identify high-value and low-value products, which is critical for pricing strategies.
2. Sales Trend Analysis: The dot plot and bar charts help identify trends in product sales,
such as which products have higher sales and which products are more expensive.
3. Business Metrics: The overall revenue and inventory metrics provide insights into the
health of the business and help with decision-making.
CONCLUTION
This page is focused on descriptive analytics and basic statistics. The main tasks involve:
Contact Information
1. +255675839840
2. +255656848274
YouTube
YouTube Channel
Telegram
1. +255656848274
2. +255738144353
PlayStore
GitHub
GitHub Profile