0% found this document useful (0 votes)
20 views

BUSINESS INTELLIGENCE docs

Metrics of a business

Uploaded by

Sharon Nyamagwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

BUSINESS INTELLIGENCE docs

Metrics of a business

Uploaded by

Sharon Nyamagwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

BUSINESS INTELLIGENCE: KPI, TRENDS & PREDICTION

AUTOMATED INFERENTIAL & DESCRIPTIVE STATISTICS

DATA-DRIVEN WEB APPLICATION WITH PYTHON AND STREAMLIT

Live Demo

Live Demo

YouTube

Watch the video

TARGET AUDIENCE

1 Scientific Research
2 Health & Medical Laboratories
3 Data Scientists
4 Educational Institutions
5 Statisticians
6 Natural Mathematics Researchers
7 Information System Analysts
8 Software Developers
9 Finance & Accounting Professionals
10 Machine Learning Engineers
11 Market Analysts
12 Economists
13 Business Intelligence Analysts
14 Operations Researchers
15 Environmental Scientists
16 Policy Analysts
17 Social Science Researchers
18 Clinical Data Managers
19 Actuaries
20 Product Managers

INTRODUCTION

My name is Sameer, a data scientist and software developer. I have designed this application as a
foundation for others to solve data-related problems across fields such as scientific research,
finance, and health. Through this application, you can learn how data is collected, cleaned,
analyzed, and interpreted to derive meaningful insights.
PROBLEM STATEMENT

Many organizations and researchers struggle to analyze vast amounts of data efficiently.
Traditional methods can be time-consuming and often require specialized knowledge in both
statistics and programming. The challenge is to provide an accessible tool that leverages data
science and statistical techniques to automate data analysis tasks for various applications,
including predictive modeling, trend analysis, and hypothesis testing.

MAIN OBJECTIVE

The main objective of this project is to provide knowledge about statistical and machine learning
models, demonstrating how scientific computing and programming can be used to automate
complex analysis tasks. By making these techniques accessible, users can enhance their decision-
making processes and generate insights more effectively.

METHODOLOGY

1. Data Collection: Data is gathered from relevant sources depending on the field of
application (e.g., health records, financial data, survey data).
2. Data Cleaning: The data undergoes preprocessing to handle missing values, correct
inaccuracies, and transform data types for accurate analysis.
3. Data Analysis: Using descriptive and inferential statistical methods, key patterns and
trends are identified within the dataset.
4. Model Development: Machine learning models, including regression and classification
models, are developed to predict outcomes and identify patterns.
5. Visualization: Interactive visualizations such as histograms, ogives, and scatter plots
help in the intuitive understanding of results.
6. Interpretation: Insights are derived from the results, helping users make data-driven
decisions relevant to their field of interest.

PROJECT FEATURES:

1 CO-VARIANCE Measure of the joint variability of two variables.


2 ADVANCED MULTIVARIATE Regression techniques involving multiple predictors
REGRESSION and response variables.
3 TRENDS BY GEO- Analyze data trends based on geographic
REFERENCING information.
4 DESCRIPTIVE STATISTICS Summary and analysis of data with central tendency,
ANALYTICS dispersion, etc.
5 MULTIPLE REGRESSION Model the relationship between one dependent
ANALYSIS variable and multiple independent variables.
6 SALES TRENDS BY DATE Analyze sales patterns over a specified time period.
RANGE
7 BUSINESS TARGET BY Evaluate business performance relative to targets.
PROGRESS
8 INTERACTIVE Dynamic and user-interactive data visualizations.
VISUALIZATION GRAPHS
9 STATISTICS FOR GROUPED Statistical analysis where data is organized into
DATA groups or intervals.
10 STATISTICS FOR Statistical analysis of raw, ungrouped data values.
UNGROUPED DATA
11 ADVANCED PYTHON QUERY Techniques for complex data querying using Python.
12 OUTLIER DETECTION Methods for identifying abnormal data points in
TECHNIQUES datasets.
13 HYPOTHESIS TESTING Statistical method to test assumptions or claims
about a population.
14 FREQUENCY DISTRIBUTION Representation of data showing the number of
observations within intervals.
15 NORMAL DISTRIBUTIONS Bell-shaped distribution that is symmetrical about the
mean.
16 PROBABILITY Function that shows the likelihood of different
DISTRIBUTIONS outcomes in an experiment.
17 LOGISTIC REGRESSION Model to estimate probabilities and model binary
outcomes.
18 ESTIMATION OF Inference of population parameters based on sample
POPULATION data.
19 PROBABILITY DENSITY Function describing the likelihood of a continuous
random variable's outcome.

EXPLANATION OF PROJECT PYTHON PAGES

PAGE 1: DESCRIPTIVE STATISTICS FOR GROUPED DATA

1. Data Loading

1. Data Source: Loads dataset from a CSV file for analysis.

2. Age Interval Calculation

1. Purpose: Creates age intervals (e.g., 0-10, 11-20) and labels for categorizing age data
into discrete groups.

3. Frequency Table Creation

1. Purpose: Generates a table counting occurrences within each age interval, facilitating
grouped data analysis.

4. Grouped Statistical Calculations

1. Purpose: Calculates essential statistics for grouped data, aiding in understanding data
distribution:
2. Mean: Computes the weighted average midpoint of age intervals.
3. Mode: Identifies the most frequent age interval.
4. Median: Determines the midpoint interval in cumulative frequency.
5. Variance and Standard Deviation: Measures the spread of data points around the mean.
6. Skewness and Kurtosis: Assesses the symmetry and peakedness of the data distribution.
7. Interquartile Range (IQR): Calculates the spread between the first and third quartiles.
8. Standard Error: Measures the precision of the sample mean.

5. Metric Display in Streamlit

1. Purpose: Displays key grouped data statistics (mean, median, mode, etc.) to the user in
an interactive dashboard.

6. Skewness Visualization

1. Purpose: Plots a normal distribution curve to visualize data symmetry, with an


annotation for skewness, allowing for a visual assessment of data distribution shape.

7. Frequency Table Display

1. Purpose: Presents a frequency table with cumulative frequencies, providing insights into
data distribution across age intervals.

PAGE 2: DESCRIPTIVE STATISTICS & DATA VISUALIZATION

1. Data Loading and Selection


2. Loads data from an Excel file (data.xlsx) and uses it for analytical processing.
3. Allows users to filter data by Region, Location, and Construction fields for customized
analysis.
4. Descriptive Analytics
5. Computes key summary statistics such as Sum, Mode, Mean, and Median for the
Investment column.
6. Displays these metrics in the Streamlit interface for easy visualization.
7. Data Visualization
8. Histograms: Visualizes the frequency distribution of variables in the dataset.
9. Bar Chart: Shows investments by BusinessType, providing a breakdown of investments
by type.
10. Line Chart: Visualizes investments by State, showing trends across different states.
11. Pie Chart: Represents Ratings by Region, showing the proportion of ratings for each
region.
12. Target Tracking and Progress Bar
13. Defines a target for investment and calculates the current percentage toward this target.
14. Provides a progress bar to visually represent how close the current investment is to the
target.
15. Quartile Analysis
16. Uses a box plot to analyze the distribution of Investment by BusinessType, displaying
quartiles and helping identify outliers.
17. User Interface with Interactive Elements
18. Includes an interactive sidebar with options to navigate between different views (Home,
Progress).
19. Enables selection of quantitative features for exploring distributions and trends.

PAGE 3: HYPOTHESIS TESTING

1. Data Loading and Cleaning:


2. Reads data from an Excel file (hypothesis.xlsx).
3. Drops unnecessary columns to focus on relevant fields for hypothesis testing.
4. Hypothesis Formulation:
5. Defines null and alternative hypotheses for comparing the mean revenues of Group A and
Group B.
6. Confidence Level Setup:
7. Sets a confidence level of 95% for statistical significance in hypothesis testing.
8. T-Test for Independent Samples:
9. Conducts a t-test to compare means of two independent groups (Group A and Group B).
10. Calculates t-statistic and p-value for hypothesis evaluation.
11. Sample Statistics Calculation:
12. Computes and displays sample mean and standard deviation for both groups.
13. Confirms sample size and enforces t-distribution usage only for samples smaller than 30.
14. Critical Value Determination:
15. Calculates the critical value based on confidence level and sample size.
16. T-Distribution Curve Generation:
17. Generates a probability density curve for visualizing the t-distribution.
18. Decision-Making:
19. Compares computed t-statistic with critical value to decide whether to reject the null
hypothesis.
20. Visualization of Results:
21. Displays t-distribution curve with annotated critical value, t-statistic, and rejection region.
22. Uses visual aids (vertical lines, filled regions) to highlight decision boundaries and
critical regions.
23. Summary Metrics Display:
24. Shows computed values and critical values in a dashboard format.
25. Presents a sample size and statistical metrics in a well-organized layout using Streamlit
components.

PAGE 4: ADVANCED LINEAR REGRESSION

1. Data Loading and Selection

1. Data Source: Loaded from CSV file (advanced_regression.csv).


2. Feature Columns: interest_rate, unemployment_rate, index_price.
3. Filtering: Data is filtered based on user-selected year and month.
2. Exploratory Data Analysis (EDA)

Correlation Analysis:

1. Used sns.regplot to visually explore relationships between features.


2. Calculated and displayed correlation matrix for the variables.

Visualizing Relationships:

1. Regression plots show the relationships between interest_rate and unemployment_rate,


interest_rate and index_price.
2. Box plots detect outliers in the dataset.

Variable Distributions:

1. Displayed histograms for variable frequency distributions.


2. Used sns.pairplot to examine pairwise relationships.

3. Handling Missing Data

1. Checked for missing values and displayed the count of NaN entries in each column.
2. Provided descriptive statistics (mean, standard deviation, etc.) for each variable.

4. Data Preprocessing

Splitting the Data:

1. Split the data into training and testing sets using train_test_split.

Standardization:

1. Applied standardization using StandardScaler to scale features.

5. Modeling

Multiple Linear Regression Model:

1. Built a linear regression model using LinearRegression from sklearn.


2. Used cross-validation to evaluate model performance.

Prediction:

1. Predicted the target variable (index_price) on the test dataset.

6. Model Evaluation
Performance Metrics:

1. Calculated and displayed Mean Squared Error (MSE), Mean Absolute Error (MAE),
and Root Mean Squared Error (RMSE).

R² and Adjusted R²:

1. Calculated and displayed the R² and Adjusted R² values for model performance.

Residuals Analysis:

1. Computed residuals and visualized them using a normal distribution curve to check the
error distribution.

7. Statistical Analysis

1. Used OLS (Ordinary Least Squares) regression from statsmodels to obtain detailed
model insights, including coefficients and p-values.

PAGE 5: CO-VARIANCE

1. Data Loading: Loads data from an Excel file, allowing for further statistical operations
and visualizations.
2. Feature Selection: Provides a feature selection for X variable, enabling dynamic analysis
of various numerical features against the target variable.
3. Statistical Model Fitting: Fits an Ordinary Least Squares (OLS) regression model to
examine the relationship between the selected X feature and the target variable (Projects).
4. Key Statistical Metrics Calculation:
5. Intercept: Displays the intercept term of the model, representing the baseline effect on
Projects.
6. R-Squared: Shows the R-squared value, providing insight into the model's explanatory
power.
7. Adjusted R-Squared: Adjusts for the number of predictors to gauge model fit accuracy.
8. Standard Error: Provides the standard error, indicating the precision of the intercept
estimate.
9. Predictions and Residuals Calculation: Calculates model predictions and residuals for
further analysis.
10. Data Visualization:
11. Line of Best Fit Plot: Generates a scatter plot with a line of best fit to visualize the
relationship between the selected X feature and Projects, assessing the model fit visually.
12. Grid and Border Customization: Customizes plot appearance for better interpretability.

PAGE 6: DESCRIPTIVE STATISTICS FOR UNGROUPED DATA

1. Data Loading
2. Load dataset from a CSV file for analysis.
3. Quartile and IQR Calculation
4. Calculate the 1st Quartile (Q1), 3rd Quartile (Q3), and Interquartile Range (IQR) for
understanding the spread of the dataset.
5. Basic Statistics Computation
6. Determine minimum, maximum, and median values to summarize the dataset's range and
central tendency.
7. Ogives Plotting
8. Generate Less Than and Greater Than Ogives to visualize cumulative frequency
distribution.
9. Add a vertical line and annotation for the median value to highlight central tendency in
the plot.
10. Display Statistics in Streamlit Dashboard
11. Display quartiles, IQR, min, max, and median values in an interactive layout for user
insights.
12. Apply styling to metrics for improved readability and visual appeal.
13. Interactive Visualization
14. Present the ogives plot in Streamlit to allow for intuitive data exploration.

PAGE 7: DATA VISUALIZATION

1. Data Loading and Selection


2. Loads data from an Excel file (data.xlsx) and uses it for analytical processing.
3. Allows users to filter data by Region, Location, and Construction fields for customized
analysis.
4. Descriptive Analytics
5. Computes key summary statistics such as Sum, Mode, Mean, and Median for the
Investment column.
6. Displays these metrics in the Streamlit interface for easy visualization.
7. Data Visualization
8. Histograms: Visualizes the frequency distribution of variables in the dataset.
9. Bar Chart: Shows investments by BusinessType, providing a breakdown of investments
by type.
10. Line Chart: Visualizes investments by State, showing trends across different states.
11. Pie Chart: Represents Ratings by Region, showing the proportion of ratings for each
region.
12. Target Tracking and Progress Bar
13. Defines a target for investment and calculates the current percentage toward this target.
14. Provides a progress bar to visually represent how close the current investment is to the
target.
15. Quartile Analysis
16. Uses a box plot to analyze the distribution of Investment by BusinessType, displaying
quartiles and helping identify outliers.
17. User Interface with Interactive Elements
18. Includes an interactive sidebar with options to navigate between different views (Home,
Progress).
19. Enables selection of quantitative features for exploring distributions and trends.
PAGE 8: LINEAR REGRESSION

1. Data Loading and Preprocessing:

1. The dashboard loads an Excel dataset (regression.xlsx) containing information on


Dependant, Wives, and Projects.
2. Extracts the independent variables (Dependant and Wives) and the dependent variable
(Projects) for use in regression analysis.

2. Model Fitting and Prediction:

1. A Linear Regression model is trained on the dataset using Dependant and Wives to
predict the Projects (dependent variable).
2. Predictions are made using the trained model and stored for further analysis.

3. Regression Coefficients:

1. The Intercept (Bo) and Coefficients (B1, B2) for the independent variables are
calculated and displayed. These represent the linear relationship between the predictors
and the dependent variable.

4. Model Evaluation Metrics:

1. R-squared (R²): Measures the proportion of variance in the dependent variable explained
by the independent variables.
2. Adjusted R-squared: Adjusts R² for the number of predictors in the model, preventing
overfitting.
3. Sum of Squared Errors (SSE): Calculates the total error between the predicted and
actual values.
4. Sum of Squared Regression (SSR): Measures the variation explained by the model.

5. Prediction Table:

1. Displays a table with the actual and predicted Projects (Y) values, along with the SSE and
SSR values for each data point.

6. Residual Analysis:

1. Residuals: The difference between the actual and predicted values of Projects is
calculated.
2. A scatter plot of the residuals versus the predicted values is displayed to visualize model
fit.
3. A Kernel Density Estimation (KDE) plot of the residuals is shown to analyze their
distribution.

7. User Input and Prediction:


1. Users can input new values for Dependant and Wives in a sidebar form.
2. Upon submission, the model predicts the number of Projects for the provided inputs and
displays the result.

8. Download Option:

1. The user can download the dataset with the actual values, predicted values, SSE, and SSR
as a CSV file.

9. Visualizations:

1. Regression Line and Scatter Plot: Visualizes the relationship between actual and
predicted values, including the best fit line.
2. Residual Plot: Shows the distribution of residuals using a KDE plot.

PAGE 9: SALES ANALYTICS { CASE STUDY }

1. Data Import and Processing

1. Dataset Loading: A CSV file (sales.csv) is read into a pandas DataFrame for analysis.
2. Date Filtering: Users can filter the dataset by a date range (start and end dates). The data
is filtered based on the OrderDate column to display relevant sales data.
3. Data Exploration: A DataFrame explorer is used to interactively view and filter the
dataset, making it easier for users to explore the data.

2. Descriptive Analytics

1. Metrics Calculation:
2. Total Products in Inventory: Count of Product entries to display the number of
inventory items.
3. Total Price Sum: The sum of all TotalPrice values is displayed to give an overall view of
sales revenue.
4. Price Range Analysis:
5. Maximum and minimum price for products are calculated and displayed.
6. Price range (difference between the maximum and minimum prices) is calculated.
7. These metrics provide key insights into inventory and sales data.

3. Data Visualization

1. Dot Plot: A scatter plot is used to visualize the relationship between Product and
TotalPrice. Each point represents a product with its corresponding total price, and
products are color-coded by their category.
2. Bar Graph: A bar chart is used to display the relationship between Product and
UnitPrice. The chart aggregates UnitPrice over months to show trends in pricing.
3. Scatter Plot: A scatter plot is created based on user-selected features. It visualizes
relationships between categorical (qualitative) data (feature_x) and numerical
(quantitative) data (feature_y).
4. Bar Chart of Quantities: A bar chart visualizes the total quantity sold for each product,
helping to analyze product demand.

4. Interactive User Interface

1. Date Range Selection: Users can select a date range from the sidebar, allowing them to
filter sales data dynamically.
2. Feature Selection: Users can select features for the x and y axes to explore relationships
in the data through scatter plots.
3. Data Table: The filtered dataset is displayed interactively for further analysis.

5. Statistical and Business Insights

1. Price Range Insights: The metrics calculated (maximum, minimum, range) help users
identify high-value and low-value products, which is critical for pricing strategies.
2. Sales Trend Analysis: The dot plot and bar charts help identify trends in product sales,
such as which products have higher sales and which products are more expensive.
3. Business Metrics: The overall revenue and inventory metrics provide insights into the
health of the business and help with decision-making.

CONCLUTION

This page is focused on descriptive analytics and basic statistics. The main tasks involve:

1. Data cleaning and filtering.


2. Displaying key business metrics related to product pricing and sales volume.
3. Visualizing the relationship between various features such as product prices and
quantities.
4. Providing interactive tools for users to explore the dataset and extract insights.

Contact Information

WhatsApp

1. +255675839840
2. +255656848274

YouTube

YouTube Channel
Telegram

1. +255656848274
2. +255738144353

PlayStore

PlayStore Developer Page

GitHub

GitHub Profile

You might also like