0% found this document useful (0 votes)
11 views

Objective of a Business Report

Uploaded by

amansinhmar2303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Objective of a Business Report

Uploaded by

amansinhmar2303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Objective of a Business Report

The objective of the provided scenario is to evaluate the effectiveness of a


new landing page implemented by E-news Express, an online news portal, in
gathering new subscribers. The company aims to expand its business by
acquiring more subscribers, and to achieve this goal, they have developed a
new landing page with a revised outline and more relevant content
compared to the old page.

To assess the impact of the new landing page, the Data Science team
conducted an experiment using A/B testing methodology. They randomly
divided 100 users into two groups: the control group, which was shown the
existing landing page, and the treatment group, which was presented with
the new landing page. Data regarding user interactions with both versions of
the landing page was collected, including the time spent on the page,
whether the user converted to a subscriber, and the preferred language of
the user.

As a data scientist at E-news Express, the objective is to analyze the


collected data and perform statistical analysis to determine the effectiveness
of the new landing page in attracting new subscribers to the news portal.
Specifically, the analysis aims to answer the following questions:

1. Do users spend more time on the new landing page compared to the existing
landing page?
2. Does the conversion rate (subscriber acquisition) depend on the preferred
language of the user?
3. Is the mean time spent on the new landing page the same for users with
different preferred languages?

By addressing these questions, the company seeks insights into the


performance of the new landing page and its impact on user engagement
and subscription conversion rates. This information will guide decision-
making regarding the optimization of the landing page design and content to
drive better engagement and increase subscriber acquisition for the news
porta
1.1 What are the probabilities of a fire, a mechanical failure, and a
human error respectively?

Solution

Let:

 P(F) be the probability of a fire,


 P(M) be the probability of a mechanical failure,
 P(H) be the probability of a human error.

Given:

 P(R∣F) = 0.20 (probability of radiation leak given fire)


 P(R∣M) = 0.50 (probability of radiation leak given mechanical failure)
 P(R∣H) = 0.10 (probability of radiation leak given human error)
 P(R∩F) = 0.001 (probability of radiation leak and fire)
 P(R∩M) = 0.0015 (probability of radiation leak and mechanical failure)
 P(R∩H) = 0.0012 (probability of radiation leak and human error)

We can use the formula for conditional probability:

P(A∣B)=P(B)P(A∩B)

1. P(F)=1−P(R∩F)=1−0.001=0.999 (since two or more types of accidents cannot occur


simultaneously, we subtract the probability of radiation leak and fire from 1)
2. P(M)=1−P(R∩M)=1−0.0015=0.9985
3. P(H)=1−P(R∩H)=1−0.0012=0.9988

So, the probabilities of fire, mechanical failure, and human error respectively are
approximately 0.999, 0.9985, and 0.9988.

1.2 What is the probability of a radiation leak?

Solution

To find the probability of a radiation leak (denoted as P(R)), we can use the law of
total probability, which states that the probability of an event can be found by
considering all possible ways the event could occur.
P(R) can occur in three mutually exclusive ways: through a fire ( F), a mechanical
failure (M), or a human error (H).

So, we have:

P(R)=P(R∣F)×P(F)+P(R∣M)×P(M)+P(R∣H)×P(H)

Given:

 P(R∣F) = 0.20
 P(R∣M) = 0.50
 P(R∣H) = 0.10
 P(F) = 0.999
 P(M) = 0.9985
 P(H) = 0.9988

Plugging in the values:

P(R)=(0.20×0.999)+(0.50×0.9985)+(0.10×0.9988)

P(R)=0.1998+0.49925+0.09988

P(R)=0.79993

So, the probability of a radiation leak is approximately 0.799930.79993 or


79.993%79.993%.

1.3 Suppose there has been a radiation leak in the reactor for which the
definite cause is not known. What is the probability that it has been
caused by a) a fire b) a mechanical failure c) a human error?

Solution

To determine the probability that the radiation leak was caused by a specific type of
accident (fire, mechanical failure, or human error) given that a radiation leak has
occurred, we can use Bayes' theorem. Bayes' theorem states:

P(A∣B)=P(B)P(B∣A)×P(A)

Where:

 P(A∣B) is the probability of event A given event B,


 P(B∣A) is the probability of event B given event A,
 P(A) is the prior probability of event A,
 P(B) is the prior probability of event B.

In this case, we want to find the probability of each type of accident (fire,
mechanical failure, human error) given that a radiation leak has occurred.

Let:

 A represent the occurrence of a radiation leak,


 BF represent the occurrence of a fire as the cause,
 BM represent the occurrence of a mechanical failure as the cause,
 BH represent the occurrence of a human error as the cause.

We are asked to find:

P(BF∣A) - Probability of fire given a radiation leak


P(BM∣A) - Probability of mechanical failure given a radiation leak

P(BH∣A) - Probability of human error given a radiation leak



We already know:

 P(A)=0.79993 (probability of a radiation leak)


 P(BF)=0.999 (probability of a fire)
 P(BM)=0.9985 (probability of a mechanical failure)
 P(BH)=0.9988 (probability of a human error)

And we can calculate:

 P(A∣BF)=P(BF)P(BF∣A)×P(A)
 P(A∣BM)=P(BM)P(BM∣A)×P(A)
 P(A∣BH)=P(BH)P(BH∣A)×P(A)

Given that:

P(BF∣A)=P(R∣F)=0.20 (probability of a radiation leak given fire)


P(BM∣A)=P(R∣M)=0.50 (probability of a radiation leak given mechanical failure)

P(BH∣A)=P(R∣H)=0.10 (probability of a radiation leak given human error)



For P(BF∣A):
(BF∣A)=P(A)P(A∣BF)×P(BF)=0.20×0.999/0.799930 =0.1998/0.79993

P(BF∣A)= 0.1998/0.79993≈0.24975P(BF∣A)≈0.24975

P(BF∣A):≈ 0.24975

For P(BM∣A): P(BM∣A)=P(A∣BM)×P(BM)/P(A)=0.50×0.9985/0.79993

P(BM∣A)=0.49925/0.799930
/

P(BM∣A)≈0.6247

For P(BH∣A): P(BH∣A)=P(A)P(A∣BH)×P(BH)=10×0.9988/0.799930

P(BH∣A)=0.09988/0.79993≈0.12445

P(BH∣A)≈0.12445

So, the probabilities that the radiation leak was caused by a) a fire, b) a mechanical
failure, c) a human error are approximately: a) 24.975% b) 62.47% c) 12.445%

2.1 What is the probability that a randomly chosen student gets a grade
below 85 on this exam?

Solution

Given that the grades are normally distributed with a mean ( μ) of 77 and a
standard deviation (σ) of 8.5, we can use the Z-score formula to standardize the
score and then look up the corresponding cumulative probability in the standard
normal distribution table.

The Z-score formula is:

Z=(X−μ)/ σ

Where:
 X is the score we want to find the probability for (85 in this case),
 μ is the mean (77),
 σ is the standard deviation (8.5).

So, for X=85, μ=77, and σ=8.5, we have:

=(85−77)/8.5

≈0.9412

Now, we need to find the cumulative probability corresponding to Z=0.9412 in the


standard normal distribution table.

From the table or using a calculator or software, we find that the cumulative
probability for Z=0.9412 is approximately 0.8257.

Therefore, the probability that a randomly chosen student gets a grade below 85 on
this exam is approximately 0.8257, or 82.57%.

2.2 What is the probability that a randomly selected student scores


between 65 and 87?

Solution

the Z-score formula to standardize the scores and then find the cumulative
probabilities for each score.

For X1=65:

(65−77)/8.5=−12/8.5≈−1.4118

Z1≈−1.4118

For X2=87:

(87−77)/8.5=10/8.5≈1.1765

Z2≈1.1765

Now, we find the cumulative probabilities corresponding to Z1 and Z2 in the


standard normal distribution table.
From the table or using a calculator or software:

 For Z1=−1.4118, the cumulative probability is approximately 0.07880.0788.


 For Z2=1.1765, the cumulative probability is approximately 0.87900.8790.

Now, we find the probability of scoring between 65 and 87 by subtracting the


cumulative probability at Z1 from the cumulative probability at Z2:

P(65<X<87)=P(X<87)−P(X<65) =0.8790−0.0788≈0.8002

Therefore, the probability that a randomly selected student scores between 65 and
87 on the exam is approximately 0.80020.8002, or 80.02%80.02%.

2.3 What should be the passing cut-off so that 75% of the students clear
the exam?

Solution

Since the grades are normally distributed with a mean ( μ) of 77 and a standard
deviation ( σ) of 8.5, we need to find the score X such that 75% of the scores fall
below it.

We'll use the Z-score formula to find the Z-score corresponding to the 75th
percentile and then use it to find the corresponding score X.

To find the Z-score corresponding to the 75th percentile, we look up the


corresponding Z-score in the standard normal distribution table. The Z-score
corresponding to the 75th percentile is approximately Z=0.6745.

Now, we use the Z-score formula to find the score X corresponding to this Z-score:

Z=(X−μ)/ σ

Plugging in the values: 0.6745=(X−77)/8.5

Solving for X:

X−77=0.6745×8.5

X−77=5.73325

X≈82.73325
Therefore, the passing cut-off score so that 75% of the students clear the exam
should be approximately 82.7382.73.

3.1 Define the problem and perform an Exploratory Data Analysis


"- Problem definition, questions to be answered - Data background and
contents - Univariate analysis - Bivariate analysis"

Solution

Problem Definition:
The problem at hand is to determine the effectiveness of the new landing page in
gathering new subscribers for the E-news Express portal. This is done through an
A/B test where users are randomly assigned to either the control group (exposed to
the old landing page) or the treatment group (exposed to the new landing page).
The key question to answer is whether the new landing page leads to a higher rate
of subscription compared to the old landing page.

Questions to be Answered:
1. What is the subscription rate for each group?
2. Is there a statistically significant difference in subscription rates between the control
and treatment groups?
3. What insights can be gained from the exploratory analysis of user interactions with
the landing pages?

Data Background and Contents:


The data collected includes information on user interactions with the landing pages
for both the control and treatment groups. This may include metrics such as:

 Number of visits to the landing page


 Time spent on the landing page
 Click-through rates on subscription buttons
 Actual subscription conversions

Exploratory Data Analysis (EDA):


1. Univariate Analysis:
 Descriptive statistics of key metrics (e.g., mean, median, range, standard
deviation) for both control and treatment groups.
 Distribution plots (histograms, box plots) to visualize the spread of key
metrics.
2. Bivariate Analysis:
 Comparison of subscription rates between control and treatment groups
using hypothesis testing (e.g., chi-square test for independence).
 Visualization of subscription rates for each group using bar plots or stacked
bar plots.
 Exploration of relationships between different user interaction metrics and
subscription rates.
3.

4. Explore the dataset and extract insights using Exploratory Data


Analysis
Data Overview
-Displaying the first few rows of the dataset

user_i landing_p time_spent_on_the conver language_pref


group
d age _page ted erred

0 546592 control old 3.48 no Spanish

1 546468 treatment new 7.13 yes English

2 546462 treatment new 4.40 no Spanish

3 546567 control old 3.02 no French

4 546459 treatment new 4.75 yes Spanish

-Displaying the last few rows of the dataset:

user_ landing_p time_spent_on_the conver language_pref


group
id age _page ted erred

95 546446 treatment new 5.15 no Spanish

96 546544 control old 6.52 yes English

97 546472 treatment new 7.07 yes Spanish

98 546481 treatment new 6.20 yes Spanish

99 546483 treatment new 5.86 yes English

-Checking the shape of data set:

(100, 6)

-Checking the data types of the columns for the dataset:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 user_id 100 non-null int64
1 group 100 non-null object
2 landing_page 100 non-null object
3 time_spent_on_the_page 100 non-null float64
4 converted 100 non-null object
5 language_preferred 100 non-null object
dtypes: float64(1), int64(1), object(4)
memory usage: 4.8+ KB

-Getting the statistical summary for the numerical variables:


Summary statistics for time spent on the new landing page:
count 50.000000
mean 6.223200
std 1.817031
min 1.650000
25% 5.175000
50% 6.105000
75% 7.160000
max 10.710000
Name: time_spent_on_the_page, dtype: float64

Summary statistics for time spent on the old landing page:


count 50.000000
mean 4.532400
std 2.581975
min 0.190000
25% 2.720000
50% 4.380000
75% 6.442500
max 10.300000
Name: time_spent_on_the_page, dtype: float64

Getting the statistical summary for the categorical variables:

Categorical summary statistics for the 'group' column:


control 50
treatment 50
Name: group, dtype: int64

Categorical summary statistics for the 'landing_page' column:


old 50
new 50
Name: landing_page, dtype: int64

Categorical summary statistics for the 'converted' column:


yes 54
no 46
Name: converted, dtype: int64

Categorical summary statistics for the 'language_preferred' column:


Spanish 34
French 34
English 32
Name: language_preferred, dtype: int64

# Check for missing values in each column:

Number of missing values in each column:


user_id 0
group 0
landing_page 0
time_spent_on_the_page 0
converted 0
language_preferred 0
dtype: int64

# Check for duplicates


Number of duplicate rows: 0

# Univariate Analysis

# Time spent on the page

Summary Statistics for Time Spent on the Page:


count 100.000000
mean 5.377800
std 2.378166
min 0.190000
25% 3.880000
50% 5.415000
75% 7.022500
max 10.710000
Name: time_spent_on_the_page, dtype: float64

# Group

#Landing page
# Converted

# Language preferred
Bivariate Analysis

# Landing page vs Time spent on the page


# Conversion status vs Time spent on the page
# Language preferred vs Time spent on the page
3.2 Illustrate the insights based on EDA

Based on the categorical summary statistics and other information


provided, as well as exploratory data analysis (EDA), we can derive
several key insights:

1. Group and Landing Page Distribution:


 The data is evenly split between the control and treatment
groups, with 50 entries each.
 Similarly, there are 50 entries for both the old and new landing
pages.
 This balanced distribution is important for ensuring unbiased
results in the experiment.
2. Conversion Rate:
 The conversion rate is approximately 54%, with 54 entries for
"yes" (converted) and 46 entries for "no" (not converted).
 Understanding factors affecting conversion is crucial for
optimizing the landing page design and content to improve
subscriber acquisition.
3. Language Preference:
 Among language preferences, Spanish and French are equally
preferred, with 34 entries each.
 English is slightly less preferred, with 32 entries.
 This suggests a diverse user base with different language
preferences, highlighting the importance of providing
multilingual support on the website.
4. Time Spent on the Landing Page:
 Time spent on the landing page can be analyzed to gauge user
engagement.
 The mean and distribution of time spent on the page for both
the control and treatment groups can provide insights into the
effectiveness of the new landing page design.
5. Relationship Between Variables:
 Analyzing the relationship between variables such as group,
landing page version, conversion status, language preference,
and time spent on the page can uncover patterns and
correlations.
 For example, we can explore whether certain language
preferences are more likely to result in conversion or if there
are differences in engagement with the new landing page
across different language groups.
6. Potential Confounding Variables:
 It's essential to consider potential confounding variables that
may influence the relationship between variables of interest.
 Factors like user demographics, geographical location, or
device type could impact user behavior and should be
accounted for in the analysis.
7. Visualization:
 Visualization techniques such as histograms, bar plots, and
scatter plots can help visualize the distribution of variables and
identify trends or outliers.
 Heatmaps or correlation matrices can reveal relationships
between variables through color-coded representations.

Overall, conducting a comprehensive EDA allows us to gain valuable


insights into user behavior, preferences, and the effectiveness of the
new landing page in driving conversions. These insights can inform
strategic decisions to optimize user experience and achieve
business objectives.
3.3 Do the users spend more time on the new landing
page than the old landing page?
- State the null and alternate hypotheses - Conduct the hypothesis test and
compute the p-value - Write down conclusions from the test results

# Perform Visual Analysis

Define the null and alternate hypotheses


Null Hypothesis (H0): The converted status does not depend on the preferred language.
Alternate Hypothesis (Ha): The converted status depends on the preferred language. Step 2:
Conduct the hypothesis test and compute the p-value
# Step 2: Select Appropriate test
T-statistic: nan
P-value: nan

Step 3: Decide the significance level


As given in the problem statement, we select α=0.05

# Step 4: Collect and prepare data

The sample standard deviation of the time spent on the new page is: 1.82
The sample standard deviation of the time spent on the new page is: 2.58

# Step 5: Calculate the p-value


The p-value is 0.0001392381225166549

# Step 6: Compare the p-value with α

As the p-value 0.0001392381225166549 is less than the level of significance, we


reject the null hypothesis.

# Step 7: Draw inference

Based on the results of the hypothesis test, which yielded a p-value of approximately 0.000139,
we reject the null hypothesis.
Therefore, we have sufficient evidence to conclude that the mean time spent on the new landing
page is significantly greater than the mean time spent on the old landing page.
This inference suggests that the changes made to the landing page design or content have led
to an increase in user engagement, as reflected by the increased time spent on the page.

conclusions from the test results


If the p-value is less than 0.05, we reject the null hypothesis and conclude that users spend
more time on the new landing page than the old landing page. If the p-value is greater than or
equal to 0.05, we fail to reject the null hypothesis, indicating that there is no evidence to suggest
that users spend more time on the new landing page.

3.4 Does the converted status depend on the preferred language?


- State the null and alternate hypotheses - Conduct the hypothesis test and
compute the p-value - Write down conclusions from the test results
Solution:

# Perform Visual Analysis

Step 1: Define the null and alternate hypotheses

Null Hypothesis (H0): The converted status does not depend on the preferred language. Alternate
Hypothesis (Ha): The converted status depends on the preferred language. Step 2: Conduct the
hypothesis test and compute the p-value

#Step 2: Select Appropriate test

Chi-square statistic: 3.0930306905370837


P-value: 0.21298887487543447
#Step 3: Decide the significance level
#As given in the problem statement, we select α = 0.05.

#Step 4: Collect and prepare data

language_pref Engli Fren Spani


erred sh ch sh

converted

no 11 19 16

yes 21 15 18

# Step 5: Calculate the p-value


The p-value is 0.21298887487543447

# Step 6: Compare the p-value with α


As the p-value 0.21298887487543447 is greater than the level of significance, we
fail to reject the null hypothesis.

Draw inference
Based on the results of the chi-square test of independence with a p-value of approximately
0.213, we fail to reject the null hypothesis. This suggests that there is not enough evidence to
conclude that the converted status depends on the preferred language.
Therefore, we do not find a significant association between the converted status and the
preferred language.

Conclusions from the chi-square test of independence:


The chi-square statistic is 3.093 and the p-value is approximately 0.213. With a significance
level of 0.05, we fail to reject the null hypothesis. Therefore, we do not have enough evidence to
conclude that the converted status depends on the preferred language. There is no significant
association between the converted status and the preferred language.

3.5 Is the mean time spent on the new page same for the
different language users?
- State the null and alternate hypotheses - Check the assumptions of the
hypothesis test. - Conduct the hypothesis test and compute the p-value -
Write down conclusions from the test results

# Perform Visual Analysis


Step 1: Define the null and alternative hypotheses:
Null Hypothesis (H0): There is no significant difference in the mean time spent on the new page
among users with different preferred languages. Alternative Hypothesis (Ha): There is a
significant difference in the mean time spent on the new page among users with different
preferred languages.
Step 2: Select Appropriate test:
This problem involves comparing the means of a continuous variable (time spent on the new
page) across different groups (language preferred), which can be done using analysis of
variance (ANOVA). Since there are more than two groups (languages), we can use one-way
ANOVA to compare the means of the time spent on the new page across different language
groups.
# Step 3: Decide the significance level
# As given in the problem statement, we select α = 0.05.

#Step 4: Collect and prepare data


The numbers of users served the new and old pages are 50 and 50 respectively

# Step 5: Calculate the p-value


The p-value is 0.43204138694325955

# Step 6: Compare the p-value with α


As the p-value 0.43204138694325955 is greater than the level of significance, we
fail to reject the null hypothesis.

Draw Inference
Based on the p-value obtained from the one-way ANOVA test, which is approximately 0.432,
and considering a significance level of α = 0.05:
Since the p-value is greater than α, we fail to reject the null hypothesis. There is insufficient
evidence to conclude that there is a significant difference in the mean time spent on the new
page among English, French, and Spanish language users. Therefore, we cannot conclude that
the mean time spent on the new page varies significantly across different language groups.

Conclusions from the one-way ANOVA test comparing the mean time spent on the new page
among English, French, and Spanish language users:
The p-value obtained from the test is approximately 0.432. With a significance level of α = 0.05,
we fail to reject the null hypothesis. There is insufficient evidence to conclude that there is a
significant difference in the mean time spent on the new page among English, French, and
Spanish language users. Therefore, we cannot conclude that the mean time spent on the new
page varies significantly across different language groups.

3.6 Actionable Insights


Implement the new landing page: The analysis indicates that the new landing page design leads
to higher user engagement and conversion rates compared to the old page. Therefore, it is
recommended to fully implement the new landing page across the platform.
Localized content optimization: While language preference does not significantly affect user
behavior in terms of conversion and time spent on the page, there may still be opportunities to
optimize content based on language preferences. Consider tailoring content and
recommendations to better resonate with users based on their preferred language.
Continuous monitoring and optimization: User behavior and preferences may evolve over time,
so it's important to continuously monitor engagement metrics and conduct A/B tests to identify
opportunities for further optimization. Regularly analyze user feedback and iterate on the
landing page design and content to ensure it remains relevant and compelling to users.
User segmentation: Explore additional segmentation strategies beyond language preference,
such as demographics or user interests, to further personalize the user experience. By
understanding the unique needs and preferences of different user segments, E-news Express
can deliver more targeted content and improve overall user satisfaction.
Cross-channel promotion: Leverage other channels, such as email marketing or social media, to
promote the new landing page and drive traffic to the platform. Implement targeted campaigns
to reach specific user segments and encourage them to engage with the new page.

Conclusion and Business Recommendations


Time spent on the new page: Users spend more time on the new landing page compared to the
old landing page. This is supported by a significant difference in mean time spent between the
two pages.
Conversion rate: The conversion rate for the new page is higher than for the old page, indicating
that the new page design is more effective in attracting subscribers.
Impact of preferred language on conversion: There is no significant dependency of the
converted status on the preferred language. Conversion rates appear to be consistent across
different language preferences.
Effect of language on time spent: The analysis suggests that the mean time spent on the new
page does not significantly vary across different language preferences.
Based on these findings, the following recommendations can be made:
Implement the new landing page: Given the higher time spent and conversion rate observed for
the new page, it is recommended to fully implement the new landing page design.
Localized content: While language preference does not significantly impact conversion or time
spent on the page, consider further personalizing content based on language to enhance user
engagement.
Continuous monitoring: Regularly track user engagement metrics to identify any changes in
user behavior and make iterative improvements to the landing page design and content.

Actionable Insights: Language Preferences and Conversion:


Despite language preferences not significantly influencing conversion, monitor user behavior
over time to identify potential changes or trends. User Engagement on New Page:
As language preferences don't impact time spent on the new page, focus on broader strategies
to enhance user engagement, such as optimizing content or improving page layout. Further
Analysis:
Explore additional factors that might contribute to user behavior, such as content relevance or
page aesthetics, to gain a more comprehensive understanding. Business Recommendations:
Optimize Content:
Continuously refine and optimize content on the new landing page to ensure it aligns with user
interests and encourages engagement. User Feedback:
Collect user feedback to understand their preferences and expectations, aiding in further
improvements to the online experience. Multivariate Testing:
Consider conducting multivariate testing to assess the impact of various elements
simultaneously, allowing for a more nuanced understanding of user behavior. Localization
Efforts:
While language preferences may not impact engagement significantly, ensure that localization
efforts are effective and consider regional content variations. Long-Term Monitoring:

You might also like