0% found this document useful (0 votes)
3 views

Mrm Assignment 3

The document explains data processing as the systematic collection, manipulation, and organization of data to generate meaningful information, involving steps like data collection, input, processing, storage, output, and analysis. It also discusses normality and stationarity tests, which assess data distribution and time series consistency, respectively, along with the concepts of null and alternative hypotheses, Type 1 and Type 2 errors in hypothesis testing. Lastly, it distinguishes between parametric tests, which assume normal distribution, and non-parametric tests, which do not, highlighting their respective applications.

Uploaded by

brunda.c24
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Mrm Assignment 3

The document explains data processing as the systematic collection, manipulation, and organization of data to generate meaningful information, involving steps like data collection, input, processing, storage, output, and analysis. It also discusses normality and stationarity tests, which assess data distribution and time series consistency, respectively, along with the concepts of null and alternative hypotheses, Type 1 and Type 2 errors in hypothesis testing. Lastly, it distinguishes between parametric tests, which assume normal distribution, and non-parametric tests, which do not, highlighting their respective applications.

Uploaded by

brunda.c24
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Assignment-3

1) What is meant by data processing explain clearly

Ans:Data processing refers to the collection, manipulation, and organization of data to


produce meaningful information. It involves a series of steps that transform raw data into a
more usable format.

The key points of data processing are:

1. Data Collection: This is the first step where data is gathered from various sources. This
could include surveys, sensors, transactions, or online activities. The data collected can be in
different forms, such as numbers, text, images, or videos.

2. Data Input: Once the data is collected, it needs to be entered into a system for processing.
This can be done manually or through automated means, such as data entry software or direct
data feeds from devices.

3. Data Processing: This is the core of data processing. It involves using algorithms and
software to manipulate the data. This can include sorting, filtering, aggregating, and analyzing
the data. The goal is to convert raw data into a more understandable format, such as tables,
graphs, or reports.

4. Data Storage: After processing, the data needs to be stored for future use. This can be done
in databases, cloud storage, or other forms of data repositories. Proper storage ensures that
the data is secure and can be easily accessed when needed.

5. Data Output: The final step is to present the processed data in a way that is useful for
decision-making. This could involve generating reports, visualizations, or dashboards that
summarize the findings and insights derived from the data.

6. Data Analysis: Often, the processed data is further analyzed to derive insights or make
predictions. This can involve statistical analysis, machine learning, or other analytical
techniques.

In summary, data processing is a systematic approach to handling data to extract valuable


insights and support decision-making. It plays a crucial role in various fields, including
business, science, healthcare, and technology.

2) Explain normality test and stationery test

Ans: Normality Test

A normality test is a statistical procedure used to determine whether a dataset follows a


normal distribution (bell-shaped curve). This is important because many statistical methods
assume that the data is normally distributed. Here are some common normality tests:

1. Shapiro-Wilk Test: This test evaluates the null hypothesis that the data is normally
distributed. A low p-value (typically below 0.05) indicates that the null hypothesis can be
rejected, suggesting the data is not normally distributed.

2. Kolmogorov-Smirnov Test: This test compares the sample distribution with a normal
distribution. Similar to the Shapiro-Wilk test, a low p-value suggests that the data does not
follow a normal distribution.

3. Anderson-Darling Test: This is another test that checks if the sample comes from a
specified distribution, including normal. It gives more weight to the tails of the distribution,
which can be useful in certain analyses.

Stationarity Test

A stationarity test is used in time series analysis to determine if a time series has a constant
mean and variance over time. A stationary time series is important for many statistical
modeling techniques, such as ARIMA models. Here are some common tests for stationarity:
1. Augmented Dickey-Fuller (ADF) Test: This test checks for the presence of a unit root in the
time series. The null hypothesis is that the time series is non-stationary. A low p-value
indicates that you can reject the null hypothesis, suggesting that the series is stationary.

2. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test: This test has a different null hypothesis,


which is that the time series is stationary. A high p-value suggests that the series is stationary,
while a low p-value indicates non-stationarity.

3. Phillips-Perron Test: Similar to the ADF test, this test checks for a unit root in the time
series data but adjusts for serial correlation and heteroskedasticity in the errors.

Summary

Normality Test: Checks if a dataset follows a normal distribution, using tests like Shapiro-Wilk,
Kolmogorov-Smirnov, and Anderson-Darling.

Stationarity Test: Assesses whether a time series has constant mean and variance over time,
using tests like ADF, KPSS, and Phillips-Perron.

Both tests are crucial for ensuring that the assumptions required for various statistical
analyses are met.

3) Explain the following, null hypothesis, alternative hypothesis, type 1 error, type 2 error

Ans: Let’s go through each of these concepts one by one.

Null Hypothesis

The null hypothesis, often denoted as H0, is a statement that indicates no effect or no
difference in a particular situation. It serves as the default assumption that there is no
relationship between two measured phenomena. For example, if you're testing a new drug, the
null hypothesis might state that the drug has no effect on patients compared to a placebo.
Alternative Hypothesis

The alternative hypothesis, denoted as H1 or Ha, is the statement that contradicts the null
hypothesis. It suggests that there is an effect, a difference, or a relationship present. In the
drug example, the alternative hypothesis would state that the drug does have an effect on
patients compared to a placebo.

Type 1 Error

A Type 1 error occurs when the null hypothesis is incorrectly rejected when it is actually true.
This means that you conclude there is an effect or a difference when, in reality, there isn't one.
The probability of making a Type 1 error is denoted by the alpha level (α), commonly set at
0.05. This means there is a 5% chance of incorrectly rejecting the null hypothesis.

Type 2 Error

A Type 2 error happens when the null hypothesis is not rejected when it is actually false. In
this case, you fail to detect an effect or difference that is present. The probability of making a
Type 2 error is denoted by beta (β). The power of a test, which is 1 - β, indicates the probability
of correctly rejecting the null hypothesis when it is false.

Summary

Null Hypothesis (H0): Assumes no effect or difference.

Alternative Hypothesis (H1): Assumes there is an effect or difference.

Type 1 Error: Incorrectly rejecting H0 when it is true (false positive).

Type 2 Error: Failing to reject H0 when it is false (false negative).

These concepts are fundamental in hypothesis testing and help researchers make informed
decisions based on their data.

4) Distinguished parameteic and non- parametric test

Ans:Let's discuss parametric and non-parametric tests.

Parametric Tests

Parametric tests are statistical tests that make certain assumptions about the parameters of
the population distribution from which the samples are drawn. These tests typically assume
that the data follows a normal distribution and that the variances are equal. Because of these
assumptions, parametric tests are generally more powerful and can detect differences more
effectively when the assumptions are met. Common examples of parametric tests include:

t-test: Used to compare the means of two groups.

ANOVA (Analysis of Variance): Used to compare the means of three or more groups.

Pearson correlation: Measures the strength of the linear relationship between two continuous
variables.

Non-Parametric Tests

Non-parametric tests do not assume a specific distribution for the data. They are used when
the assumptions for parametric tests are not met, such as when the data is not normally
distributed or when the sample sizes are small. Non-parametric tests are often used with
ordinal data or when the data is ranked. While they may be less powerful than parametric tests,
they are more flexible and can be applied to a wider range of data types. Common examples
of non-parametric tests include:

Mann-Whitney U test: Used to compare differences between two independent groups.

Wilcoxon signed-rank test: Used to compare two related samples.

Kruskal-Wallis test: Used to compare three or more independent groups.


Summary

Parametric Tests: Assume normal distribution and equal variances; more powerful when
assumptions are met (e.g., t-test, ANOVA).

Non-Parametric Tests: Do not assume a specific distribution; more flexible for various data
types (e.g., Mann-Whitney U test, Kruskal-Wallis test).

Choosing between parametric and non-parametric tests depends on the data characteristics
and the specific research question.

You might also like