BA Notes
BA Notes
• dplyr: A powerful package for data manipulation in R. It provides verbs like filter,
select, mutate, and arrange to efficiently clean, transform, and summarize data
frames.
• pacman: A package manager for R, making it easy to install, update, and manage other R
packages. It streamlines the process compared to the base install.packages function.
• stringr: Offers a collection of functions for string manipulation in R. It provides tools for
cleaning, extracting, replacing, and formatting text data.
• tm: A text mining package for manipulating and analyzing text data in R. It provides
tools for cleaning, tokenization (splitting text into words), stemming/lemmatization
(reducing words to their base forms), document-term matrix creation, and more.
• ggplot2: A popular package for creating elegant and customizable statistical graphics in
R. It offers a grammar-based approach to build complex plots.
• NLP (Natural Language Processing): This broad term encompasses various packages and
techniques for dealing with text data, including tm, SnowballC, and others mentioned
here.
Base R Functions:
Statistical Functions:
• corplot: (package depends on which one you're using) Creates a correlation matrix plot,
visualizing pairwise correlations between variables.
• corrgram: (package depends on which one you're using) Similar to corplot, creates a
correlation matrix plot.
• psych: A package offering various psychometric functions, including correlation
analysis.
• lattice: A package for creating trellis graphics, including correlation plots.
• corr.ci: Computes confidence intervals for correlation coefficients.
• splom: Creates a scatter plot matrix, visualizing relationships between all pairs of
variables.
• cor.test: Performs a correlation test to assess the statistical significance of a correlation
coefficient.
Other Functions:
• abline: Adds a horizontal or vertical line to an existing plot, often used to emphasize
specific values (e.g., mean, median).
• wilcox.test: Performs a non-parametric Wilcoxon signed-rank test for paired data,
comparing medians.
• kruskal.test: Performs a non-parametric Kruskal-Wallis test for comparing medians
across multiple groups.
• Parametric: Use a t-test. It compares the means of two independent groups assuming
normally distributed data.
• Non-parametric: Use the Mann-Whitney U test. This test compares the medians of two
independent groups without assuming normality.
• Parametric: Use a paired t-test. It compares the means of two related groups assuming
normally distributed data.
• Non-parametric: Use the Wilcoxon signed-rank test. This test compares the medians
of two related groups without assuming normality.
These tests provide essential tools for understanding relationships and making informed
conclusions from your data, considering its underlying characteristics.