Lecture02 Slides
Lecture02 Slides
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 0/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Today’s Question
A. a character variable
B. a binary variable
C. a continuous variable
D. an observation
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 1/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 2/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
! A dataset is a structured
collection of data.
! Datasets are typically organized variables
as dataframes, where rows are 1 2 ...
observations and columns are ↓ ↓
variables.
1 →
! Dataframe vs. matrix: observations 2 →
The latter is restricted to
...
containing data all of the same
type* (i.e. numeric, integers,
logical and character).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 3/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Example of a Dataframe
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 4/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
What is an observation?
承載嘅以⾥野
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 5/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
數紫
What is a variable?
⼀
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 6/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Notation
When defining a new variable, we represent the variable and its contents in the
following format:
X = {10, 5, 8}
! On the left-hand side of the equal sign, we identify the variable name :
! What is the name of the variable here?
! On the right-hand side of the equal sign and inside curly brackets, we have
the content of the variables: multiple observations, separated by commas.
⼀
73
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 7/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 8/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 9/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Binary variable:
! It can take only two values: 1s and 0s
! They represent the presence/absence of a trait:
! 1 if individual i has the trait Yes ho q
! 0 if individual i does NOT have the trait
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 11/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Non-binary variable:
! Categorical variable: if a variable has a finite set of categories.
! Nominal (categorical): variable with unordered categories.
! Example: race, hair color, types of food, method of travel to work
已非名
level
! Ordinal (ranked): variable with ordered categories.
! Example: educational level, the level of satisfaction, steak doneness
可數的
! Count variable: values are a form of counts (0, 1, 2, 3, and so on).
! Example: the number of students enrolled in schools, the number of sunny
days per year, the number of cigarettes one person can smoke per day.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 12/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Data Types
asewide
! Conventional
1 Cross-sectional data
2 Time series data
3 Pooled cross sections data
4 Panel (or longitudinal) data
uge receut yms,
! Unconventional
1 Textual data
2 Network Data
3 Spatial Data
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 13/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Conventional Data
隨機抽樣
Convntional !: Cross-sectional Data
! It consists of a sample of individuals, households, firms, cities, states,
countries, or a variety of other units, taken at a given point in time.
平均值
! Minor timing difference within a year in collecting the data would be ignored.
抽樣
! It can be obtained by random sampling from the underlying population.
! Random sampling: a method to randomly select a sample of observations
from the target population.
not H after to the data
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 14/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Conventional Data
by time
Conventional ": Time series Data
! It consists of observations on a variable or several variables over time.
! Example: Stock prices, money supply, consumer price index, GDP etc.
! Time is an important dimension:
! Past events can influence future events.
! Lags in behavior are prevalent in the social sciences.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 15/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Conventional Data
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 16/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Conventional Data
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 17/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Unconventional Data
! Digitized textual data through email, websites, social media messages, news
report, government documents, health records, digitized published articles
and books etc.
! Example: The disputed authorship of The Federalist Papers
! The Federalist consists of 85 essays attributed to Alexander Hamilton, John
Jay, and James Madison from 1787 to 1788.
! Because both Hamilton and Madison helped draft the Constitution, scholars
regard The Federalist as a primary document reflecting the intentions of the
authors of the Constitution.
! Among all the essays, 73 of them are uncontested; for 12 essays, the
authorship is under debate.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 18/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Unconventional Data
! The text of the 85 essays is scraped from the Library of Congress website
and stored as fpXX.txt , where XX represents the essay number ranging
from 01 to 85.
! Scraping: an automated method of data collection from websites using a
computer program.
The Federalist Papers data
! Methods: to distinguish the authors on the the basis of their writing style:
! Filler words: upon, by, and to at different rates.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 19/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Unconventional Data
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 20/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Unconventional Data
! An adjacency matrix:
! The entries represent the existence of relationships between two units (one
presented by the row and the other represented by the column).
! 1 indicates the existence; 0 indicates no relationship.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 21/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Unconventional Data
! Spatial data contain information about patterns over space and can be
visualized through maps.
! Main types of spatial data:
! Spatial point data represent the locations of events as points on a map.
! Spatial polygon data represent geographical areas by connecting points on a
map.
! Spatial-temporal data: a set of spatial point or polygon data recorded over
time, revealing changes in spatial patterns over time.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 22/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Unconventional Data
! Uneven geographical distribution but the pattern has changed over time.
! A decreasing effect of revolutionary base and an increasing effect of
education
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 23/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Functions
Use functions in R
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 24/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Functions
A function
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 25/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Functions
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 26/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Function Formats
! When multiple arguments are specified inside the parentheses, they are
separated by commas , : function_name(argument1, argument2)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 27/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Function Formats
! We specify any optional arguments we want next and include their names:
! function_name(required_argument,
optional_argument_name = optional_argument)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 28/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Function Formats
USING R FUNCTIONS:
function_name(required_argument)
or
function_name(required_argument,
optional_argument_name = optional_argument)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 29/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Examples
Fictitious example:
! Suppose R were capable of baking and that it had a function named bake()
that, by default,bakes the specified ingredient for 60 minutes at 400◦ F.
! Required argument: the ingredient
! Example: cake_mix
! Optional arguments: named degrees and minutes to change the default
temperature and duration of the bake, respectively.
! degrees = 350 changes temperature to 350◦ F
! minutes =30 changes duration of bake to 30 minutes
! The following code would ask R to bake a cake mix for 30 minutes at
350◦ F, so that we can have cake as the output:
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 30/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Examples
! sqrt is the name of the function, which, as all function names, is followed by
parentheses ().
! 25 is the required argument.
! 5 is the output.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 31/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Examples
Example: round() can be used to round some value to the nearest whole number.
! When calling a complicated function, it’s not easy to remember which one
argument comes first. ⇒ Make use of argument names.
round( x = 3.14165, digits = 2 )
## [1] 3.14
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 32/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Some Notes
Some notes
! Examples:
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 33/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 34/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Some Notes
! Folder and files:
! Create a folder title SOSC1100 on your Desktop.
! Go to download voting.csv and HKBarometerL2.csv from Canvas and save
them in the folder SOSC1100.
! To follow the R demonstration in this lecture, you can choose:
! to create a new R script in RStudio; or,
! to download the Lecture02_Exercise.R from Canvas; save it in
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 36/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Variable Description
birth Year of birth of registered voter
Whether registered voter received message:
message
"yes", "no"
Whether registered voter voted:
voted
1=voted, 0=didn’t vote
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 37/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Overview
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 38/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
setwd("∼/Desktop/SOSC1100") # if Mac
setwd("C:/user/Desktop/SOSC1100") # if Windows
! Note: In Windows code, user is your own username.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 39/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 40/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 42/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
! Example: dim(data)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 44/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Quit R
! If you quit RStudio, R will ask whether you want to save the workspace
image, which contains all the objects you have created during the R
session.
! I recommend that you do NOT save it.
! You can always re-create the objects by re-running the code in your R script.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 45/46
Data Basics Variable Types Data Types Functions Inspecting Data Conclusion
Today’s Lecture
! Data/dataset/dataframe
! Observations and variables
! Variable types: character vs. numeric; continuous vs. discrete
! Data types: conventional and unconventional
! Using functions: (), sqrt(), round(), #
! Loading and viewing data: setwd(), read.csv(), View(), head(), dim()
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 46/46