0% found this document useful (0 votes)
20 views2 pages

Dao2702 Cheat Sheet

Uploaded by

wanzhenxp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views2 pages

Dao2702 Cheat Sheet

Uploaded by

wanzhenxp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

lOMoARcPSD|32422700

Lecture 2: Intro to Python Programming: Types of Objects Lecture 3: Control Flow of Python Programs Iterate Characters of String 2. Deleting items – remove() - only removes the first occurrence of the input
*The type of an object is very important because it defines the possible values for Syntax Rule of ‘If’ statement Go directly to the next iteration with continue argument, pop() - to remove items according to their position indexes, if not
objects (e.g., int type objects can only have integer values); It determines the The statement is started by the keyword if; The if specified remove the last item
operations that the object supports. keyword is followed by a boolean type (True or 3. Searching for an item – index() - returns the position index of a given item in the
print(type(something)) – determine type False) condition and a colon; Code block 1 is list, can only find the index of the first appearance of the given value.
5 is int - indicating signed integers. executed if the condition is True, and it is skipped 4. List as iterables – list
2.3 is float, indicating floating point (fractional) numbers. The float type objects if the condition is False. comprehension: enable you to
are numbers with the decimal point. replace the loop statements
('Hello!')) is str, representing a string of characters. The str type objects are ‘Else’ The False branch is started by the keyword with a single-line expression in
created by characters surrounded by either single or double quotation marks. else and a colon, Execute Code block 2 if the Range (start,stop before ‘stop’,step) [non-specified: default] the format of [expression for
Addition “+”: Numerical addition: print (2.3 + 5), Combining strings: print ('Hello Boolean type condition is False Range (2,5) – start at 2 & stop before 5, Range(5) – stop before 5 item in iterable].
' + 'World’) Qn: Given a list of words, create a new list
Multiplication “*”: Numerical multiplication: print (2.3 * 5), Repeat the string: ‘If + Elif + else’ Code block that includes all words starting with a vowel
print ('Hello ' * 5) 2 is executed if Condition letter (A, E, I, O, or U).
A variable can be created by an assignment statement, which associate the 2 is True and all previous
variable name with an object. An assignment statement can be broken into the conditions are false; Code
following components: block 2 is indicated by the Tuples and Dictionaries : Tuples are
Variables and Assignment Statement: A variable can be created by an same indentation as Code Comparison sequences of data items, can be
assignment statement, which associate the variable name with an object. An block 1 and Code block 3. created as data items separated by
assignment statement can be broken into the following components: Assignment The if-else ternary expressions Nested if-statements commas, with or without parentheses.
operator: ‘=’ An empty tuple is created by empty parentheses. In creating tuples with only one
Variable(s): name(s) on the left.; Object(s): expression(s) on the right. item, the comma is necessary in the statement.
Notes: syntax for variable names: Only one word, consist of letters, numbers, and Basic features of tuples : 1. Iterable, 2. same len() function, indexing and slicing
underscores, cannot begin with a number, avoid contradictions with Python expressions, operations with + and * as strings and lists, 3. Immutable 4. No
keywords or other variable/function names Strings and Lists: ways to create strings: 1. The output of the input() function method is defined for tuple objects
- str type object; 2. Convert objects of other types to strings by the str() Tuple Unpacking 1. In tuple unpacking,
function; 3. Expressions of strings involving + (for concatenating strings) or * the left-hand-side is a tuple of
(for duplicating strings). Len() : return no of characters in string/no. of items variables, and the right-hand-side is an
in list arbitrary iterable data sequence.
Accessing string: a = String [9] = l, print(a), Slicing:[start:stop:step] 2. can be applied for iterating over
Qn: A programming course is taken by both full-time and part-time students. Their multiple iterable data structures via
grades for this course are calculated separately, as shown in the table below. Write [:] – take all characters of string, for loops. 3. The only requirement is that the number of variables matches the
a program that asks students two questions: Are you a full-time student? Yes/No, [2:] – start from 2, stop & step number of items in the data sequence.
What is your score? and then prints out the corresponding grades of the student. default, zip(): iterate items from two lists in parallel. It can be applied to other iterable
Data Type Conversion: Int() converts Loops and Iterations: Repeated execution of a set of code : iteration. 1 repetition [:9] – stop at 8 (9-1), start and step default, [2:9], start from 2, stop at 8, step types, such as lists or tuples. It can be used to iterate more than two data
an object into an int type object, and of a code block = one iteration. default, [::3] – start & stop default, step = 3, [::-1] - greetings = "Hello World", sequences. The given data sequences are allowed to have different lengths. The
float() converts an object into float print(greetings[::-1]), result: dlroW olleH iteration stops when the shortest sequence is running out of items. Zip sequence is
type. Same multiplication operation Methods of Strings tuples with n elements
will give different results as variables Dictionaries are coded in this format: {‘AMZN’:190.90} and consist of a series of
are of different data types. comma-separated key:value pairs. Dictionary keys must be hashable (not
Input() generates a prompt that allows users to give an input. The output of the changeable). They are often strings, but can actually be any of Python’s immutable
input() function is always a str type object, regardless of the content of users' types: boolean, integer, float, tuple, and others.
inputs, so in this example, the input temperature is a str and cannot be directly Qn: Write a program to count the number of letter "a"s (either upper case or lower Basic features of dictionaries : Iterable, The same len() function, Indexing via keys
used for numerical calculation. As a result, the str type input must be converted case) in a given string. Mutable Data Arrangement of Dict: Exp: Stock = {‘AMZN’: 170.40}, Access via keys:
into numerical values first, so that the mathematical equation for temperature stocks[key], format to dict: print('{}: {}'.format(key, value)), Changing values:
conversion can be applied. stocks['AAPL'] += 0.50 #Increase the value of 'AAPL' by 0.50, Adding new pair of
key and value: stocks['ZM'] = 76.30 #Add a new key ZM and new value 76.30
Qn: The variable words is a list containing words of a song. Create a dictionary
Syntax Rule of ‘While’ Loop : change status to false / while True: where the keys are all words appearing in the song, and the values are the numbers
of appearances of these words.
Comparison Operators (left) : used to compare two values, and a Boolean value Count() –
(True/False) is returned. count = string.lower().count('a'), print(count)
Format() –
exam = 85, grade = 'A+',
text = 'Your exam marks:
{}, your grade:
Iterating over keys: for item in variable.keys(): (indented) print(item) or for item
{}'.format(exam, grade),
in variable: (indented) print (item) Access values of dictionary data items: for
print(text)
item in variable: (indented) print('{}:{}'.format(name, variable[])) Iterating over
values: for item in variable.values(): print(item) Iterating over keys and values in
parallel: for item in variable.items(): print(item) Unpacking: for keys, value in
Membership Operators (right): used to test if a sequence is presented in an
variable.items(): print('{}: {}'.format(name,
object. Replace() string_us = 'Coffee enhances my modeling skills.', string_uk =
price))
Logical operators: used to combine conditional statements. string_us.replace('modeling', 'modelling') Result: Coffee enhances my modelling
Qn: Write a program to group these works
Note: Syntax Rule for ‘For’ Loops skills.
into two dictionaries: concertos that contains
True and False: False; Lists is a collection
all concertos, and symphonies that contains
True or False: True; of data (with the
all symphonies.
Not True and not False: False; same or mixed types), created by placing all the items (elements) inside a square
Summary of Built-in Data Structures
Not True or not False: True; bracket [], empty list created by: empty brackets
Similarities between list and string: Items in a list can be accessed via the same
Qn: Given x as a number, create a boolean expression that is True if it contains the indexing and slicing system as strings. The + and * operators can be used to
digit 3 or 5, otherwise the boolean expression is False. concatenate or duplicate items in a list.
Differences between list and string: Lists are mutable → can modify part of the list
elements. String is immutable.
List Methods:
→ b = 2, d = 2., b//a, result type is int 1. Adding items by append() – append new item, extend() – item in other list added
to current list, insert() -- first argument is the index of the position that the new
item is inserted & the second input argument is the data item to be inserted.
Downloaded by wanzhen li ([email protected])
lOMoARcPSD|32422700

Parentheses and brackets plt.title('Time Trend of Stock the given data frame (or series) object.; set_index(): convert columns into row Qn: Find the value of x , such that the probability of having a return worse than x
Returns', fontsize=14) index while default indexes are discarded .reset_index(): row indexes can be is no larger than a given probability of alpha = 0.01? [x = norm.ppf(alpha, mean,
plt.legend(loc='upper left', converted into a column of the data frame, recovering the default integer row std)] VaR with 1−𝛼=0.99 confidence.
fontsize=12); plt.ylim([-10, indexes after performing boolean filtering or dropping missing Scipy-stat module: from scipy.stats import <object>; <object>.<method>(value,
35]) → Specify the y-limits of values, .reset_index(drop=True): drop extra column (original index aft dropna()) shape_param1, shape_param2, ...); <method> specifies the distribution function,
the axes; after resetting index; Element-wise arithmetic operations: (addition, subtraction, value: random variable Poisson P(X=4) = poisson.pmf(4, mu); P(4 <= X <= 10) =
plt.x/yticks(fontsize=10) → multiplication, etc.) on series without using loops poisson.cdf(10, mu) - poisson.cdf(3, mu), plot: x = np.arange(21), pmfs =
Specify the font size of axes Vectorised string operation – all string records are processed simultaneously with poisson.pmf(x, mu)
ticks; plt.grid → display gridline a single function/method call - .str.lower()/.str.replace(‘ ’, ‘—‘) / str.count(‘A’) Expected Value and Variances
Functions, Modules and Packages Lovely Pandas: Datasets Slicing - str[:2].astype(int)
Functions that return output - .float(2) #float returns 2.0 as the output, Dataset: rectangular array, with variables in columns and observations in rows. A Sweet Numpy: import numpy as np
‘abc’.upper() #String method upper returns 'ABC' as the output sum([1, 2, 3]) variable (or field) is a characteristic of each individual in the dataset. Array-like objects: range, list, tuple, and dict; Scalars: int, float, bool, each object
#sum returns 6 as the output, x.pop(2) #pop returns the removed item as the Types of variables : Numerical data: wages, years of education and potential has only a single value.
output Functions that don’t return output Print(), Append(), Insert(), Extend() experience, Categorical data: gender, and marriage status [Boolean: True = 1, 1D arrays: shape of the array & number of data
Non & Non-Type: and False = 0] items in the array = length
If we place non- Panda Series Data Structure: pd.series → representing 1D of indexed data, can 2D arrays: shape of the array = row number and Variances Properties:
returning output be created from another 1D array-like object, such as a list or a tuple column number; number of data items = the row Var(𝑐)=0 ;Var(𝑎𝑋+𝑐)=𝑎2Var(𝑋);
functions on the Attributes of series: Each attribute can be accessed via the . operator, without number x column number. Var(𝑎𝑋+𝑏𝑌+𝑐)=𝑎2Var(𝑋)+𝑏2Var(𝑌)+2𝑎𝑏Cov(𝑋,𝑌).
right-hand-side of parentheses, example: variable.values, variable.index (access start,stop,step) Random Sampling
assignment statement, we will receive a special value None as the result. The type Specify index index = ['Mary', 'Ann', 'John', 'David', 'Frank', 'Ben'], exper = Attributes of Numpy Data Array: ndim: the Random Number Generator:
of the None object is NoneType. pd.Series([2.0, 22.0, 2.0, 44.0, 7.0, 9.0], index=index) #Specify series indexes dimension number of the array; shape: the shape of the array, given as a tuple; np.random.normal(5, 2.5,
Create a function Data type of all data elements is given as the dtype attributes. size: the total number of data items. dtype: the data type of all data items. float64 size=1000) //
Data type conversion: educ_int = = float, indicates floating point numbers. np.random.poisson(20,
educ.astype(int/str/other types) Creating Numpy Array size=1000)
Indexing and slicing of pandas.Series & np.one (length or
dataframe: Iloc[] - integer-position based, shape) / np.zeros:
Loc[] - label based indexes **for loc[], the create arrays with all
item indexed by stop are included in the items to be ones or
selection. For iloc[], stop index is exclusive of zeros.
the selection. Arange() A sequence of
Data Frame → 2D data structure, where the columns represent different numbers are generated according to the
variables (fields), and the observations (records) are represented by rows. values of the start, stop, and step arguments. Sample variance (or sample standard deviation) in numpy: 1. q6.var(ddof=1) 2.
Functions that take multiple Arguments and return more than one output
Create Data Frame → Data_frame = pd.DataFrame(data_dict) Differences compared to range: are 1) the Convert the NumPy array into a series: pd.Series(q6).var()
1.Positional: each argument is passed to the function according to their positions
Attributes of Data Frames: Data_frame.column, Data_frame.index, output is an numpy.ndarray type object; 2) Simulation for Decision Making:
in the definition of the function (positional must follow given order)
Data_frame.dtypes **iloc[:, 2] → all rows selected; iloc[2:5, :] → all columns numbers in the array could be fractional numbers but the range only allows
selected Select single column: data_frame[‘educ’] integers.; Indexing and slicing of arrays similar to list indexing and slicing Example:
Changing Dataframe: Change the value in data frame: Data_frame.loc[2:3,’educ’] print(array_2d[[0, 2, 1], 1:]) *row zero, two, one Example: print(array_2d[-2:, ::-1])
= 9.0; Create new columns: Data.frame.loc[:,’remarks_2’] = ‘none’; Boolean series *[::-1], take from the back to front
and Boolean indexing: “and”(&), “or”(|), “not (~); Access or to modify a subset of Array Vectorised Operation
Sampling Distribution of Sample Mean
rows that satisfy some given conditions & boolean indexing to make changes on print(array_2d + 3) #Add one scalar to each element; print(array_2d * 2) #Each
some records of the remarks filed.: element is multiplied by a scalar; print(array_2d + or * array_2d)
Read data from files: Data = Broadcasting: reshape() method is used to create a new array with the same
pd.read_csv(‘wage.csv’) element values but different shape. Summary of broadcasting rules: If the two
.head(row) to show the column labels arrays differ in their number of dimensions, the shape of the one with fewer
and typical values of each variable. dimensions is padded with ones on its leading (left) side.; If the shape of the two
Confidence interval & Hypothesis Testing: C.I = Estimate +/- Margin of Error
Basics of Descriptive Analytics: Centers, variations and extreme points arrays does not match in any dimension, the array with shape equal to 1 in that
2. Keyword: allows passing arguments according to the keywords of the arguments Data.mean(), Data.median(), Data.var(), Data.std(), Data.max() dimension is stretched to match the other shape. ; If in any dimension the shape
**In Python programming, all positional arguments must be specified before Counts of each unique value of a given series: Data[‘gender’].value_counts() items disagree and neither is equal to 1, an error is raised.
keyword arguments, otherwise there will be an error message. Proportion of unique values: value_counts(normalize=True) Functions and Array Method
Data.corr(): correlation, data.cov(): covariance; Data.describe(): Obtain the key np.log(3) #Natural logarithm of 3
descriptive measures np.exp(np.arange(1, 3, 0.5)) #Natural
Boxplot: "minimum" value: Q1 -1.5*IQR, The first quartile: Q1, The second exponentials of 1, 1.5, 2, 2.5
Known Pop s.d: z_alpha2 = norm.ppf(1-alpha/2), moe = z_alpha2 * sigma/n**0.5;
quartile/median: Q2, third quartile: Q3, The “maximum” value: Q3 + 1.5*IQR, np.square(np.arange(3)) #Squares of 0, 1,
lower = estimate - moe upper = estimate + moe
Code: plt.boxplot(data['wage'], vert=False) #Create a horizontal boxplot 2; np.power(2, np.arange(3) #2 to the
Unknown pop s.d: t_alpha2 = t.ppf(1-alpha/2, n-1)
3. Scope and namespaces: Namespace: collection of names. It maps names to plt.xlabel('Hourly wages ($)', fontsize=14) #Label of the x axis, plt.show() power of 0, 1, 2
Sample proportion = p-hat = m/n; pmf = binom(m,n,p)
corresponding objects. The same name can be used if they refer to objects in **Observations or samples outside of the 1.5× IQR ranges are recognized as Aggregation methods:
Hypotheses: always refer to population parameters, such as 𝜇 . The three types
different scopes. For example, the same name can be used to define variables in "outliers", as they are too distant from the other observations. sum()/max()/min()/ mean()/ var()/std() -
of tests also applied to population proportions. 1. Two-tailed test (𝜇 could vary
different functions. 3 layers of scopes: local, global, and built-in. Local variable: A Histogram plt.hist(data['wage'],bins=20,color='b',alpha=0.3); #Histogram of population by default
from 𝜇0 in two directions) → 𝐻𝑎: 𝜇≠𝜇0 (constant) 2. Left-tailed test → 𝐻𝑎:
variable declared inside the function's body Global variable: A variable declared wages with 20 bins, Colour is blue, opacity is 0.3; plt.xlabel('Hourly wages ($)', Review of Probability Theory
outside of the function. Global variable can be accessed inside or outside of the fontsize=14), plt.ylabel('Frequency', fontsize=14), plt.show() Rule of Complements: If 𝐴 is any event, then we have 𝑃(𝐴)=1−𝑃(𝐴𝑐), where 𝐴𝑐 𝜇<𝜇0 3. Right-tailed test → 𝐻𝑎: 𝜇>𝜇0
function. Visualize the relations between variables using scatter plots: is the complement of 𝐴 , i.e. the event that 𝐴 does not occur.; General Sampling Distribution: Null hypothesis assumes the population mean 𝜇=𝜇0
Importing Modules: by the keyword import, followed by the module name & as plt.scatter(data['educ'], data['wage'], color='b', alpha=0.4), plt.xlabel('Education Addition Rule: For events 𝐴1 and 𝐴2 , (𝐴1 or 𝐴2)=𝑃(𝐴1)+𝑃(𝐴2)−𝑃(𝐴1 and 𝐴2); 1. Population standard deviation = 𝜎 , 𝑧 -test model
<shortform>. Objects from modules can be called via the syntax (Years)', fontsize=14), plt.ylabel('Hourly Salary (Dollars)', fontsize=14), plt.show() ME event: 𝑃(𝐴1 and 𝐴2)=0 , so 𝑃(𝐴1 or 𝐴2)=𝑃(𝐴1)+𝑃(𝐴2); Conditional (standard normal distribution):
module_name.object_name. [example: statistics.mean(numbers)] Better control Use intersection of boolean to make a selection: subset = data.loc[data['married'] Probability: 𝑃(𝐴|𝐵)=𝑃(𝐴 and 𝐵)𝑃(𝐵) OR 𝑃(𝐴 and 𝐵)=𝑃(𝐴|𝐵)𝑃(𝐵);
namespaces: from statistics import mean as me & (data['gender'] == 'M')], plt.scatter(subset['educ'], subset['wage'], alpha=0.5, Multiplication Rule for Independent Events: 𝑃(𝐴 and 𝐵)=𝑃(𝐴)𝑃(𝐵). 2. Unknown population
Matplolib: Line Plot(plt.plot), Bar Chart (plt.bar). Scatter Plot color='b') Storytelling with Data: Pre-processing with Pandas PMF: calc single probability of each scenario X: number of trials succeeded; N: standard deviation then
Handling missing data: .isnull() -- returns True if the item in the data frame or total number of trials; P: prob of succeed trials the sample standard
series is missing .notnull() – returns True if items not missing CDF: cumulative distribution, deviation 𝑠 is used, t-test model
3. Null hypothesis assumes the population proportion 𝑝=𝑝0 , we choose the 𝑧 -
test model (standard normal distribution):
𝑃 -Value: : the probability of getting
sample data at least as inconsistent with
prob = 1 - binom.cdf(x, n, p); Norm.cdf (x, mean, std): CDF(3) = PMF(1) + PMF(2) + the null hypothesis (and supportive of the alternative hypothesis) as the sample
*change plt.bar() to plt.plot for line chart PMF(3), P(X>4) = 1- CDF(3) data actually obtained
Dropna() & Dropna(inplace = True)- drops all rows with any missing values, **Given a significant level 𝛼 , we reject the null hypothesis 𝐻0 in favour of the
Configuration of plot and figures from scipy.stats import norm // Inverse CDF = PPF = quantile function.
Plt.legend(font size = ) → plot legend Fillna() & fillna(0, inplace = True) – Replace NaN by other values alternative hypothesis, if the 𝑃-value is lower than the selected significance level
Plt.plot/bar/scatter(Label = ‘xxx’) → legend name Inplace (for drop, fillna, set and reset_index) -- Returns a new data frame (or series) 𝛼 ; Otherwise, we do not reject the null hypothesis.
object with the intended changes, Returns no output (None), but directly changes Population Proportion: E(p-hat) = np/n = p; Var(p-hat) = p(1-p)/n
Downloaded by wanzhen li ([email protected])

You might also like