CFA Level2 Notes 1Quantitative 2Economics
CFA Level2 Notes 1Quantitative 2Economics
Quantative Methods
R1: Multiple Regressions
1. Quantative Methods
Normal Distribution
Skewness 指尾部那边长,左偏or负偏就是指左边低长、右边高。
Skewness is the third central moment in a return distribution that is often used to measure the
distribution’s departure from symmetry
A distribution can have right (or positive), left (or negative), or zero skewness. A right-skewed distribution is
longer on the right side of its peak, and a left-skewed distribution is longer on the left side of its peak。
上图从左到右依次是:positive、zero、negative skewness。
In a negatively skewed distribution: Mode众数 > Median中位数 > mean平均数
https://www.caisgroup.com/articles/trend-following-and-the-diversification-potential-of-positive-skew
Basic concepts
RSS(Regression of Sum Squared Variations) + SSE(Sum of Squared Errors) = SST(Sum of squared Total varisions)
SSE = ∑(yi − y^i )2 . This gives a measure of the total deviation of the predictions from the actual values.
SSE also known as the Sum of Squared Residuals (SSR) or the redisual sum of squares。
RSS = ∑(y^i − Yˉ )2 measure the variance in the dependent variable that is explained by the independent
variable x.
SST = ∑(yi − Yˉ )2 measure the total variance in the dependent variable y.
Mean Squared Regression: MSR = RSS/k , k=1 for single lienar regression
Mean Squared Errors: MSE = SSE/(n − k − 1) ,
Standard Error of Estimate/Regression
SSE ∑ni=1 (yi − y^)2
SEE = MSE = =
n−k−1
n−k−1
SE(β^j ) 可以推导出不同置信概率下的取值范围。
βj −0
The t-statistic for a slope coefficient βj : =
SEβj
就是在做假设检验时,若空假设null hypothesis是这个系数=0,那么对应的t-statistic是多少。
如果一个系数的t-statistic的绝对值 > 1.0, 那么adjusted-R2 增加;反之这个系数的t-statistic的绝对值 <
1.0, 那么adjusted-R2减少。
The p-value for a slop coefficient: p-value = P(t>t.Stat_{beta_j}),有时等于P/2注意One-tail或者Two-tail
p-value越低,代表空假设发生的概率越低,p-value <significant level alpha, 则可以拒绝空假设。
While the adjusted R2 penalizes overfiting, it does not indicate the quality of model fit(like AIC/BIC), nor does it
indicate statistical significance of the slope coefficients. We can formally evaluate the overall model fit using an F-
test (discussed later).
R2 Coefficient of Determination
R2 = Explained variation
T otal variation = RSS
SST = SST −SSE
SST = T otal Variation−U nexplained variation
T otal variation
评估所有自变量对因变量的整体解释程度。值越大,模型拟合的越好
Both AIC and BIC evaluate the quality of model fit among competing models for the same dependent variable.
Akaike’s informaon criterion (AIC) is used if the goal is to have a better forecast, while the Schwarz’s
Bayesian information criteria (BIC) is used if the goal is a better goodness of fit.
AIC = n ln( SSE
n ) + 2(k + 1)
BIC对变量个数的惩罚因子更重。
AIC 和BIC都是值越小、模型越好
SSEU /(n−k−1)
测模型是否至少有一个变量得系数显著不等于零。
Linear F-test(原假设一般为几个变量的系数=0,如b2=b3=0),满足下列条件拒绝原假设。
Model Specification
Regression model specification is the selection of the explanatory (independent) variables to be included in a
model, and the transformations (if any) of those explanatory variables.
There are three primary assumption violations that you will encounter:
(1) heteroskedasticity,
(2) serial correlation (i.e., autocorrelation),
(3) multicollinearity.
For ecach assumption answer the following qustions:
a. What is it?
Heteroskedasticity 异方差性,就是不满足残差的同质性。 the variance of the residuals is not the same across all
observations in the sample
The standard errors are usually unreliable estimates. (For financial data, these standard errors are usually
underestimated, resulting in Type I errors.)
The F-test for the overall model is also unreliable.
2
Rresid
=R2 from a second regression (of the squared residuals from the first regression) on the
independent variables
BP test statistic 大于 对应的critical value,才拒绝原假设(没有condicitonal heteroskedasticity),结论原
数据有condicitonal heteroskedasticity。
Autocorrelation
Serial correlation, also known as autocorrelation, refers to a situation in which regression residual terms are
correlated with one another; that is, not independent. Serial correlation can pose serious problem with
regressions using time series data.
Residual serial correlation at a single lag can be detected using the Durbin–Watson (DW) statistic. A more
general test (which can accommodate serial correlation at multiple lags) is the Breusch–Godfrey (BG) test.
The DW statistic is designed to detect positive serial correlation of the errors of a regression equation.
DW statistic < critical value,则存在serial correlation
The Breusch–Godfrey (BG) test is for serial correlation.
the BG test statistic > the critical value, 则存在serial correlation。
In the presence of serial correlation, if the independent variable is a lagged value of the dependent
variable, then regression coefficient estimates are invalid and coefficients’ standard errors are deflated, so
t-statistics are inflated.
For instance, when lagged dependent variables are included in the explanatory variables, then it is
inappropriate to use this test.(Durbin Waston Test)
计算方式:给定一个回归模型,n个序列样本;①计算预测误差的平方和SSE, ②然后每个样本预测误差(y‘ -
y_true)减去前面那个样本的误差、得到difference,计算这n-1个difference平方的和。最后②除以①,就是durbin
Waston的值。
DW statistic = 2.0, indicating zero autocorrelation; > 2.0 indicates negative autocorrelation; below 2.0
mean there is positive autocorrelation
https://www.investopedia.com/terms/d/durbin-watson-statistic.asp#:~:text=The Durbin Watson statistic
is,above 2.0 indicates negative autocorrelation.
Multicollinearity 共线性
Multicollinearity值多个自变量之间的共线性
Detecting Multicollinearity
VIF(variance inflation factor) 排除自变量j重新算剩余变量的R2
V IFj = 1/(1 − Rj2 ), Rj2 值越大(共线约严重)、 V IFj 值越大
V IFj > 10 indicates serious multicollinearity issues requiring correction; V IFj > 5 warrants further
Leverage is a measure of the distance between the j th observaton of independent variable i relative to its
sample mean.
The sum of the individual leverages for all observations is k + 1. If a variable’s leverage is higher than
three times the average, [3(k + 1) / n], it is considered potentially influential.
any observation with a studentized residual whose absolute value exceeds the critical t-value is a
potentially influential observation.
Cook’s distance (Di) is a composite metric (i.e., it takes into account both the leverage and outliers) for
evaluating if a specific observation is influential.
e2i
Di =
hii
k⋅MSE [ (1−h2ii ) ]
,一般不要求计算,会直接给出
Di > 2 ∗
k/n indicate that the ith observation is highly likely to be an influential data point. 注意这里的系
数2 是乘以。
k = independent variables的个数
n = 样本点个数 (#observations)
where, t=1,2,3...
Choose: When a variable grows at a constant rate, a log-linear model is most appropriate. When the
variable increases over time by a constant amount, a linear trend model is most appropriate.
Limitations:
the assumptions underlying linear regression is that the residuals are uncorrelated with each other.
但当样本的残差之间是autocorrelation自相关时,不能用这个模型。
AutoRegressive(AR) models
When the dependent variable is regressed against one or more lagged values of itself, the resultant model is called
as an autoregressive model (AR).
用AR model的前提是time series being modeled is covariance stationary.
COV (x1 , x1+k ) = COV (x2 , x2+k ) 。The covariance between any two terms of the sequence depends only
on the relative position of the two terms and not on their absolute position. Constant and finite
covariance between values at any given lag.
备注:协方差是两个随机变量之间线性相关性的度量。它比较两个随机变量与其平均值(或预期)值的偏差。随
机变量X和Y的协方差公式: COV (X, Y ) = E(X − μX )E(Y − μY )
p indicates the number of lagged values that the autoregressive model will include as independent
variables.
Forecasting with AR model:地推式逐个计算,namely chain rule of forecasting.
Autocorrelation & Model Fit
When an AR model is correctly specified, the residual terms will not exhibit serial correlation
Test whether an AR model is correctly specified:每个样本点对应的residuals、计算lag1,2,3的
autocorrelation,然后test whether the autocorrelations are significantly different from zero:t-test看否显
著(也可画图看分布)
其中t-statistic is the estimated autocorrelation divided by the standard error. The standard error is
1/ n , n 就是观察点数据个数。不同lag 的standard Error一样,根据standard error可以求观察点个数。
Financial and economic time series inherently exhibit some form of instability or nonstationarity.金融环
境是动态变化的、不同时期模型的系数都不一样。
If a time series follows a random walk process (with a drift), the predicted value is equal to the value in the
previous period plus a random error term.
Random Walk xt = b0 + b1 xt−1 + ϵt
b1 = 1 , The time series is said to be have a unit root and will follow a random walk process.
a random walk (with/out a drift) is not covariance stationary. 因为the mean-reverting level is b0
1−b1
, 不是
常数 (the division of any number by zero is undefined)
Cointegration means that two time series are economically linked (related to the same macro variables) or
follow the same trend and that relationship is not expected to change
Penalized regressions:正则化防止过拟合
Least absolute shrinkage and selection operator (LASSO).
SVM, KNN,CART...
Ensemble and Random Forest
Other Models
Neural Networks
Deep Learning Networks (DLNs)
Reinforcement Learning (RL)
TF-IDF( term frequency–inverse document frequency)
Term Frequency TF (t, d) = ∑ ′ft,df ′ , is the relative frequency of term t within document d。就是term在文档
t ∈d
t ,d
中出现的次数,不是在数据集中出现的次数。分母是所有term在文档d中出现的次数之和。
注意TF calculation is at the document(sentence) level, not at collection level(all documents).
注意TF分很高的token往往是stop/common words, TF分很低的token往往是sparse terms(专有名词、地名
等)。TF分中间部分的tokens areimportant to the meaning of the text
Inverse Document Frequency IDF (t, D) = log N
∣{d∈D:t∈d}∣ 。就是term t出现在多个文档中除以文档的总个数N,然
后再求倒数inverse,最后取log平滑。
Comments
The data (text) curation step involves gathering relevant external text data via web services or programs that
extract raw content from a source.
the text preparation and wrangling step
involves cleansing, preprocessing, and converting the data into a structured format usable for model
training.
remove numbers, perform Stemming and lemmatization.
2, Economics
Cross Rate
Bid-Ask Spread: If the quote in the interbank USD/EUR spot market is 1.3649/1.3651
1 1
Rule2: 同汇率bid-ask互换, ( B
C )bid =
C
(B )ask
, (B
C )ask = ( B
C
)bid
1+R( days
360 )
Vt 就是先算forward contract到期是T的价值,然后再折现到现在时间点t的价值。
F P0 (T ) 合约最初0时刻签订时,在T时刻的执行价格(购买base currency的价格)
days 当前时间t距离合约到期日T的天数
R = 年化annualized interest rate of price currency.
注意:
合约的价值是按price currency定的, base currency是“标的商品”。
这里FP对应最初签订合约价格的long方、即购买价格, F Pt 则用t时刻卖出价格,分子为 F Pt (T ) − F P0 (T ) ;
F P0 (T ) − F Pt (T ) ;
Sx/y
1+ry
The word ‘covered’ in the context of covered interest parity means bound by arbitrage. Convered指
受套利约束,uncovered指不受套利约束
公式中是名义利率Nominal interest rate
Uncovered Interest Rate Parity
E(St ) 1+rx
S0
= 1+ry
哪种货币的利率高、未来就会相应的贬值。
If uncovered interest rate parity (and covered interest parity) holds, the forward rate is unbiased
predictor of future spot rate (i.e., forward rate parity holds)
One of the assumptions of uncovered interest rate parity is that investors are risk neutral.
Forward Rate Parity指 F = E(St )
= E(%ΔS)
Absolute PPP:两国的汇率值等于两国CPI的比值
The law of one price states that identical goods should have the same price in all locations.同样的产品
在不同地区的价格应该是一样的!
SA/B = CP IA / CP IB , CPI一揽子产品的加权价格
Relative PPP:两国汇率的的变化等于两国通胀率变化的差。
%ΔSA/B ≈ Inf lationA − Inf lationB ,
A国通胀率相对高、则相对于B国货币贬值。两国今年通胀率一样、则汇率不变。
Relative PPP states that changes in exchange rates should exactly offset the price effects of any
inflation differential between two countries.
Rreal A ≈ Rreal B Under real interest rate parity, real interest rates are assumed to converge across
different markets.
When Uncovered Interest Rate Parity and Purchasing Power Parity hold together, they illuminate a relationship
named real interest rate parity, which suggests that expected real interest rates represent expected adjustments
in the real exchange rate. This relationship generally holds strongly over longer terms and among emerging
market countries.
真实利率平价:汇率会随着真实利率而变动。
Balance-of-payments (BOP) accounting is a method used to keep track of transactions between a country
and its international trading partners. It includes {government, consumer, and business} transactions. The
BOP accounts reflect all payments and liabilities to foreigners as well as all payments and obligations
received from foreigners.
国际收支平衡表(BOP)应该为零,这是为了与复式簿记保持一致。一方面是经常账户,另一方面是资本和金融账
户,应该相互平衡。
current account:约等于 eXports - iMports
surplus盈余表示净出口大,defict赤字表示净进口大
financial(capital) account:looks at assets between countries.
financial account= inflows - outflows。表示对本国的net investment flows
outflows: 本国花钱购买外国资产、并持有。demestic purchases of foreign assets ==> Government-
owned assets abroad
inflows: 外国花钱购买本国资产、并持有。Foreign purchases of local assets ==> Foreign-owned
assets in the country
一个国家净出口挣的钱肯定会投资于外国资产(对应outflows)。(只是计量是用本国货币)
值> 0(inflows > outflows),表示外国净持有本国资产、利用国外的资金来满足本国需要。Thus, the
economy is using world savings to meet its local investment and consumption demands. It is a net
debtor to the rest of the world.
值< 0(inflows < outflows)表示净持有国外资产、向国外提供资金。That indicates the economy of this
country is a net creditor, providing funds to the world.
financial account (also known as the capital account) measures the flow of funds for debt and equity
investment into and out of the country.
current account + financial account = 0
Current Account Influences: Current account deficits lead to a depreciation of domestic currency via a
variety of mechanisms:
Flow supply/demand mechanism: 经常性账户赤字导致本国货币贬值,本国货币贬值导致进口商品变贵、出
口商品相对便宜,这会加剧经常性账户由赤字转向balance。
Portfolio balance mechanism.:一个国家Current Account surpluses 经常对应着capital account deficits
(这常常表明对国外其他几个国家的投资)。如果本国(investor country)decide to rebalance their
investment porolios, it can have a significant negave impact on the value of those investee country
currencies.
Debt sustainability mechanism:一个国家current account deficit对应着capital account surplus(by
borrowing from abroad,认为外国对本国的投资、本国欠国外外债). 如果赤字or外债相对GDP过大,外国投
资者质疑其还债能力的可持续性,则会导致该国货币的贬值。以此来减少进口、增加出口,缩小current
account deficit,缩小capital account surplus,达到一个正常债务水平。
Capital Account Influences:
1999年,美国麻省理工学院教授克鲁格曼在蒙代尔-弗莱明模型的基础上,结合对亚洲金融危机的实证分析,提
出了“不可能三角”(Impossible triangle/Impossible trinity theory)
independent monetary policy: 指本国的利率调整是否要一定要和 国外世界利率一致,不然就会有套利空
间(引起资本自由流动或者汇率不稳定)
中国:要货币政策的独立性、汇率的稳定性,有资本管制(没有实现资本的自由流动)
香港:要资本自由流动和汇率的稳定性,没有独立的货币政策
欧洲国家: 有stable exchange rate and free capital flows ,没有独立的货币政策
Monetary models只考虑货币政策对汇率的影响,不考虑财政政策的影响。
Pure monetary model: the PPP(购买力平价公式) holds at any point in time and output is held constant.
货币共计增加x%、物价增加x%,货币贬值x%
Does not take into account expectations about future monetary expansion or contraction.即货币供给未
来的增加不会影响当前的汇率值。
Dornbusch overshooting model(多恩布施 超调模型)
This model assumes that prices are sticky (inflexible) in the short term and, hence, do not immediately
reflect changes in monetary policy (in other words, PPP does not hold in the short term). 假设物价短期
内不会受货币政策的影响
该模型认为货币扩张政策,在短期内不仅会由于物价增加x%带来货币贬值x%,而且还会由于资本外流capital
outflows使得domestic currency贬值大于x%。
Portfolio balance approach takes a long-term view and evaluates the effects of a sustained fiscal deficit or
surplus on currency values.(不考虑货币政策对汇率的影响)
长期的财政扩张政策,在短期会因增加政府借款、增加利率,吸引国外投资本国货币,增加本国货币需求,进而导致
货币升值。但长期政府需要归还贷款,必须紧缩财政政策(利用税收还债)或者印钱(扩张货币政策)这都会导致汇
率贬值。
政府想要干预汇率一般需要相对的足够的外汇储备。
货币危机的几个警告信号:出口减少、固定汇率、外汇储备减少、通胀增加、货币供给快速增加、银行危机
Therefore, higher potential GDP growth implies higher real interest rates and higher real asset returns in
general.
ΔA
ΔY /Y = A + α ΔK
ΔL
K + (1 − α) L
Definations:
Quantity of labor = the size of the labor force * average hours worked。
Labor force = the number of working age (ages 16–64) people available to work, both employed and
unemployed.
Demographics人口统计数据(年龄分布等)
Labor force participation =Labor force / working age population
劳动力参与力 = 劳动力 / 适龄人口数量
Immigration
Average hours worked
Investmenting in human capital, physical capital, technologies, public infrustructure can increase economic
growth.
1、假设:人均收入大于可维持水平,就会生孩子、人口增长
2、因此人均GDP的增长是不可以永久的,因为人均GDP上升到生存水平以上、就会出现人口爆炸/增长。
3、Classical growth theory is not supported by empirical evidence.没有经验证据的支持
经济增长与当前技术水平、人均资本率capital-to-labor相关。
假设技术增长才会带来资本投资的增加。 assumes that capital investment will expand as technology
improves
2、Based on the Cobb-Douglas function discussed earlier, neoclassical growth theory states that:
θ
g∗ = 1−α
Regulating commerce:规范商业贸易,指政府出台法律法规
Regulating financial markets:主要监管 securities markets and financial institutions.
xx
Regulation of Financial Institutions
Prudential supervision refers to the monitoring and regulation of financial institutions to reduce
system-wide risks and to protect investors.
Antitrust Regulation
Regulators
government agencies
independent regulators: are given recognition by government agencies and have power to make rules and
enforce them. eg. self-regulating organizations (SROs) are also independently funded and, as such, are
politically independent.
Regulatory Interdependencies
Regulatory capture:指监管机构会采纳被监管机构的一些意见,或者一些监管规定会。来做出一些保护被监管机构
的利益(如限制新进入者、保护已有行业利润)
Regulatory capture is more likely to be a concern with Self-Regulating organizations (SROs) than with
government agencies. For example, regulatory capture is often cited as a concern with the
commercializaon of financial exchanges.与政府机构相比,监管捕获更可能成为 SRO 的担忧。 例如,监管捕
获经常被认为是金融交易商业化的一个问题。
Regulatory competion: 指不同juridictions的regulator互相竞争、provide the most business-friendly
regulatory environment,目的是吸引外国公司来该地区。
Regulatory arbitrage:监管套利。指不同地区的监管规则不一样(比如污染)或者同一地区监管条纹解释和经济行
为实质之间的区别。
Price mechanisms
Restricting or requiring certain activities.
Provision of public goods or financing of private projects
Comments
labor productivity = output per worker
capital-to-labor ratio = capital deepening
Covered interest parity is forced by arbitrage, which is not the case for uncovered interest rate parity. If the
forward rate is equal to the expected future spot rate, we say that the forward rate is an unbiased predictor of
the future spot rate: F = E(S1). In this special case, given that covered interest parity holds, uncovered interest
parity would also hold (and vice versa). In other words, if uncovered interest rate parity (and covered interest
parity) holds, the forward rate is unbiased predictor of future spot rate (i.e., forward rate parity holds).
building a group of auto and textile factories in the southern states 在南部各州建设一批汽车和纺织工厂
Low per capita GDP suggests that India may lack sufficient industrial and financial infrastructure to support
some types of industries.人均GDP较低表明印度可能缺乏足够的工业和金融基础设施来支持某些类型的产业。
Blackout periods are established by companies in response to concerns about insider trading. 为了应对内幕交易
的担忧,公司设立了封锁期。
Sengupta’s endorsement of an exchange that trades “pollution rights” is consistent with the Coase
theorem. The Coase theorem states that if an externality can be traded and there are no transaction costs, then
the allocation of property rights will be efficient and the resource allocation will not depend on the initial
assignment of property rights.