时间序列分析理论及R代码实现

第一章: 绪论
第二章: 基本概念
第三章：趋势
第四章: 平稳时间序列模型
第五章: 非平稳时间序列模型
第六章: 模型识别
第七章: 参数估计
第八章：模型诊断
第九章: 预测

第一章: 绪论

什么是时间序列？
和统计里的样本有什么区别？
研究目的？研究方法？
统计指标？分析软件？
时间序列分析步骤？
常用的ARIMA模型？

例1. 加利福尼亚州洛杉矶地区100多年来的年降水量时间序列图。

data(larain)
summary(larain)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.080   9.675  14.140  14.888  18.400  40.290

plot(larain, ylab='Inches', xlab='Year', type='o', main='图1-1 洛杉矶年降水量时序图')

plot(y=larain,x=zlag(larain),ylab='Inches',xlab='Previous Year Inches', main='图1-2 洛杉矶当年降水量与去年降水量散点图')

例2. 某化工过程中颜色属性的时间序列图。

data(color)
summary(color)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   63.00   70.00   76.00   74.89   79.00   87.00

plot(color, ylab='Color Property', xlab='Batch', type='l', main='图1-3. 某化工过程中颜色属性的时间序列图')
points(color, col='blue')

plot(y=color, x=zlag(color), ylab='Color Property', xlab='Prevous Batch Color Property', main='图1-4.当前颜色值与前期颜色值的散点图')

例3. 加拿大野兔丰度的时间序列图。

data(hare)
summary(hare)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   20.00   27.00   39.55   65.00   95.00

plot(hare, ylab='Abundance', xlab='Year', type='l', main='图1-5. 加拿大野兔丰度的时间序列图')
points(hare, col='blue')

plot(y=hare, x=zlag(hare), ylab='Abundance', xlab='Previous Year Abundance', main='图1-6.当年与上一年野兔丰度的散点图')

例4. 艾奥瓦州迪比克市月平均气温的时间序列图。

data(tempdub)
summary(tempdub)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40   27.32   48.60   46.27   63.12   74.00

plot(tempdub, ylab='Temperature', type='l', main='图1-7. 艾奥瓦州迪比克市月平均气温的时间序列图')
points(tempdub, col='blue')

plot(y=tempdub, x=zlag(tempdub,d=12), ylab='Temperature', xlab='Previous Year Temperature', main='图1-7’.当年与上一年月平均气温的散点图')

例5. 滤油器月销售量的时间序列图。

data(oilfilters)
summary(oilfilters)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1572    2314    3262    3388    4488    5862

plot(oilfilters, type='l', ylab='Sales', main='图1-8. 滤油器月销售量的时间序列图')
points(oilfilters, col='blue')

plot(y=oilfilters, x=zlag(oilfilters,d=12), ylab='Sales', xlab='Previous Year Sales', main='图1-8’.当年与上一年月销售量的散点图')

plot(oilfilters,type='l',ylab='Sales', main='图1-9. 以特殊符号绘制的滤油器月销售量')
points(y=oilfilters,x=time(oilfilters), pch=as.vector(season(oilfilters)))

习题1.（on P.7）

Question 3~5.

###  rnorm(n), rchisq(n,df=2), rt(n,df=5)
n=48;  
set.seed(12345); tsnorm1 = ts(rnorm(n),freq=1,start=1)
set.seed(1234); tsnorm2 = ts(rnorm(n),freq=1,start=1)
set.seed(123); tsnorm3 = ts(rnorm(n),freq=1,start=1)
set.seed(12); tsnorm4 = ts(rnorm(n),freq=1,start=1)
opar=par(mfrow=c(2,2)) 
plot(tsnorm1,type='o'); abline(h=0,col='red')
plot(tsnorm2,type='o'); abline(h=0,col='red')
plot(tsnorm3,type='o'); abline(h=0,col='red')
plot(tsnorm4,type='o'); abline(h=0,col='red')

par(opar)

Question 6.

data(tempdub)
#op <- par(bg = "light blue")
plot(tempdub,ylab='Temperature',type='l',col = "blue")
Month=c("J","A","S","O","N","D","J","F","M","A","M","J")
points(tempdub,pch=Month,col='red')

#points(tempdub,pch=as.vector(season(tempdub)),col='red')
#par(op)

第二章: 基本概念

均值函数，方差函数 \[ \mu_t=E(X_t), \sigma_t^2=Var(X_t). \]
自协方差函数

\[ \gamma_{t,s} = Cov(X_t,X_s) = E(X_t,X_s)-\mu_t\mu_s. \]

自相关函数

\[ \rho_{t,s} = Corr(X_t,X_s) =\frac {\gamma_{t,s}} {\sqrt{\gamma_{t,t} \gamma_{s,s}}}. \]

严(强)平稳性

任意有限维分布都与初始时间点无关，只与时间间隔相关。如果期望和方差都存在的话，那么我们就有 \[ X_t \overset{d} {=} X_s. \qquad \mu_t=\mu_s\equiv\mu, \qquad \sigma_t^2=\sigma_s^2\equiv\sigma^2\] \[ \left( X_{t_1},X_{t_2}\right) \overset{d} {=} \left(X_{t_1+k},X_{t_2+k}\right). \qquad \gamma_{t,s}=\gamma_{0,\left|t-s\right|}\] \[ \left(X_{t_1},X_{t_2},\cdots,X_{t_n}\right) \overset{d} {=} \left(X_{t_1+k},X_{t_2+k},\cdots,X_{t_n+k}\right).\]

Let \(\quad \gamma_k=Cov(X_t,X_{t+k}), \quad \text{then} \quad \rho_k=Corr(X_t,X_{t+k})=\frac {\gamma_{t,t+k}}{\sqrt{\gamma_{t,t}\gamma_{t+k,t+k}}}=\frac{\gamma_k}{\gamma_0}.\)

宽(弱)平稳性

均值函数和方差函数都是常数：\(\qquad \mu_t\equiv\mu, \qquad \sigma_t^2\equiv\sigma^2\)

协方差仅依赖于时间的滞后：\(\qquad \gamma_k=\gamma_{t,t+k}=\gamma_{0,k}.\)

Let \(\quad \gamma_k=Cov(X_t,X_{t+k}), \quad \text{then} \quad \rho_k=Corr(X_t,X_{t+k})=\frac {\gamma_{t,t+k}}{\sqrt{\gamma_{t,t}\gamma_{t+k,t+k}}}=\frac{\gamma_k}{\gamma_0}.\)

例1. 白噪声序列 {\(e_t, i.i.d.\)}, 平稳的 \[\begin{align} \gamma_k=\begin{cases} \sigma_e^2, &\text{if } k=0\\ 0, &\text{if } k>0 \end{cases} \qquad\qquad\qquad &\rho_k=\begin{cases} 1, &\text{if } k=0\\ 0, &\text{if } k>0\end{cases} \end{align}\]

n=100;
opar=par(mfrow=c(2,2)) 
plot(rnorm(n),xlab='time',type='o');abline(h=0,col='red')
plot(rchisq(n,2),xlab='time',type='o');abline(h=2,col='red')
plot(rt(n,5),xlab='time',type='o');abline(h=0,col='red')
plot(rexp(n),xlab='time',type='o');abline(h=1,col='red')

par(opar)

例2. 滑动平均序列 {\(X_t=\frac{e_t+e_{t-1}}{2}\)}, 平稳的 \[\begin{align} \gamma_k=\begin{cases} 0.5\cdot\sigma_e^2, &\text{if } k=0\\ 0.25\cdot\sigma_e^2, &\text{if } k=1\\ 0, &\text{if } k>1 \end{cases} \qquad\qquad\qquad &\rho_k=\begin{cases} 1, &\text{if } k=0\\ 0.5, &\text{if } k=1\\ 0, &\text{if } k>1\end{cases} \end{align}\]

n=100;
opar=par(mfrow=c(2,2)) 
tsnorm=rnorm(n)
tsMA1=ts((tsnorm+zlag(tsnorm))/2.0,freq=1,start=1)
plot(tsMA1,type='o',ylab='MA of N(0,1)'); 
abline(h=0,col='red')

tschisq=rchisq(n,2)
tsMA2=ts((tschisq+zlag(tschisq))/2.0,freq=1,start=1)
plot(tsMA2,type='o',ylab='MA of chisq(2)'); 
abline(h=2,col='red')

tst=rt(n,5)
tsMA3=ts((tst+zlag(tst))/2.0,freq=1,start=1)
plot(tsMA3,type='o',ylab='MA of t(5)'); 
abline(h=0,col='red')

tsexp=rexp(n)
tsMA4=ts((tsexp+zlag(tsexp))/2.0,freq=1,start=1)
plot(tsMA4,type='o',ylab='MA of exp(1)'); 
abline(h=1,col='red')

par(opar)

例3. 随机游动序列 {\(Y_t=e_1+e_2+\cdots+e_t\)}, 非平稳的 \[\mu_t=t\cdot\mu, \qquad\qquad \sigma_t^2=t\cdot\sigma_e^2.\]

\[\begin{align} \gamma_{t,s}&=Cov(e_1+e_2+\cdots+e_t,e_1+e_2+\cdots+e_s)\\ &=Var(e_1+e_2+\cdots+e_{min(t,s)})\\ &=min(t,s)\cdot \sigma_e^2 \end{align}\]

\[\rho_{t,s}=\frac{\gamma_{t,s}}{\sqrt{\gamma_{t,t}\gamma_{s,s}}}=\frac{min(t,s)\cdot \sigma_e^2}{\sqrt{t \sigma_e^2 \cdot s\sigma_e^2}}=\frac{\sqrt{min(t,s)}}{\sqrt{max(t,s)}}\]

data(rwalk)
n<-length(rwalk)
plot(rwalk,type='o',ylab='Random Walk', main = '随机游动的时间序列图1')

###生成随机游动过程方法1
Rwalk <- vector()
Rwalk[1] <- rnorm(1)
for (i in 2:n){
  set.seed(123+i*456)
  Rwalk[i] <- Rwalk[i-1] + rnorm(1)
}
ts_Rwalk <- ts(Rwalk,freq=1,start=1)

plot(ts_Rwalk,type='l',ylab='Random Walk',xlab='Time',main='随机游动的时间序列图2')
points(ts_Rwalk,col='blue')

###生成随机游动过程方法2：cumsum(rnorm(n))
opar=par(mfrow=c(2,2))  
plot(cumsum(rnorm(n)),type='o',xlab='Time');
plot(cumsum(rchisq(n,2)),type='o',xlab='Time');
plot(cumsum(rt(n,5)),type='o',xlab='Time');
plot(cumsum(rexp(n,1)),type='o',xlab='Time');

par(opar)

第三章：趋势

3.1 随机趋势

受随机扰动的影响，不随时间衰减；
在不同的模拟中，可能展现完全不同的趋势。

n=60;
opar=par(mfrow=c(2,2))  
plot(cumsum(rnorm(n)),type='o',xlab='Time');
plot(cumsum(rnorm(n)),type='o',xlab='Time');
plot(cumsum(rnorm(n)),type='o',xlab='Time');
plot(cumsum(rnorm(n)),type='o',xlab='Time');

par(opar)

3.2 确定趋势

例4. 艾奥瓦州迪比克市月平均气温的时间序列图。

data(tempdub)
plot(tempdub, ylab='Temperature', type='l', main='图1-7. 艾奥瓦州迪比克市月平均气温的时间序列图')
points(tempdub, col='blue')

均值模型： \(Y_t=\mu_t+X_t\)，其中\(E(X_t)=0.\)

随机扰动\(X_t\) 是零均值的平稳序列；
均值过程\(\mu_t\)随着时间\(t\)的变化，表现出特定的变化模式：

\[ \mu_t=\begin{cases} \mu_0, & \text{常函数} \\ \beta_0+\beta_1 t, & \text{线性函数} \\ \beta_0+\beta_1 t+\beta_2 t^2, & \text{二次函数} \\ \cdots， & \text{多项式函数} \\ \mu_{t-T}, & \text{周期函数} \\ \end{cases}\]

3.3 均值函数为常数

\[\quad\hat{\mu}=\bar{Y}, \qquad E(\hat{\mu})=\mu_0, \qquad \text{无偏估计量}\]

\({X_t}\)是平稳的时间序列，自协方差函数为\(\gamma_k\), 自相关函数为\(\rho_k\).

\[Var(\hat{\mu})=\frac{\gamma_0}{n} [1+2\sum_{k=1}^{n-1}(1-\frac{k}{n})\rho_k].\]

例1’. 白噪声序列 {\(X_t=e_t, i.i.d.\)}, 平稳的, \(Var(\hat{\mu})=\frac{\gamma_0}{n}.\)

\[\begin{gathered} \gamma_k=\begin{cases} \sigma_e^2, &\text{if } k=0\\ 0, &\text{if } k>0 \end{cases} \qquad\qquad\qquad \rho_k=\begin{cases} 1, &\text{if } k=0\\ 0, &\text{if } k>0\end{cases} \end{gathered}\]

例2’. 滑动平均序列{\(X_t=\frac{1}{2}e_t+\frac{1}{2}e_{t-1}\)}, 平稳的, \(Var(\hat{\mu})=\frac{\gamma_0}{n}[1+\frac{n-1}{n}]=\frac{\gamma_0}{n}(\frac{2n-1}{n})\simeq 2\frac{\gamma_0}{n}.\)

\[\begin{align} \gamma_k=\begin{cases} 0.5\cdot\sigma_e^2, &\text{if } k=0\\ 0.25\cdot\sigma_e^2, &\text{if } k=1\\ 0, &\text{if } k>1 \end{cases} \qquad\qquad\qquad &\rho_k=\begin{cases} 1, &\text{if } k=0\\ 0.5, &\text{if } k=1\\ 0, &\text{if } k>1\end{cases} \end{align}\]

\(\qquad\) 滑动平均序列 {\(X_t=e_t-\frac{1}{2}e_{t-1}\)}, 平稳的, \(Var(\hat{\mu})=\frac{\gamma_0}{n}[1-0.8(\frac{n-1}{n})]=\frac{\gamma_0}{n}(\frac{0.2n+0.8}{n})\simeq 0.2\frac{\gamma_0}{n}.\)

\[\begin{align} \gamma_k=\begin{cases} 1.25\cdot\sigma_e^2, &\text{if } k=0\\ -0.5\cdot\sigma_e^2, &\text{if } k=1\\ 0, &\text{if } k>1 \end{cases} \qquad\qquad\qquad &\rho_k=\begin{cases} 1, &\text{if } k=0\\ -0.4, &\text{if } k=1\\ 0, &\text{if } k>1\end{cases} \end{align}\]

例5. 一般的平稳序列，随着滞后的增加，自相关函数迅速衰减，\(\sum_{k=0}^{+\infty}|\rho_k| < \infty.\) 当\(n\)充分大的时候， \(Var(\hat{\mu}) \simeq \frac{\gamma_0}{n}[\sum_{k=0}^{+\infty}\rho_k].\)

\(\qquad\) 例如，对于所有的整数\(k\), \(\rho_k=\phi^{|k|}\), \(\phi\in(-1, +1).\) 那么当\(n\)充分大的时候，\(Var(\hat{\mu}) \simeq \frac{\gamma_0}{n}[1+\frac{2\phi}{1-\phi}]=\frac{(1+\phi)\gamma_0}{(1-\phi)n}.\)

\[ \qquad\]

\({X_t}\)是非平稳的时间序列，自协方差函数为\(\gamma_{t,s}\), 自相关函数为\(\rho_t,s\).

\[Var(\hat\mu)=\frac{1}{n^2}\sum_{i=1}^n\sum_{j=1}^n Cov(X_i,X_j)=\frac{1}{n^2}[\sum_{i=1}^n Var(X_i)+2\sum_{i=2}^n\sum_{j=1}^{i-1} \gamma_{i,j}]\]

例3’. 随机游动序列 {\(X_t=e_1+e_2+\cdots+e_t\)}, 非平稳的, \(\mu_t=t\cdot\mu, \quad \sigma_t^2=t\cdot\sigma_e^2.\)

\[\begin{align} \gamma_{t,s}&=Cov(e_1+e_2+\cdots+e_t,e_1+e_2+\cdots+e_s)\\ &=Var(e_1+e_2+\cdots+e_{min(t,s)})\\ &=min(t,s)\cdot \sigma_e^2 \end{align}\]

\[\rho_{t,s}=\frac{\gamma_{t,s}}{\sqrt{\gamma_{t,t}\gamma_{s,s}}}=\frac{min(t,s)\cdot \sigma_e^2}{\sqrt{t \sigma_e^2 \cdot s\sigma_e^2}}=\frac{\sqrt{min(t,s)}}{\sqrt{max(t,s)}}\]

\[\begin{align} Var(\hat{\mu})&=\frac{1}{n^2}\left [\sum_{i=1}^n i\sigma_e^2+2\sum_{i=2}^n\sum_{j=1}^{i-1} j\sigma_e^2\right ]\\ &=\frac{\sigma_e^2}{n^2}\left [\sum_{i=1}^n i+ 2\sum_{j=1}^{n-1}\sum_{i=j+1}^n j\right ]\\ &=\frac{\sigma_e^2}{n^2}\left [\sum_{i=1}^n i+ 2\sum_{j=1}^{n-1} (n-j) j\right ] \\ &=\frac{\sigma_e^2}{n^2}\left [(1+2n)\sum_{i=1}^n i- 2\sum_{j=1}^{n} j^2\right ] \\ &=\frac{\sigma_e^2}{n^2}\left [(1+2n)\frac{n(n+1)}{2}- 2 \frac{n(n+1)(2n+1)}{6}\right ] \\ &=\sigma_e^2 (2n+1) \frac{(n+1)}{6n} \end{align}\]

3.4 回归方法

\(\qquad\) {\(X_t\)}满足Gauss-Markov假设(零均值，等方差，不相关).

时间的线性趋势 \(\mu_t=\beta_0+\beta_1 t\)，

\(\qquad\) 最小二乘法估计：最小化 \(Q(\beta_0,\beta_1)=\sum_{t=1}^n [Y_t-(\beta_0+\beta_1 t)]^2\), 令偏导数为0，求解。

\[\begin{align} \hat{\beta_1} &= \frac{\sum_{t=1}^n t(Y_t-\bar{Y})}{\sum_{t=1}^n (t-\bar{t})^2}=\frac{\sum_{t=1}^n (t-\bar{t})(Y_t-\bar{Y})}{\sum_{t=1}^n (t-\bar{t})^2}\\ \hat{\beta_0} &= \bar{Y}-\hat{\beta_1}\bar{t} \end{align}\]

时间的二次函数 \(\mu_t=\beta_0+\beta_1 t +\beta_2 t^2\)

data(wages)
plot(wages,type='l',ylab='Wages',xlab='Time',main='Monthly wages')
points(wages,col='blue')

\(\qquad\) 时间的一次函数拟合

lm_wages=lm(wages~time(wages))
summary(lm_wages)

## 
## Call:
## lm(formula = wages ~ time(wages))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.23828 -0.04981  0.01942  0.05845  0.13136 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.490e+02  1.115e+01  -49.24   <2e-16 ***
## time(wages)  2.811e-01  5.618e-03   50.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08257 on 70 degrees of freedom
## Multiple R-squared:  0.9728, Adjusted R-squared:  0.9724 
## F-statistic:  2503 on 1 and 70 DF,  p-value: < 2.2e-16

plot(wages,type='l',ylab='Wages',xlab='Time',main='一次时间函数趋势')
points(wages,col='blue')
abline(lm_wages,col='red')

最小二乘估计的斜率\(\hat{\beta}_1=0.1341\) 和截距 \(\hat{\beta}_0=-1.008\). 图表3-2展示了随机游动并叠加了最小二乘回归趋势线。

\(\qquad\) 二次时间函数拟合

lm2_wages=lm(wages~time(wages)+I(time(wages)^2))
summary(lm2_wages)

## 
## Call:
## lm(formula = wages ~ time(wages) + I(time(wages)^2))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.148318 -0.041440  0.001563  0.050089  0.139839 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -8.495e+04  1.019e+04  -8.336 4.87e-12 ***
## time(wages)       8.534e+01  1.027e+01   8.309 5.44e-12 ***
## I(time(wages)^2) -2.143e-02  2.588e-03  -8.282 6.10e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05889 on 69 degrees of freedom
## Multiple R-squared:  0.9864, Adjusted R-squared:  0.986 
## F-statistic:  2494 on 2 and 69 DF,  p-value: < 2.2e-16

xfit <- time(wages)
yfit <- fitted(lm2_wages)
plot(wages,type='l',ylab='Wages',xlab='Time',main='二次时间函数趋势')
points(wages,col='blue')
lines(as.vector(xfit),as.vector(yfit),col='red')

周期性或季节性趋势

data(tempdub)
month.=season(tempdub) 
model2=lm(tempdub~month.-1) # -1 removes the intercept term 
summary(model2)

## 
## Call:
## lm(formula = tempdub ~ month. - 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2750 -2.2479  0.1125  1.8896  9.8250 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## month.January     16.608      0.987   16.83   <2e-16 ***
## month.February    20.650      0.987   20.92   <2e-16 ***
## month.March       32.475      0.987   32.90   <2e-16 ***
## month.April       46.525      0.987   47.14   <2e-16 ***
## month.May         58.092      0.987   58.86   <2e-16 ***
## month.June        67.500      0.987   68.39   <2e-16 ***
## month.July        71.717      0.987   72.66   <2e-16 ***
## month.August      69.333      0.987   70.25   <2e-16 ***
## month.September   61.025      0.987   61.83   <2e-16 ***
## month.October     50.975      0.987   51.65   <2e-16 ***
## month.November    36.650      0.987   37.13   <2e-16 ***
## month.December    23.642      0.987   23.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.419 on 132 degrees of freedom
## Multiple R-squared:  0.9957, Adjusted R-squared:  0.9953 
## F-statistic:  2569 on 12 and 132 DF,  p-value: < 2.2e-16

model3=lm(tempdub~month.) # intercept is automatically included so one month (Jan) is dropped
summary(model3)

## 
## Call:
## lm(formula = tempdub ~ month.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2750 -2.2479  0.1125  1.8896  9.8250 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       16.608      0.987  16.828  < 2e-16 ***
## month.February     4.042      1.396   2.896  0.00443 ** 
## month.March       15.867      1.396  11.368  < 2e-16 ***
## month.April       29.917      1.396  21.434  < 2e-16 ***
## month.May         41.483      1.396  29.721  < 2e-16 ***
## month.June        50.892      1.396  36.461  < 2e-16 ***
## month.July        55.108      1.396  39.482  < 2e-16 ***
## month.August      52.725      1.396  37.775  < 2e-16 ***
## month.September   44.417      1.396  31.822  < 2e-16 ***
## month.October     34.367      1.396  24.622  < 2e-16 ***
## month.November    20.042      1.396  14.359  < 2e-16 ***
## month.December     7.033      1.396   5.039 1.51e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.419 on 132 degrees of freedom
## Multiple R-squared:  0.9712, Adjusted R-squared:  0.9688 
## F-statistic: 405.1 on 11 and 132 DF,  p-value: < 2.2e-16

# first creates the first pair of harmonic functions and then fit the model
har.=harmonic(tempdub,1)##a matrix consisting of \cos(2k ?? t), \sin(2k ?? t), k=1,2,...,m, excluding any zero functions.
model4=lm(tempdub~har.)
summary(model4)

## 
## Call:
## lm(formula = tempdub ~ har.)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.1580  -2.2756  -0.1457   2.3754  11.2671 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      46.2660     0.3088 149.816  < 2e-16 ***
## har.cos(2*pi*t) -26.7079     0.4367 -61.154  < 2e-16 ***
## har.sin(2*pi*t)  -2.1697     0.4367  -4.968 1.93e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.706 on 141 degrees of freedom
## Multiple R-squared:  0.9639, Adjusted R-squared:  0.9634 
## F-statistic:  1882 on 2 and 141 DF,  p-value: < 2.2e-16

tempFit4 <- ts(fitted(model4),freq=12,start=c(1964,1))
plot(tempFit4,ylab='Temperature',type='l',
ylim=range(c(fitted(model4),tempdub))) # the ylim option ensures that the 
# y axis has a range that fits the raw data and the fitted values
points(tempdub, col='blue')
points(tempFit4,col='red')

3.5 回归估计的性质及残差分析

参考“多元线性回归分析”http://www.homepage.zjut.edu.cn/yjq/

随机项的估计和预测（残差） \[\hat X_t=Y_t-\hat Y_t\]

检验残差\(\hat X_t\)是否为白噪声序列：零均值，等方差，不相关（正态独立）。

# Exhibit 3.8
plot(y=rstudent(model3), x=as.vector(time(tempdub)), xlab='Time', ylab='Standardized Residuals', type='o')
abline(h=0, col='red')

# Exhibit 3.9
plot(y=rstudent(model3), x=as.vector(time(tempdub)), xlab='Time', ylab='Standardized Residuals', type='l')
points(y=rstudent(model3), x=as.vector(time(tempdub)), pch=as.vector(season(tempdub)),  col='blue')
abline(h=0, col='red')

# Exhibit 3.10
plot(y=rstudent(model3), x=as.vector(fitted(model3)), xlab='Fitted Trend Values',
 ylab='Standardized Residuals',type="n")
points(y=rstudent(model3), x=as.vector(fitted(model3)), pch=as.vector(season(tempdub)))

# Exhibit 3.11
hist(rstudent(model3), xlab='Standardized Residuals', main='')

# Exhibit 3.12
qqnorm(rstudent(model3), main='QQ plot')
qqline(rstudent(model3), col='red')

#Shapiro-Wilk检验: H_0 正态
shapiro.test(rstudent(model3))

## 
##  Shapiro-Wilk normality test
## 
## data:  rstudent(model3)
## W = 0.9929, p-value = 0.6954

#t检验: H_0 均值为0
t.test(residuals(model3))

## 
##  One Sample t-test
## 
## data:  residuals(model3)
## t = -1.1355e-15, df = 143, p-value = 1
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.5410872  0.5410872
## sample estimates:
##     mean of x 
## -3.108378e-16

假定样本平稳，序列具有不变的均值和方差，定义滞后\(k\)的样本自相关函数为 \[ r_k=\frac{\sum_{t=k+1}^n (Y_t-\overline Y)(Y_{t-k}-\overline Y)}{\sum_{t=1}^n (Y_t-\overline Y)^2}, \qquad k=1,2,\cdots\] \(r_k\)相对滞后\(k\)的图形被称作自相关函数图。

# Exhibit 3.13
acf(rstudent(model3), main='ACF')

# 游程检验：H_0 独立
runs(rstudent(model3))

## $pvalue
## [1] 0.216
## 
## $observed.runs
## [1] 65
## 
## $expected.runs
## [1] 72.875
## 
## $n1
## [1] 69
## 
## $n2
## [1] 75
## 
## $k
## [1] 0

例3”. 随机游动过程，错误地当成线性时间趋势处理：

data(rwalk)
plot(rwalk,type='o',ylab='rWalk',main='图表3-2* 随机游动')

model1=lm(rwalk~time(rwalk))
summary(model1)

## 
## Call:
## lm(formula = rwalk ~ time(rwalk))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.70045 -0.79782  0.06391  0.63064  2.22128 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.007888   0.297245  -3.391  0.00126 ** 
## time(rwalk)  0.134087   0.008475  15.822  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.137 on 58 degrees of freedom
## Multiple R-squared:  0.8119, Adjusted R-squared:  0.8086 
## F-statistic: 250.3 on 1 and 58 DF,  p-value: < 2.2e-16

最小二乘估计的斜率\(\hat{\beta}_1=0.1341\) 和截距 \(\hat{\beta}_0=-1.008\). 图表3-2展示了随机游动并叠加了最小二乘回归趋势线。

plot(rwalk,type='o',ylab='rWalk',main='图表3-2 随机游动与线性时间趋势')
abline(model1, col='red') # add the fitted least squares line

# Exhibit 3.14
plot(y=rstudent(model1), x=as.vector(time(rwalk)), ylab='Standardized Residuals',   xlab='Time', type='o')
abline(h=0, col='red')

# Exhibit 3.15
plot(y=rstudent(model1),x=fitted(model1),ylab='Standardized Residuals',
xlab='Fitted Trend Values',type='p')

# Exhibit 3.16
acf(rstudent(model1),main='ACF')

# 游程检验：H_0 独立
runs(rstudent(model1))

## $pvalue
## [1] 0.0266
## 
## $observed.runs
## [1] 22
## 
## $expected.runs
## [1] 30.96667
## 
## $n1
## [1] 29
## 
## $n2
## [1] 31
## 
## $k
## [1] 0

习题3.7 （on P.38)

data(winnebago); winnebago

##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1966                                                     61   48
## 1967   53   78   75   58  146  193  124  120  134   99  130  166
## 1968  168  267  314  432  355  384  232  235  293  242  248  236
## 1969  209  358  352  406  562  389  416  493  409  328  222  195
## 1970  156  439  671  874  558  621  628  652  495  344  405  586
## 1971  403  700  837 1224 1117 1214  762  846 1228  937 1396 1174
## 1972  628 1753

n <- length(winnebago);  n

## [1] 64

Time_winn <- time(winnebago); Time_winn

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1966                                                                        
## 1967 1967.000 1967.083 1967.167 1967.250 1967.333 1967.417 1967.500 1967.583
## 1968 1968.000 1968.083 1968.167 1968.250 1968.333 1968.417 1968.500 1968.583
## 1969 1969.000 1969.083 1969.167 1969.250 1969.333 1969.417 1969.500 1969.583
## 1970 1970.000 1970.083 1970.167 1970.250 1970.333 1970.417 1970.500 1970.583
## 1971 1971.000 1971.083 1971.167 1971.250 1971.333 1971.417 1971.500 1971.583
## 1972 1972.000 1972.083                                                      
##           Sep      Oct      Nov      Dec
## 1966                   1966.833 1966.917
## 1967 1967.667 1967.750 1967.833 1967.917
## 1968 1968.667 1968.750 1968.833 1968.917
## 1969 1969.667 1969.750 1969.833 1969.917
## 1970 1970.667 1970.750 1970.833 1970.917
## 1971 1971.667 1971.750 1971.833 1971.917
## 1972

#(a).时序图
plot(winnebago, type='l', ylab='Monthly Sale', xlab='Time', main='图1. 时序图(a)')
points(winnebago, col='blue', pch=2); abline(h=0, col='red')

#(b).时间的一次函数拟合
model_1=lm(winnebago~Time_winn)
summary(model_1)

## 
## Call:
## lm(formula = winnebago ~ Time_winn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -419.58  -93.13  -12.78   94.96  759.21 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -394885.68   33539.77  -11.77   <2e-16 ***
## Time_winn       200.74      17.03   11.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 209.7 on 62 degrees of freedom
## Multiple R-squared:  0.6915, Adjusted R-squared:  0.6865 
## F-statistic: 138.9 on 1 and 62 DF,  p-value: < 2.2e-16

abline(model_1, col='red')

### y_hat=-394885.68+200.74 * Time

Res_winn <- rstudent(model_1) ##标准化残差
plot(Res_winn, type='l', ylab='Standarized Residual in (b)', xlab='Time', main='图2. 残差图(b))')
points(Res_winn, col='blue', pch=3); abline(h=0, col='red')

#(c).对数变换之后，时间的一次函数拟合
winn_log <- log(winnebago)  #数据对数化
plot(winn_log, type='l', ylab='Log(Monthly Sale)', xlab='Time', main='图3. 对数化之后时序图(c)')
points(winn_log, col='blue', pch=6); abline(h=0, col='red')

#(d).对数变换+一次函数拟合
model_2=lm(winn_log~Time_winn)
summary(model_2)

## 
## Call:
## lm(formula = winn_log ~ Time_winn)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03669 -0.20823  0.04995  0.25662  0.86223 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -984.93878   62.99472  -15.63   <2e-16 ***
## Time_winn      0.50306    0.03199   15.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3939 on 62 degrees of freedom
## Multiple R-squared:  0.7996, Adjusted R-squared:  0.7964 
## F-statistic: 247.4 on 1 and 62 DF,  p-value: < 2.2e-16

abline(model_2, col='red')

### Log(y_hat)=-984.93878+ 0.50306 * Time

Res_winn_log <- rstudent(model_2) ##标准化残差
plot(Res_winn_log, type='l', ylab='Residual in log (d)', xlab='Time', main='图4. 对数变换之后的残差图(d))')
points(Res_winn_log, col='blue', pch=7); abline(h=0, col='red')

#(e)季节性+对数变换+一次函数拟合
month_winn <- season(winnebago); month_winn

##  [1] November  December  January   February  March     April     May      
##  [8] June      July      August    September October   November  December 
## [15] January   February  March     April     May       June      July     
## [22] August    September October   November  December  January   February 
## [29] March     April     May       June      July      August    September
## [36] October   November  December  January   February  March     April    
## [43] May       June      July      August    September October   November 
## [50] December  January   February  March     April     May       June     
## [57] July      August    September October   November  December  January  
## [64] February 
## 12 Levels: January February March April May June July August ... December

plot(winn_log, type='l', ylab='Log(Monthly Sale)', xlab='Time', main='图5. 季节性+对数化之后时序图(e)')
points(winn_log, col='blue', pch=as.vector(month_winn )); abline(h=0, col='red')

model_3=lm(winn_log~month_winn+Time_winn)
summary(model_3)

## 
## Call:
## lm(formula = winn_log ~ month_winn + Time_winn)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.92501 -0.16328  0.03344  0.20757  0.57388 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -997.33061   50.63995 -19.695  < 2e-16 ***
## month_winnFebruary     0.62445    0.18182   3.434 0.001188 ** 
## month_winnMarch        0.68220    0.19088   3.574 0.000779 ***
## month_winnApril        0.80959    0.19079   4.243 9.30e-05 ***
## month_winnMay          0.86953    0.19073   4.559 3.25e-05 ***
## month_winnJune         0.86309    0.19070   4.526 3.63e-05 ***
## month_winnJuly         0.55392    0.19069   2.905 0.005420 ** 
## month_winnAugust       0.56989    0.19070   2.988 0.004305 ** 
## month_winnSeptember    0.57572    0.19073   3.018 0.003960 ** 
## month_winnOctober      0.26349    0.19079   1.381 0.173300    
## month_winnNovember     0.28682    0.18186   1.577 0.120946    
## month_winnDecember     0.24802    0.18182   1.364 0.178532    
## Time_winn              0.50909    0.02571  19.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3149 on 51 degrees of freedom
## Multiple R-squared:  0.8946, Adjusted R-squared:  0.8699 
## F-statistic: 36.09 on 12 and 51 DF,  p-value: < 2.2e-16

lines(y=as.vector(fitted(model_3)), x=as.vector(Time_winn), col='red')
points(y=as.vector(fitted(model_3)), x=as.vector(Time_winn), col='Purple', pch=as.vector(month_winn ))

Res1_winn_log <- rstudent(model_3) ##标准化残差
plot(Res1_winn_log, type='l', ylab='Residual in log (f)', xlab='Time', main='图6. 季节性对数变换之后的残差图(f))')
points(Res1_winn_log, col='blue', pch=as.vector(month_winn ))
abline(h=0, col='red')

#(b).游程检验：独立性
runs(Res1_winn_log)

## $pvalue
## [1] 0.000243
## 
## $observed.runs
## [1] 18
## 
## $expected.runs
## [1] 32.71875
## 
## $n1
## [1] 29
## 
## $n2
## [1] 35
## 
## $k
## [1] 0

##pvalue  0.000243  拒绝“残差是独立的”

#(c).自相关性
acf(Res1_winn_log,main='图7.季节性+对数变换+一次函数拟合之后的残差自相关图' )  #自相关图

#(d).正态性
#QQ图
qqnorm(Res1_winn_log, main='图8.季节性+对数变换+一次函数拟合之后的残差QQ图' )  #QQ图
qqline(Res1_winn_log, col='blue'); abline(h=0, col='red')

#直方图
hist(Res1_winn_log , xlab='Standardized Residuals_log_2', main='图9. 季节性+对数变换+一次函数拟合之后的残差直方图')

#Shapiro-Wilk检验: 正态性
shapiro.test(Res1_winn_log )

## 
##  Shapiro-Wilk normality test
## 
## data:  Res1_winn_log
## W = 0.97035, p-value = 0.1262

##p-value = 0.1262 不能拒绝来自正态总体的假设

第四章: 平稳时间序列模型

4.1 滑动平均过程

\[ Y_t=e_t-\theta_1 e_{t-1}-\theta_2 e_{t-2}-\cdots-\theta_q e_{t-q}, \qquad\text{ 其中 } e_t\sim WN(0,\sigma_e^2). \]

当\(q=1\)时，MA(1)过程： \(Y_t=e_t-\theta e_{t-1}.\)

\(\qquad\)均值 \(E(Y_t)=0\), 方差 \(Var(Y_t)=(1+\theta^2)\sigma_e^2\), 协方差及自相关函数 \[\begin{gathered} \begin{align} Cov(Y_t,Y_{t-1})&=Cov(e_t-\theta e_{t-1},e_{t-1}-\theta e_{t-2})=-\theta \sigma_e^2. \\ Cov(Y_t,Y_{t-2})&=Cov(e_t-\theta e_{t-1},e_{t-2}-\theta e_{t-3})=0.\\ Cov(Y_t,Y_{t-k})&=Cov(e_t-\theta e_{t-1},e_{t-k}-\theta e_{t-k-1})=0. \text{ if } k>2.\\ \end{align} \qquad\qquad \rho_k= \begin{cases} 1, &\text{ if }k=0\\ -\frac{\theta}{1+\theta^2}, &\text{ if }k=1\\ 0, &\text{ if }k\geq 2\\ \end{cases} \end{gathered}\]

\(\qquad\) 结论：MA(1)是平稳的，自相关函数在滞后 \(1\) 阶之后“截尾”。

theta <- seq(-1,1,by=0.1)
rho_1 <- (-theta)/(1+theta^2)
data.frame(theta=theta, rho_1=rho_1)

##    theta      rho_1
## 1   -1.0  0.5000000
## 2   -0.9  0.4972376
## 3   -0.8  0.4878049
## 4   -0.7  0.4697987
## 5   -0.6  0.4411765
## 6   -0.5  0.4000000
## 7   -0.4  0.3448276
## 8   -0.3  0.2752294
## 9   -0.2  0.1923077
## 10  -0.1  0.0990099
## 11   0.0  0.0000000
## 12   0.1 -0.0990099
## 13   0.2 -0.1923077
## 14   0.3 -0.2752294
## 15   0.4 -0.3448276
## 16   0.5 -0.4000000
## 17   0.6 -0.4411765
## 18   0.7 -0.4697987
## 19   0.8 -0.4878049
## 20   0.9 -0.4972376
## 21   1.0 -0.5000000

plot(theta, rho_1, col='blue', type='l', ylab=expression(rho[1]), xlab=expression(theta))
abline(h=0.5, lty=4); abline(h=-0.5, lty=4)

例如： MA(1)序列\(\theta=-0.9, \rho_1=0.4972\), 和\(\theta=0.9, \rho_1=-0.4972\)。

# Exhibit 4.2
data(ma1.2.s)  #theta=-0.9
plot(ma1.2.s, ylab=expression(Y[t]), type='l', main=expression(theta==-0.9))
points(ma1.2.s, col='blue'); abline(h=0, col='red')

# Exhibit 4.3
plot(y=ma1.2.s, x=zlag(ma1.2.s), col='blue', ylab=expression(Y[t]), xlab=expression(Y[t-1]), type='p', main=expression(theta==-0.9))

# Exhibit 4.4
plot(y=ma1.2.s, x=zlag(ma1.2.s,2), col='blue', ylab=expression(Y[t]), xlab=expression(Y[t-2]), type='p', main=expression(theta==-0.9))

# Exhibit 4.5
data(ma1.1.s)
plot(ma1.1.s,ylab=expression(Y[t]),type='o'); abline(h=0, col='red')

# Exhibit 4.6
plot(y=ma1.1.s, x=zlag(ma1.1.s), ylab=expression(Y[t]), xlab=expression(Y[t-1]),type='p')

# Exhibit 4.7
plot(y=ma1.1.s,x=zlag(ma1.1.s,2),ylab=expression(Y[t]),xlab=expression(Y[t-2]),type='p')

# An MA(1) series of length n=100 with MA coefficient equal to -0.9 and 0.9 respectively.
set.seed(12345) 
y1=arima.sim(model=list(ma=-c(-0.9)),n=100)
y2=arima.sim(model=list(MA=-c(0.9)), n=100)
# Note that R uses the plus convention in the MA model formula so the additional minus sign.

当\(q=2\)时，MA(2)过程： \(Y_t=e_t-\theta_1 e_{t-1}-\theta_2 e_{t-2}.\)

\(\qquad\)均值 \(E(Y_t)=0\), 方差 \(Var(Y_t)=(1+\theta_1^2+\theta_2^2)\sigma_e^2\), 协方差及自相关函数 \[\begin{gathered} \begin{align} Cov(Y_t,Y_{t-1})&=Cov(e_t-\theta_1 e_{t-1}-\theta_2 e_{t-2},e_{t-1}-\theta_1 e_{t-2}-\theta_2 e_{t-3})=(-\theta_1+\theta_1\theta_2) \sigma_e^2. \\ Cov(Y_t,Y_{t-2})&=Cov(e_t-\theta_1 e_{t-1}-\theta_2 e_{t-2},e_{t-2}-\theta_1 e_{t-3}-\theta_2 e_{t-4})=-\theta_2 \sigma_e^2.\\ Cov(Y_t,Y_{t-k})&=Cov(e_t-\theta_1 e_{t-1}-\theta_2 e_{t-2},e_{t-k}-\theta_1 e_{t-k-1}-\theta_2 e_{t-k-2})=0. \text{ if } k\geq 3.\\ \end{align} \qquad \rho_k= \begin{cases} 1, &\text{ if }k=0\\ \frac{-\theta_1+\theta_1\theta_2}{1+\theta_1^2+\theta_2^2}, &\text{ if }k=1\\ \frac{-\theta_2}{1+\theta_1^2+\theta_2^2}, &\text{ if }k=2\\ 0, &\text{ if }k\geq 3\\ \end{cases} \end{gathered}\]

\(\qquad\) 结论：MA(2)是平稳的，自相关函数在滞后 \(2\) 阶之后“截尾”。

例如： MA(2)序列\(\theta_1=1, \theta_2=-0.6\)。\(\rho_1=-0.678,\rho_2=0.254.\)

data(ma2.s)

opar=par(mfrow=c(2,2))
# Exhibit 4.8
plot(ma2.s, ylab=expression(Y[t]),type='o'); abline(h=0, col='red')
# Exhibit 4.9
plot(y=ma2.s, x=zlag(ma2.s), ylab=expression(Y[t]), xlab=expression(Y[t-1]), type='p')
# Exhibit 4.10
plot(y=ma2.s, x=zlag(ma2.s,2), ylab=expression(Y[t]), xlab=expression(Y[t-2]), type='p')
# Exhibit 4.11
plot(y=ma2.s,x=zlag(ma2.s,3),ylab=expression(Y[t]),xlab=expression(Y[t-3]),type='p')

par(opar)

# An MA(2) series with MA coefficients equal to 1 and -0.6 and of length n=100.
y=arima.sim(model=list(ma=-c(1, -0.6)), n=100)
# Note that R uses the plus convention in the MA model formula so the additional minus sign. 
plot(y, ylab=expression(Y[t]), type='o'); abline(h=0, col='red')

MA(q)过程： \(Y_t=e_t-\theta_1 e_{t-1}-\theta_2 e_{t-2}-\cdots-\theta_q e_{t-q}\)

\(\qquad\)均值 \(E(Y_t)=0\), 方差 \(Var(Y_t)=(1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2)\sigma_e^2\), 协方差及自相关函数 \[\begin{gathered} Cov(Y_t,Y_{t-k})=\begin{cases} \sigma_e^2 (-\theta_k+\sum_{i=1}^{q-k}\theta_i\theta_{i+k}), &\text{ if } 1\leq k\leq q\\ 0. \text{ if } k> q.\\ \end{cases} \qquad\qquad \rho_k= \begin{cases} 1, &\text{ if }k=0\\ \frac{-\theta_k+\sum_{i=1}^{q-k}\theta_i\theta_{i+k}}{1+\sum_{i=1}^q\theta_i^2}, &\text{ if }1\leq k\leq q\\ 0, &\text{ if }k > q\\ \end{cases} \end{gathered}\]

\(\qquad\) 结论：MA(q)是平稳的，自相关函数在滞后 \(q\) 阶之后“截尾”。

4.2 一般线性过程

\(\qquad \{Y_t\}\) 表示观测到的时间序列，\(\{e_t\}\) 表示未观测到的白噪声。 \[Y_t=\psi_0e_t+\psi_1 e_{t-1}+\psi_2 e_{t-2}+\cdots=\sum_{i=0}^{\infty}\psi_i e_{t-i}.\]

期望为零 \(E(Y_t)=0.\) 一般性假设\(\psi_0=1.\)
方差有限 \(Var(Y_t)=\sigma_e^2(\sum_{i=0}^{\infty}\psi_i^2)<\infty.\)
一阶协方差

\[\begin{align} Cov(Y_t,Y_{t-1}) &= Cov(\psi_0e_t+\psi_1 e_{t-1}+\psi_2 e_{t-2}+\cdots, \psi_0e_{t-1}+\psi_1 e_{t-2}+\psi_2 e_{t-3}+\cdots ) \\ &= (\psi_0\psi_1+\psi_1\psi_2+\psi_2\psi_3+\cdots)\sigma_e^2=\sigma_e^2(\sum_{i=0}^{\infty}\psi_i\psi_{i+1}) \end{align}\]

二阶协方差

\[\begin{align} Cov(Y_t,Y_{t-2}) &= Cov(\psi_0e_t+\psi_1 e_{t-1}+\psi_2 e_{t-2}+\cdots, \psi_0e_{t-2}+\psi_1 e_{t-3}+\psi_2 e_{t-4}+\cdots ) \\ &= (\psi_0\psi_2+\psi_1\psi_3+\psi_2\psi_4+\cdots)\sigma_e^2=\sigma_e^2(\sum_{i=0}^{\infty}\psi_i\psi_{i+2}) \end{align}\]

\(k\)阶协方差

\[Cov(Y_t,Y_{t-k}) =\sigma_e^2(\sum_{i=0}^{\infty}\psi_i\psi_{i+k}),\qquad k\geq0.\]

\(k\)阶自相关函数

\[Corr(Y_t,Y_{t-k})=\frac{\sum_{i=0}^{\infty}\psi_i\psi_{i+k}}{\sum_{i=0}^{\infty}\psi_i^2}, \quad k\geq 0.\]

例如，一般线性过程\(\psi_j=\phi^j, |\phi|<1.\)

\[Y_t=e_t+\phi e_{t-1}+\phi^2 e_{t-2}+\cdots, \qquad |\phi|<1.\]

\(\qquad\)- 期望为零 \(E(Y_t)=0\), 方差有限 \(Var(Y_t)=\sigma_e^2(\sum_{i=0}^{\infty}\phi^{2i})=\frac{1}{1-\phi^2}\sigma_e^2.\)

\(\qquad\)- \(k\)阶协方差 \(Cov(Y_t,Y_{t-k})=\sigma_e^2(\sum_{i=0}^{\infty}\phi^{i}\phi^{i+k})=\frac{\phi^k}{1-\phi^2}\sigma_e^2.\)

\(\qquad\)- \(k\)阶自相关函数 \(Corr(Y_t,Y_{t-k})=\phi^k, \quad k\geq 0.\)

4.3 自回归过程

\[ Y_t=\phi_1 Y_{t-1}+\phi_2 Y_{t-2}+\cdots+\phi_p Y_{t-p}+e_t, \qquad\text{ 其中 } e_t\sim WN(0,\sigma_e^2). \]

\(e_t\) 与 \(\{Y_{t-1},Y_{t-2},\cdots,Y_1\}\) 是不相关的。
当\(p=1\)时，AR(1)过程： \(Y_t=\phi Y_{t-1}+e_t.\)

\(\qquad\) 如果\({Y_t}\)序列平稳，那么

\(\qquad\)- 均值 \(\mu=\frac{0}{1-\phi}=0\), 因为\(E(Y_t)=\phi E(Y_{t-1})+E(e_t)\);

\(\qquad\)- 方差 \(\gamma_0=\frac{\sigma_e^2}{1-\phi^2}\geq 0\), 因为\(Var(Y_t)=\phi^2 Var(Y_{t-1})+Var(e_t)\);

\(\qquad\quad\) ①. 平稳性的必要条件是 \(|\phi|<1.\)

\(\qquad\)- \(k\)阶协方差 \[\begin{align} Cov(Y_t,Y_{t-k})&=Cov(\phi Y_{t-1}+e_t, Y_{t-k}) \\ \gamma_k &= \phi \gamma_{k-1} \\ &= \phi\cdot\phi \gamma_{k-2}=\cdots\\ &=\phi^k \gamma_0 \\ \end{align} \]

\(\qquad\)- 自相关函数 \(\qquad \rho_k=\phi^k, k=0,1,2,\cdots\)

\(\qquad\)- AR(1)的一般线性表示

\[\begin{align} &&Y_t&=\phi Y_{t-1}+e_t\\ \phi \cdot &&Y_{t-1}&=\phi Y_{t-2}+e_{t-1} &&\cdot\phi\\ \phi^2 \cdot &&Y_{t-2}&=\phi Y_{t-3}+e_{t-2} &&\cdot\phi^2\\ &&&\cdots\\ \phi^{k-1} \cdot &&Y_{t-(k-1)}&=\phi Y_{t-k}+e_{t-(k-1)} &&\cdot\phi^{k-1}\\ \end{align}\]

\(\qquad\qquad\)求和可得 \(Y_t=e_t+\phi e_{t-1}+\phi^2 e_{t-2}+\cdots+\phi^{k-1} e_{t-(k-1)}+\phi^k Y_{t-k}.\)

\(\qquad\qquad\)如果\(|\phi|<1\), 并且\(k\rightarrow\infty\), 那么 \[Y_t=e_t+\phi e_{t-1}+\phi^2 e_{t-2}+\phi^3 e_{t-3}+\cdots\]

\(\qquad\quad\) ②. 平稳性的充分条件是 \(|\phi|<1.\)

\(\qquad\) 结论：AR(1)是平稳的充分必要条件是\(|\phi|<1\)，自相关函数\(\rho_k=\phi^k\)随时滞\(k\)指数衰减。

x<-seq(1:12);
y1<-0.9^x;
y2<-0.4^x;
y3<-(-0.8)^x;
y4<-(-0.5)^x;
opar=par(mfrow=c(2,2))
plot(x,y1,type='h', ylim=c(-0.1,0.9), xlab='Lag', ylab=expression(rho[k]), main=expression(phi==0.9));
points(x,y1,col='blue',cex=0.5); abline(h=0,col='red');
plot(x,y2,type='h', ylim=c(-0.1,0.9),  xlab='Lag', ylab=expression(rho[k]), main=expression(phi==0.4));
points(x,y2,col='blue',cex=0.5); abline(h=0,col='red');
plot(x,y3,type='h', ylim=c(-0.9,0.9),  xlab='Lag', ylab=expression(rho[k]), main=expression(phi==-0.8));
points(x,y3,col='blue',cex=0.5); abline(h=0,col='red');
plot(x,y4,type='h', ylim=c(-0.9,0.9),  xlab='Lag', ylab=expression(rho[k]), main=expression(phi==-0.5));
points(x,y4,col='blue',cex=0.5); abline(h=0,col='red');

par(opar)

例如：AR(1)序列 \(\phi=0.9, \rho_1=0.9, \rho_2=0.81, \rho_3=0.729\cdots\)

data(ar1.s)

opar=par(mfrow=c(2,2))
# Exhibit 4.13
plot(ar1.s,ylab=expression(Y[t]),type='o')
# Exhibit 4.14
plot(y=ar1.s,x=zlag(ar1.s),ylab=expression(Y[t]),xlab=expression(Y[t-1]),type='p')
# Exhibit 4.15
plot(y=ar1.s,x=zlag(ar1.s,2),ylab=expression(Y[t]),xlab=expression(Y[t-2]),type='p')
# Exhibit 4.16
plot(y=ar1.s,x=zlag(ar1.s,3),ylab=expression(Y[t]),xlab=expression(Y[t-3]),type='p')

# An AR(1) series with AR coefficient equal to 0.98 and of length n=100.
set.seed(12345); y1=arima.sim(model=list(ar=c(0.98)),n=100)
set.seed(1234); y2=arima.sim(model=list(ar=c(-0.6)),n=100)
set.seed(54321); y3=arima.sim(model=list(ar=c(0.3)),n=100)
set.seed(4321); y4=arima.sim(model=list(ar=c(-0.1)),n=100)
# Note that the R convention for the AR model formula is same as the book.

#opar=par(mfrow=c(2,2))
plot(y1,type='o', xlab='Time', ylab='AR(1)', main=expression(phi==0.98));
plot(y2,type='o', xlab='Time', ylab='AR(1)', main=expression(phi==-0.6));
plot(y3,type='o', xlab='Time', ylab='AR(1)', main=expression(phi==0.3));
plot(y4,type='o', xlab='Time', ylab='AR(1)', main=expression(phi==-0.1));

acf(y1, xlab='Lag', ylab='ACF', main=expression(phi==0.98));
acf(y2, xlab='Lag', ylab='ACF', main=expression(phi==-0.6));
acf(y3, xlab='Lag', ylab='ACF', main=expression(phi==0.3));
acf(y4, xlab='Lag', ylab='ACF', main=expression(phi==-0.1));

par(opar)

当\(p=2\)时，AR(2)过程： \(Y_t=\phi_1 Y_{t-1}+\phi_2 Y_{t-2}+e_t.\)

\(\qquad\) 如果\({Y_t}\)序列平稳，那么

\(\qquad\)- 均值 \(\mu=\frac{0}{1-\phi_1-\phi_2}=0\), 因为\(E(Y_t)=\phi_1 E(Y_{t-1})+\phi_2 E(Y_{t-2})+E(e_t)\);

\(\qquad\)- 方差 \(Var(Y_t)=\phi_1^2 Var(Y_{t-1})+\phi_2^2 Var(Y_{t-2})+Var(e_t)+2\phi_1\phi_2 Cov(Y_{t-1},Y_{t-2})\), 则

\[(1-\phi_1^2-\phi_2^2)\gamma_0=2\phi_1\phi_2\gamma_1+\sigma_e^2.\]

\(\qquad\)- 一阶协方差 \((1-\phi_2)\gamma_1 = \phi_1 \gamma_0\), 因为 \[\begin{align} Cov(Y_t,Y_{t-1})&=Cov(\phi_1 Y_{t-1}+\phi_2 Y_{t-2}+e_t, Y_{t-1}) \\ &=\phi_1 Var(Y_{t-1})+\phi_2 Cov(Y_{t-1},Y_{t-2}) \\ \end{align} \]

\(\qquad\qquad\)解方程组可得 \[\begin{cases} \gamma_0&=&\sigma_e^2\frac{1-\phi_2}{(1+\phi_2)[(1-\phi_2)^2-\phi_1^2]}\\ \gamma_1&=&\sigma_e^2\frac{\phi_1}{(1+\phi_2)[(1-\phi_2)^2-\phi_1^2]} \end{cases}\]

\(\qquad\)- 一阶自相关函数 \(\rho_1=\frac{\phi_1}{1-\phi_2}\)

\(\qquad\)- \(k\)阶协方差/自相关函数 \[\begin{align} &&Cov(Y_t,Y_{t-k})&=Cov(\phi_1 Y_{t-1}+\phi_2 Y_{t-2}+e_t, Y_{t-k}) \\ \bf{Yule-Walker方程}&&\gamma_k&=\phi_1 \gamma_{k-1}+\phi_2 \gamma_{k-2} \\ &&\rho_k&=\phi_1 \rho_{k-1}+\phi_2 \rho_{k-2} \\ \end{align} \]

时间序列分析理论及R代码实现

原俊青，浙江工业大学

http://www.homepage.zjut.edu.cn/yjq/

第一章: 绪论

第二章: 基本概念

第三章：趋势

3.1 随机趋势

3.2 确定趋势

3.3 均值函数为常数

3.4 回归方法

3.5 回归估计的性质及残差分析

第四章: 平稳时间序列模型

4.1 滑动平均过程

4.2 一般线性过程

4.3 自回归过程

4.4 ARMA(p,q)模型

第五章: 非平稳时间序列模型

第六章: 模型识别

第七章: 参数估计

第八章：模型诊断

第九章: 预测