什么是时间序列?
和统计里的样本有什么区别?
研究目的?研究方法?
统计指标?分析软件?
时间序列分析步骤?
常用的ARIMA模型?
例1. 加利福尼亚州洛杉矶地区100多年来的年降水量时间序列图。
data(larain)
summary(larain)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.080 9.675 14.140 14.888 18.400 40.290
plot(larain, ylab='Inches', xlab='Year', type='o', main='图1-1 洛杉矶年降水量时序图')
plot(y=larain,x=zlag(larain),ylab='Inches',xlab='Previous Year Inches', main='图1-2 洛杉矶当年降水量与去年降水量散点图')
例2. 某化工过程中颜色属性的时间序列图。
data(color)
summary(color)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 63.00 70.00 76.00 74.89 79.00 87.00
plot(color, ylab='Color Property', xlab='Batch', type='l', main='图1-3. 某化工过程中颜色属性的时间序列图')
points(color, col='blue')
plot(y=color, x=zlag(color), ylab='Color Property', xlab='Prevous Batch Color Property', main='图1-4.当前颜色值与前期颜色值的散点图')
例3. 加拿大野兔丰度的时间序列图。
data(hare)
summary(hare)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 20.00 27.00 39.55 65.00 95.00
plot(hare, ylab='Abundance', xlab='Year', type='l', main='图1-5. 加拿大野兔丰度的时间序列图')
points(hare, col='blue')
plot(y=hare, x=zlag(hare), ylab='Abundance', xlab='Previous Year Abundance', main='图1-6.当年与上一年野兔丰度的散点图')
例4. 艾奥瓦州迪比克市月平均气温的时间序列图。
data(tempdub)
summary(tempdub)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 27.32 48.60 46.27 63.12 74.00
plot(tempdub, ylab='Temperature', type='l', main='图1-7. 艾奥瓦州迪比克市月平均气温的时间序列图')
points(tempdub, col='blue')
plot(y=tempdub, x=zlag(tempdub,d=12), ylab='Temperature', xlab='Previous Year Temperature', main='图1-7’.当年与上一年月平均气温的散点图')
例5. 滤油器月销售量的时间序列图。
data(oilfilters)
summary(oilfilters)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1572 2314 3262 3388 4488 5862
plot(oilfilters, type='l', ylab='Sales', main='图1-8. 滤油器月销售量的时间序列图')
points(oilfilters, col='blue')
plot(y=oilfilters, x=zlag(oilfilters,d=12), ylab='Sales', xlab='Previous Year Sales', main='图1-8’.当年与上一年月销售量的散点图')
plot(oilfilters,type='l',ylab='Sales', main='图1-9. 以特殊符号绘制的滤油器月销售量')
points(y=oilfilters,x=time(oilfilters), pch=as.vector(season(oilfilters)))
习题1.(on P.7)
Question 3~5.
### rnorm(n), rchisq(n,df=2), rt(n,df=5)
n=48;
set.seed(12345); tsnorm1 = ts(rnorm(n),freq=1,start=1)
set.seed(1234); tsnorm2 = ts(rnorm(n),freq=1,start=1)
set.seed(123); tsnorm3 = ts(rnorm(n),freq=1,start=1)
set.seed(12); tsnorm4 = ts(rnorm(n),freq=1,start=1)
opar=par(mfrow=c(2,2))
plot(tsnorm1,type='o'); abline(h=0,col='red')
plot(tsnorm2,type='o'); abline(h=0,col='red')
plot(tsnorm3,type='o'); abline(h=0,col='red')
plot(tsnorm4,type='o'); abline(h=0,col='red')
par(opar)
Question 6.
data(tempdub)
#op <- par(bg = "light blue")
plot(tempdub,ylab='Temperature',type='l',col = "blue")
Month=c("J","A","S","O","N","D","J","F","M","A","M","J")
points(tempdub,pch=Month,col='red')
#points(tempdub,pch=as.vector(season(tempdub)),col='red')
#par(op)
\[ \gamma_{t,s} = Cov(X_t,X_s) = E(X_t,X_s)-\mu_t\mu_s. \]
\[ \rho_{t,s} = Corr(X_t,X_s) =\frac {\gamma_{t,s}} {\sqrt{\gamma_{t,t} \gamma_{s,s}}}. \]
任意有限维分布都与初始时间点无关,只与时间间隔相关。如果期望和方差都存在的话,那么我们就有 \[ X_t \overset{d} {=} X_s. \qquad \mu_t=\mu_s\equiv\mu, \qquad \sigma_t^2=\sigma_s^2\equiv\sigma^2\] \[ \left( X_{t_1},X_{t_2}\right) \overset{d} {=} \left(X_{t_1+k},X_{t_2+k}\right). \qquad \gamma_{t,s}=\gamma_{0,\left|t-s\right|}\] \[ \left(X_{t_1},X_{t_2},\cdots,X_{t_n}\right) \overset{d} {=} \left(X_{t_1+k},X_{t_2+k},\cdots,X_{t_n+k}\right).\]
Let \(\quad \gamma_k=Cov(X_t,X_{t+k}), \quad \text{then} \quad \rho_k=Corr(X_t,X_{t+k})=\frac {\gamma_{t,t+k}}{\sqrt{\gamma_{t,t}\gamma_{t+k,t+k}}}=\frac{\gamma_k}{\gamma_0}.\)
宽(弱)平稳性
均值函数和方差函数都是常数:\(\qquad \mu_t\equiv\mu, \qquad \sigma_t^2\equiv\sigma^2\)
协方差仅依赖于时间的滞后:\(\qquad \gamma_k=\gamma_{t,t+k}=\gamma_{0,k}.\)
Let \(\quad \gamma_k=Cov(X_t,X_{t+k}), \quad \text{then} \quad \rho_k=Corr(X_t,X_{t+k})=\frac {\gamma_{t,t+k}}{\sqrt{\gamma_{t,t}\gamma_{t+k,t+k}}}=\frac{\gamma_k}{\gamma_0}.\)
例1. 白噪声序列 {\(e_t, i.i.d.\)}, 平稳的 \[\begin{align} \gamma_k=\begin{cases} \sigma_e^2, &\text{if } k=0\\ 0, &\text{if } k>0 \end{cases} \qquad\qquad\qquad &\rho_k=\begin{cases} 1, &\text{if } k=0\\ 0, &\text{if } k>0\end{cases} \end{align}\]
n=100;
opar=par(mfrow=c(2,2))
plot(rnorm(n),xlab='time',type='o');abline(h=0,col='red')
plot(rchisq(n,2),xlab='time',type='o');abline(h=2,col='red')
plot(rt(n,5),xlab='time',type='o');abline(h=0,col='red')
plot(rexp(n),xlab='time',type='o');abline(h=1,col='red')
par(opar)
例2. 滑动平均序列 {\(X_t=\frac{e_t+e_{t-1}}{2}\)}, 平稳的 \[\begin{align} \gamma_k=\begin{cases} 0.5\cdot\sigma_e^2, &\text{if } k=0\\ 0.25\cdot\sigma_e^2, &\text{if } k=1\\ 0, &\text{if } k>1 \end{cases} \qquad\qquad\qquad &\rho_k=\begin{cases} 1, &\text{if } k=0\\ 0.5, &\text{if } k=1\\ 0, &\text{if } k>1\end{cases} \end{align}\]
n=100;
opar=par(mfrow=c(2,2))
tsnorm=rnorm(n)
tsMA1=ts((tsnorm+zlag(tsnorm))/2.0,freq=1,start=1)
plot(tsMA1,type='o',ylab='MA of N(0,1)');
abline(h=0,col='red')
tschisq=rchisq(n,2)
tsMA2=ts((tschisq+zlag(tschisq))/2.0,freq=1,start=1)
plot(tsMA2,type='o',ylab='MA of chisq(2)');
abline(h=2,col='red')
tst=rt(n,5)
tsMA3=ts((tst+zlag(tst))/2.0,freq=1,start=1)
plot(tsMA3,type='o',ylab='MA of t(5)');
abline(h=0,col='red')
tsexp=rexp(n)
tsMA4=ts((tsexp+zlag(tsexp))/2.0,freq=1,start=1)
plot(tsMA4,type='o',ylab='MA of exp(1)');
abline(h=1,col='red')
par(opar)
例3. 随机游动序列 {\(Y_t=e_1+e_2+\cdots+e_t\)}, 非平稳的 \[\mu_t=t\cdot\mu, \qquad\qquad \sigma_t^2=t\cdot\sigma_e^2.\]
\[\begin{align} \gamma_{t,s}&=Cov(e_1+e_2+\cdots+e_t,e_1+e_2+\cdots+e_s)\\ &=Var(e_1+e_2+\cdots+e_{min(t,s)})\\ &=min(t,s)\cdot \sigma_e^2 \end{align}\]
\[\rho_{t,s}=\frac{\gamma_{t,s}}{\sqrt{\gamma_{t,t}\gamma_{s,s}}}=\frac{min(t,s)\cdot \sigma_e^2}{\sqrt{t \sigma_e^2 \cdot s\sigma_e^2}}=\frac{\sqrt{min(t,s)}}{\sqrt{max(t,s)}}\]
data(rwalk)
n<-length(rwalk)
plot(rwalk,type='o',ylab='Random Walk', main = '随机游动的时间序列图1')
###生成随机游动过程方法1
Rwalk <- vector()
Rwalk[1] <- rnorm(1)
for (i in 2:n){
set.seed(123+i*456)
Rwalk[i] <- Rwalk[i-1] + rnorm(1)
}
ts_Rwalk <- ts(Rwalk,freq=1,start=1)
plot(ts_Rwalk,type='l',ylab='Random Walk',xlab='Time',main='随机游动的时间序列图2')
points(ts_Rwalk,col='blue')
###生成随机游动过程方法2:cumsum(rnorm(n))
opar=par(mfrow=c(2,2))
plot(cumsum(rnorm(n)),type='o',xlab='Time');
plot(cumsum(rchisq(n,2)),type='o',xlab='Time');
plot(cumsum(rt(n,5)),type='o',xlab='Time');
plot(cumsum(rexp(n,1)),type='o',xlab='Time');
par(opar)
受随机扰动的影响,不随时间衰减;
在不同的模拟中,可能展现完全不同的趋势。
n=60;
opar=par(mfrow=c(2,2))
plot(cumsum(rnorm(n)),type='o',xlab='Time');
plot(cumsum(rnorm(n)),type='o',xlab='Time');
plot(cumsum(rnorm(n)),type='o',xlab='Time');
plot(cumsum(rnorm(n)),type='o',xlab='Time');
par(opar)
例4. 艾奥瓦州迪比克市月平均气温的时间序列图。
data(tempdub)
plot(tempdub, ylab='Temperature', type='l', main='图1-7. 艾奥瓦州迪比克市月平均气温的时间序列图')
points(tempdub, col='blue')
均值模型: \(Y_t=\mu_t+X_t\), 其中\(E(X_t)=0.\)
随机扰动\(X_t\) 是零均值的平稳序列;
均值过程\(\mu_t\)随着时间\(t\)的变化,表现出特定的变化模式:
\[ \mu_t=\begin{cases} \mu_0, & \text{常函数} \\ \beta_0+\beta_1 t, & \text{线性函数} \\ \beta_0+\beta_1 t+\beta_2 t^2, & \text{二次函数} \\ \cdots, & \text{多项式函数} \\ \mu_{t-T}, & \text{周期函数} \\ \end{cases}\]
\[\quad\hat{\mu}=\bar{Y}, \qquad E(\hat{\mu})=\mu_0, \qquad \text{无偏估计量}\]
\[Var(\hat{\mu})=\frac{\gamma_0}{n} [1+2\sum_{k=1}^{n-1}(1-\frac{k}{n})\rho_k].\]
例1’. 白噪声序列 {\(X_t=e_t, i.i.d.\)}, 平稳的, \(Var(\hat{\mu})=\frac{\gamma_0}{n}.\)
\[\begin{gathered} \gamma_k=\begin{cases} \sigma_e^2, &\text{if } k=0\\ 0, &\text{if } k>0 \end{cases} \qquad\qquad\qquad \rho_k=\begin{cases} 1, &\text{if } k=0\\ 0, &\text{if } k>0\end{cases} \end{gathered}\]
例2’. 滑动平均序列{\(X_t=\frac{1}{2}e_t+\frac{1}{2}e_{t-1}\)}, 平稳的, \(Var(\hat{\mu})=\frac{\gamma_0}{n}[1+\frac{n-1}{n}]=\frac{\gamma_0}{n}(\frac{2n-1}{n})\simeq 2\frac{\gamma_0}{n}.\)
\[\begin{align} \gamma_k=\begin{cases} 0.5\cdot\sigma_e^2, &\text{if } k=0\\ 0.25\cdot\sigma_e^2, &\text{if } k=1\\ 0, &\text{if } k>1 \end{cases} \qquad\qquad\qquad &\rho_k=\begin{cases} 1, &\text{if } k=0\\ 0.5, &\text{if } k=1\\ 0, &\text{if } k>1\end{cases} \end{align}\]
\(\qquad\) 滑动平均序列 {\(X_t=e_t-\frac{1}{2}e_{t-1}\)}, 平稳的, \(Var(\hat{\mu})=\frac{\gamma_0}{n}[1-0.8(\frac{n-1}{n})]=\frac{\gamma_0}{n}(\frac{0.2n+0.8}{n})\simeq 0.2\frac{\gamma_0}{n}.\)
\[\begin{align} \gamma_k=\begin{cases} 1.25\cdot\sigma_e^2, &\text{if } k=0\\ -0.5\cdot\sigma_e^2, &\text{if } k=1\\ 0, &\text{if } k>1 \end{cases} \qquad\qquad\qquad &\rho_k=\begin{cases} 1, &\text{if } k=0\\ -0.4, &\text{if } k=1\\ 0, &\text{if } k>1\end{cases} \end{align}\]
例5. 一般的平稳序列,随着滞后的增加,自相关函数迅速衰减,\(\sum_{k=0}^{+\infty}|\rho_k| < \infty.\) 当\(n\)充分大的时候, \(Var(\hat{\mu}) \simeq \frac{\gamma_0}{n}[\sum_{k=0}^{+\infty}\rho_k].\)
\(\qquad\) 例如,对于所有的整数\(k\), \(\rho_k=\phi^{|k|}\), \(\phi\in(-1, +1).\) 那么当\(n\)充分大的时候,\(Var(\hat{\mu}) \simeq \frac{\gamma_0}{n}[1+\frac{2\phi}{1-\phi}]=\frac{(1+\phi)\gamma_0}{(1-\phi)n}.\)
\[ \qquad\]
\[Var(\hat\mu)=\frac{1}{n^2}\sum_{i=1}^n\sum_{j=1}^n Cov(X_i,X_j)=\frac{1}{n^2}[\sum_{i=1}^n Var(X_i)+2\sum_{i=2}^n\sum_{j=1}^{i-1} \gamma_{i,j}]\]
例3’. 随机游动序列 {\(X_t=e_1+e_2+\cdots+e_t\)}, 非平稳的, \(\mu_t=t\cdot\mu, \quad \sigma_t^2=t\cdot\sigma_e^2.\)
\[\begin{align} \gamma_{t,s}&=Cov(e_1+e_2+\cdots+e_t,e_1+e_2+\cdots+e_s)\\ &=Var(e_1+e_2+\cdots+e_{min(t,s)})\\ &=min(t,s)\cdot \sigma_e^2 \end{align}\]
\[\rho_{t,s}=\frac{\gamma_{t,s}}{\sqrt{\gamma_{t,t}\gamma_{s,s}}}=\frac{min(t,s)\cdot \sigma_e^2}{\sqrt{t \sigma_e^2 \cdot s\sigma_e^2}}=\frac{\sqrt{min(t,s)}}{\sqrt{max(t,s)}}\]
\[\begin{align} Var(\hat{\mu})&=\frac{1}{n^2}\left [\sum_{i=1}^n i\sigma_e^2+2\sum_{i=2}^n\sum_{j=1}^{i-1} j\sigma_e^2\right ]\\ &=\frac{\sigma_e^2}{n^2}\left [\sum_{i=1}^n i+ 2\sum_{j=1}^{n-1}\sum_{i=j+1}^n j\right ]\\ &=\frac{\sigma_e^2}{n^2}\left [\sum_{i=1}^n i+ 2\sum_{j=1}^{n-1} (n-j) j\right ] \\ &=\frac{\sigma_e^2}{n^2}\left [(1+2n)\sum_{i=1}^n i- 2\sum_{j=1}^{n} j^2\right ] \\ &=\frac{\sigma_e^2}{n^2}\left [(1+2n)\frac{n(n+1)}{2}- 2 \frac{n(n+1)(2n+1)}{6}\right ] \\ &=\sigma_e^2 (2n+1) \frac{(n+1)}{6n} \end{align}\]
\(\qquad\) {\(X_t\)}满足Gauss-Markov假设(零均值,等方差,不相关).
\(\qquad\) 最小二乘法估计:最小化 \(Q(\beta_0,\beta_1)=\sum_{t=1}^n [Y_t-(\beta_0+\beta_1 t)]^2\), 令偏导数为0,求解。
\[\begin{align} \hat{\beta_1} &= \frac{\sum_{t=1}^n t(Y_t-\bar{Y})}{\sum_{t=1}^n (t-\bar{t})^2}=\frac{\sum_{t=1}^n (t-\bar{t})(Y_t-\bar{Y})}{\sum_{t=1}^n (t-\bar{t})^2}\\ \hat{\beta_0} &= \bar{Y}-\hat{\beta_1}\bar{t} \end{align}\]
data(wages)
plot(wages,type='l',ylab='Wages',xlab='Time',main='Monthly wages')
points(wages,col='blue')
\(\qquad\) 时间的一次函数拟合
lm_wages=lm(wages~time(wages))
summary(lm_wages)
##
## Call:
## lm(formula = wages ~ time(wages))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.23828 -0.04981 0.01942 0.05845 0.13136
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.490e+02 1.115e+01 -49.24 <2e-16 ***
## time(wages) 2.811e-01 5.618e-03 50.03 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08257 on 70 degrees of freedom
## Multiple R-squared: 0.9728, Adjusted R-squared: 0.9724
## F-statistic: 2503 on 1 and 70 DF, p-value: < 2.2e-16
plot(wages,type='l',ylab='Wages',xlab='Time',main='一次时间函数趋势')
points(wages,col='blue')
abline(lm_wages,col='red')
最小二乘估计的斜率\(\hat{\beta}_1=0.1341\) 和截距 \(\hat{\beta}_0=-1.008\). 图表3-2展示了随机游动并叠加了最小二乘回归趋势线。
\(\qquad\) 二次时间函数拟合
lm2_wages=lm(wages~time(wages)+I(time(wages)^2))
summary(lm2_wages)
##
## Call:
## lm(formula = wages ~ time(wages) + I(time(wages)^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.148318 -0.041440 0.001563 0.050089 0.139839
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.495e+04 1.019e+04 -8.336 4.87e-12 ***
## time(wages) 8.534e+01 1.027e+01 8.309 5.44e-12 ***
## I(time(wages)^2) -2.143e-02 2.588e-03 -8.282 6.10e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05889 on 69 degrees of freedom
## Multiple R-squared: 0.9864, Adjusted R-squared: 0.986
## F-statistic: 2494 on 2 and 69 DF, p-value: < 2.2e-16
xfit <- time(wages)
yfit <- fitted(lm2_wages)
plot(wages,type='l',ylab='Wages',xlab='Time',main='二次时间函数趋势')
points(wages,col='blue')
lines(as.vector(xfit),as.vector(yfit),col='red')
data(tempdub)
month.=season(tempdub)
model2=lm(tempdub~month.-1) # -1 removes the intercept term
summary(model2)
##
## Call:
## lm(formula = tempdub ~ month. - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2750 -2.2479 0.1125 1.8896 9.8250
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## month.January 16.608 0.987 16.83 <2e-16 ***
## month.February 20.650 0.987 20.92 <2e-16 ***
## month.March 32.475 0.987 32.90 <2e-16 ***
## month.April 46.525 0.987 47.14 <2e-16 ***
## month.May 58.092 0.987 58.86 <2e-16 ***
## month.June 67.500 0.987 68.39 <2e-16 ***
## month.July 71.717 0.987 72.66 <2e-16 ***
## month.August 69.333 0.987 70.25 <2e-16 ***
## month.September 61.025 0.987 61.83 <2e-16 ***
## month.October 50.975 0.987 51.65 <2e-16 ***
## month.November 36.650 0.987 37.13 <2e-16 ***
## month.December 23.642 0.987 23.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.419 on 132 degrees of freedom
## Multiple R-squared: 0.9957, Adjusted R-squared: 0.9953
## F-statistic: 2569 on 12 and 132 DF, p-value: < 2.2e-16
model3=lm(tempdub~month.) # intercept is automatically included so one month (Jan) is dropped
summary(model3)
##
## Call:
## lm(formula = tempdub ~ month.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2750 -2.2479 0.1125 1.8896 9.8250
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.608 0.987 16.828 < 2e-16 ***
## month.February 4.042 1.396 2.896 0.00443 **
## month.March 15.867 1.396 11.368 < 2e-16 ***
## month.April 29.917 1.396 21.434 < 2e-16 ***
## month.May 41.483 1.396 29.721 < 2e-16 ***
## month.June 50.892 1.396 36.461 < 2e-16 ***
## month.July 55.108 1.396 39.482 < 2e-16 ***
## month.August 52.725 1.396 37.775 < 2e-16 ***
## month.September 44.417 1.396 31.822 < 2e-16 ***
## month.October 34.367 1.396 24.622 < 2e-16 ***
## month.November 20.042 1.396 14.359 < 2e-16 ***
## month.December 7.033 1.396 5.039 1.51e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.419 on 132 degrees of freedom
## Multiple R-squared: 0.9712, Adjusted R-squared: 0.9688
## F-statistic: 405.1 on 11 and 132 DF, p-value: < 2.2e-16
# first creates the first pair of harmonic functions and then fit the model
har.=harmonic(tempdub,1)##a matrix consisting of \cos(2k ?? t), \sin(2k ?? t), k=1,2,...,m, excluding any zero functions.
model4=lm(tempdub~har.)
summary(model4)
##
## Call:
## lm(formula = tempdub ~ har.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.1580 -2.2756 -0.1457 2.3754 11.2671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.2660 0.3088 149.816 < 2e-16 ***
## har.cos(2*pi*t) -26.7079 0.4367 -61.154 < 2e-16 ***
## har.sin(2*pi*t) -2.1697 0.4367 -4.968 1.93e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.706 on 141 degrees of freedom
## Multiple R-squared: 0.9639, Adjusted R-squared: 0.9634
## F-statistic: 1882 on 2 and 141 DF, p-value: < 2.2e-16
tempFit4 <- ts(fitted(model4),freq=12,start=c(1964,1))
plot(tempFit4,ylab='Temperature',type='l',
ylim=range(c(fitted(model4),tempdub))) # the ylim option ensures that the
# y axis has a range that fits the raw data and the fitted values
points(tempdub, col='blue')
points(tempFit4,col='red')
参考“多元线性回归分析”http://www.homepage.zjut.edu.cn/yjq/
随机项的估计和预测(残差) \[\hat X_t=Y_t-\hat Y_t\]
检验残差\(\hat X_t\)是否为白噪声序列: 零均值,等方差,不相关(正态独立)。
# Exhibit 3.8
plot(y=rstudent(model3), x=as.vector(time(tempdub)), xlab='Time', ylab='Standardized Residuals', type='o')
abline(h=0, col='red')
# Exhibit 3.9
plot(y=rstudent(model3), x=as.vector(time(tempdub)), xlab='Time', ylab='Standardized Residuals', type='l')
points(y=rstudent(model3), x=as.vector(time(tempdub)), pch=as.vector(season(tempdub)), col='blue')
abline(h=0, col='red')
# Exhibit 3.10
plot(y=rstudent(model3), x=as.vector(fitted(model3)), xlab='Fitted Trend Values',
ylab='Standardized Residuals',type="n")
points(y=rstudent(model3), x=as.vector(fitted(model3)), pch=as.vector(season(tempdub)))
# Exhibit 3.11
hist(rstudent(model3), xlab='Standardized Residuals', main='')
# Exhibit 3.12
qqnorm(rstudent(model3), main='QQ plot')
qqline(rstudent(model3), col='red')
#Shapiro-Wilk检验: H_0 正态
shapiro.test(rstudent(model3))
##
## Shapiro-Wilk normality test
##
## data: rstudent(model3)
## W = 0.9929, p-value = 0.6954
#t检验: H_0 均值为0
t.test(residuals(model3))
##
## One Sample t-test
##
## data: residuals(model3)
## t = -1.1355e-15, df = 143, p-value = 1
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.5410872 0.5410872
## sample estimates:
## mean of x
## -3.108378e-16
假定样本平稳,序列具有不变的均值和方差,定义滞后\(k\)的样本自相关函数为 \[ r_k=\frac{\sum_{t=k+1}^n (Y_t-\overline Y)(Y_{t-k}-\overline Y)}{\sum_{t=1}^n (Y_t-\overline Y)^2}, \qquad k=1,2,\cdots\] \(r_k\)相对滞后\(k\)的图形被称作自相关函数图。
# Exhibit 3.13
acf(rstudent(model3), main='ACF')
# 游程检验:H_0 独立
runs(rstudent(model3))
## $pvalue
## [1] 0.216
##
## $observed.runs
## [1] 65
##
## $expected.runs
## [1] 72.875
##
## $n1
## [1] 69
##
## $n2
## [1] 75
##
## $k
## [1] 0
例3”. 随机游动过程,错误地当成线性时间趋势处理:
data(rwalk)
plot(rwalk,type='o',ylab='rWalk',main='图表3-2* 随机游动')
model1=lm(rwalk~time(rwalk))
summary(model1)
##
## Call:
## lm(formula = rwalk ~ time(rwalk))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.70045 -0.79782 0.06391 0.63064 2.22128
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.007888 0.297245 -3.391 0.00126 **
## time(rwalk) 0.134087 0.008475 15.822 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.137 on 58 degrees of freedom
## Multiple R-squared: 0.8119, Adjusted R-squared: 0.8086
## F-statistic: 250.3 on 1 and 58 DF, p-value: < 2.2e-16
最小二乘估计的斜率\(\hat{\beta}_1=0.1341\) 和截距 \(\hat{\beta}_0=-1.008\). 图表3-2展示了随机游动并叠加了最小二乘回归趋势线。
plot(rwalk,type='o',ylab='rWalk',main='图表3-2 随机游动与线性时间趋势')
abline(model1, col='red') # add the fitted least squares line
# Exhibit 3.14
plot(y=rstudent(model1), x=as.vector(time(rwalk)), ylab='Standardized Residuals', xlab='Time', type='o')
abline(h=0, col='red')
# Exhibit 3.15
plot(y=rstudent(model1),x=fitted(model1),ylab='Standardized Residuals',
xlab='Fitted Trend Values',type='p')
# Exhibit 3.16
acf(rstudent(model1),main='ACF')
# 游程检验:H_0 独立
runs(rstudent(model1))
## $pvalue
## [1] 0.0266
##
## $observed.runs
## [1] 22
##
## $expected.runs
## [1] 30.96667
##
## $n1
## [1] 29
##
## $n2
## [1] 31
##
## $k
## [1] 0
习题3.7 (on P.38)
data(winnebago); winnebago
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1966 61 48
## 1967 53 78 75 58 146 193 124 120 134 99 130 166
## 1968 168 267 314 432 355 384 232 235 293 242 248 236
## 1969 209 358 352 406 562 389 416 493 409 328 222 195
## 1970 156 439 671 874 558 621 628 652 495 344 405 586
## 1971 403 700 837 1224 1117 1214 762 846 1228 937 1396 1174
## 1972 628 1753
n <- length(winnebago); n
## [1] 64
Time_winn <- time(winnebago); Time_winn
## Jan Feb Mar Apr May Jun Jul Aug
## 1966
## 1967 1967.000 1967.083 1967.167 1967.250 1967.333 1967.417 1967.500 1967.583
## 1968 1968.000 1968.083 1968.167 1968.250 1968.333 1968.417 1968.500 1968.583
## 1969 1969.000 1969.083 1969.167 1969.250 1969.333 1969.417 1969.500 1969.583
## 1970 1970.000 1970.083 1970.167 1970.250 1970.333 1970.417 1970.500 1970.583
## 1971 1971.000 1971.083 1971.167 1971.250 1971.333 1971.417 1971.500 1971.583
## 1972 1972.000 1972.083
## Sep Oct Nov Dec
## 1966 1966.833 1966.917
## 1967 1967.667 1967.750 1967.833 1967.917
## 1968 1968.667 1968.750 1968.833 1968.917
## 1969 1969.667 1969.750 1969.833 1969.917
## 1970 1970.667 1970.750 1970.833 1970.917
## 1971 1971.667 1971.750 1971.833 1971.917
## 1972
#(a).时序图
plot(winnebago, type='l', ylab='Monthly Sale', xlab='Time', main='图1. 时序图(a)')
points(winnebago, col='blue', pch=2); abline(h=0, col='red')
#(b).时间的一次函数拟合
model_1=lm(winnebago~Time_winn)
summary(model_1)
##
## Call:
## lm(formula = winnebago ~ Time_winn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -419.58 -93.13 -12.78 94.96 759.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -394885.68 33539.77 -11.77 <2e-16 ***
## Time_winn 200.74 17.03 11.79 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 209.7 on 62 degrees of freedom
## Multiple R-squared: 0.6915, Adjusted R-squared: 0.6865
## F-statistic: 138.9 on 1 and 62 DF, p-value: < 2.2e-16
abline(model_1, col='red')
### y_hat=-394885.68+200.74 * Time
Res_winn <- rstudent(model_1) ##标准化残差
plot(Res_winn, type='l', ylab='Standarized Residual in (b)', xlab='Time', main='图2. 残差图(b))')
points(Res_winn, col='blue', pch=3); abline(h=0, col='red')
#(c).对数变换之后,时间的一次函数拟合
winn_log <- log(winnebago) #数据对数化
plot(winn_log, type='l', ylab='Log(Monthly Sale)', xlab='Time', main='图3. 对数化之后时序图(c)')
points(winn_log, col='blue', pch=6); abline(h=0, col='red')
#(d).对数变换+一次函数拟合
model_2=lm(winn_log~Time_winn)
summary(model_2)
##
## Call:
## lm(formula = winn_log ~ Time_winn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03669 -0.20823 0.04995 0.25662 0.86223
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -984.93878 62.99472 -15.63 <2e-16 ***
## Time_winn 0.50306 0.03199 15.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3939 on 62 degrees of freedom
## Multiple R-squared: 0.7996, Adjusted R-squared: 0.7964
## F-statistic: 247.4 on 1 and 62 DF, p-value: < 2.2e-16
abline(model_2, col='red')
### Log(y_hat)=-984.93878+ 0.50306 * Time
Res_winn_log <- rstudent(model_2) ##标准化残差
plot(Res_winn_log, type='l', ylab='Residual in log (d)', xlab='Time', main='图4. 对数变换之后的残差图(d))')
points(Res_winn_log, col='blue', pch=7); abline(h=0, col='red')
#(e)季节性+对数变换+一次函数拟合
month_winn <- season(winnebago); month_winn
## [1] November December January February March April May
## [8] June July August September October November December
## [15] January February March April May June July
## [22] August September October November December January February
## [29] March April May June July August September
## [36] October November December January February March April
## [43] May June July August September October November
## [50] December January February March April May June
## [57] July August September October November December January
## [64] February
## 12 Levels: January February March April May June July August ... December
plot(winn_log, type='l', ylab='Log(Monthly Sale)', xlab='Time', main='图5. 季节性+对数化之后时序图(e)')
points(winn_log, col='blue', pch=as.vector(month_winn )); abline(h=0, col='red')
model_3=lm(winn_log~month_winn+Time_winn)
summary(model_3)
##
## Call:
## lm(formula = winn_log ~ month_winn + Time_winn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.92501 -0.16328 0.03344 0.20757 0.57388
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -997.33061 50.63995 -19.695 < 2e-16 ***
## month_winnFebruary 0.62445 0.18182 3.434 0.001188 **
## month_winnMarch 0.68220 0.19088 3.574 0.000779 ***
## month_winnApril 0.80959 0.19079 4.243 9.30e-05 ***
## month_winnMay 0.86953 0.19073 4.559 3.25e-05 ***
## month_winnJune 0.86309 0.19070 4.526 3.63e-05 ***
## month_winnJuly 0.55392 0.19069 2.905 0.005420 **
## month_winnAugust 0.56989 0.19070 2.988 0.004305 **
## month_winnSeptember 0.57572 0.19073 3.018 0.003960 **
## month_winnOctober 0.26349 0.19079 1.381 0.173300
## month_winnNovember 0.28682 0.18186 1.577 0.120946
## month_winnDecember 0.24802 0.18182 1.364 0.178532
## Time_winn 0.50909 0.02571 19.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3149 on 51 degrees of freedom
## Multiple R-squared: 0.8946, Adjusted R-squared: 0.8699
## F-statistic: 36.09 on 12 and 51 DF, p-value: < 2.2e-16
lines(y=as.vector(fitted(model_3)), x=as.vector(Time_winn), col='red')
points(y=as.vector(fitted(model_3)), x=as.vector(Time_winn), col='Purple', pch=as.vector(month_winn ))
Res1_winn_log <- rstudent(model_3) ##标准化残差
plot(Res1_winn_log, type='l', ylab='Residual in log (f)', xlab='Time', main='图6. 季节性对数变换之后的残差图(f))')
points(Res1_winn_log, col='blue', pch=as.vector(month_winn ))
abline(h=0, col='red')
#(b).游程检验:独立性
runs(Res1_winn_log)
## $pvalue
## [1] 0.000243
##
## $observed.runs
## [1] 18
##
## $expected.runs
## [1] 32.71875
##
## $n1
## [1] 29
##
## $n2
## [1] 35
##
## $k
## [1] 0
##pvalue 0.000243 拒绝“残差是独立的”
#(c).自相关性
acf(Res1_winn_log,main='图7.季节性+对数变换+一次函数拟合之后的残差自相关图' ) #自相关图
#(d).正态性
#QQ图
qqnorm(Res1_winn_log, main='图8.季节性+对数变换+一次函数拟合之后的残差QQ图' ) #QQ图
qqline(Res1_winn_log, col='blue'); abline(h=0, col='red')
#直方图
hist(Res1_winn_log , xlab='Standardized Residuals_log_2', main='图9. 季节性+对数变换+一次函数拟合之后的残差直方图')
#Shapiro-Wilk检验: 正态性
shapiro.test(Res1_winn_log )
##
## Shapiro-Wilk normality test
##
## data: Res1_winn_log
## W = 0.97035, p-value = 0.1262
##p-value = 0.1262 不能拒绝来自正态总体的假设
\[ Y_t=e_t-\theta_1 e_{t-1}-\theta_2 e_{t-2}-\cdots-\theta_q e_{t-q}, \qquad\text{ 其中 } e_t\sim WN(0,\sigma_e^2). \]
\(\qquad\)均值 \(E(Y_t)=0\), 方差 \(Var(Y_t)=(1+\theta^2)\sigma_e^2\), 协方差及自相关函数 \[\begin{gathered} \begin{align} Cov(Y_t,Y_{t-1})&=Cov(e_t-\theta e_{t-1},e_{t-1}-\theta e_{t-2})=-\theta \sigma_e^2. \\ Cov(Y_t,Y_{t-2})&=Cov(e_t-\theta e_{t-1},e_{t-2}-\theta e_{t-3})=0.\\ Cov(Y_t,Y_{t-k})&=Cov(e_t-\theta e_{t-1},e_{t-k}-\theta e_{t-k-1})=0. \text{ if } k>2.\\ \end{align} \qquad\qquad \rho_k= \begin{cases} 1, &\text{ if }k=0\\ -\frac{\theta}{1+\theta^2}, &\text{ if }k=1\\ 0, &\text{ if }k\geq 2\\ \end{cases} \end{gathered}\]
\(\qquad\) 结论:MA(1)是平稳的,自相关函数在滞后 \(1\) 阶之后“截尾”。
theta <- seq(-1,1,by=0.1)
rho_1 <- (-theta)/(1+theta^2)
data.frame(theta=theta, rho_1=rho_1)
## theta rho_1
## 1 -1.0 0.5000000
## 2 -0.9 0.4972376
## 3 -0.8 0.4878049
## 4 -0.7 0.4697987
## 5 -0.6 0.4411765
## 6 -0.5 0.4000000
## 7 -0.4 0.3448276
## 8 -0.3 0.2752294
## 9 -0.2 0.1923077
## 10 -0.1 0.0990099
## 11 0.0 0.0000000
## 12 0.1 -0.0990099
## 13 0.2 -0.1923077
## 14 0.3 -0.2752294
## 15 0.4 -0.3448276
## 16 0.5 -0.4000000
## 17 0.6 -0.4411765
## 18 0.7 -0.4697987
## 19 0.8 -0.4878049
## 20 0.9 -0.4972376
## 21 1.0 -0.5000000
plot(theta, rho_1, col='blue', type='l', ylab=expression(rho[1]), xlab=expression(theta))
abline(h=0.5, lty=4); abline(h=-0.5, lty=4)
例如: MA(1)序列\(\theta=-0.9, \rho_1=0.4972\), 和\(\theta=0.9, \rho_1=-0.4972\)。
# Exhibit 4.2
data(ma1.2.s) #theta=-0.9
plot(ma1.2.s, ylab=expression(Y[t]), type='l', main=expression(theta==-0.9))
points(ma1.2.s, col='blue'); abline(h=0, col='red')
# Exhibit 4.3
plot(y=ma1.2.s, x=zlag(ma1.2.s), col='blue', ylab=expression(Y[t]), xlab=expression(Y[t-1]), type='p', main=expression(theta==-0.9))
# Exhibit 4.4
plot(y=ma1.2.s, x=zlag(ma1.2.s,2), col='blue', ylab=expression(Y[t]), xlab=expression(Y[t-2]), type='p', main=expression(theta==-0.9))
# Exhibit 4.5
data(ma1.1.s)
plot(ma1.1.s,ylab=expression(Y[t]),type='o'); abline(h=0, col='red')
# Exhibit 4.6
plot(y=ma1.1.s, x=zlag(ma1.1.s), ylab=expression(Y[t]), xlab=expression(Y[t-1]),type='p')
# Exhibit 4.7
plot(y=ma1.1.s,x=zlag(ma1.1.s,2),ylab=expression(Y[t]),xlab=expression(Y[t-2]),type='p')
# An MA(1) series of length n=100 with MA coefficient equal to -0.9 and 0.9 respectively.
set.seed(12345)
y1=arima.sim(model=list(ma=-c(-0.9)),n=100)
y2=arima.sim(model=list(MA=-c(0.9)), n=100)
# Note that R uses the plus convention in the MA model formula so the additional minus sign.
\(\qquad\)均值 \(E(Y_t)=0\), 方差 \(Var(Y_t)=(1+\theta_1^2+\theta_2^2)\sigma_e^2\), 协方差及自相关函数 \[\begin{gathered} \begin{align} Cov(Y_t,Y_{t-1})&=Cov(e_t-\theta_1 e_{t-1}-\theta_2 e_{t-2},e_{t-1}-\theta_1 e_{t-2}-\theta_2 e_{t-3})=(-\theta_1+\theta_1\theta_2) \sigma_e^2. \\ Cov(Y_t,Y_{t-2})&=Cov(e_t-\theta_1 e_{t-1}-\theta_2 e_{t-2},e_{t-2}-\theta_1 e_{t-3}-\theta_2 e_{t-4})=-\theta_2 \sigma_e^2.\\ Cov(Y_t,Y_{t-k})&=Cov(e_t-\theta_1 e_{t-1}-\theta_2 e_{t-2},e_{t-k}-\theta_1 e_{t-k-1}-\theta_2 e_{t-k-2})=0. \text{ if } k\geq 3.\\ \end{align} \qquad \rho_k= \begin{cases} 1, &\text{ if }k=0\\ \frac{-\theta_1+\theta_1\theta_2}{1+\theta_1^2+\theta_2^2}, &\text{ if }k=1\\ \frac{-\theta_2}{1+\theta_1^2+\theta_2^2}, &\text{ if }k=2\\ 0, &\text{ if }k\geq 3\\ \end{cases} \end{gathered}\]
\(\qquad\) 结论:MA(2)是平稳的,自相关函数在滞后 \(2\) 阶之后“截尾”。
例如: MA(2)序列\(\theta_1=1, \theta_2=-0.6\)。\(\rho_1=-0.678,\rho_2=0.254.\)
data(ma2.s)
opar=par(mfrow=c(2,2))
# Exhibit 4.8
plot(ma2.s, ylab=expression(Y[t]),type='o'); abline(h=0, col='red')
# Exhibit 4.9
plot(y=ma2.s, x=zlag(ma2.s), ylab=expression(Y[t]), xlab=expression(Y[t-1]), type='p')
# Exhibit 4.10
plot(y=ma2.s, x=zlag(ma2.s,2), ylab=expression(Y[t]), xlab=expression(Y[t-2]), type='p')
# Exhibit 4.11
plot(y=ma2.s,x=zlag(ma2.s,3),ylab=expression(Y[t]),xlab=expression(Y[t-3]),type='p')
par(opar)
# An MA(2) series with MA coefficients equal to 1 and -0.6 and of length n=100.
y=arima.sim(model=list(ma=-c(1, -0.6)), n=100)
# Note that R uses the plus convention in the MA model formula so the additional minus sign.
plot(y, ylab=expression(Y[t]), type='o'); abline(h=0, col='red')
\(\qquad\)均值 \(E(Y_t)=0\), 方差 \(Var(Y_t)=(1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2)\sigma_e^2\), 协方差及自相关函数 \[\begin{gathered} Cov(Y_t,Y_{t-k})=\begin{cases} \sigma_e^2 (-\theta_k+\sum_{i=1}^{q-k}\theta_i\theta_{i+k}), &\text{ if } 1\leq k\leq q\\ 0. \text{ if } k> q.\\ \end{cases} \qquad\qquad \rho_k= \begin{cases} 1, &\text{ if }k=0\\ \frac{-\theta_k+\sum_{i=1}^{q-k}\theta_i\theta_{i+k}}{1+\sum_{i=1}^q\theta_i^2}, &\text{ if }1\leq k\leq q\\ 0, &\text{ if }k > q\\ \end{cases} \end{gathered}\]
\(\qquad\) 结论:MA(q)是平稳的,自相关函数在滞后 \(q\) 阶之后“截尾”。
\(\qquad \{Y_t\}\) 表示观测到的时间序列,\(\{e_t\}\) 表示未观测到的白噪声。 \[Y_t=\psi_0e_t+\psi_1 e_{t-1}+\psi_2 e_{t-2}+\cdots=\sum_{i=0}^{\infty}\psi_i e_{t-i}.\]
期望为零 \(E(Y_t)=0.\) 一般性假设\(\psi_0=1.\)
方差有限 \(Var(Y_t)=\sigma_e^2(\sum_{i=0}^{\infty}\psi_i^2)<\infty.\)
一阶协方差
\[\begin{align} Cov(Y_t,Y_{t-1}) &= Cov(\psi_0e_t+\psi_1 e_{t-1}+\psi_2 e_{t-2}+\cdots, \psi_0e_{t-1}+\psi_1 e_{t-2}+\psi_2 e_{t-3}+\cdots ) \\ &= (\psi_0\psi_1+\psi_1\psi_2+\psi_2\psi_3+\cdots)\sigma_e^2=\sigma_e^2(\sum_{i=0}^{\infty}\psi_i\psi_{i+1}) \end{align}\]
\[\begin{align} Cov(Y_t,Y_{t-2}) &= Cov(\psi_0e_t+\psi_1 e_{t-1}+\psi_2 e_{t-2}+\cdots, \psi_0e_{t-2}+\psi_1 e_{t-3}+\psi_2 e_{t-4}+\cdots ) \\ &= (\psi_0\psi_2+\psi_1\psi_3+\psi_2\psi_4+\cdots)\sigma_e^2=\sigma_e^2(\sum_{i=0}^{\infty}\psi_i\psi_{i+2}) \end{align}\]
\[Cov(Y_t,Y_{t-k}) =\sigma_e^2(\sum_{i=0}^{\infty}\psi_i\psi_{i+k}),\qquad k\geq0.\]
\[Corr(Y_t,Y_{t-k})=\frac{\sum_{i=0}^{\infty}\psi_i\psi_{i+k}}{\sum_{i=0}^{\infty}\psi_i^2}, \quad k\geq 0.\]
例如,一般线性过程\(\psi_j=\phi^j, |\phi|<1.\)
\[Y_t=e_t+\phi e_{t-1}+\phi^2 e_{t-2}+\cdots, \qquad |\phi|<1.\]
\(\qquad\)- 期望为零 \(E(Y_t)=0\), 方差有限 \(Var(Y_t)=\sigma_e^2(\sum_{i=0}^{\infty}\phi^{2i})=\frac{1}{1-\phi^2}\sigma_e^2.\)
\(\qquad\)- \(k\)阶协方差 \(Cov(Y_t,Y_{t-k})=\sigma_e^2(\sum_{i=0}^{\infty}\phi^{i}\phi^{i+k})=\frac{\phi^k}{1-\phi^2}\sigma_e^2.\)
\(\qquad\)- \(k\)阶自相关函数 \(Corr(Y_t,Y_{t-k})=\phi^k, \quad k\geq 0.\)
\[ Y_t=\phi_1 Y_{t-1}+\phi_2 Y_{t-2}+\cdots+\phi_p Y_{t-p}+e_t, \qquad\text{ 其中 } e_t\sim WN(0,\sigma_e^2). \]
\(e_t\) 与 \(\{Y_{t-1},Y_{t-2},\cdots,Y_1\}\) 是不相关的。
当\(p=1\)时,AR(1)过程: \(Y_t=\phi Y_{t-1}+e_t.\)
\(\qquad\) 如果\({Y_t}\)序列平稳,那么
\(\qquad\)- 均值 \(\mu=\frac{0}{1-\phi}=0\), 因为\(E(Y_t)=\phi E(Y_{t-1})+E(e_t)\);
\(\qquad\)- 方差 \(\gamma_0=\frac{\sigma_e^2}{1-\phi^2}\geq 0\), 因为\(Var(Y_t)=\phi^2 Var(Y_{t-1})+Var(e_t)\);
\(\qquad\quad\) ①. 平稳性的必要条件是 \(|\phi|<1.\)
\(\qquad\)- \(k\)阶协方差 \[\begin{align} Cov(Y_t,Y_{t-k})&=Cov(\phi Y_{t-1}+e_t, Y_{t-k}) \\ \gamma_k &= \phi \gamma_{k-1} \\ &= \phi\cdot\phi \gamma_{k-2}=\cdots\\ &=\phi^k \gamma_0 \\ \end{align} \]
\(\qquad\)- 自相关函数 \(\qquad \rho_k=\phi^k, k=0,1,2,\cdots\)
\(\qquad\)- AR(1)的一般线性表示
\[\begin{align} &&Y_t&=\phi Y_{t-1}+e_t\\ \phi \cdot &&Y_{t-1}&=\phi Y_{t-2}+e_{t-1} &&\cdot\phi\\ \phi^2 \cdot &&Y_{t-2}&=\phi Y_{t-3}+e_{t-2} &&\cdot\phi^2\\ &&&\cdots\\ \phi^{k-1} \cdot &&Y_{t-(k-1)}&=\phi Y_{t-k}+e_{t-(k-1)} &&\cdot\phi^{k-1}\\ \end{align}\]
\(\qquad\qquad\)求和可得 \(Y_t=e_t+\phi e_{t-1}+\phi^2 e_{t-2}+\cdots+\phi^{k-1} e_{t-(k-1)}+\phi^k Y_{t-k}.\)
\(\qquad\qquad\)如果\(|\phi|<1\), 并且\(k\rightarrow\infty\), 那么 \[Y_t=e_t+\phi e_{t-1}+\phi^2 e_{t-2}+\phi^3 e_{t-3}+\cdots\]
\(\qquad\quad\) ②. 平稳性的充分条件是 \(|\phi|<1.\)
\(\qquad\) 结论:AR(1)是平稳的充分必要条件是\(|\phi|<1\),自相关函数\(\rho_k=\phi^k\)随时滞\(k\)指数衰减。
x<-seq(1:12);
y1<-0.9^x;
y2<-0.4^x;
y3<-(-0.8)^x;
y4<-(-0.5)^x;
opar=par(mfrow=c(2,2))
plot(x,y1,type='h', ylim=c(-0.1,0.9), xlab='Lag', ylab=expression(rho[k]), main=expression(phi==0.9));
points(x,y1,col='blue',cex=0.5); abline(h=0,col='red');
plot(x,y2,type='h', ylim=c(-0.1,0.9), xlab='Lag', ylab=expression(rho[k]), main=expression(phi==0.4));
points(x,y2,col='blue',cex=0.5); abline(h=0,col='red');
plot(x,y3,type='h', ylim=c(-0.9,0.9), xlab='Lag', ylab=expression(rho[k]), main=expression(phi==-0.8));
points(x,y3,col='blue',cex=0.5); abline(h=0,col='red');
plot(x,y4,type='h', ylim=c(-0.9,0.9), xlab='Lag', ylab=expression(rho[k]), main=expression(phi==-0.5));
points(x,y4,col='blue',cex=0.5); abline(h=0,col='red');
par(opar)
例如:AR(1)序列 \(\phi=0.9, \rho_1=0.9, \rho_2=0.81, \rho_3=0.729\cdots\)
data(ar1.s)
opar=par(mfrow=c(2,2))
# Exhibit 4.13
plot(ar1.s,ylab=expression(Y[t]),type='o')
# Exhibit 4.14
plot(y=ar1.s,x=zlag(ar1.s),ylab=expression(Y[t]),xlab=expression(Y[t-1]),type='p')
# Exhibit 4.15
plot(y=ar1.s,x=zlag(ar1.s,2),ylab=expression(Y[t]),xlab=expression(Y[t-2]),type='p')
# Exhibit 4.16
plot(y=ar1.s,x=zlag(ar1.s,3),ylab=expression(Y[t]),xlab=expression(Y[t-3]),type='p')
# An AR(1) series with AR coefficient equal to 0.98 and of length n=100.
set.seed(12345); y1=arima.sim(model=list(ar=c(0.98)),n=100)
set.seed(1234); y2=arima.sim(model=list(ar=c(-0.6)),n=100)
set.seed(54321); y3=arima.sim(model=list(ar=c(0.3)),n=100)
set.seed(4321); y4=arima.sim(model=list(ar=c(-0.1)),n=100)
# Note that the R convention for the AR model formula is same as the book.
#opar=par(mfrow=c(2,2))
plot(y1,type='o', xlab='Time', ylab='AR(1)', main=expression(phi==0.98));
plot(y2,type='o', xlab='Time', ylab='AR(1)', main=expression(phi==-0.6));
plot(y3,type='o', xlab='Time', ylab='AR(1)', main=expression(phi==0.3));
plot(y4,type='o', xlab='Time', ylab='AR(1)', main=expression(phi==-0.1));
acf(y1, xlab='Lag', ylab='ACF', main=expression(phi==0.98));
acf(y2, xlab='Lag', ylab='ACF', main=expression(phi==-0.6));
acf(y3, xlab='Lag', ylab='ACF', main=expression(phi==0.3));
acf(y4, xlab='Lag', ylab='ACF', main=expression(phi==-0.1));
par(opar)
\(\qquad\) 如果\({Y_t}\)序列平稳,那么
\(\qquad\)- 均值 \(\mu=\frac{0}{1-\phi_1-\phi_2}=0\), 因为\(E(Y_t)=\phi_1 E(Y_{t-1})+\phi_2 E(Y_{t-2})+E(e_t)\);
\(\qquad\)- 方差 \(Var(Y_t)=\phi_1^2 Var(Y_{t-1})+\phi_2^2 Var(Y_{t-2})+Var(e_t)+2\phi_1\phi_2 Cov(Y_{t-1},Y_{t-2})\), 则
\[(1-\phi_1^2-\phi_2^2)\gamma_0=2\phi_1\phi_2\gamma_1+\sigma_e^2.\]
\(\qquad\)- 一阶协方差 \((1-\phi_2)\gamma_1 = \phi_1 \gamma_0\), 因为 \[\begin{align} Cov(Y_t,Y_{t-1})&=Cov(\phi_1 Y_{t-1}+\phi_2 Y_{t-2}+e_t, Y_{t-1}) \\ &=\phi_1 Var(Y_{t-1})+\phi_2 Cov(Y_{t-1},Y_{t-2}) \\ \end{align} \]
\(\qquad\qquad\)解方程组可得 \[\begin{cases} \gamma_0&=&\sigma_e^2\frac{1-\phi_2}{(1+\phi_2)[(1-\phi_2)^2-\phi_1^2]}\\ \gamma_1&=&\sigma_e^2\frac{\phi_1}{(1+\phi_2)[(1-\phi_2)^2-\phi_1^2]} \end{cases}\]
\(\qquad\)- 一阶自相关函数 \(\rho_1=\frac{\phi_1}{1-\phi_2}\)
\(\qquad\)- \(k\)阶协方差/自相关函数 \[\begin{align} &&Cov(Y_t,Y_{t-k})&=Cov(\phi_1 Y_{t-1}+\phi_2 Y_{t-2}+e_t, Y_{t-k}) \\ \bf{Yule-Walker方程}&&\gamma_k&=\phi_1 \gamma_{k-1}+\phi_2 \gamma_{k-2} \\ &&\rho_k&=\phi_1 \rho_{k-1}+\phi_2 \rho_{k-2} \\ \end{align} \]