%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% HEADER %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \documentclass[12pt]{article} \usepackage[T1]{fontenc} \usepackage[latin1]{inputenc} \usepackage{amsmath,amsfonts,amssymb,rotating,longtable,url} \usepackage{fancyhdr, fancyvrb} \usepackage[pdftex,urlcolor=black,linkcolor=black,citecolor=black,colorlinks=true,bookmarks=true,plainpages=false,pdfpagelabels]{hyperref} \setlength{\topmargin}{-1.5cm} \setlength{\textwidth}{16.5cm} \setlength{\textheight}{23.5cm} \setlength{\oddsidemargin}{0cm} \setlength{\evensidemargin}{0cm} \setlength{\parindent}{0cm} \setlength{\parskip}{0cm} \newcommand{\ep}{\hfill\hfill$\Box$} \newcommand{\p}{\overline{p}} \newcommand{\s}{\sigma^{2}} \newcommand{\I}{\text{I}} \renewcommand{\arraystretch}{1} \renewcommand{\baselinestretch}{1} \usepackage{//oenbnt/daten/Benutzer/FELDKIRC/Daten/Programme/Temp/Sweave} %\usepackage{Sweave} \VerbatimFootnotes <>= rm(list=ls()) # deletes the workspace setwd("//oenbnt/daten/Benutzer/FELDKIRC/Daten/PC7825/Projects/panel blog/") options(width=70) # sets the length of chunk output library(xtable) @ \begin{document} \title{BMS and the Fixed Effects Estimator - A Tutorial} \author{\textbf{Martin Feldkircher}\thanks{martin.feldkircher@gzpace.net.}}\smallskip %EndAName \date{This version: June 2011} \maketitle \begin{abstract} This tutorial should illustrate how to use Bayesian Model Averaging (BMA) in \verb+R+ with panel data. \end{abstract} \section{Introduction} Methods for estimating econometric models with panel data have been frequently discussed in the literature (see eg Mundlak, 1978). Two estimators seem to resurface most often: the fixed effects estimator (FE) and the random effects estimator (RE). Both have their separate virtues and underlying assumptions (for an exposition see Bartels, 2008). Since the FE estimator can be easily cast into the linear regression framework that is used for \verb+BMS+ it will be our focus in this tutorial. For an application of Bayesian model averaging employing the RE estimator please refer to Moral-Benito (2011). Furthermore, a great deal of the literature seems to pivot around the question of how to calculate standard errors (Bartels, 2008). At the current stage, we abstract from the calculation of so-called \textit{clustered} standard errors since we stay in a pure Bayesian framework in \verb+BMS+ (and standard errors are a classical concept). For the purpose of illustration we will use the data put forward in Moral-Benito (2011) and made publicly available at \url{Moralbenito.com}. The data contains $K=35$ variables (including the dependent variable, the growth rate of per capita GDP) for $N=73$ countries and for the period 1960-2000. The appendix lists the variables together with a short description. The dependent variable, GDP growth, is calculated for five year averages resulting into eight observations per country. Moral-Benito (2011) argues in favor of calculating averages of flow variables, while stock variables have been measured at the first year of each five-year period. The data can be downloaded here \url{http://www.moralbenito.com/research.htm} ('download data for the paper Determinants of Economic Growth: A Bayesian Panel Data Approach). <>= library(BMS) @ After having started \verb+R+ and loaded the \verb+BMS+ we can read in the data file: <>= data.raw=read.table("Dataset.csv",header=TRUE,sep=";") panelDat=data.raw[,-c(1:4)] rownames(panelDat)=paste(data.raw[,"code"],data.raw[,"year"],sep="_") panelDat=panelDat[,-charmatch(c("GRW_WB","GDP_WB"),colnames(panelDat))] @ For that purpose I have simply saved the 'Dataset.xls' file as a 'Dataset.csv' file and have replaced all missing values by 'NA' in excel. You can also download the data here 'panelDat.rda'. <>= panelDat=as.matrix(panelDat) head(panelDat) @ The rownames of the data are a combination of the country code and the year of the observation. The data is provided in a \verb+data frame+ consisting of stacked observations per column. That is, the first column containing the dependent variable consists \begin{equation*} Y_1=\left(y_{1,1},y_{1,2},\ldots, y_{1,T=8},\ldots,y_{N,1},y_{N,2},\ldots,y_{N,T}\right) \end{equation*} This is also the format we can use later on when calling the \verb+bms+ function. We will use two approaches to estimate a fixed effects (FE) panel (country / time fixed effects). The first approach makes use of the Frisch-Waugh-Lovell theorem (see eg Lovell, 2008) boiling down to demeaning the data accordingly. That is, in the case of country fixed effects, subtract from each observation (dependent and independent variable) the within country mean. For the case of time fixed effects, subtract from each observation the mean across countries per time period. We will start with the country fixed effects first. For that purpose we will have to re-shape the data frame and put it into the form of a three dimensional array ($T\times K \times N$). That can be achieved with the function \verb+panel_unstack+. Since \verb+bms+ uses data in its stacked form, we have to make use of \verb+panel_stack+ as well. Both functions are not part of the \verb+BMS+ library and thus have to be copy and pasted into your \verb+R+ console by yourself: <>= panel_unstack= function(stackeddata, tstep=NULL) { # stackeddata is a stacked data frame/matrix ordered in the following way # Variable1 Variable2 # ctry1_time1 # ctry1_time2 # ... # ctry2_time1 # ctry2_time2 # ... # tstep is the number of time points # panel_unstack produces a 3-dimensional array out of that bigT=nrow(stackeddata);K=ncol(stackeddata); if (is.null(tstep)) tstep=bigT X1=aperm(array(as.vector(t(as.matrix(stackeddata))),dim=c(K,tstep,bigT/tstep)), perm=c(2,1,3)) try(dimnames(X1)[[1]] <- unique(sapply(strsplit(rownames(stackeddata),"_"),function(x) x[[2]])), silent=TRUE) try(dimnames(X1)[[2]] <- colnames(stackeddata), silent=TRUE) try(dimnames(X1)[[3]] <- unique(sapply(strsplit(rownames(stackeddata),"_"),function(x) x[[1]])), silent=TRUE) return(X1) } panel_stack = function(array3d) { x1= apply(array3d,2,rbind) try(rownames(x1) <- as.vector(sapply(dimnames(array3d)[[3]], FUN=function(x) paste(x, dimnames(array3d)[[1]], sep="_"))), silent=TRUE) return(as.data.frame(x1)) } @ We can now easily transform the data from its stacked form into the three-dimensional array via: <>= dat.array=panel_unstack(panelDat, tstep=8) @ where we have set \verb+tstep=8+ since we have eight time periods per country. The advantages of the three-dimensional array are that we can easily access each dimension of the data: <>= dat.array[,,"ZWE"] dat.array["1965",,] dat.array[,"GSH",] @ \section{Fixed Effects Estimation by Demeaning the Data} The function \verb+demean+ (again not part of the \verb+BMS+ library, so copy and paste the following lines into your \verb+R+ console) demeans the data to estimate individual (eg country), time and individual and time fixed effects. It takes as argument the three dimensional data array we have created above (\verb+dat.array+) and via \verb+margin+ we can specify over which dimension we want to demean the data (country / time). <>= demean = function(x, margin) { #x is an array #margin is the dimension along which should be demeaned if (!is.array(x)) stop("x must be an array/matrix") otherdims=(1:length(dim(x)))[-margin] sweep(x,otherdims,apply(x,otherdims,mean)) } @ Demeaning is now easily accomplished by: <>= timeDat=panel_stack(demean(dat.array,3)) countryDat=panel_stack(demean(dat.array,1)) @ where we have used \verb+panel_stack+ to re-transform the demeaned data into its stacked form that can be passed to the \verb+bms+ function. Since in the data frame only the first 12 explanatory variable show variation over time, we will restrict estimation to these variables only. <>= # only the first 12 variables show time variation, so estimate panel with those regressors only modelCd=bms(countryDat[,1:13],user.int=F) modelTd=bms(timeDat[,1:13],user.int=F) @ We will briefly discuss the results in the next section (see Table 1 and Table 2). Note that demeaning the data yields the same posterior estimates for the coefficients as with incorporating the FE directly, the approach we opt for in the next section. However, the posterior variance for the coefficient estimates is not identical (though very similar). Also note that demeaning does not save you from the degrees of freedom problem when incorporating the large set of fixed effects by a set of dummy variables. For an application using BMA with country FEs see for example Crespo Cuaresma et al. (2009). \section{Fixed Effects Estimation with Dummy Variables} We will now turn to the second possibility of estimating FEs, which is the dummy variable approach. The advantage of the dummy variable approach is also that it yields estimates for the FEs which can be important for some applications. For the dummy approach we will make use of the new \verb+BMS+ feature of holding variables constant (not sampling) them by the \verb+bms+ argument \verb+fixed.reg+. Please make sure that you have installed \verb+BMS+ $\geq$ version 0.3. We start now with creating the country dummies: <>= # country dummies bigT=nrow(panelDat);tstep=8; countryDummies=kronecker(diag(bigT/tstep),rep(1,tstep));colnames(countryDummies)=dimnames(dat.array)[[3]] # we have to leave one out (first column in this example, but of course is not relevant which column we leave out) countryDummies=countryDummies[,-1] @ In a same fashion we can easily create a set of time dummies: <