%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% HEADER
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\documentclass[12pt]{article}
\usepackage[T1]{fontenc}
\usepackage[latin1]{inputenc}
\usepackage{amsmath,amsfonts,amssymb,rotating,longtable,url}
\usepackage{fancyhdr, fancyvrb}
\usepackage[pdftex,urlcolor=black,linkcolor=black,citecolor=black,colorlinks=true,bookmarks=true,plainpages=false,pdfpagelabels]{hyperref}
\setlength{\topmargin}{-1.5cm} \setlength{\textwidth}{16.5cm}
\setlength{\textheight}{23.5cm} \setlength{\oddsidemargin}{0cm}
\setlength{\evensidemargin}{0cm} \setlength{\parindent}{0cm}
\setlength{\parskip}{0cm}
\newcommand{\ep}{\hfill\hfill$\Box$}
\newcommand{\p}{\overline{p}}
\newcommand{\s}{\sigma^{2}}
\newcommand{\I}{\text{I}}
\renewcommand{\arraystretch}{1}
\renewcommand{\baselinestretch}{1}
\usepackage{//oenbnt/daten/Benutzer/FELDKIRC/Daten/Programme/Temp/Sweave}
%\usepackage{Sweave}

\VerbatimFootnotes



<<label=null,echo=F>>=
rm(list=ls())                           # deletes the workspace
setwd("//oenbnt/daten/Benutzer/FELDKIRC/Daten/PC7825/Projects/panel blog/")
options(width=70)                       # sets the length of chunk output
library(xtable)
@

\begin{document}
\title{BMS and the Fixed Effects Estimator - A Tutorial} 
\author{\textbf{Martin Feldkircher}\thanks{martin.feldkircher@gzpace.net.}}\smallskip 
%EndAName

\date{This version: June 2011}

\maketitle


\begin{abstract}

This tutorial should illustrate how to use Bayesian Model Averaging (BMA) in \verb+R+ with panel data.
\end{abstract}

\section{Introduction}
 Methods for estimating econometric models with panel data have been frequently discussed in the literature (see eg Mundlak, 1978).
Two estimators seem to resurface most often: the fixed effects estimator (FE) and the random effects estimator (RE). Both have their separate virtues and underlying assumptions (for an exposition see
Bartels, 2008). Since the FE estimator can be easily cast into the linear regression framework that is used for \verb+BMS+ it will be our focus in this tutorial. For an application of Bayesian model
averaging employing the RE estimator please refer to Moral-Benito (2011). Furthermore, a great deal of the literature seems to pivot around the question of how to calculate standard errors
(Bartels, 2008). At the current stage, we abstract from the calculation of so-called \textit{clustered} standard errors since we stay in a pure Bayesian framework in \verb+BMS+ (and standard errors
are a classical concept).
 
For the purpose of illustration we will use the data put forward in Moral-Benito (2011) and made publicly available at
\url{Moralbenito.com}. The data contains $K=35$ variables (including the dependent variable, the growth rate of per capita GDP) for $N=73$ countries and for the period 1960-2000. The appendix lists the variables
together with a short description. The dependent variable, GDP growth, is calculated for five year averages resulting into eight observations per country. Moral-Benito (2011) argues in favor of calculating averages of
flow variables, while stock variables have been measured at the first year of each five-year period. The data can be downloaded here \url{http://www.moralbenito.com/research.htm} ('download data for the paper
Determinants of Economic Growth: A Bayesian Panel Data Approach).

<<label=one,echo=TRUE>>=
library(BMS)
@


After having started \verb+R+ and loaded the \verb+BMS+ we can read in the data file:

<<label=readIn,echo=FALSE>>=
data.raw=read.table("Dataset.csv",header=TRUE,sep=";")
panelDat=data.raw[,-c(1:4)]
rownames(panelDat)=paste(data.raw[,"code"],data.raw[,"year"],sep="_")
panelDat=panelDat[,-charmatch(c("GRW_WB","GDP_WB"),colnames(panelDat))]
@
For that purpose I have simply saved the 'Dataset.xls' file as a 'Dataset.csv' file and have replaced all missing values by 'NA' in excel. You can also download the data here 'panelDat.rda'.


<<label=null,echo=FALSE>>=
panelDat=as.matrix(panelDat)
head(panelDat)
@
The rownames of the data are a combination of the country code and the year of the observation. The data is provided in a \verb+data frame+ consisting of stacked observations per column.
That is, the first column containing the dependent variable consists
\begin{equation*}
Y_1=\left(y_{1,1},y_{1,2},\ldots, y_{1,T=8},\ldots,y_{N,1},y_{N,2},\ldots,y_{N,T}\right)
\end{equation*}
 This is also the format we can use
later on when calling the \verb+bms+ function.

We will use two approaches to estimate a fixed effects (FE) panel (country / time fixed effects). The first approach makes use of the Frisch-Waugh-Lovell  theorem (see eg Lovell, 2008) boiling
down to demeaning the data accordingly.  That is, in the case of country fixed effects, subtract from each observation (dependent and independent variable) the within country mean.
For the case of time fixed effects, subtract from each observation the mean across countries per time period. We will start with the country
fixed effects first.

For that purpose we will have to re-shape the data frame and put it into the form of a three dimensional array ($T\times K \times N$). That can be achieved  with the function \verb+panel_unstack+. Since 
\verb+bms+ uses data in its stacked form, we have to make use of \verb+panel_stack+ as well. Both functions are not part of the \verb+BMS+ library and thus have to be copy and pasted into your
\verb+R+ console by yourself:

<<label=re-shape,echo=TRUE>>=
 panel_unstack= function(stackeddata, tstep=NULL) {
  # stackeddata is a stacked data frame/matrix ordered in the following way
  #               Variable1  Variable2
  # ctry1_time1
  # ctry1_time2
  # ...
  # ctry2_time1
  # ctry2_time2
  # ...
  # tstep is the number of time points
  # panel_unstack produces a 3-dimensional array out of that
  
  bigT=nrow(stackeddata);K=ncol(stackeddata); 
  if (is.null(tstep)) tstep=bigT
  X1=aperm(array(as.vector(t(as.matrix(stackeddata))),dim=c(K,tstep,bigT/tstep)), perm=c(2,1,3))
  try(dimnames(X1)[[1]] <-  unique(sapply(strsplit(rownames(stackeddata),"_"),function(x) x[[2]])), silent=TRUE) 
  try(dimnames(X1)[[2]] <-  colnames(stackeddata), silent=TRUE)
  try(dimnames(X1)[[3]] <-  unique(sapply(strsplit(rownames(stackeddata),"_"),function(x) x[[1]])), silent=TRUE)
  return(X1)
}


panel_stack = function(array3d) {
  x1= apply(array3d,2,rbind)
  try(rownames(x1) <-  as.vector(sapply(dimnames(array3d)[[3]], FUN=function(x) paste(x, dimnames(array3d)[[1]], sep="_"))), silent=TRUE)
  return(as.data.frame(x1))
}
@

We can now easily transform the data from its stacked form into the three-dimensional array via:

<<label=re-shape2,echo=TRUE>>=
dat.array=panel_unstack(panelDat, tstep=8)
@
where we have set \verb+tstep=8+ since we have eight time periods per country.  The advantages of the three-dimensional array are that we can easily access each dimension of the data:

<<label=array,echo=TRUE,results=hide>>=
dat.array[,,"ZWE"]
dat.array["1965",,]
dat.array[,"GSH",]
@




\section{Fixed Effects Estimation by Demeaning the Data}

The function \verb+demean+ (again not part of the \verb+BMS+ library, so copy and paste the following lines into your \verb+R+ console) demeans the data to estimate individual (eg country), time and
individual and time fixed effects. It takes as argument the three dimensional data array we have created above (\verb+dat.array+) and via \verb+margin+ we can specify over which dimension we want to
demean the data (country / time).


<<label=demeaning,echo=TRUE>>=
demean = function(x, margin) {
 #x is an array
 #margin is the dimension along which should be demeaned
 if (!is.array(x)) stop("x must be an array/matrix")
 otherdims=(1:length(dim(x)))[-margin]
  sweep(x,otherdims,apply(x,otherdims,mean))
}
@


Demeaning is now easily accomplished by:

<<label=demeaningEx,echo=TRUE>>=
timeDat=panel_stack(demean(dat.array,3))
countryDat=panel_stack(demean(dat.array,1))
@
where we have used \verb+panel_stack+ to re-transform the demeaned data into its stacked form that can be passed to the \verb+bms+ function.

Since in the data frame only the first 12 explanatory variable show variation over time, we will restrict estimation to these variables only.

<<label=demeanBMS,echo=TRUE>>=
# only the first 12 variables show time variation, so estimate panel with those regressors only
modelCd=bms(countryDat[,1:13],user.int=F)
modelTd=bms(timeDat[,1:13],user.int=F)
@

We will briefly discuss the results in the next section (see Table 1 and Table 2).
Note that demeaning the data yields the same posterior estimates for the coefficients as with incorporating the FE directly, the approach we opt for in the next section. However,
the posterior variance for the coefficient estimates is not identical (though very similar). Also note that demeaning does not save you from the degrees of freedom problem when incorporating 
the large set of fixed effects by a set of dummy variables. For an application using BMA with country FEs see for example Crespo Cuaresma et al. (2009).

\section{Fixed Effects Estimation with Dummy Variables}
We will now turn to the second possibility of estimating FEs, which is the dummy variable approach. The advantage of the dummy variable approach is also that it yields estimates for
the FEs which can be important for some applications. For the dummy approach we will make use of the new \verb+BMS+ feature of holding variables constant (not sampling) them by the
\verb+bms+ argument \verb+fixed.reg+. Please make sure that you have installed \verb+BMS+ $\geq$ version 0.3. We start now with creating the country dummies:

<<label=countryDumm,echo=TRUE>>=
# country dummies
bigT=nrow(panelDat);tstep=8;
countryDummies=kronecker(diag(bigT/tstep),rep(1,tstep));colnames(countryDummies)=dimnames(dat.array)[[3]]
# we have to leave one out (first column in this example, but of course is not relevant which column we leave out)
countryDummies=countryDummies[,-1]
@

In a same fashion we can easily create a set of time dummies:
<<label=timeDumm,echo=TRUE>>=
# time series dummies
timeDummies=matrix(diag(tstep),bigT,tstep,byrow=T);colnames(timeDummies)=dimnames(dat.array)[[1]]
# we have to leave one out (first column in this example, but of course is not relevant which column we leave out)
timeDummies=timeDummies[,-1]
modelTdummy=bms(cbind(panelDat[,1:13],timeDummies),fixed.reg=colnames(timeDummies),user.int=F)
@

Running the two regressions (for the first 13 elements of the demeaned data frame only):

<<label=estDumm,echo=TRUE,results=hide>>=
modelCdummy=bms(cbind(panelDat[,1:13],countryDummies),fixed.reg=colnames(countryDummies),user.int=F)
modelTdummy=bms(cbind(panelDat[,1:13],timeDummies),fixed.reg=colnames(timeDummies),user.int=F)
@

should yield the same results as with demeaning. Type \verb+coef(modelCdummy)+ /  \verb+coef(modelTdummy)+ to get the results in \verb+R+.
These are summarized in the Table below:

<<label=results,results=tex,echo=FALSE>>=
countryTable=cbind(estimates.bma(modelCd)[colnames(panelDat)[2:13],1:3],estimates.bma(modelCdummy)[colnames(panelDat)[2:13],1:3])
timeTable=cbind(estimates.bma(modelTd)[colnames(panelDat)[2:13],1:3],estimates.bma(modelTdummy)[colnames(panelDat)[2:13],1:3])

xtable(round(countryTable,3),align=c("l","|",rep("r",3),"|",rep("r",3)),caption="Estimation of country fixed effects: Left panel based on demeaning the data, 
right panel on the dummy variable estimation approach.")
xtable(round(timeTable,3),align=c("l","|",rep("r",3),"|",rep("r",3)),caption="Estimation of time fixed effects: Left panel based on demeaning the data, right
 panel on the dummy variable estimation approach.")
@

As one can see the results are very similar to each other. Since we have used 'full enumeration' no stochastic variability should be expected for the two approaches.
However, when using large data sets and thus the MCMC sampler in turn, please bear in mind that there might be some stochastic variation of results when running differen \verb+bms+ chains.
 Posterior coefficients for the model employing country fixed effects are to be interpreted with respect to the within variation: A positive
coefficient on the variable measuring the country's openness (\verb+OPEM+) means that if openness increases \textit{within} a country GDP growth is incraesing. On the other hand, time fixed effects  in 
the particular example look at the between variation of the data. That is, if openness \textit{across countries} (at once) increases, does this affect GDP growth? From Table 2 we see that this effect
is smaller compared to that for the within transformed data. 



\newpage


                                                                        
\begin{thebibliography}{99}


\bibitem{bartels} Bartels, Brandon (2008). {Beyond 'Fixed versus Random Effects': A Framework for Improving Substantive and Statsitcal Analysis of Panel, Time-Series Cross-Sectional, and Multilevel Data.}
\emph{Mimeo, Stony Brook University, New York.}

\bibitem{us} Crespo Cuaresma, J.  and Doppelhofer, G.  and Feldkircher, M. 2009: {The Determinants of Economic Growth in European Regions.}
\emph{CESifo Working Paper Series, No. 2519.}

\bibitem{lovell2008} Lovell, M., 2008. {A Simple Proof of the FWL (Frisch,Waugh,Lovell) Theorem.}
\emph{Journal of Economic Education.}

\bibitem{moralB11} Moral Benito, Enrique (2011). {Determinants of Economic Growth: A Bayesian Panel Data Approach.}
\emph{The Review of Economics and Statistics, forthcoming.}

\bibitem{mundlak78} Mundlak, Yair, 1978. {On the Pooling of Time Series and Cross Section Data.}
\emph{Econometrica, Vol. 46, p. 69-85.}



\end{thebibliography}

\section{Appendix}

\begin{table}
{\scriptsize
\begin{tabular}{l|l}
\hline
\hline
 GRW\_PWT & Dependent variable (source: PWT 6.2) \\
&Growth of per capita GDP over 5-year periods (2000 US dollars at PPP)\\ 
\hline 
GDP\_PWT & log Initial GDP (PWT 6.2)(IN)\\
	&Logarithm of initial real GDP per capita (2000 US dollars at PPP) \\ 
 POP &Population (source PWT 6.2)(IN)\\
  &  Population in thousands of people \\
 PGRW &  Population Growth (source PWT 6.2)(AV)\\
  &   Average annual growth rate of population \\
 IPR &  Investment Price (source PWT 6.2)(AV)\\
   & Average investment price level \\
 OPEM &Opennes measure (source PWT 6.2)(AV)\\
   &  Export plus imports as a share of GDP   \\
 CSH &  Consumption Share (source PWT 6.2)(AV)\\
  &  Consumption as a share of GDP \\
 GSH &   Government Share (source PWT 6.2)(AV)\\
  &  Government consumption as a share of GDP \\              
 ISH & Investment Share (source PWT 6.2)(AV)\\
  &  Investment as a share of GDP \\
 LBF & Labor Force (source PWT 6.2)(IN)\\
  &  Ratio of workers to population \\   
 LEX & Life Expectancy (source WDI 2005)(IN)\\
  &  Life expectancy at birth \\
 PDE & Population Density (source WDI 2005)(IN)\\
  &  Population divided by land area \\
 URBP &   Urban Population (source WDI 2005)(IN)\\
  &   Fraction of population living in urban areas   \\
 NWTR & Navigable Water (source Gallup et. al)\\
   &  Fraction of land area near navigable water \\
 LND & Landlocked Country (source Gallup et. al)\\
   & Dummy for landlocked countries \\     
 ADIS &  Air Distance (source Gallup et. al)\\
   &  Logarithm of minimal distance in km from New York, Rotterdam, or Tokio \\
 TAR & Tropical Area (source Gallup et. al)\\
   & Fraction of land area in geographical tropics    \\
 TPOP & Tropical Population (source Gallup et. al)\\
  &   Fraction of population in geographical tropics    \\
 LAR & Land Area (source Gallup et. al)\\
  &  Area in $km^2$ \\
 OPEI & Openness Index (source: Sachs and Warner)\\
  &   Index of trade openness from 1 (highest) to 0  \\
 INDP &  Independence (source Gallup et. al)\\
  &   Timing of national independence measure: 0 if before 1914;\\
  & 1 if between 1914 and 1945; 2 if between 1946 and 1989 and 3 if after 1989 \\
 SOC &  Socialist (source Gallup et. al)\\
  &  Dummy for countries under socialist rule for considerable time during 1950 to 1995     \\
 WAR & War Dummy (source: Barro and Lee)\\
  &  Dummy for countries that participated in external war between 1960 and 1990\\
 CLI &  CLimate (source Gallup et. al)\\
  &  Fraction of land area with tropical climate \\
 EU &  Europe\\
 &  Dummy for EU countries \\
 SAFR&Sub-Saharan Africa\\
  &   Dummy for Sub-Sahara African countries\\
 LATM&Latin America\\
  &   Dummy for Latin American countries\\
 EAS &East Asia\\
  &  Dummy for East Asian countries\\
 MAL &Malaria (source Gallup et. al) (IN)\\
  &  Fraction of population in areas with malaria \\
 P15 &Population under 15 (source: Barro and Lee)(IN)\\
  &  Fraction of population younger than 15 years \\
 P65 &Population over 65 (source: Barro and Lee)(IN)\\
 &   Fraction of population older than 65 years \\                  
 PED &Primary Education (source: Barro and Lee)(IN)\\
  &  Stock of years of primary education\\
 SED &Secondary Education (source: Barro and Lee)(IN)\\
  &  Stock of years of secondary education\\
 PR &Political Rights (source: Freedom House)(IN)\\
  & Index of political rights from 1 (highest) to 7\\
 CL & Civil Liberties (source: Freedom House)(IN)\\
  & Index of civil liberties from 1 (highest) to 7 \\  
   \hline
   \hline
           
\end{tabular} 
}
\caption{\footnotesize{Source: \url{Moralbenito.com}. Notes:
1.-(IN) refers to initial value for the 5-year period.
2.-(AV) refers to 5-year average. 
3.-Variables without neither (IN) nor (AV) are the same for all the years.}}\label{tbl:data}
\end{table}


\end{document}