I am not an econometrician

I am a statistician, but I have worked in a department of predominantly econometricians for the past 17 years. It is a little like an Australian visiting the United States. Initially, it seems that we talk the same language, do the same sorts of things, and have a very similar culture. But the longer you stay there, the more you realise there are differences that run deep and affect the way you see the world.

Last week at my research group meeting, I spoke about some of the differences I have noticed. Coincidentally, Andrew Gelman blogged about the same issue a day later.

Theory-driven or data-driven

Econometrics is often “theory driven” while statistics tends to be “data driven”. I discovered this in the interview for my current job when someone criticized my research for being “data driven” and asked me to respond. I was confused because I thought statistical research should be driven by data analytic issues, not by some pre-conceived theory, but that was not the perspective of the people interviewing me. (Fortunately, I was hired anyway.) Typically, econometricians test theory using data, but often do little if any exploratory data analysis. On the other hand, I tend to build models after looking at data sets. I think this distinction also extends to many other areas where statistics is applied.

As a result of this distinction, econometricians do a lot of hypothesis testing but produce few graphics. Many research seminars in our department involve someone describing a model, applying it to some data, and showing the estimated parameters, standard errors, results of various hypothesis tests, etc. They do all that without ever plotting the data to start with! This seems bizarre to me, and I still get annoyed about it even though I’ve seen it at least a hundred times. I teach my students to first spend time getting to know their data through plots and other data visualization methods before even thinking about fitting a model or doing a hypothesis test.

Probably because of the emphasis that econometricians place on their theoretical models, they tend to fall in love with them and even seem to believe they are true. This is evident by the phrase “data generating process” (or its acronym DGP) that econometricians commonly use to describe a statistical model. I never think of my models as data generating processes. The data come from some real world, complicated, messy, nonlinear, nonstationary, nonGaussian process. At best, my model is a crude approximation. I often cite Box’s maxim that “All models are wrong, but some are useful”, and while my colleagues would agree in principle, they still behave as if their models are the true data generating processes.

Expertise and ignorance

When I first joined an econometrics department, I was struck by how much everyone knew about time series and regression, and how little they knew about a lot of other topics. There are vast areas of statistics that econometricians typically know little about including survey sampling, discriminant analysis, clustering, and the design of experiments. My training was much broader but in some ways shallower. There were standard undergraduate topics in econometrics that I knew nothing about — cointegration, endogeneity, ARCH/GARCH models, seemingly unrelated regression, the generalized methods of moments, and so on.

Because of the nature of economic data, econometricians have developed some specific techniques for handling time series and regression problems. In particular, econometricians have thought very carefully about causality, because it is usually not possible to conduct experiments within economics and finance, and so they have developed several methods to help identify potentially causal relationships. These developments do not always filter back to the general statistical community, although they can be very useful. For example, the method of instrumental variables (which allows consistent estimation when the explanatory variables are correlated with the error term of a regression model) can be used to help identify potentially causal relationships. Tests for “Granger causality”are another useful econometric development.

For some reason, econometricians have never really taken on the benefits of the generalized linear modelling framework. So you are more likely to see an econometrician use a probit model than a logistic regression, for example. Probit models tended to go out of fashion in statistics after the GLM revolution prompted by Nelder and Wedderburn (1972).

Confusing terminology

The two communities have developed their own sets of terminology that can be confusing. Sometimes they have different terms for the same concept; for example, “longitudinal data” in statistics is “panel data” in econometrics; “survival analysis” in statistics is “duration modelling” in microeconometrics.

In other areas, they use the same term for different concepts. For example, a “robust” estimator in statistics is one that is insensitive to outliers, whereas a “robust” estimator in econometrics is insensitive to heteroskedasticity and autocorrelation. A “fixed effect” in statistics is a non-random regression term, while a “fixed effect” in econometrics means that the coefficients in a regression model are time-invariant. This obviously has the potential for great confusion, which is evident in the Wikipedia articles on fixed effects and robust regression.

Avoid silos

I’ve stayed in a (mostly) econometrics department for so long because it is a great place to work, full of very nice people, and is much better funded than most statistics departments. I’ve also learned a lot, and I think the department has benefited from having a broader statistical influence than if they had only employed econometricians.

I would encourage econometricians to read outside the econometrics literature so they are aware of what is going on in the broader statistical community. These days, most research econometricians do pay some attention to JASA and JRSSB, so the gap between the research communities is shrinking. However, I would suggest that econometricians add Statistical Science and JCGS to their reading list, to get a wider perspective.

I would encourage statisticians to keep abreast of methodological developments in econometrics. A good place to start is Hayashi’s graduate textbook Econometrics which we use at Monash for our PhD students.

The gap is closing

One thing I have noticed in the last seventeen years is that the two communities are not so far apart as they once were. Nonparametric methods were once hardly mentioned in econometrics (too “data-driven”), and now the main econometrics journals are full of nonparametric asymptotics. There are special issues of statistical journals dedicated to econometrics (e.g., CSDA has regular special issues dedicated to computational econometrics).

Just as US television has made the Australian culture rather less distinctive than it once was, statistical ideas are infiltrating econometrics, and vice-versa. But until I hear a research seminar on Visualization of Macroeconomic Data, I don’t think I will ever feel entirely at home.

Some of these thoughts were prompted by this discussion on crossvalidated.com.

The article has been updated to reflect some of the comments made below. Thanks for the feedback.

Related Posts:

  • Letian

    A naive question. If all models are wrong, who do we, especially econometricians, pay so much attention to asymptotic properties, biasness and etc? They are wrong as well, right?

    • Far from naive, that’s a very good question! Sometimes asymptotic properties provide a useful first order approximation to small sample properties, and sometimes bias provides a handle on how useful a model is under given conditions. But often such research seems to me to be a little pointless.

  • Miguel A. Arranz

    I partly disagree on several issues. I cannot see much difference nowadays between a modern econometrician and an applied statistician. I must admit that I might be biased, since I am econometrician teaching Econometrics to BSc Statistics students. It is true that there are different names (we all have funny stories) and that we might use different subsets of techniques (what you call Survival Analysis is essential for what we call Duration Models in Microeconometrics, for example, and GLMs are its core), given the nature of the phenomena we are analyzing. That means that I concentrate on teaching the techniques that most statisticians overlook.

    However, Econometrics is becoming increasingly data-driven, even incorporating many concepts from data mining, paying special attention to visualization of data. It is no longer acceptable to get a (complex) functional form derived from Economic Theory and test it. Furthermore, JASA, JRSSB, … appear on the reading lists of many graduate econometrics courses, and they are also included in the recommended lists of publishing journals in many departments of Econometrics.

    I think there is still a gap, but is closing much faster than we think.

    • I hope you’re right. However, only three days ago I sat through a seminar involving a relatively complex model and never once saw a plot of the data. I think there’s a long way to go before econometricians get serious about data visualization.

      I’ve updated the article to take account of some of your other comments. Thanks.

      • econstudent

        There appear to be a generalization error here. 🙂

        I agree that data visualization is not widely taught or appreciated as it should be in econometrics.

        But I don’t think this is a fair comparison. Econometrics may be better thought of a particular branch of statistics, in which people develop and use statistical methods to deal with economic data and test economic hypotheses. In my mind, this is what defines econometrics. Also econometricians do look at data. Just think about the concept of cointegration and ARCH. These econometric concepts would be not possible if the people hadn’t looked at the data carefully. But I do get annoyed with many papers when they don’t plot their data.

        In addition, statistics is a broad topic. For example does every biostatistician knows a lot about survey analysis, or more than what a microeconometrician knows about survey analysis?

        Last but not least, the concept of hypothesis testing, bias, DGP etc were not brainchild of econometricians, right?

        I am extremely surprised that you have the impression that econometricians think of their model as being “true.” I can understand that in some cases you need to assume certain things to make tractable inference. But it is still an assumption. The readers can feel free to disagree of course, but it does not mean the author believes it is true.

        • I agree on cointegration and ARCH — brilliant ideas proposed by some excellent econometricians. I knew Clive Granger and he certainly looked very carefully at any data he analysed.

          However, I never said that econometricians needed to know all areas of statistics. Read what I wrote more carefully. I was simply reflecting on the differences I have observed, and how statisticians are trained more broadly. There are good reasons for that as you have suggested.

          I also never said that hypothesis testing or bias calculations were bad. My point was that they should not come before a careful look at the data. (This is anathema in some areas of social science.)

          I’ve no idea who came up with the phrase DGP. I never heard it before I moved to an econometrics department, and I imagined it was a peculiarity to econometrics. Perhaps it is used in statistics as well, but not by the statisticians I know.

          • Capistran

            Clive Granger was also a statistician that spend a lot of time in an econometrics department. Looking at the data is of course fundamental, and econometricians have to do it much more. Although I have to say that as an econometrician, I have also had the opposite experience: looking at too many charts and too little theory!

  • Chris

    I worked in a Health Economics department in a large “Global ” Pharma company. Before that I had worked strictly on phase I, II, III clinical trials and a drug approval or two. I spent a lot of time explaining economists/econometrics to the statisticians and explaining statistics/experimental design to the economists. I still don’t think they understand the hypothesis of non-inferiority. My running joke was “…dummy variables are for dummy’s.,,..” and theirs “… we don’t need no stinking confidence intervals…”. We did get along extremely well, mainly because I actually did know how to analyze a double blind randomized clinical trial – until a corporate restructuring and downsizing. We remain good friends, and I even got a couple sole author publications out of the work.

  • Martha Smith

    An interesting compare-and-contrast. Thanks.
    I have noticed that the “theory, then collect data, then test hypothesis — but no reason to plot the data” paradigm also seems to be prevalent in psychology.
    But I am used to using “robust” in more or less the way you say econometricians do: “robust” meaning insensitive to violations of model assumptions, and “resistant” meaning insensitivity to outliers.
    (I am a mathematician, but got into teaching graduate statistics courses a number of years ago, with greatest interest in applications in biology and engineering. I am definitely not an econometrician!)

  • Vincent Granville

    Can’t you be both? I thought econometricians were statisticians working with time series models, ARMA processes etc.

    • Vincent, that sort of describes me… Due to weird subject choices at university, I ended up taking econometrics type subjects (time series etc) with a stats department at the expense of the traditional stats topics like experimental design and GLM… But I didn’t quite do enough economics, so I feel a little like a half-qualified statistician and a half-qualified econometrician… So call me an empirical econometrician 😛 Though I’m not sure that such a creature is the most useful thing.

  • Robert Taylor

    Well now. I was trained as a statistician but work as an econometrician. The latter don’t like to look at data because they fear that doing so will invalidate all their asymptotic theory (pre-test bias etc). I was taught always to look at the data and see what it’s telling you. To me that seems sensible. But the econometrics profession is grinding it out of me 🙂

  • Eric

    I’m not sure how you can do good econometrics without a thorough background in statistics. I’m also dismayed to learn that people run models without thoroughly studying their data. On the other hand, it’s been 20 years since I gave up my academic appointment for Wall St. so maybe things have changed. On the applied side, I see a ton of studies that use reduced form models. even if the authors don’t know that this is what they are doing — so as an empirical matter I don’t buy the idea that econometrics is more concerned with structure (whether economic structure or data structure). Perhaps the distinction is that good, rigorous econometrics requires statistics and often structure, but bad econometric studies can do without.

  • lu-nonymous

    Hi, I’m a PhD student doing research in empirical macroeconomics. Your last sentence got me to respond. I’m actually very much interested in data visualization and most of the data I stare at is macro data. I think we can benefit very strongly from better data visualization.

    On problem with macro data is that it’s “small”. You wouldn’t want to visualize US postwar annual GDP, consumption and investment time series for your seminar audience, because they’ve all seen it countless times.

    People might be more interested in it, if it’s historical macro data (e.g. Piketty). Or if you’ve found a way to get data on a higher frequency for more countries.

  • 10is Maestro

    It is a dated discussion, but, if you want an update, I just completed an undergraduate degree in economics — with several econometrics course, including a graduate course in econometrics.

    If you want an example of data driven choice, consider fitting a moving average model, an autoregressive model or a combination of both on a given time series. A quick ad hoc check you can do is look at correlograms (both partial and normal) knowing that the impulse response function of all three processes differ greatly — e.g., MA process have a shorter memory whereas shocks reverbate for very long in AR processes. This obviously have implications for modeling from a statistical perspective, but also as far as economic theory is concerned: if you can well approximate the behavior of a time series with a moving average model, a theory that gets you long lasting effects and slow adjustment likely is wrong. So, correlograms is one way we often see what is going with macro and financial data.

    Another usual plot we perform is a simple line plot of the time series — does it look stationary? If it is too messy, we should bother to use unit root tests and make sure we work with stationary series. This is actually something we always do — and if I pick a growth rate or a first difference, I also plot it to see if I get what I expect. Those plots can also reveal heteroskedasticity — perhaps episodes of high and low variance, which may be well represented by ARCH processes.

    We also usually pick ranges of lags with correlograms and formalize that argument with information criteria — which also are data driven.

    Remember that it is not because you do not see visual representations in seminars that no one used it. They might simply believe it is not sufficiently forceful or superfluous.