Saturday, January 24, 2015

Extreme Value Modelling in Stata

David Roodman wrote to me today, saying:
"I don’t know if you use Stata, but I’ve just released a Stata package for extreme value theory. It is strongly influenced by Coles’s book on EVT and the associated ismev package for R. Using maximum likelihood, it fits the generalized Pareto distribution and the generalized extreme value distribution, the latter including the extension to multiple order statistics. It also offers various diagnostic plots. There are already many sophisticated R packages for EVT. I suppose mine offers accessibility…and small-sample bias corrections. It can do the Cox-Snell correction for all the models, including with covariates (citing you for GPD, and promising a write-up for the rest). It also offers bias correction based on a parametric bootstrap. I’ve confirmed the efficacy of both bias corrections through simulations, for the GPD and GEV. I’m still tweaking the simulations, and they take time, but I hope to soon post some graphs based on them. The GPD results closely match yours.

Comments welcome. Please circulate to others who might be interested. To install the package in Stata, type “ssc install extreme”.  The help file contains clickable examples that reproduce most results in the Coles book. The web page is https://ideas.repec.org/c/boc/bocode/s457953.html."
The work on the Cox-Snell bias correction for the Generalized Pareto Distribution that David is referring to is Giles et al. (2015). You can find an earlier post about this work here, and you can download the paper here.

Update, 28/1/2015: See more at David's blog.

Reference

Giles, D. E., H. Feng, and R. T. Godwin, 2015. Bias-corrected maximum likelihood estimation of the parameters of the generalized Pareto distribution. Communications in Statistics - Theory and Methods, in press.


© 2014, David E. Giles

Sunday, January 11, 2015

Econometrics vs. Ad Hoc Empiricism

In a post in 2013, titled "Let's Put the "ECON" Back Into Microeconometrics", I complained about some of the nonsense that is passed off as "applied econometrics". Specifically, I was upset about the disconnect between the economic model (if there is one) and the empirical relationships that are actually estimated, in many "applied" papers.

I urge you to look back at the post before reading further.

Here's a passage from that post:
"In particular, how often have you been presented with an empirical application that's based on just a reduced-form model that essentially ignores the nuances of the theoretical model?
I'm not picking on applied microeconomic papers - really, I'm not! The same thing happens with some applied macroeconomics papers too. It's just that in the micro. case, there's often a much more detailed and rich theoretical model that just lends itself to some nice structural modelling. And then all we see is a regression of the logarithm of some variable on a couple of interesting covariates, and a bunch of controls - the details of which are frequently not even reported."
Well, things certainly haven't improved since I wrote that. In fact, it seems that I'm encountering more and more of this nonsense. This isn't "econometrics", and the purveyors of this rubbish aren't "econometricians". 

My real concern is that students who are exposed to these papers and seminars may not recognize it for what it is - just ad hoc empiricism. 


© 2014, David E. Giles

Friday, January 9, 2015

ARDL Modelling in EViews 9

My previous posts relating to ARDL models (here and here) have drawn a lot of hits. So, it's great to see that EViews 9 (now in Beta release - see the details here) incorporates an ARDL modelling option, together with the associated "bounds testing".

This is a great feature, and I just know that it's going to be a "winner" for EViews.

It certainly deserves a post, so here goes!

First, it's important to note that although there was previously an EViews "add-in" for ARDL models (see here and here), this was quite limited in its capabilities. What's now available is a full-blown ARDL estimation option, together with bounds testing and an analysis of the long-run relationship between the variables being modelled.

Here, I'll take you through another example of ARDL modelling - this one involves the relationship between the retail price of gasoline, and the price of crude oil. More specifically, the crude oil price is for Canadian Par at Edmonton; and the gasoline price is that for the Canadian city of Vancouver. Although crude oil prices are recorded daily, the gasoline prices are available only weekly. So, the price data that we'll use are weekly (end-of-week), for the 4 January 2000 to 16 July 2013, inclusive.

The oil prices are measured in Candian dollars per cubic meter. The gasoline prices are in Canadian cents per litre, and they exclude taxes. Here's a plot of the raw data:

Thursday, January 1, 2015

New Year Reading List

Happy New Year - and happy reading!
  • Arlot, S. and A. Celisse, 2010. A survey of cross-validation procedures for model selection. Statistics Surveys, 4, 40-79. (HT Rob)
  • Marsilli, C., 2014. Variable selection in predictive MIDAS models.Working Paper No. 520, Bank of France.
  • Kulaksizoglu, T., 2014. Lag order and critical values of the augmented Dickey-Fuller test: A replication. Journal of Applied Econometrics, forthcoming.
  • Mooij, J. M., J. Peters, D. Janzing, J. Zscheischler, and B. Scholkopf, 2014. Distinguishing cause from effect using observational data: Methods and benchmarks. Working Paper. (HT Roger)
  • Polak, J., M. L. King, and X. Zhang, 2014. A model validation procedure. Working Paper 21/14, Department of Econometrics and Business Statistics, Monash University.
  • Reed, W. R., 2014. Unit root tests, size distortions and cointegrated data. Working Paper 28/2014, Department of Economics and Finance, University of Canterbury. (HT Bob)


© 2014, David E. Giles

Wednesday, December 31, 2014

Econometricians' Debt to Alan Turing

The other day, Carol and I went with friends to see the movie, The Imitation Game. I definitely recommend it.

I was previously aware of many of Alan Turing's contributions, especially in relation to the Turing Machine, cryptography, computing, and artificial intelligence. However, I hadn't realized the extent of Turing's use of, and contributions to, a range of important statistical tools. Some of these tools have a direct bearing on Econometrics.

For example:
  • (HT to Lief Bluck for this one.) In 1935, at the tender age of 22, Turing was appointed a Fellow at King's College, Cambridge, on the basis of his 1934 (undergraduate) thesis in which he proved the Central Limit Theorem. More specifically, he derived a proof of what we now call the Lindeberg-Lévy Central Limit Theorem.  He was not aware of Lindeberg's earlier work (1920-1922) on this problem. Lindeberg, in turn, was unaware of Lyapunov's earlier results. (Hint: there was no internet back then!). How many times has your econometrics instructor waved her/his arms and muttered ".......as a result of the central limit theorem....."?
  • In 1939, Turing developed what Wald and his collaborators would later call "sequential analysis". Yes, that's Abraham Wald who's associated with the Wald tests that you use all of the time.Turing's wartime work on this subject remained classified until the 1980's. Wald's work became well-established in the literature by the late 1940's, and was included in the statistics courses that I took as a student in the 1960's. Did I mention that Wald's wartime associates included some familiar names from economics? Namely, Trygve Haavelmo, Harold Hotelling, Jacob Marschak, Milton Friedman, W. Allen Wallis, and Kenneth Arrow.
  • The mathematician/statistician I. J. ("Jack") Good was a member of Turing's team at Bletchley Park that cracked the Enigma code. Good was hugely influential in the development of modern Bayesian methods, many of which have found their way into econometrics. He described the use of Bayesian inference in the Enigma project in his "conversation" with Banks (1996). (This work also gave us the Good-Turing estimator - e.g., see Good, 1953.)
  • Turing (1948) devised the LU ("Lower and Upper") Decomposition that is widely used for matrix inversion and for solving systems of linear equations. Just think how many times you invert matrices when you're doing your econometrics, and how important it is that the calculations are both fast and accurate!

References

Banks, D. L., 1996. A conversation with I. J. Good. Statistical Science, 11, 1-19.

Good, I. J., 1953.The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237-264.

Turing, A. M., 1948. Rounding-off errors in matrix processes. Quarterly Journal of Mechanics and Applied Mathematics, 1, 287-308.


© 2014, David E. Giles

Monday, December 29, 2014

Multivariate Medians

I'll bet that in the very first "descriptive statistics" course you ever took, you learned about measures of "central tendency" for samples or populations, and these measures included the median. You no doubt learned that one useful feature of the median is that, unlike the (arithmetic, geometric, harmonic) mean, it is relatively "robust" to outliers in the data.

(You probably weren't told that J. M. Keynes provided the first modern treatment of the relationship between the median and the minimization of the sum of absolute deviations. See Keynes (1911) - this paper was based on his thesis work of 1907 and 1908. See this earlier post for more details.)

At some later stage you would have encountered the arithmetic mean again, in the context of multivariate data. Think of the mean vector, for instance.

However, unless you took a stats. course in Multivariate Analysis, most of you probably didn't get to meet the median in a multivariate setting. Did you ever wonder why not?

One reason may have been that while the concept of the mean generalizes very simply from the scalar case to the multivariate case, the same is not true for the humble median. Indeed, there isn't even a single, universally accepted definition of the median for a set of multivariate data!

Let's take a closer look at this.

Sunday, December 28, 2014

Econometrics in the Post-Cowles Era

My thanks to Olav Bjerkholt for alerting me to a special edition of the open access journal, Oekonomia, devoted to the History of Econometrics. Olav recently guest-edited this issue, and here's part of what he has to say in the Editor's Foreword:

"Up to World War II there were competing ideas, approaches, and multiple techniques in econometrics but no ruling paradigm. Probability considerations played a very minor role in econometric work due to an unfounded but widely accepted view that economic time series were not amenable to such analysis. The formalization of econometrics undertaken at Cowles Commission in Chicago in the late 1940s inspired by the Probability Approach of Trygve Haavelmo, and often referred to as the CC-Haavelmo paradigm, placed the whole problem of determining economic relationships firmly within a probabilistic framework and made most traditional techniques redundant. A key assumption in this paradigm as it was conceived is that models to be estimated have been fixed with certainty by a priori formulated theories alone, it can thus be labeled as “theory-oriented”. It exerted a strong influence in the ensuing years, not least as consolidated standard econometrics propagated by textbooks. The history of econometrics, as written in the 1980s and 1990s, covered mainly the period up to and including the Cowles Commission econometric achievements.
Haavelmo made a remark at the beginning of his influential monograph that econometric research aimed at connecting economic theory and actual measurements, using appropriate tools as a bridge pier, “[b]ut the bridge itself was never completely built.” From around 1960 there arose increasingly discontents of different kinds with the CC-Haavelmo paradigm, not least because of the key assumption mentioned above. The bridge needed mending but the ideas of how to do it went in different directions and led eventually to developments of new paradigms and new directions of econometric analysis. This issue comprises four articles illuminating developments in econometrics in the post-Cowles era."
These four articles are:

  • Marc Nerlove,  “Individual Heterogeneity and State Dependence: From George Biddell Airy to James Joseph Heckman”.
  • Duo Qin, “Inextricability of Confluence and Autonomy in Econometrics”.
  • Aris Spanos, “Reflections on the LSE Tradition in Econometrics: a Student’s Perspective”.
  • Nalan Basturk, Cem Cakmakli, S. Pinar Ceyhan, and Herman van Dijk, “Historical Developments in Bayesian Econometrics after Cowles Foundation Monographs 10, 14”.

Coincidentally, the last of these papers was the topic of another post of mine last month, before I was aware of this special journal issue. I'm looking forward to reading the other three contributions. If they're even half as good as the one by Basturk et al., I'm in for a treat!


© 2014, David E. Giles

Saturday, December 27, 2014

The Demise of a "Great Ratio"

Once upon a time there was a rule of thumb that there were 20 sheep in New Zealand for every person living there. Yep, I kid you not. The old adage used to be "3 million people; 60 million sheep".

I liked to think of this as another important "Great Ratio". You know - in the spirit of the famous "Great Ratios" suggested by Klein and Kosubod (1961) in the context of economic growth, and subsequently analysed and augmented by a variety of authors. The latter include Simon (1990), Harvey et al. (2003), Attfield and Temple (2010), and others.

After all, it's said that (at least in the post-WWII era) the economies of both Australia and New Zealand "rode on the sheep's back". If that's the case, then the New Zealand Sheep Ratio (NZSR) may hold important clues for economic growth in that country.

My interest in this matter right now comes from reading an alarming press release from Statistics New Zealand, a few days ago. The latest release of the Agricultural Production Statistics for N.Z. revealed that the (provisional) figure for the number of sheep was (only!) 26.9 million at the end of June 2014 - down 4% from 2013.

I was shocked, to say the least! Worse was to come. The 2014 figure puts the number of sheep in N.Z. at the lowest level since 1943! 

I'm sure you can understand my concern. We'd better take a closer look at this, and what it all means for the NZSR:

Wednesday, December 17, 2014

End-of-Semester Econometrics Examination

My introductory graduate econometrics class has just finished up. The students sat the final examination yesterday. They did really well!

If you'd like to try your hand, you can find the exam. here.


© 2014, David E. Giles

Sunday, December 14, 2014

The Rotterdam Model

Ken Clements (U. Western Australia) has sent me a copy of his paper, co-authored with Grace Gao this month, "The Rotterdam Demand Model Half a Century On". 

How appropriate it is to see this important landmark in econometrics honoured in this way. And how fitting that this paper is written by two Australian econometricians, given the enormous contributions to empirical demand analysis that have come from that group of researchers - including Ken and his many students - over the years. (But more on this another time.)

Any student who wants to see applied econometrics at its best can do no better than look at the rich empirical literature on consumer demand. That literature will take you beyond the "toy" models that you meet in your micro. courses, to really serious ones: the Linear Expenditure System, the Rotterdam Model, the Almost Ideal Demand System, and others. Where better to see the marriage of sound economic modelling, interesting data, and innovative statistical methods? In short - "econometrics".

Back to Ken and Grace's paper, though. Here's the abstract: