http://philsci-archive.pitt.edu/archive/00003864/01/J._M._Keynes_and_L._von_Mises_on_Probability_(pdf).pdf
This paper discusses, on pages 6-7, Rothbard's comments on statistical inference. He says that the theory of statistics rests on the assumption that all samples will be distributed on a normal curve.
How is it possible to take this as anything other than a mistaken understanding by Rothbard of the Central Limit Theorem? That theorem does not say that all samples are distributed normally, but rather that for any non-normal distribution, the distribution of the averages of various groups can be made to approach as closely to normal as desired.
Furthermore, the CLT is not an assumption, as Rothbard said, but a theorem, logically derived from the axioms of probability theory.
I don't want Rothbard's statement to be absurd. Can any mathematicians think of an interpretation of Rothbard's words that makes sense?
To see the preconditions for the CLT, let's look at it first in the other direction. Suppose I have a population with mean mu and standard deviation sigma. Repeat: mu is the mean of the population - the "real" mean, and sigma is the "real" standard deviation. I can only draw a normal curve with these features if mu and sigma are both finite, so that's a precondition for the CLT. Now, the trivial case is that the population is distributed normally. Then for any sample size, we expect the sample distribution (the distribution of the things we selected out of the population in some random sample) to approximate a normal curve with mean my and standard deviation sigma. So we won't bother with this case.
Now, if I draw a simple random sample (another precondition for CLT) and measure some trait of the things selected, and the trait is itself a random variable (another precondition) I can then find the average of my sample. So, I can have a population of 10M people, select an SRS of 1000 people, measure their heights, and find the average height, call it h. What does this average height tell me about the population? I can make a guess that mu is about h, and there are statistical tests for determining my confidence in that and finding the range covered by "about" but I'm also interested in the distribution of heights. To get information about this, I begin by finding the distribution of average heights. That is, if I drew this sample many times, each time randomly selecting 1000 people, how would the various h values be distributed?
This is where CLT comes into play. Suppose that instead of 1000 people, I drew 10 people. Now I repeat this many times, each time finding the average, then determine how often each number comes up as the average (give ranges instead of numbers, though.) This distribution would be expected to have an average around mu, but not to be normal. Now, as I increase the sample size from 10 to 100 to 1000, the distribution is guaranteed to approach normality. Notice that at the limit, I have a sample size of 10M. Thus, each sample drawn is exactly the same, and we'd expect that for every sample, we'll get the same average. This isn't true, though - we can make mistakes measuring, or have measurements that call for judgement. Hence, at the limit we'll have a normal distribution with a very small standard deviation. When the sample size is large enough (subject to definition) we can even describe what the normal distribution looks like - the mean is approximately mu, and the standard deviation is approximately sigma/sqrt(sample size)
So yes, there are situations where CLT doesn't apply. In essence:
1. Infinite population
2. Infinite mean or standard deviation
3. Zero standard deviation (hence zero variance - i.e. a constant population)
4. Trait not describable as a random variable (such as political party preference, in Rothbard's example - hence statistics makes no claim at all about normality of this trait)
5. No way to draw simple random samples, or drawing simple random samples is misleading (such as determining the average number of testicles in a population of both males and females - similar to the man with his head in the oven and his feet in the freezer who is quite comfortable on average)
But this has nothing to do with Rothbard's claim. Rothbard claimed that statistics claims that all traits whatsoever are distributed normally, and that this is an assumption of statistics. This seems wrong because:
1. There are traits about which statistics makes no normality claims
2. Regarding traits which statistics does make normality claims about, the CLT doesn't say that the distribution is normal, only the sample distribution for a large enough sample size
3. The CLT is not an assumption of statistics, it is a theorem.
if it helps or no, here is the longer article from which rothbards statements were pulled
http://mises.org/econsense/ch6.asp
perhaps your disagreement with rothbard is just that he was unaware of other statistical techniques aside from what you have outlined, and mistook one dominant technique for the whole of the field? but then again it leaves the question of how useful statistics are to economists unaddressed.
Where there is no property there is no justice; a proposition as certain as any demonstration in Euclid
Fools! not to see that what they madly desire would be a calamity to them as no hands but their own could bring
I don't agree with Taleb's politics, but I love his skeptical mathematical mind:
The Fourth Quadrant
Inverse Problems. It is the greatest epistemological difficulty I know. In real life we do not observe probability distributions (not even in Soviet Russia, not even the French government). We just observe events. So we do not know the statistical properties—until, of course, after the fact. Given a set of observations, plenty of statistical distributions can correspond to the exact same realizations—each would extrapolate differently outside the set of events on which it was derived. The inverse problem is more acute when more theories, more distributions can fit a set a data.
This inverse problem is compounded by the small sample properties of rare events as these will be naturally rare in a past sample. It is also acute in the presence of nonlinearities as the families of possible models/parametrization explode in numbers.
JAlanKatz:Furthermore, the CLT is not an assumption, as Rothbard said, but a theorem, logically derived from the axioms of probability theory.
The assumption that Rothbard has a problem with is the following:
“In the science of statistics, the way we move from our known samples to the unknown population is to make one crucial assumption: that the samples will, in any and all cases, whether we are dealing with height or unemployment or who is going to vote for this or that candidate, be distributed around the population figure according to the so-called “normal curve.”
And according to the Wikipedia article, which cites John Rice in this definition:
the central limit theorem (CLT) states conditions under which the sum of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed
First of all, this definition of the theorem seems very different from yours, because it doesn't say anything about making distributions "approach as closely to normal as desired." Secondly, it seems to me, Rothbard doesn't have a problem with the CLT itself as far as it goes, but rather a problem with the statistician's habit of assuming it applies to the real world, given that the derived CLT itself only purports to state conditions under which the sum of certain numbers are normally distributed, and not that those conditions always exist in reality, or that the statistician can even know when and where those conditions are present in reality.
The critic of Rothbard quoted in the text himself admitted the limited nature of the CLT's application:
Alternatively, normal distribution can be strictly derived by the Central Limit Theorem, which shows that where some variable is influenced by a large number of unrelated random variables, that variable will be normally distributed. This result holds subject to certain conditions, which are very widely, but not universally, encountered. Statisticians are open to the possibility of non-normal distributions where these conditions don ́t apply.
My question is, how do statisticians know these conditions occur "widely"?
Austrian Theory of the Business Cycle in 9 steps (Soliciting comments)
Daniel J. Sanchez:First of all, this definition of the theorem seems very different from yours, because it doesn't say anything about making distributions "approach as closely to normal as desired." Secondly, it seems to me, Rothbard doesn't have a problem with the CLT itself as far as it goes, but rather a problem with the statistician's habit of assuming it applies to the real world, given that the derived CLT itself only purports to state conditions under which the sum of certain numbers are normally distributed, and not that those conditions always exist in reality, or that the statistician can even know when and where those conditions are present in reality.
I have never come across the definition in the wikipedia article. I've always learned, and taught, the way I detailed in a later post: That as the sample size increases, the sampling distribution(not the observed distribution in the sample, but rather the distribution of the average X value on the samples) approaches normality. Perhaps one is a correlary of the other; I'm not a statistician.
Daniel J. Sanchez:My question is, how do statisticians know these conditions occur "widely"?
It's an empirical question, but certainly by my statement, it doesn't appear up to much debate. The conditions just aren't that hard to meet. Finite mean and sd just means that all values are finite, and the population is finite. Having variables amenable to simple random samples, and independent, are harder to meet (height doesn't quite work because it can be correlated to sex, and if we allow for sex variations, it's not an SRS) but there still seem to be plenty of examples.
trying to understand
JAlanKatz:This distribution would be expected to have an average around mu, but not to be normal. Now, as I increase the sample size from 10 to 100 to 1000, the distribution is guaranteed to approach normality.
so the CLT is only applicable for data sets where 'if you could get a large enough sample' you'd see that the distribution is normal?
nirgrahamUK:so the CLT is only applicable for data sets where 'if you could get a large enough sample' you'd see that the distribution is normal?
The key is to differentiate between the distribution of the sample (which we assume in some way approximates the distribution of the population, so if the population is not normal, neither should this distribution be) and the sampling distribution, which is the distribution of the averages of many samples. If I select a sample of 100, I can draw a graph of value vs. frequency, with the frequencies summing to 100, since I have 100 items in my sample. The shape of that should approximate the shape of a graph drawn for a sample of 10M, which is the entire population. I haven't added any normality by doing that.
On the other hand, I can draw a sample of 100, and find the average value. Then I can repeat this process 100 times, and graph the value of the average against how often that average was obtained. This one will be more normal than the first graph (if the requirements of CLT are met.) If I increase not the number of samples, but the sample size (draw 200 instead of 100), then it will be more normal.
When is this true? When we have a trait which can be fairly observed by drawing a simple random sample, when the trait is random and independent (can't be predicted by some other trait), and when the mean and sd of the population are finite.
JAlanKatz:When we have a trait which can be fairly observed by drawing a simple random sample, when the trait is random and independent (can't be predicted by some other trait), and when the mean and sd of the population are finite.
so i imagine rothbard has a problem with all the statisticians assuming these things when they set about their economic analysis. if they do asume these things. if they dont asume these things , are they therefore using methods other than CLT?
nirgrahamUK:so i imagine rothbard has a problem with all the statisticians assuming these things when they set about their economic analysis. if they do asume these things. if they dont asume these things , are they therefore using methods
What statisticians do is they first verify these conditions - the first 3 conditions can be reasoned out and have to do with experiment design, the last 2 can be calculated to check that they are finite.
Correct me if I'm wrong but could you not just criticize the potential abuse of statistics by saying that the majority of economic variables which are worth drawing inference from are not of an independent nature. If for example we want to analyze various prices it would seem that they are fundamentally dependent nature, that is, they are interelated through a complex web of relations. A price in one area is intricately related to price in another area and therefore there height is not independent in nature. I suppose if they're random to a sufficient degree it can be overlooked, but the most interesting aspect of complex systems is the relations of its elements. These relations determine the values of the elements. I guess if you could reduce the relations to random underlying causes, you could go on? I don't know. I suppose I'm probably way off. I've taken many econometrics course but I never pay attention (it's really boring).
edward_1313: Correct me if I'm wrong but could you not just criticize the potential abuse of statistics by saying that the majority of economic variables which are worth drawing inference from are not of an independent nature. If for example we want to analyze various prices it would seem that they are fundamentally dependent nature, that is, they are interelated through a complex web of relations. A price in one area is intricately related to price in another area and therefore there height is not independent in nature. I suppose if they're random to a sufficient degree it can be overlooked, but the most interesting aspect of complex systems is the relations of its elements. These relations determine the values of the elements. I guess if you could reduce the relations to random underlying causes, you could go on? I don't know. I suppose I'm probably way off. I've taken many econometrics course but I never pay attention (it's really boring).
No, I tend to think that this is true. There are many good criticisms of the use of statistics: ordinal preference, subjectivity, time, the fact that so much of mainstream economics simply misuses them anyway (there's a good audio from the Keynes conference at Mises.org about this in relation to the Phillips curve), human action, ... I think what you've pointed out here is true also. If there's enough randomness, you can factor out the dependence, but only by using a stratified random sample. Yes, you could also reduce the relations, but in economics, there is no random underlying factor (except maybe human action).
JAlanKatz: in economics, there is no random underlying factor (except maybe human action).
human action is by definition not random, its purposeful, and this is why im skeptical that for the prupose of studying economics:
JAlanKatz:the first 3 conditions can be reasoned out and have to do with experiment design,.
nirgrahamUK:human action is by definition not random, its purposeful, and this is why im skeptical that for the prupose of studying economics:
Human action is purposeful from the perspective of the actor. From the perspective of an observer, a particular action seems random, though, since we don't have access to their plans and purposes. I don't mean seems random in the sense of "oh, that seems like a random thing to do" but rather that we cannot predict ahead of time what a person will do. If I walk into an auction house, I understand what's happening there, but without getting into everyone's brains, I can't predict who will buy what, or for how much. I can make some guesses "John really likes those kinds of things, I bet he'll bid on it" but it doesn't take me very far. I agree entirely, though, that this whole thing is not amenable to the theory of statistics and probability.
The big problem that I've encountered with economic statistical analysis is the not the size or distribution of the data or sampling errors, it's understanding what you're really measuring.
For instance, say I have a database full of price information. If you work from the understanding that all prices are relative, then they are in fact relative only to those prices at that particular instance when they were measured. You're only ever measuring an instant in time. So for price to be a measure of opportunity cost, then as you go forward in time, you need to know how all the opportunity costs have changed along with the current price of whatever you're measuring. To do this, you have to make an assumption that opportunity costs are not changing over time. Mises seems to touch on this point in Epistemological Problems in Economics: you have to define an ends to which the action is attempting to obtain. Unless you're aware of what the actor perceives as the next best alternative at all points in time, you inherently have an incomplete data set.
It goes even further. Say you had a complete data set for an instant in time to where you could create a supply and demand curve. That supply and demand curve would only be valid for that particular instant in time and only for that particular actor. This is why I have a problem with aggregate supply and demand curves and price time series. They're never really measuring what price is supposed to represent -- the opportunity cost of the next best action.
Do any of you know of a short, concise piece arguing against statistics in econ? The big pieces by the likes of Hoppe and Mises (and Hollis) are good, but I want something I can slide into my paper (i.e. summarise and evaluate) without breaking the word limits or spending excessive amounts of time on it. I've printed out Taleb's paper. Is it worth incorporating in an essay on the methodology of economics or are his concerns restricted to finance? Rothbard has some good concise pieces on the amenability of human action to mathematical treatment in footnotes in MES, but I am already aware of and will be using those. North's (critical) paper on Rothbard's use of graphs is good but too extensive since I'm already using other longer materials. Maybe by Hayek in his The Counter-revolution of Science? I have the book, and it seems to be a collection of many articles.
Freedom of markets is positively correlated with the degree of evolution in any society...