Mon 3 Oct 2005
In an earlier post I observed that “Seixon does not understand sampling”. Seixon removed any doubt about this with his comments on that post and two more posts. Despite superhuman efforts to explain sampling to him by several qualified people in comments, Seixon has continued to claim that the sample was biased and therefore “that the study is so fatally flawed that there’s no reason to believe it.”
I’m going to show, without numbers, just pictures, that the sampling was not biased and what the effect of the clustering of the governorates was.
Let’s look at a simplified example. Suppose we have three samples to allocate between two governorates. Governorate A has twice as many people as Governorate B, so if they are not paired up, A gets two samples and B gets one sample. (This is called stratified sampling.) If they are paired up using the Lancet’s scheme, then B has a one in three chance of getting all three samples, otherwise A gets them. (This is called clustered sampling.) Seixon claims that this method introduces a bias and what they should have done was allocate the three samples independently with B having a one third chance of getting each cluster. (So that, for example, B has a (1/3)x(1/3)x(1/3) chance of getting all three. This is called simple random sampling.
We can see the difference each of these three procedures makes by running some simulations. I used a random number between 1 and 13 as the result of taking a sample in governorate A and one between 1 and 6 for governorate B and ran the simulation a thousand times. The first graph shows the results for stratified sampling. The horizontal lines show the distribution of the results. 95% of the values lie between the top and bottom lines, while the middle one shows the average.

The second one shows the result of clustered sampling. Notice that the average is the same as for the first one. This shows that by definition, the sample is not biased. However, the top and bottom lines are further apart—the effect of using cluster sampling instead of stratified sampling is to increase the variation of the samples.

The third one shows the result of simple random sampling. The average is the same as the previous two. There is less variation than for cluster sampling.

The last graph shows simple random sampling but with two samples instead of three. The average is the same as for the others, and the amount of variation is about the same as for cluster sampling. In other words, the result of cluster sampling is just like simple random sampling with a smaller sample size. The ratio of the sample sizes for which cluster sampling and simple random sampling give the same variation is called the design effect. In this case it is roughly (3/2=1.5). In our example governate A was quite different from governate B (samples from A were on average twice as big). If A and B were more alike then the design effect would be smaller. That is why they paired governorates that believed were similarly violent. If the governorates that they paired were not similar, it does not bias the results as Seixon believes, but it does reduce the precision of the results, increasing the width of the confidence interval.

Seixon offers one more argument against clustering—if clustering is valid, why not put everything into just one cluster? The answer is that although that would not bias the result, it would increase the design effect so much that the confidence intervals would be so big that the results would be meaningless.
This article by Checchi and Roberts goes into much more details of the mechanics of conducting surveys of mortality. (Thanks to Tom Doyle for the link.)
October 3rd, 2005 at 2:58 pm
Devil’s advocate: how was the researchers’ estimate of the success of pairing violent and non-violent provinces reflected in the Lancet study’s final confidence intervals? Realistically, how could they have factored in whether a subjective assessment of provincial violence levels was accurate or not?
Recall that one of the study’s most surprising conclusion was the upshot of the lower confidence interval, pegging a lower limit of net deaths at 8,000, suggesting there was almost zero possibility U.S. actions to that point could have saved more Iraqis than they killed. Many commenters (Dsquared, etc.) made a big deal of this at the time, and rightly so. But a minor adjustment of the variance at that fringe might have had a significant effect on the study’s reception.
Not arguing with you (I think the “pairing bias” is just another red herring) but I do think you’re begging the question a little, there.
(And, truly, the researchers could have avoided this line of criticism entirely by doing more to document their province-pairing rationale than just attributing their choices to their own “belief:” an unfortunate choice of word, that.)
October 3rd, 2005 at 10:34 pm
BruceR,
From what little I’ve learned from Wikipedia and an old Navy manual, any “adjustment” to the study that would have included zero (or less) in the CI would have made the results wildly inconclusive. The only thing that makes the study meaningful is that the bottom limit of the CI is significantly above zero (aka, our CI was huge, but the bottom limit shows an effect exists). In my ill-informed layman’s perspective, this is why the study isn’t really useful as there a few factors (ambiguous child mortality data, unmeasured “faith” in pairing) that, taken into account, would move the bottom limit below zero.
Someone please correct me if I’m wrong.
October 3rd, 2005 at 10:52 pm
I’d like to strike my “unmeasured “faith” in pairing” example in my previous post. In the previous Lancet study thread, Kevin Donoghue says that this has been accounted for.
October 3rd, 2005 at 11:13 pm
BruceR, they used the results they found to boot-strap a probability distribution, so the more different the pairs are, the more variation in the distribution and the larger the confidence interval you get.
jet, you are wrong. Both about what it would mean if the CI just dipped under 0, and about your suggestion that those factors would do so.
October 4th, 2005 at 12:07 am
I guess Lambert has redefined stratified sampling and cluster sampling.
Stratified sampling is where you pick out randomly from different strata. Say, you pick a total of 600 people from one strata and 300 in a strata that has 50% the population, according to PPS. You pick these people across the whole strata randomly.
Cluster sampling is the same thing, except you would bunch 30 and 30 people together, making 30 clusters of 30 people. Then you would distribute these 30 clusters across the two strata (areas) randomly. When you get into choosing the persons you will sample, you distribute the clusters that each area gets via SRS according to city, commune, whatever according to PPS. So say a cluster lands in a city, you interview the 30 people in that cluster in that same city.
The difference between stratified and clustering is that with stratified, you would choose each person randomly across the area. With cluster sampling, you bunch up 30 people and they are all from the same vicinity just as a single person would be in stratified.
Now, Lambert here has apparently redefined cluster sampling to something that is completely incongruent with every example of cluster sampling out there.
In cluster sampling, you do not cluster clusters as the Lancet did. No Lambert, that is not called multistage clustering either. Stop lying.
So now you have succeeded once again in putting up a smokescreen strawman, and then demolishing it. Good job. Not only that, but you are being intellectually dishonest here. What you just demonstrated - that’s not cluster sampling. You also gave a very misleading definition of stratified sampling. Governate A would not get “two samples” while B “one sample”. With stratified sampling, each governate gets one sample, but they are of different sizes according to PPS. The elements of these samples are chosen via SRS across the entire strata.
Then you proceeded to invent a new definition of cluster sampling. I asked you to show me a single example of a “clustering of clusters” study, literature, anything. Instead of doing so, you just further your dishonesty and whip up a few charts that will make the less inclined believe you. “Lookie, pictures!”
I know perfectly well what sampling is, and I know what stratified and cluster sampling are as well. What you have demonstrated here is nothing short of ludicrous. All you have to do is read the excerpts I have printed in my most recent post on this subject to see that Lambert is not being candid.
October 4th, 2005 at 12:36 am
Also missing from this “rebuttal”:
Explaining how it was legitimate to non-randomly pick provinces to pair up.
Explaining how it was legitimate to pair up provinces without any rationale other than a “belief”.
Explaining how clustering of clusters is even supported at all in statistics.
Explaining how a sample is still random when 2 consecutive non-random decisions affect the entire sample.
Explaining how a sample is still random when the provinces in the pairings ended up with a far higher chance of being excluded from the sample than the rest of the provinces.
This is just ridiculous Lambert, I am laughing my ass off.
October 4th, 2005 at 1:04 am
This is what I have learned from this and other threads:
The man has more energy than you, or I, or a batallion of arguers-from-Enlightenment-principles.
Best,
D
October 4th, 2005 at 1:04 am
Congratulations, Seixon. Just when I thought you couldn’t possibly be less persuasive…
October 4th, 2005 at 1:32 am
Seixon,
Every aspect of a survey design does not have to be chosen randomly. Do you think that the sample size should be chosen randomly? The size of each cluster? The sampling unit?
This was expliened in my post. Pairing dissimilar provinces increases the width of the confidence interval.
Ummm, it’s text book stuff. You really did do a course in statistics? Look at, for example, the notes from a University of Melbourne course on sampling. Read the chapter on cluster sampling. Though you should probably read the one on stratified sampling as well.
This one has been explained to you multiple times.
I drew you a nice picture in my post. Did you notice that the average was the same no matter which method was used?
OK, I answered five of your questions. Now answer two of mine. Where did you do this stats course where you claim to have gotten an A? And what was the name of your onstructor?
October 4th, 2005 at 3:39 am
Bruce R: “how was the researchers’ estimate of the success of pairing violent and non-violent provinces reflected in the Lancet study’s final confidence intervals?”
Tim L. “they used the results they found to boot-strap a probability distribution, so the more different the pairs are, the more variation in the distribution and the larger the confidence interval you get.”
Bruce, the paper states they used the bootstrap method to determine that all clusters were interchangeable (using the null hypothesis that they were interchangeable-really uninformative). So, they determined that they did not have to account for the variance between clusters in the final analysis. But, we can see from the results that the clusters varied enough and the usual ’shrug, good enough for such a population’ is only reasonable in a study that is replicated and gives the same result.
October 4th, 2005 at 4:39 am
Tim,
No, but every decision to do with actually choosing the sample has to be random. You know this, so why are you trying to run away from the truth?
Would you like to show me any literature or ANYTHING that shows clumping of clusters in a 2nd stage of cluster sampling??? What you said would be true if they had just started out with 11 clusters instead. Then the CI would be expanded as you say, but the sample wouldn’t be biased. The way the JHU team did it, they biased it by treating different provinces unequally and ensuring that some had more of a chance of being in the sample than others, even taking PPS into consideration.
Want to point out where in that PDF file it talks about clumping of clusters??? It even says in that file that you do SRS of clusters. So again, when are you actually going to show something other than the Lancet study that uses clumping of clusters??
Yes, Kevin tried to explain it, but he was using expected values, which is not what we were after. We were after the probabilities, which were altered by the pairings. For example, Missan would have had a 61% chance of being sampled in the initial round, while in the 2nd paired up round, it suddenly only had 34% of being sampled. In other words, in the first round, Missan had a 39% chance of not being in the sample. Yet after being paired, it suddenly had a 66% chance of not being in the sample. This was of course due to a completely non-random process.
That’s not biasing the sample??? Give me a break Tim.
What you said is “cluster sampling” is not cluster sampling. In cluster sampling, you don’t distribute clusters with a winner-takes-all approach. You do it via SRS, just like the PDF you linked to says. Do I really need to quote your own source against you?
Come on Tim, level with us.
October 4th, 2005 at 5:26 am
But, we can see from the results that the clusters varied enough and the usual ’shrug, good enough for such a population’ is only reasonable in a study that is replicated and gives the same result.
If I can tease this out a bit:
I used to replicate studies in lab with test tubes, petri dishes, 2″ pots - controlled environments. When I did field studies of veg crops, I did not ‘replicate’ studies exactly, as I could not duplicate weather, soil moisture, insolation. A team going back to Eye-rack and performing a survey will not be replicating the ululation-inducing Lancet study.
Best,
D
October 4th, 2005 at 6:32 am
Dano,
I think the point was that Lambert’s little strawman doesn’t even begin to draw parallels with the Lancet study.
What he is essentially doing is showing a graph of E(X) by doing 1,000 trials. The problem with this is that he has chosen some arbitrary numbers, and we don’t get to see how often governate B is really chosen for the sample. Not to mention that his “cluster sampling” isn’t really cluster sampling. I’d be tempted to make a much more illustrative example, one with real cluster sampling, and then one with the Lancet method, and see how those compare.
In fact, him saying that stratified sampling means that A gets two samples, while B gets one is just wrong. In stratified sampling, there is only one sample per strata, although the size of it will vary with the population in each strata. Lambert’s simplistic example doesn’t give us any sense of this. Also, the mortality rates in Iraq are not random, such as his numbers in this example are. Just one of many things wrong with this smokescreen…
October 4th, 2005 at 7:23 am
It seems that, with one exception, we are all agreed that the method used produces an unbiased sample. The interesting questions relate to the possibility that the sample may be a freak. To my mind the best answer to this is to look at Figure 1 on page 3 of the study. What jumps out at you is that Sulaymaniyah is the only place where mortality fell and Kerbala is the only place where it held constant. Everywhere else things got worse and in several cases they got a great deal worse. In the light of this it is very hard to believe that mortality could actually have fallen.
The CI tells us the same thing in a more erudite way. An interesting thought-experiment is to ask: how much do you have to increase the standard deviation of the sample in order to get the result Bruce R and Jet are interested in, with enough of the distribution in negative territory to leave intact the hypothesis that the invasion actually reduced mortality. Bear in mind that a “one-tailed” test is appropriate since the alternative hypothesis is that mortality rose, whether by a little or by a lot. So we have to go from having the 2.5% mark at 8,000 excess deaths to having the 5% mark at zero deaths. By my rough calculation, using a normal distribution, we would need to increase the standard deviation of the sample by about one-third.
October 4th, 2005 at 7:36 am
Eudoxis, Dano:
Regrettably, nothing you’re saying would make one assess the Lancet study’s methodology as equal or superior to, say, the UNDP study’s methodology. The big advantage, of course, is the Lancet study could be performed faster. Concur the results are non-replicable.
I’m not suggesting the probabilities are not adequately accounted for in the confidence interval, nor does it appear any problem with province-pairing could possibly have influenced the mean so significantly as opponents have suggested. But a study that at first glance appeared to say Iraqi fatalities were 98,000, plus or minus 90,000, is hardly an example of sterling precision: dare I suggest it may even not be worth Tim and others nailing themselves to the prow for it?
The research team evidently made some decisions about personal safety, cost, and time to publication that are still fair game for second-guessing, even if the statistics themselves are beyond question (sorry, Seixon). But given a choice between this and the UNDP paper for a cite on the cost of the war, would anyone now pick this one? Which still makes me wonder why the team took the road they chose, opting for publication speed over the greater measure of accuracy (and narrower interval) one would think they might have been able to obtain through a broader sampling of Iraqi households, for instance.
October 4th, 2005 at 7:55 am
Kevin,
I don’t think anyone is suggesting that the mortality in post-invasion Iraq went down. You are thrashing yet another strawman.
BruceR, you still haven’t admitted that your own calculations cemented my findings that the pairings of the provinces were completely fraudulent using coalition mortality as an indicator, since the JHU team used NO indicator, well, aside from “belief”.
I guess you all see it fine for Lambert to warp the definition of cluster sampling, where his own source proves him wrong, and toying around with the definition of stratified sampling.
His picture examples here don’t even demonstrate what the Lancet study was all about, or anything that has to do with my main points.
The sample was biased towards central Iraq, and biased towards the more populous regions in Iraq. The probabilities for exclusion from the sample were greatly amplified by the pairing process, e.g. violating the definition of a random sample.
You all still don’t seem to give a damn that Lambert keeps claiming the pairing process is consistent with cluster sampling, although he has not given a single example or any literature citing this to be the case. In fact, the literature he just cited proves him wrong, as it says cluster sampling is done by distributing the clusters via SRS.
Lambert’s example demonstrates, I guess, a single cluster sample. Not enough with that, the random numbers his samples generate doesn’t even have anything to do with the Lancet study, as the samples there did not produce random numbers.
The denial is astounding here. None of these issues are relevant, it seems, because they are too bothersome to try and invent excuses for.
That Lambert’s own sources prove him wrong seems to alert nobody. It seems I have come to an echo chamber.
The UNDP study is superior because it didn’t cut corners and bias their sample for the sake of convenience.
If the JHU team wanted to cut down on travel, why didn’t they just use 15 clusters instead of 33, and triple the sample sizes of those 15 clusters?
Oh, right, because 15 clusters would be frowned upon, even though that would have been an unbiased sample. Instead, they toyed around with their sample to get it the way they wanted it, so they could still claim to have 33 clusters (important since 15 would give a considerably higher DE).
October 4th, 2005 at 7:55 am
BR:
I merely chose the opportunity to clarify replication, as some of The PosseTM innocently or purposefully don’t understand replication, and they have spread the confusion.
Certainly the Lancet paper can be improved upon when bombs aren’t raining down from the heavens. I’m not arguing that one paper is better than another, and never have. I merely argue that the Lancet study is robust - and a first - and thus can be improved upon.
Hint: loud, long ululation is not a valid rebuttal.
Best,
ÐanØ
October 4th, 2005 at 8:03 am
I don’t think anyone is suggesting that the mortality in post-invasion Iraq went down.
From the bottom of my heart, thank you. That is the only important conclusion. If you believe it, then you believe in the Lancet study.
I actually believe that Seixon got an “A” in a statistics course. Everyone who did stats at university goes through this stage of believing that any deviation from the Platonic Form of the statistical study is a horrible sin which cannot be redeemed. The attitude usually survives about ten minutes into the first practical assignment.
October 4th, 2005 at 8:08 am
Or to put it another way, Seixon, let’s cut to the chase. The only way in which the grouping of the clusters would have affected the randomness of the sample, is if it was informative. In other words, the sample was random unless the clusters were grouped by someone who knew that he was doing so in order to group low-violence clusters with high-violence ones in order to eliminate the low-violence ones. In the bluntest terms possible, the sample was random unless the grouping procedure was fraudulently carried out by the survey team in order to intentionally create a dishonest survey.
Do you really have the balls to accuse the survey authors of this? Are you prepared to do so under your real name and accept the potential legal consequences of doing so?
October 4th, 2005 at 8:46 am
Bruce R,
The main reasons given for doing a small-scale study were: a limited budget, the risks faced by the survey teams and the hope that the occupation authorities could be pressured into doing a larger study by having it demonstrated that the thing can be done. The theory that the work was rushed doesn’t hold up. A rushed job wouldn’t stand up to hostile inspection the way this paper has.
As to choosing between the Lancet and the UNDP, we don’t really have to. The sensible thing to do is look at all the respectable work which is available. In one of the papers which Tim Lambert links to (and BTW, thanks to him and Tom Doyle for that) Checchi and Roberts list seven estimates of violence in Iraq. If Roberts is happy to cite other evidence there’s no reason why the rest of us should hesitate. But attempts to discredit serious work using arguments like Seixon’s fully deserve to be shown up for what they are.
Roberts also raises an important point about confidence intervals. In the sort of humanitarian emergency he is mostly concerned with, in places like Darfur, should relief agencies really insist on 95% confidence before they announce a finding that a serious increase in mortality has taken place? That approach was originally adopted for laboratory use, for testing whether certain drugs enhance the sex-drive of hamsters and suchlike questions. It doesn’t make much sense when applied to famines, wars and epidemics. Perhaps the sensible thing would be to publish a table or figure giving a whole range of confidence levels. If nothing else, it would discourage the Kaplan’s of this world from waffling about dartboards.
I actually believe that Seixon got an “A” in a statistics course.
dsquared, I think you might usefully look at the previous thread, where Tim Lambert and John Quiggin, amongst others, formed a different impression. Still, I know you don’t shy away from expounding surprising theories. I look forward to a post on CT or your own blog about the great potential of Seixonian Probability Theory.
October 4th, 2005 at 9:25 am
I think you might usefully look at the previous thread, where Tim Lambert and John Quiggin, amongst others, formed a different impression.
I have the greatest respect for Tim and John both, but working in universities, I think they both have overoptimistic estimates of the correlation between grades and understanding of the underlying subject. For what it’s worth I got an alpha minus from Oxford University in International Economics but I still have to have the Ricardian theory of comparative advantage explained to me every couple of months.
October 4th, 2005 at 9:41 am
Tim Lambert:
This article by Checchi and Roberts goes into much more details of the mechanics of conducting surveys of mortality. (Thanks to Tom Doyle for the link.)
You’re quite welcome, and thank you for the gracious acknowledgment.
Unfortunately, Relief Web changed the URL for the Checchi/Roberts article, so your link (and the one I provided in an earlier thread) doesn’t work. The current correct URL is:
www.reliefweb.int/rw/lib….
All the best,
[Fixed. Thanks again. Tim]
October 4th, 2005 at 11:16 am
Bruce R.
“Regrettably, nothing you’re saying would make one assess the Lancet study’s methodology as equal or superior to, say, the UNDP study’s methodology.”
My comment was merely a specific answer to your specific question about estimating error associated with the difference between violence in the regions. Shorter answer: they didn’t estimate that error.
Dano, I think everybody is very aware that this study can’t be replicated on the ground. That’s why the less robust method of bootstrapping can be used to estimate errors for a cluster data set with heterogeneity between clusters. You might want to read up on bootstrap replications.
October 4th, 2005 at 11:32 am
Dano, I think everybody is very aware that this study can’t be replicated on the ground. That’s why the less robust method of bootstrapping can be used to estimate errors for a cluster data set with heterogeneity between clusters. You might want to read up on bootstrap replications.
I was unclear. My comment was for Stevie Mac’s PosseTM.
Apologies.
D
October 4th, 2005 at 11:38 am
I apologize in advance for hijacking this thread, which truly is about something else, but I do think the more interesting question here is the one Kevin D. has raised more eloquently than I was able to: specifically, given the nature of the topic, the degree to which the first serious study (and I do agree it is that) on any highly contentious and politically charged sociopolitical issue such as this, should sacrifice exactitude for timeliness/impact. Kevin is of the perfectly defensible humanist view that given the choice, scientists might want to bend toward what could be seen as the greater good (ie, getting the word out in time to save lives). I find I have qualms with that. But to be fair, my training is as a historian and perhaps I reflexively am taking the longer view.
The statistics lessons have been both useful and a fun college refresher. But I’m still gravitating back to what I see as the larger issue.
Take it as a hypothetical, instead. If the Lancet had a choice between this article in October, before the American election, and another survey with a significantly tighter distribution that wouldn’t be ready until January, should this paper have been the one they published, assuming they could only publish the one? What about a study with tremendous precision that wouldn’t be publishable until 2010? I just wonder how far one should go in keeping one’s statistical powder dry in such situations.
October 4th, 2005 at 4:04 pm
Good questions, BR, and this gets back to the philosophy of reductionist science.
Many who wish to maintain objectivity use the Platonic model and Cartesian methods - the subject-object relationship. What we’ve found, however, is that this relationship allows the object to be devalued and thus expolited.
My experience is with plants, and Russian botanists don’t do random sampling, they do relevés and do other things to eliminate bias. This method also allows them to become intimately involved with a place, which narrows the gap between subject and object.
Narrowing the gap between subject and object makes it harder to exploit the object.
Now, Bruce, in your example I’d say the scientists who published when they did (your presumed early) had a narrower gap between subject and object. Is the object they studied less subject to devaluing?
Well, we have a long way to go, but we can see where the start is from here.
Best,
D
October 4th, 2005 at 5:04 pm
dsquared, working in a university I think I have a pretty good idea that students can get a good mark in a course with only a superficial understanding of the subject matter. Their knowledge is often extremely fragile — change things around a little bit and they are lost, though most realize this, unlike Seixon.
Seixon, I would like to contact the person who taught the statistics course you did. Please tell me their name and institution.
October 4th, 2005 at 5:32 pm
dsquared,
“From the bottom of my heart, thank you. That is the only important conclusion. If you believe it, then you believe in the Lancet study.”
OK, let me see if I follow you. Since I am not mentally retarded and understand that mortality went up in Iraq post-invasion, you know, due to the war and all… Then I agree with anything the Lancet study says? Then I believe that the Lancet study is bullet-proof? What kind of logical disconnect is that?
I know for a fact that mortality went up post-invasion, that is only logical. There was an invasion, the US military shot down thousands of Iraqi soldiers, and thousands of civilians always get killed in any invasion, especially considering that Saddam stored weapons in hospitals and within civilian infrastructure like the coward he is.
What does that have to do with the Lancet study? This study takes almost every shortcut possible, and completely undermines their own study by using methodology that it seems they have invented entirely on their own in order to either get the results they wanted, or ensure that their study looked more robust than it was.
No one has commented on how Lambert reinvented what cluster sampling is, and was very misleading about what stratified sampling is. Why is that?
No one seems to want to elaborate on why Salah ah Dinh, for example, was paired up while other provinces were not.
These were not random decisions, but arbitrary decisions made by the team. As Lambert’s own link shows, when you do cluster sampling, you distribute the clusters via SRS.
That is what the Lancet study did in their initial phase. If they had left it at that, there would be virtually nothing to complain about with this study. Alas, they didn’t.
I have challenged Lambert again and again to show me any shred of evidence that cluster clumping is an accepted methodology, only to have him create a bunch of irrelevant strawman pictures, redefine cluster sampling, and call it a day.
dsquared:”In the bluntest terms possible, the sample was random unless the grouping procedure was fraudulently carried out by the survey team in order to intentionally create a dishonest survey.”
Not necessarily. They could have genuinely believed that the provinces were similar, even though there was no rationale for explaining that, and they still were, by any indicator, wrong.
That is why I say that they introduced an unknown bias to the study, or at least, that they biased the study towards central Iraq and towards the more populous provinces of Iraq. What the result of this bias is impossible to know, since we don’t know what the real mortality is like in the unsampled provinces.
To use an example, they paired up Texas and Arizona by assuming they were similar without showing any reason to believe so, and then oversampled Texas by excluding Arizona. Now, doesn’t this bias the sample towards Texas? Yes, it does. The only way we could establish that this wasn’t a bias was to show that from a previous stufy, that the two states were similar along the lines of what we were sampling for. The Lancet team didn’t do that.
So, it doesn’t matter if they had a nefarious intent or not. The result is still the same.
dsquared:”Do you really have the balls to accuse the survey authors of this? Are you prepared to do so under your real name and accept the potential legal consequences of doing so?”
Well, given the other evidence, that they cut out about 25% of Iraq in the name of “safety” and then ensured that Fallujah was in their sample… that the Lancet website lied about the conclusion of the study… that this led to it being used in headlines across the media right before the presidential election… that they made up rationales to pair up provinces… that they used unsupported methodology… that it’s no secret that Les Roberts is against the war…
I don’t know, seems to me that the evidence points in the direction of some intent to guide the results of this study in a certain direction.
What legal consequences would there be for me to come out and say that they manipulated the study for political reasons? Exactly, none. Not trying to stifle dissent now are you?
I cannot prove that this was the case unless I was given access to all of their work and files, but the evidence is definitely there that something doesn’t smell right.
Dano,
“Certainly the Lancet paper can be improved upon when bombs aren’t raining down from the heavens. I’m not arguing that one paper is better than another, and never have. I merely argue that the Lancet study is robust - and a first - and thus can be improved upon.”
Robust? Oh man. A study that has a CI of 8,000-194,000 is “robust”? A study that uses an unsupported method of clumping clusters is “robust”? A study that pairs provinces according to nothing other than a “belief” is “robust”?
Wow. I’m guessing if you were a Bush-supporter, you’d say that the Iraq war was the most brilliant thing ever undertaken, if your standards are that low.
Oh, apropos bombs raining down: the UNDP carried out their study in April-May 2004. This was when the battle in Fallujah was taking place, and was the most violent time of the entire post-invasion period.
Magically, they managed to interview 21,668 households, in all 18 provinces of Iraq. This also shows that the JHU team could have carried out a better and more thorough study if they had really wanted to do so.
Which brings me to BruceR’s point about the hypothetical of them waiting longer to do a better study. What about it folks?
Why didn’t the JHU team go with their original cluster sampling, you know, the initial one they had before they decided to slice and dice it as they saw fit?
Hell, they could have cut it down to 20-25 clusters and just increased the number of households in each and this would have cut down the amount of travel since not all provinces would have been sampled via SRS distribution of the clusters.
This would have made a more precise study, but would perhaps have taken a few weeks more to conduct, possibly a month. What was the profound need to release the study before the presidential election? If that was such a need, why did they not carry out the study earlier and spend more time on it?
The UNDP was there in April-May 2004 (and August 2004)… why couldn’t the JHU team do that?
See, I’m asking a lot of questions, because I know that most of the answers will be very uncomfortable for most of you to answer.
Lambert has invested his entire credibility into this study, so I don’t foresee him ever conceding anything about it. Everyone should take notice that he has gone into redefining cluster sampling in order to seem like he is correct. I am still waiting for any shred of evidence that supports cluster clumping as statistical methodology…. Tick, tock.
October 4th, 2005 at 7:23 pm
Seixon: name of institution and your instructor please. People are going to start wondering if you ever did a stats course if you fail to answer.
October 4th, 2005 at 11:13 pm
Lambert,
Look, I know you didn’t like getting caught with your pants down redefining cluster sampling at all, but this arguing from authority thing is really starting to piss me off.
I took stats in high school, and I took stats in college in 2004 and got an A. What college, and what teacher I had, is none of your damn business. Especially when you can’t even answer my questions and continue to create strawmen instead of actually debating my points.
Your own source proved you wrong, you gave misleading or false definitions of cluster and stratified sampling, you won’t even comment on the obvious bias of arbitrarily selecting some provinces for this and that, and you won’t comment on how the probabilities for being included in the sample were fundamentally altered by the unsupported clumping process. The clumping process for which you still have not even shown an example of in either literature or a study.
That’s a whole lot of loose ends Lambert, and knowing the school and teacher I had for stats won’t really help you at all in tying them up.
October 5th, 2005 at 1:09 am
Here’s a report from July 12 2005 by an Iraqi group claiming that 128,000 Iraqis have been killed in the war thus far–
washingtontimes.com/upi/2…
On the Lancet survey, I suspect the authors would have done a larger one if they’d had the resources and time. They probably would have liked to have done more specifically on Fallujah (just to see if their one neighborhood was a fluke). I don’t see anything wrong with trying to publish before the elections, though it was naive to think many Americans would change their vote as a result (if that was the idea).
And Seixon, the point of Lambert’s simulation is to illustrate the effects of clumping clusters–it doesn’t change the expected value, but increases the spread, which is what everyone has been saying. The more unlike the provinces are, the more the spread. I’ve found this discussion to be educational, but the point has been made pretty clearly now, over and over again. As for whether the Lancet authors engaged in deliberate fraud, it’s all speculation. The number they got wasn’t that out of line with the UN survey.
October 5th, 2005 at 1:23 am
Seixen, I’m sorry to say this but I think you’re out of your depth. Look, I don’t like Lambert’s style either and I think he has few qualms about being misleading if it helps him nail an ideological enemy to a cross… but on this issue you’re not fighting Lambert, you’re fighting the English language. And losing. I think this sums up the argument about bias:
dsquared: “In the bluntest terms possible, the sample was random unless the grouping procedure was fraudulently carried out by the survey team in order to intentionally create a dishonest survey.”
Seixen: “Not necessarily. They could have genuinely believed that the provinces were similar, even though there was no rationale for explaining that, and they still were, by any indicator, wrong.”
The bias you mention above is just about getting things wrong. But the point is if they were being honest, a priori, the survey team could equally have been wrong in either direction. So no bias.
That doesn’t mean the conclusion of the survey is correct. Kevin Donoghue raises the question of whether the result was a freak result… which is perfectly possible in an unbiased survey and a reasonable question to raise. But that is unrelated to issues of bias.
The bias you hint at elsewhere is intentional fraudulent sampling (as dsquared said). Even if this is true (and I have no comment on that), this is not sampling bias. You sould admit this semantic point to show you are sincere.
btw, given our disgraceful defamation laws you could get sued for calling the survey team liars. Indeed — Tim could sue you and you could sue Tim for comments on this board! Defamation laws should be changed, but that’s a different debate.
October 5th, 2005 at 1:34 am
Lambert’s simulation is a fraud. The only reason it works out that way is because he let the sample result be random, which it wouldn’t be if you were measuring mortality. Also, his simulation is the same as conducting a cluster sample with one cluster, just like the Lancet method is. That is not cluster sampling.
Let’s say you want to interview 90 households in Basrah and Missan.
With stratified sampling, you would sample 60 in Basrah and 30 in Missan, randomly chosen within each.
With SRS, you would sample 90 across the entire area of both combined.
With cluster sampling, assuming 3 clusters with 30 households each, you would randomly distribute the 3 clusters with SRS using PPS. This would most likely end up with Basrah getting 2, and Missan getting one. Within each provinces, the clusters would be placed randomly, and then 30 households from each of those locations would be sampled. Obviously the precision rises with the number of clusters, as if you have 90 clusters, it is the same thing as stratified sampling.
Now, Lancet didn’t do any of this. As Lambert so dishonestly portrayed as cluster sampling, they gave all 3 clusters to one of the two provinces, in effect the same as only having one cluster with 90 households in it. That leaves 90 households sampled in one of the provinces, and none in the other. That would be fine, except that the cluster would be larger than all the others clusters used otherwise in the study, which biases it towards Basrah as the one who most likely would win it.
Lambert, you should redo your example a little more honestly. Meaning, the simulation must be more realistic and the mortality rates of each provinces will not be random as you made them.
If you do this, you will get a different result.
Of course, this also doesn’t take into account that the provinces who went through this were not randomly chosen to forego this process, as Lambert’s simulation takes as a given.
In other words, if you used what Lambert describes as “cluster sampling” above, then Baghdad would end up with all 33 clusters. That isn’t cluster sampling.
Doing a cluster sample with one cluster, with all the households in that single cluster, would be a more honest approach of accomplishing the same thing. Of course, this would, I’m guessing, provide a horrendous DE.
October 5th, 2005 at 1:44 am
John,
“The bias you mention above is just about getting things wrong. But the point is if they were being honest, a priori, the survey team could equally have been wrong in either direction. So no bias.”
A bias is when some part of the population is more or less likely to be chosen than the rest. The way they conducted this study, the population that existed within the provinces not selected for the grouping process, they were given a higher chance of being chosen than the others. That is exactly what bias is.
The sample was biased towards central Iraq and the more populous regions of the pairings.
Whether this produces a higher or lower mortality rate doesn’t matter, because we cannot know this due to not knowing the mortality rates of the 6 provinces that were excluded.
October 5th, 2005 at 1:59 am
Let me throw this one out to Bruce: I suspect that Roberts knew
That there was no excess mortality survey being done
That he did not have much funding (he knew that for sure)
Any survey large enough to come to the attention of the US or UK would be closed down.
No further surveys would be allowed once the initial results were published.
So he had one chance to do a limited survey.
October 5th, 2005 at 2:09 am
A household in a governorate not selected for pairing had the same chance of being chosen as a household in a paired one.
October 5th, 2005 at 2:17 am
my mistake, this is just nonsense and I no longer have any idea whether Seixon is being a blackboard nitpicker or just blowing smoke.
I know for a fact that mortality went up post-invasion, that is only logical. There was an invasion, the US military shot down thousands of Iraqi soldiers, and thousands of civilians always get killed in any invasion
Indeed, which is why one would obviously expect the excess deaths number to be positive if the USA invaded, say, Sussex. However, the country that they actually did invade was Saddam-era Iraq, in which one would have thought that the pre-invasion death rate was higher than it needed to be because Saddam was murdering people. For the excess deaths to rise, it would have to be the case that, over a period of eighteen months, we killed more of them (or more exactly, more of them died as a consequence of our actions) than were dying before. In, as I say, Saddam’s Iraq.
So, there was decent reason to believe that the estimated excess deaths figure would have been negative, or that if it was positive it would be a low enough number that zero would be well within the 95% confidence interval. So if you believe that zero is not in the 95% confidence interval, then you believe this either on the evidence of the Lancet study (which is the only study to have given an estimate of total excess deaths) or on no evidence at all. Since you claimed to have done a statistics course, I assumed that you had concluded that the death rate went up based on the evidence rather than based on strange and incorrect a priori ideas of your own.
You are also guilty of two fairly fundamental misrepresentations of the study’s results. Since soldiers in the Iraqi Army would not have been part of households during the immediate pre-war period, their casualties are unlikely to have been a material contributor to the excess deaths estimate. And you are also wrong in your implied claim that the excess violent deaths are concentrated in time around the months of March and April 2003; there is a chart which demonstrates that they are not.
Meanwhile, I am glad to see that you have retreated from your claim that the grouping process was “like grouping Texas with California” to “like grouping Texas with Arizona”. You are still a fair ways off in your analogy though, since Texas is not geographically contiguous with Arizona. Perhaps the next stage of your analogy ought to be “like grouping New Mexico with Arizona” or “like grouping Wyoming with Montana”, which would rather make it clear how weak an argument you have here.
by the way, this assertion:
they gave all 3 clusters to one of the two provinces, in effect the same as only having one cluster with 90 households in it
is wrong, isn’t it?
and this assertion:
Hell, they could have cut it down to 20-25 clusters and just increased the number of households in each and this would have cut down the amount of travel since not all provinces would have been sampled via SRS distribution of the clusters. This would have made a more precise study
is also wrong for most sensible assumptions about within-cluster and between-cluster variance, isn’t it?
Be clear here; you are accusing a team of eminent scientists of either incompetence or dishonesty here. If proved, this would be enough to wreck a career, which is why accusations of this sort ought not to be made anonymously.
October 5th, 2005 at 2:29 am
IANAL, but from what I know of libel law an allegation against a respected scientist made in, say, an unsigned pamphlet printed by a bunch of school kids is unlikely to result in damages being awarded. Apart from the fact that the kids probably have no money, it would be difficult to argue that the scientist’s reputation has really been harmed.
Les Roberts is about as likely to sue The Onion as to sue Seixon.
October 5th, 2005 at 3:20 am
Eli,
Thus again showing that there was no imminent need to do this survey… other than trying to influence the US presidential election… and the precision? Didn’t matter, apparently. Why not? Guess.
Kevin,
“A household in a governorate not selected for pairing had the same chance of being chosen as a household in a paired one.”
If Missan had not been paired up, it would have had a total chance of 61% to be in the sample (using a binomial distribution with p=0.028 and n=33). The result of the pairing changed this to 34%.
Yup, 61%, 34%, same thing.
dsquared,
Even taking Saddam Hussein’s knack for killing people into consideration, sending his entire army out to get their brains blown out, using civilian infrastructure, etc. would undoubtedly given a higher mortality rate during a war than even at “peace” time with Saddam Hussein. Let’s not forget the insurgency now, eh?
“So if you believe that zero is not in the 95% confidence interval, then you believe this either on the evidence of the Lancet study (which is the only study to have given an estimate of total excess deaths) or on no evidence at all.”
I don’t believe 0 is in the confidence interval because it is ludicrous to believe that it would be. A war between 150,000 American forces and tens of thousands of Iraqi forces, and a insurgency on top of that, how in the world would that be the same as Saddam’s “quiet” 1.5 year period from before the war?
No evidence? Well, just relying on a little thing called logic, is all. Oh, and plus the UNDP study has been conducted, which you have seemingly ignored here.
“You are also guilty of two fairly fundamental misrepresentations of the study’s results. Since soldiers in the Iraqi Army would not have been part of households during the immediate pre-war period, their casualties are unlikely to have been a material contributor to the excess deaths estimate. And you are also wrong in your implied claim that the excess violent deaths are concentrated in time around the months of March and April 2003; there is a chart which demonstrates that they are not.”
Huh? The soldiers would not have been part of the households in the preceding 18 months? Uh, yes they would have. I don’t think even the Lancet study says anything like that.
I never claimed that the deaths were concentrated around the months of March and April 2003. Not sure where you pulled that out from…
“Meanwhile, I am glad to see that you have retreated from your claim that the grouping process was “like grouping Texas with California” to “like grouping Texas with Arizona”. You are still a fair ways off in your analogy though, since Texas is not geographically contiguous with Arizona. Perhaps the next stage of your analogy ought to be “like grouping New Mexico with Arizona” or “like grouping Wyoming with Montana”, which would rather make it clear how weak an argument you have here.”
The reason I was using Texas and California is because they comprise almost 26% of the US population, the same as was excluded from the Lancet sample.
Yes, it might be better to use, in the case of Missan and Basrah, Montana vs. Wyoming. However, that is misleading because we know a lot more about those two states than we do about Basrah and Missan. Not only that, but the violence between these two could have been very different, which the Montana vs. Wyoming comparison doesn’t stand up to. Also, there’s a difference between violence during regular times, and violence during war. As we saw with Fallujah, one small area can get massacred, while other comparable areas didn’t.
“by the way, this assertion:
they gave all 3 clusters to one of the two provinces, in effect the same as only having one cluster with 90 households in it
is wrong, isn’t it?”
No. Why? Isn’t distributing 3 clusters with 30 households with a single trial the same as distributing 1 cluster with 90 households? It is.
“Bla bla…
is also wrong for most sensible assumptions about within-cluster and between-cluster variance, isn’t it?”
It would have been more precise than what the Lancet study came up with. Lancet oversampled 6 provinces with their pairing process. If you just had fewer clusters to distribute with larger sample sizes, you would not be oversampling any of the provinces, and your result would have been more precise than what the Lancet study did.
“Be clear here; you are accusing a team of eminent scientists of either incompetence or dishonesty here. If proved, this would be enough to wreck a career, which is why accusations of this sort ought not to be made anonymously.”
They did what they did to reduce the places they needed to travel to. Given the circumstances, it can be defended to have compromised the precision of the study in order to do this, even though again, this gets murky when they quite purposely went to Fallujah.
Dishonesty? Well, the Lancet journal lied on their website about the results of the study. Can you answer me why they did that?
October 5th, 2005 at 3:54 am
Seixon, I asked about the name of your instructor because I think he or she would have been interested in your use of statistics. But why don’t you contact him or her yourself and find out what he or she thinks of your use of statistics. OR perhaps you realize that there is something wrong with the stuff you have written.
One cluster of size 90 is not the same as three clusters of size 30. If you believe that, then presumably 90 clusters of size 1 is also same, right?
And if Missan wasn’t paired up it’s chance of getting a cluster would have been 100%.
October 5th, 2005 at 3:57 am
Let’s try again. A household in a governorate not selected for pairing had the same chance of being chosen as a household in a paired one.
October 5th, 2005 at 4:09 am
And if Missan wasn’t paired up it’s chance of getting a cluster would have been 100%.
Tim, not for the first time, you have underestimated Seixon’s propensity to muddle. He is trying to calculate Missan’s probability at the outset of getting one or more clusters with and without pairing. Needless to say, he gets part of the answer wrong.
October 5th, 2005 at 4:42 am
Lambert,
I seriously doubt my teacher would give a damn about what I was doing with statistics, and quite frankly, wasting his time because you won’t be candid seems a bit frivolous. Thus, I will not waste his time with this, as I am quite sure he has enough to do as it is.
“One cluster of size 90 is not the same as three clusters of size 30. If you believe that, then presumably 90 clusters of size 1 is also same, right?”
I guess I wasn’t clear…
When you clump 3 clusters of 30 households and distribute them together, you get almost the same result as you would by having 1 cluster with 90 households. The similarity, as I explained, is that one province gets all of the households. Of course, within the province that wins the 3 clusters, those 3 clusters will be distributed differently than the 1 cluster. Yet with such a small geographical area, this will not have such a profound difference.
The similarity was with how the clusters and the households in them were distributed. With both 1×90 and 3×30-clumped, all of the households would be in one of the two provinces. This would not be the case with genuine cluster sampling.
“And if Missan wasn’t paired up it’s chance of getting a cluster would have been 100%.”
During the initial cluster sampling, Missan had, as I calculated, a 61% chance of being sampled. This after distributing 33 clusters via cluster sampling, using PPS SRS. Do you want to explain how its chance would have been 100%?
Kevin,
“Let’s try again. A household in a governorate not selected for pairing had the same chance of being chosen as a household in a paired one.”
Ah, so the households’ chances have no relation to the province’s chances? How do you figure that?
If Missan doesn’t get sampled, neither does its households. I don’t see how you are sidestepping that fact.
“Tim, not for the first time, you have underestimated Seixon’s propensity to muddle. He is trying to calculate Missan’s probability at the outset of getting one or more clusters with and without pairing. Needless to say, he gets part of the answer wrong.”
Yes, at the outset. What else are we supposed to be calculating? I am not muddling at all, you guys are muddling what I am saying and sidestepping things I am saying. How did I get part of the answer wrong? Geez. This is like talking to a wall that says, “you are always wrong,” on it.
October 5th, 2005 at 5:07 am
Seixon: How did I get part of the answer wrong?
I left that as an exercise. You will never get the hang of it unless you do your homework. (Hint: Missan’s population is 34% of the combined population of Basrah and Missan.)
Ah, so the households’ chances have no relation to the province’s chances?
There is a relation but it is more subtle than you think. For a clue, see my comment (number 158) in the earlier thread:
timlambert.org/2005/09/la…
October 5th, 2005 at 5:19 am
dsquared,
Strike my comment about the Lancet study not saying something about military perhaps not being accounted for. I see it says that the deceased had to have been living in the household at the time of death, and must have been living there for 2 months up until that point. Now I don’t know anything about how the military worked in Iraq, so I can’t know whether or not military deaths were therefore accounted for.
October 5th, 2005 at 5:27 am
Kevin,
“There is a relation but it is more subtle than you think. For a clue, see my comment (number 158) in the earlier thread:”
Yes, here you are pretending that the probability of Missan not winning the clusters at all is irrelevant. As I said, if you operate with a GIVEN that Missan has won the clusters, then and only then do the households have an equal chance of being selected in either case.
Then the prickly fact remains that with the Lancet method, Missan had a 66% chance of ending up with zero, and thus no households sampled, and thus the households would have 0% chance of being sampled. Without the pairing, this figure would have been 39% (or according to Lambert, 0%).
There would be no reason to conduct cluster sampling between Basrah and Missan, as the result would be virtually the same as in the original cluster sampling.
You’re just sticking to your already backfired guns Kevin. Give it up.
October 5th, 2005 at 5:50 am
Seixon, do you understand that when people say that pairing done as the Lancet study did doesn’t change the expectation value but does increase the variance, that in plain English it means that there’s a chance the bloodier of the two provinces will get all the clusters and sends the calculated death rate upwards, or that the more peaceful province gets all the clusters and sends the calculated death rate downwards? That’s what Tim’s simulation showed and furthermore, I did the same thing in a simple example where I assumed that the survey actually counts all the deaths in the chosen province and then extrapolates that number to include both. If the provinces are different, they’ll get a result that is too high or too low, but the expectation value is equal to the true value. If a million different groups did the Lancet survey exactly as they did on the same days, the average value of those million groups would probably be right on top of the true value of the death toll, but the individual surveys are going to get results more scattered than if they hadn’t engaged in pairing. So the study isn’t biased–it’s just going to be less accurate than a survey without pairing. Is there something wrong with what I just said?
The interesting point is whether, by chance or if you prefer, devious liberal plotting, the Lancet team happened to survey bloodier-than-average places. You were trying to show that with troop casualty figures, I think, and someone else disagreed with your figures. I have no idea who is right, but that’d be a much better use of energy than what you’ve been doing with the sloppy use of the term “biased”.
October 5th, 2005 at 5:59 am
On a non-Seixon related topic, I read part of the Roberts article that Tim linked and aside from the other estimates of Iraq casualties, there was also a statement that said (paraphrasing from memory) that sensitivity analysis shows that the Iraq Body Count coverage is 20 percent. Anyone know anything about sensitivity analysis and how you’d arrive at a figure in Iraq? I assume it means the fraction of the dead that are likely to be counted by IBC. I was trying to do something like this comparing the UNDP death toll for children in the first year with the IBC death toll for children in two years, but I don’t know the error bar on the UNDP number. Taken at face value, though, the UN survey found about 3000 deaths in people under 18 in their time frame, a little over a year, and IBC found about 1300 in the first two years, so IBC is finding much less than half. If you just assumed most of the UN death toll is composed of civilians and compared it to IBC, the ratio is about 2 to 1, I think, but of course we don’t know what fraction of the UN numbers are civilians.
October 5th, 2005 at 6:15 am
Seixon:
Now I don’t know anything about how the military worked in Iraq
Do you think it is likely that it would have worked in a way which had its soldiers at home living with their families during the two months leading up to the war?
Donald:
I think that Roberts is probably referring to a study that’s mentioned in the closing paras. of the Lancet study in which a passive reporting system similar to IBC counted roughly a seventh of the deaths later established.
October 5th, 2005 at 6:24 am
Yes, here you are pretending that the probability of Missan not winning the clusters at all is irrelevant.
No, I am not. As I explained previously, when calculating the probability that any particular household in Missan is sampled, we can make use of the fact that if Missan gets no clusters then the probability that the said household will be sampled is precisely zero. So the probabilities which have to be summed are: Probability that Missan gets 1 cluster and said household is one of the 3,262 households surveyed) + (Probability that Missan gets 2 clusters and said household is one of the 1,631 households surveyed) + (Probability that Missan gets 3 clusters and said household is one of the 1,087 households surveyed) + …etc., up to a maximum of 33 clusters. The thing to notice is that as the number of clusters increases, the probability of the household being surveyed also increases.
For nifty ways of calculating these sums, see any introductory textbook.
October 5th, 2005 at 6:54 am
As I understood it, calculating the expected value depends on a binomial distribution. The Lancet pairing does not provide a binomial distribution, as it is equal to one trial, whereas SRS would be equal to 3 trials.
Donald,
All that was a waste of space. I have asked anyone here to show me that clumping clusters is accepted statistical methodology, and so far Lambert and the crew have come up with Null.
Kevin,
Does your nifty textbook also talk about clumping clusters? Or is Tim Lambert the only one who has such a textbook? I’m seriously getting tired of waiting for verification from statistical literature that I am wrong about the cluster clumping being unsupported methodology.
Also, your leaning back on the expected value thing is really getting boring.
66% vs. 39%. That’s all I really have to say, and you haven’t had anything that defeats that single fact. (I’m wondering if Mr. Lambert is going to explain how it is really 0% instead of 39%…)
October 5th, 2005 at 7:11 am
66% vs. 39%. That’s all I really have to say….
Is 39% supposed to be a correction of one of the figures you gave in comment number 39? When it popped up in comment number 46 it looked like a typo. Lest you think it corrects the mistake I referred to, it doesn’t.
Incidentally, you don’t have to refer to a textbook. If you do it on a spreadsheet you will get the same result: A household in a governorate not selected for pairing has the same chance of being chosen as a household in a paired one.
October 5th, 2005 at 7:32 am
Kevin,
For Missan, with the pairing, there’s a 66% chance that none of Missan’s households will be sampled, no matter how many clusters there are. Regardless of anything, Missan’s households have a 66% chance of being sampled with the pairing process.
Now, without the pairing, Missan’s households have a chance of being sampled if Missan gets one or more clusters. The chance of Missan getting 1 or more clusters is 61%. That is using a binomial distribution with p=2.8% (Missan’s PPS value) and with 33 trials. Which means that Missan has a 39% chance that it will not be sampled at all.
Now, when Missan gets paired, regardless of how many clusters the pair gets, Missan’s households will have a 66% chance of not being sampled.
In comparison, Basrah had a 16% chance of exclusion from a genuine random cluster sampling, as was done initially. Then, because of the pairing, it had a 34% chance of exclusion, regardless of any number of clusters or anything else.
October 5th, 2005 at 8:25 am
For Missan, with the pairing, there’s a 66% chance that none of Missan’s households will be sampled, no matter how many clusters there are.
Here you refer to the situation after the initial allocation of clusters and before it is determined whether Basrah or Missan will get them.
Now, without the pairing…. Missan has a 39% chance that it will not be sampled at all.
Here you refer to the situation before the initial allocation of clusters. Do you see why Tim Lambert misunderstood what you were trying to do earlier? You jump from one stage of the process to the other.
However, your main problem is that you continue to focus on the probability of Missan getting sampled. That’s part of the calculation of course. What you need to move on to is the probability of a Missan household getting sampled. Your best course is to do it on a spreadsheet. If you do it right the probability will be the same for the two sampling methods.
October 5th, 2005 at 8:38 am
Kevin,
“Here you refer to the situation after the initial allocation of clusters and before it is determined whether Basrah or Missan will get them.”
Uh, it doesn’t matter how many they get, kiddo. The probability is tied to their populations, no matter what, Missan will have a 66% chance of exclusion.
“However, your main problem is that you continue to focus on the probability of Missan getting sampled. That’s part of the calculation of course. What you need to move on to is the probability of a Missan household getting sampled. Your best course is to do it on a spreadsheet. If you do it right the probability will be the same for the two sampling methods.”
Yes, if Missan isn’t sampled, neither are its households. Right?
We have to compare the initial sampling with the grouping process. Just forget everything about conducting SRS in the 2nd phase. That would be meaningless to do anyways.
The fact remains that Lancet’s 2nd phase is not random sampling. Random sampling entails distributing each element independently, or each cluster independently.
Now if you, or Lambert, or anyone wants to show me ANYTHING that indicates that clumping clusters together in a 2nd phase into a single cluster, and then breaking them off again once they are distributed as a block, is supported, scientific, and good statistical methodology, please, for the love of human intelligence, show it to me.
Tim claimed this clumping action was cluster sampling - bzzzt, wrong.
Tim claimed this clumping action was multistage clustering - bzzzt, wrong.
Come on Tim, put up or concede.
October 5th, 2005 at 9:42 am
Kevin,
I have a different probability question for you. If Seixon Googled “Total Probability Theorem” what would be the probability that he figures out what you’re talking about?
October 5th, 2005 at 11:03 am
The intial assignment of clusters to governorates was not done by simple random sampling as Seixon believes. I quote:
Since Missan has 685,000 people its chance of getting a cluster in the initial assignment was 685/739=93%. (In an earlier comment I incorrectly said that it was 100% because I thought Missan had more than 739,000 people.) Note that the probability of Missan being sampled is different from what you get with SRS. But the expected number of clusters in Missan and hence the probability of a household in Missan being sampled is the same whichever scheme you use.
Seixon, I’ve given you a reference on clustered sampling. Your failure to understand it is not my fault.
October 5th, 2005 at 11:26 am
Sorry to interrupt all the good fun you’re having beating up Seixon, but can I point out two things:
In fact, that’s probably the reason why the standard methodology for these types of surveys uses 30 clusters: 30 is the smallest number that can be considered “large” in this context.
I can’t be completely sure exactly what they did to calculate the SEs, but I’m reasonably confident that the Lancet SEs are based on ignoring the first stage province sampling, and treating the design as simply 33 clusters with 30 observations in each one.
In the end, getting the SEs exactly right isn’t the be all and end all (compared to reducing bias). So maybe it’s better to bash Seixon. You’re innumerate Seixon! You still haven’t posted your transcript Seixon!
October 5th, 2005 at 12:07 pm
Ragout, you seem to be implying that the criticism of Seixon is somehow unjustified. Do you agree that his claim that the two-stage design produces bias?
I’m open to persuasion on the question of whether they accounted for the pairing when they calculated the CIs. It certainly seems unlikely to make much difference if they treated it as a one stage design when boot-strapping.
October 5th, 2005 at 12:36 pm
Lambert,
Missan was not the first province on the list, thus it would not have the probability you say.
Again, you just operate with some assumption so that what you say will sound correct. Regardless of that, the provinces had the same proportional chance of receiving a cluster, as that process emulates SRS with PPS.
I love how you claim I don’t understand the link you gave me on cluster sampling. Your arrogance reeks all the way across the world. So does your dishonesty. Not once in that paper did it say anything about clumping clusters, in fact, it talked almost exclusively about using SRS to distribute clusters.
But hey, what do I know? I’m illiterate! Just another addition to the “adjectives to use for Seixon” list.
The two-stage design introduces a bias because of the way it was done. The provinces chosen were not randomly chosen, and the pairings were also not randomly chosen. The pairings weren’t chosen due to any rationale other than an arbitrary one. That no randomy a samply make.
Now I have whipped up a more honest simulation of a cluster sample vs. the Lancet sample.
Unlike Lambert, I do not think it realistic to think that a sample of mortality in a certain region will vary from 1-13 or 1-6. Instead of these unrealistic measures, I used the coalition death rates from Basrah and Missan. Using these as a mean in a normal distribution, I conducted 250 trials.
For Basrah, the random normal distribution was with u=27, sd=2. For Missan, u=13, sd=1.
Here are the results of this more realistic simulation:
As you can see, wildly different from Tim’s dishonest smokescreen “simulation”. The mean does not matter as much as the median, because we are looking at what result we will be getting most often. This shows that the Lancet method is vastly different than cluster sampling, and why there are serious problems with doing things in this manner.
This simulates the problem before us much better because it is implausible that the mortality rates in the provinces will vary as much as 1-13 or 1-6.
Yet again Lambert, instead of explaining or proving your case, you resort to arguing from authority. I mean damn, you can’t even find a single paragraph or a page that says that clumping of clusters is an accepted methodology? That’s all it would take. I’ve been waiting over a week for it, tick-tock.
Me’s a starting to a think that it’s not a going to be appearing….
October 5th, 2005 at 2:14 pm
This is amazing.
Seixen: A bias is when some part of the population is more or less likely to be chosen than the rest.
Then there was no bias… because the probability of any household being selected was equal. Unless you are suggesting fraud. And if you are suggesting fraud, then there is still no sampling bias. So admit your mistake.
Seixen: Whether this produces a higher or lower mortality rate doesn’t matter
Well then there is no bias in the expected value… only a potential increase in variance. Which is exactly what everybody has been trying to tell you.
Kevin, I admire your restraint and patience!
October 5th, 2005 at 2:18 pm
Seixen: “Uh, it doesn’t matter how many they get, kiddo. The probability is tied to their populations, no matter what, Missan will have a 66% chance of exclusion.”
But what matters is each households chance of exclusion. That is a function of the province’s chance of excusion, and the households chance of exclusion from within the province. And the chance of each Iraqi households exclusion is identical (and equal to 1 minus their chance of inclusion, funnily enough).
Unless you’re suggesting a fraudulant survey, in which case — there is still no sampling bias.
October 5th, 2005 at 2:39 pm
In order to demonstrate bias, you have to be able to point out exactly where it comes in. I.e. “in this step, a province with the higher death rate has a higher chance of getting selected than the province with the lower death rate”. I don’t see it. Yes, I think we’re all convinced that the error estimates are going to be off; but it’s not been demonstrated to be asymmetrical.
October 5th, 2005 at 3:09 pm
Lambert,
No of course I don’t agree with Seixon that the Lancet’s multi-stage sampling scheme causes bias. But hasn’t that point been kind of beaten to death?
And you’re ignoring Seixon’s potentially valid point: that the particular sample drawn in the Lancet study seems to have been more violent than average? Admittedly, someone (BruceR?) seems to have refuted this point.
Seixon,
Are you aware that almost every survey that involves a personal visit has a multi-stage sampling scheme much like the Lancet’s? Here’s one example.
In your example, what precisely is wrong with the “Lancet cluster grouping”? It’s true that the estimate from this sampling scheme will never be right on the nose: sometimes it will be too high, and sometimes it will be to low. It’s also true that it’s right on average (unbiased). Also, the estimate would get a lot better if you had a few more provinces (say, 12 as in Lancet).
October 5th, 2005 at 3:50 pm
It’s simple to see where asymmetry could have been introduced.
The goal of pairing regions was to reduce travel time. So they picked distant regions and paired them with neighbors that to the author’s best estimation (see? bias need not be fraudulent )were similar in levels of violence. However, levels of violence relates directly to the parameter of interest. We know that distant regions were less violent that proximal regions. We know that the distant regions were more often paired with more violent regions.
It doesn’t matter that the clusters were assigned randomly between one of two paired regions because it is the choice of paired compared to unpaired where the bias occurs.
Perhaps it’s easier to understand when this is turned around. Let’s say the authors wanted to increase travel time. The best way to do this would be to find some of the regions with main highway arteries and major cities, pair them up with “similar” neighboring regions, randomly distribute the clusters between regions in a pair and, with some probability, several of the most urban regions are left out of the study while all of the most distant regions are left in the study.
I must be missing something because this just seems too obvious.
October 5th, 2005 at 4:41 pm
Seixon, when you did your simulations I’m sure you discovered that no matter what distribution you used (even the ridiculous ones you used) the mean was the same. Your justification for this is absurd:
The mode is the result that you get most often, not the median. And neither is relevant since the Lancet was reporting the mean.
Also, this is wrong:
The ordering of the provinces on the list does not make a difference to their chance of being sampled. If it did, the process would be biased.
October 5th, 2005 at 8:16 pm
The Lancet article makes it fairly clear that the authors did not take into account the two-stage sample design when calculating the SEs. For example, the software they use can’t estimate SEs for 2-stage sample designs.
Ragout,
Which software are you referring to here? They mention three: Mark Myatt’s, EpiInfo and STATA. I’m not familiar with any of them, nor with bootstrapping; my stats are pretty old-fashioned. Given that the CIs were bootstrapped, do you reckon it matters? (I’m not implying it doesn’t, I don’t know enough about it to have a view.)
As to the possibility that the Lancet sample just happened to be unusual, I think if any point has been beaten to death it’s that one. Heiko Gerhauser (spelling?) specialised in that critique. Everyone seems to have finished up with the same view as they started with. I’m agnostic about it. The sensible thing to do is combine the Lancet figures with other sources and cobble together a guess.
October 5th, 2005 at 10:56 pm
Tim,
Correct, I mixed up mode and median. Median is still a better indicator than the mean in this instance. The mean was off by about 1 between the two graphs I gave.
Also, because of using a random start, Missan will not have the probability you speak of at all. It depends on what number they come up with for the random start.
Also, as long as you chose the order of the list randomly, there will be no bias according to which province is first on the list.
I see you still haven’t sourced the allegation that the Lancet methodology is supported…. zzzzzz….
Ragout,
That study you linked to isn’t even similar to the Lancet one at all. The study you linked to conducted sampling according to the norm. There was no cluster clumping in that study. So again, can anyone show a study that used cluster clumping, can anyone show that this is supported statistical methodology??
True, the study, if conducted hundreds of times, would eventually average out in result. How unfortunate, then, that the Lancet study was only conducted ONCE! It doesn’t matter that if you conducted the Lancet study 1000 times that the result will average out. We are talking about a one time thing there. As you can see from my simulation, Lancet’s result will either be too high or too low in the paired provinces (when they have different violence levels). With real cluster sampling, the result will always be within the right range. That’s what’s important here.
I’m taken back that you, Ragout, cannot see the vast differences in methodology between the Lancet study and the one you linked to. It is like night and day.
z,
I did point out where this happened. It happened when they decided to pick 12 provinces arbitrarily, this made those provinces have different chances than those who were left alone. On top of that, each pair was also biased by the fact that they clumped the clusters. This biased it towards the more populous of the two. This would not have been the case if they had used SRS for distributing the clusters in the 2nd phase. As you could see from the study, 4 of the 5 pairs that were unequal in population ended up giving ALL the clusters to the most populous. As I have pointed out numerous times, this type of methodology is nonexistant and I have yet to receive any evidence from Lambert or anyone else that it is valid to do things in such a way.
eudoxis,
Thanks for nailing it right on the head. The denial “in the room” is astounding. I can’t even get a single source to show that the methodology is valid! I’ve been waiting for almost 2 weeks now….
October 5th, 2005 at 11:13 pm
The software I was referring to is EpiInfo, which allows a single clustering variable. They also mention using some specialized package “written for Save the Children.” I don’t know what that is, but it seems unlikely that it would allow for a very complex design.
I’m not that familiar with bootstrapping for survey SEs either. But as I understand it, the bootstrapping is supposed to replicate the original sampling scheme. So, I think they resampled clusters rather than individual observations. That is, they repeatedly drew a random sample of 33 clusters from their original dataset of 33 clusters (with replacement). They could have done something much more complex with bootstrapping, of course, but they present it like they’re just replicating the design they specified in EpiInfo.
I think the SEs will be underestimated from this bootstrapping procedure (or other methods that ignore the sampling of provinces) if the paired provinces have population means that are different. So it depends on how well they were able to pair similar provinces.
If Seixon is right that the pairings were poor, then the reported SEs could be underestimated substantially. Seixon isn’t that persuasive, so I think Lambert is probably right to guess that the pairings don’t affect the SEs that much.
October 6th, 2005 at 1:02 am
The median or the mode is the best indicator to use in my simulation, as Wikipedia summarizes:
You sure that the mean is the best figure to use Tim? I think you’re being intellectually dishonest, again. In non-skewed distributions, the median and mean will be very similar, as is evidenced by my cluster example: mean = 21.65, median = 21.78. In the Lancet cluster grouping example, mean = 22.42, median = 26.
Now, I seem to recall from way back in the day that if the median was more than the mean, the distribution was skewed upwards, while if it was lower, the distribution was skewed downwards.
Correct? So since the Lancet cluster grouping is skewed upwards, it would be faulty to compare that with the other according to the mean.
Feel free to correct me, I’m just recalling what I was taught back in high school. Just seems that Wikipedia is in agreement with my methods.
October 6th, 2005 at 3:11 am
Seixon: A simple google search of (mean median skewed) would have corrected you. But since it appears that you need help, see:
www.amstat.org/publicatio…
Many textbooks teach a rule of thumb stating that the mean is right of the median under right skew, and left of the median under left skew. This rule fails with surprising frequency. It can fail in multimodal distributions, or in distributions where one tail is long but the other is heavy. Most commonly, though, the rule fails in discrete distributions where the areas to the left and right of the median are not equal. Such distributions not only contradict the textbook relationship between mean, median, and skew, they also contradict the textbook interpretation of the median.
October 6th, 2005 at 3:17 am
Seixon, the Lancet study used the mean. Which you’ve shown is not affected by the pairing of governorates.
October 6th, 2005 at 4:07 am
Eudoxis, you could deliberately pair provinces with wildly different death rates and the expected value still wouldn’t be affected. The variance would go up, because when you made your choice you’d have picked a province which was either peaceful or exceptionally violent and so your calculated death toll for the sum of the two would be too high or too low.
October 6th, 2005 at 4:10 am
Tim Lambert: The intial assignment of clusters to governorates was not done by simple random sampling as Seixon believes.
Thanks for that. I must admit I had been taking Seixon’s word for it instead of referring back to the study. That was foolish of me. My excuse is that I’m not the guy who is accusing a distinguished group of researchers of using a procedure which generates a biased sample, then sneaking it past the referees of a prominent journal. What’s Seixon’s excuse? Where does that leave his critique? Are we now asked to believe that the sampling was biased anyway, even though it was not actually done in the way he thought it was done? (Clearly there would have been no bias even if it had been done the way we both supposed, but that’s neither here nor there at this stage.)
October 6th, 2005 at 5:18 am
Chris,
In my haste, it seems I got it backwards. That still doesn’t mean that using the median isn’t the better choice in this example. Clearly with the Lancet distribution, the median gives us a better indication of the typical result, and not the mean. In fact, if you put the mean as a horizontal on that graph, it won’t virtualy touch a single result. Now how’s that for a typical result?
Donald,
“Eudoxis, you could deliberately pair provinces with wildly different death rates and the expected value still wouldn’t be affected. The variance would go up, because when you made your choice you’d have picked a province which was either peaceful or exceptionally violent and so your calculated death toll for the sum of the two would be too high or too low.”
Quite correct sir, except there’s only one problem: the pairings were not random, and neither were the choice of provinces to be paired. If they were, there would be no bias. There would just be a horrifying lack of precision, as the Lancet study demonstrates quite well with its gigantic CI. You’re leaving out the facts Donald.
Lambert,
How do you know that the Lancet study used the mean? It doesn’t say so. Regardless of that, in the case I was demonstrating, the median would be a better choice to show what result was typical. How you doing on finding documentation for cluster-clumping?
Kevin,
“What’s Seixon’s excuse? Where does that leave his critique? Are we now asked to believe that the sampling was biased anyway, even though it was not actually done in the way he thought it was done?”
The clusters were picked via SRS, just in a different way. Not only that, the method they used is heavily accepted and documented as statistical methodology. There is nothing wrong with how they distributed the clusters in the initial sampling. Claiming that it was not SRS is very misleading. I suggest you guys read up on a simple guide over at the Center for Disease Control website that details how to use this method.
SRS entails distributing each cluster independently with a random trail each. That was done. Lambert, you are starting to fall completely off the wagon now.
October 6th, 2005 at 5:51 am
Claiming that it was not SRS is very misleading.
Seixon,
Well, how would you describe your claim that “Missan has a 39% chance that it will not be sampled at all” – extremely misleading, perhaps?
Even if you had been right about the method used, that would still leave you with the wrong answer for the Lancet approach: your 66% figure for Basrah and 34% for Missan only makes sense if one or other of them gets at least one cluster in the initial allocation. With SRS, which is what you were assuming when you calculated those figures, there is no guarantee of that.
You simply haven’t thought this thing through as you should have done before publishing your critique. You are making up your case as you go along and tossing a baby off the sled whenever the need arises.
October 6th, 2005 at 6:54 am
No Seixon, even if you pair wildly different provinces with wildly different death rates there’s no bias if you choose which one to give the clusters to based on using the procedure outlined in the Lancet paper, where the chance of province A getting all the clusters is proportional to its population. It’d be dumb to pair wildly different provinces, because it decreases the precision (by increasing the variance), but the expected value wouldn’t be changed and it wouldn’t be a biased sample in the technical sense.
October 6th, 2005 at 7:57 am
A question for Seixon (don’t ask me why I bother, I really don’t know): you obviously attach great importance to the fact that the survey’s pairing method changes the probability that Missan will receive zero clusters. Even a cursory look through these threads will show that you consider this a very telling point. Yet until very recently you were under the impression that, without pairing, Missan would have a 39% of getting zero clusters.
Yet, when Tim Lambert informed us that “since Missan has 685,000 people its chance of getting a cluster in the initial assignment was 685/739=93%”, you were quite unmoved by this. You wrote: “There is nothing wrong with how they distributed the clusters in the initial sampling.”
So, without pairing, 39% is fine by you and 7% is also fine by you. Yet, because pairing changes the odds, pairing is reprehensible. You are quite unmoved by the argument that any particular Missan household has the same probability of being selected under any of the systems being considered (which is what matters to the rest of us). When pairing comes into it, and only then, Missan’s probability of getting zero clusters is important to you. How come? If that probability matters when we pair governorates, surely it should also matter when we don’t?
October 6th, 2005 at 8:46 am
Kevin,
How was 39% misleading? That was according to a binomial distribution with 33 trials with p=2.8%. Of course with the equal-step method used, this changes slightly, but I’m not sure how that would be calculated, especially since the list is made randomly, and a random start is used.
Moving on to the pairings, I said that Missan’s households had a 66% chance of exclusion. This does not depend on the number of clusters that either