As we have seen in an earlier post on the China Study data (), which explored relationships hinted at by Denise Minger’s previous and highly perceptive analysis (), one can use a multivariate analysis tool like WarpPLS () to explore relationships based on data reported by others. This is true even when the dataset available is fairly small.
So I entered the data reported in the most recent (published online in March 2012) study looking at the relationship between red meat consumption and mortality into WarpPLS to do some exploratory analyses. I discussed the study in my previous post; it was conducted by Pan et al. (Frank B. Hu is the senior author) and published in the prestigious Archives of Internal Medicine (). The data I used is from Table 1 of the article; it reports figures on several variables along 5 quintiles, based on separate analyses of two samples, called “Health Professionals” and “Nurses Health” samples. The Health Professionals sample comprised males; the Nurses Health sample, females.
Below is an interesting exploratory model, with results. It includes a number of hypotheses, represented by arrows, which seem to make sense. This is helpful, because a model incorporating hypotheses that make sense allows for easy identification of nonsense results, and thus rejection of the model or the data. (Refutability is one of the most important characteristics of good theoretical models.) Keep in mind that the sample size here is very small (N=10), as the authors of the study reported data along 5 quintiles for the Health Professionals sample, together with 5 quintiles for the Nurses Health sample. In a sense, this is somewhat helpful, because a small sample tends to be “unstable”, leading nonsense results and other signs of problems to show up easily – one example would be multivariate coefficients of association (the beta coefficients reported near the arrows) greater than 1 due to collinearity ().
So what does the model above tell us? It tells us that smoking (Smokng) is associated with reduced physical activity (PhysAct); beta = -0.92. It tells us that smoking (Smokng) is associated with reduced food intake (FoodInt); beta = -0.36. It tells us that physical activity (PhysAct) is associated with reduced incidence of diabetes (Diabetes); beta = -0.25. It tells us that increased food intake (FoodInt) is associated with increased incidence of diabetes (Diabetes); beta = 0.93. It tells us that increased food intake (FoodInt) is associated with increased red meat intake (RedMeat); beta = 0.60. It tells us that increased incidence of diabetes (Diabetes) is associated with increased mortality (Mort); beta = 0.61. It tells us that being female (SexM1F2) is associated with reduced mortality (Mort); beta = -0.67.
Some of these betas are a bit too high (e.g., 0.93), due to the level of collinearity caused by such a small sample. Due to being quite high, they are statistically significant even in a small sample. Betas greater than 0.20 tend to become statistically significant when the sample size is 100 or greater; so all of the coefficients above would be statistically significant with a larger sample size. What is the common denominator of all of the associations above? The common denominator is that all of them make sense, qualitatively speaking; there is not a single case where the sign is the opposite of what we would expect. There is one association that is shown on the graph and that is missing from my summary of associations above; and it also makes sense, at least to me. The model also tells us that increased red meat intake (RedMeat) is associated with reduced mortality (Mort); beta = -0.25. More technically, it tells us that, when we control for biological sex (SexM1F2) and incidence of diabetes (Diabetes), increased red meat intake (RedMeat) is associated with reduced mortality (Mort).
How do we roughly estimate this effect in terms of amounts of red meat consumed? The -0.25 means that, for each standard deviation in the amount of red meat consumed, there is a corresponding 0.25 standard deviation reduction of mortality. (This interpretation is possible because I used WarpPLS’ linear analysis algorithm; a nonlinear algorithm would lead to a more complex interpretation.) The standard deviation for red meat consumption is 0.897 servings. Each serving has about 84 g. And the highest number of servings in the dataset is 3.1 servings, or 260 g/d (calculated as: 3.1*84). To stay a bit shy of this extreme, let us consider a slightly lower intake amount, which is 3.1 standard deviations, or 234 g/d (calculated as: 3.1*0.897*84). Since the standard deviation for mortality is 0.3 percentage points, we can conclude that an extra 234 g of red meat per day is associated with a reduction in mortality of approximately 23 percent (calculated as: 3.1*0.25*0.3).
Let me repeat for emphasis: the data reported by the authors suggests that, when we control for biological sex and incidence of diabetes, an extra 234 g of red meat per day is associated with a reduction in mortality of approximately 23 percent. This is exactly the opposite, qualitatively speaking, of what was reported by the authors in the article. I should note that this is also a minute effect, like the effect reported by the authors. (The mortality rates in the article are expressed as percentages, with the lowest being around 1 percent. So this 23 percent is a percentage of a percentage.) If you were to compare a group of 100 people who ate little red meat with another group of the same size that ate 234 g more of red meat every day, over a period of more than 20 years, you would not find a single additional death in either group. If you were to compare matched groups of 1,000 individuals, you would find only 2 additional deaths among the folks who ate little red meat.
At the same time, we can also see that excessive food intake is associated with increased mortality via its effect on diabetes. The product beta coefficient for the mediated effect FoodInt --> Diabetes --> Mort is 0.57. This means that, for each standard deviation of food intake in grams, there is a corresponding 0.57 standard deviation increase in mortality, via an increase in the incidence of diabetes. This is very likely at levels of food consumption where significantly more calories are consumed than spent, ultimately leading to many people becoming obese. The standard deviation for food intake is 355 calories. The highest daily food intake quintile reported in the article is 2,396 calories, which happens to be associated with the highest mortality (and is probably an underestimation); the lowest is 1,202 (also probably underestimated).
So, in summary, the data suggests that, for the particular sample studied (made up of two subsamples): (a) red meat intake is protective in terms of overall mortality, through a direct effect; and (b) the deleterious effect of overeating on mortality is stronger than the protective effect of red meat intake. These conclusions are consistent with those of my previous post on the same study (). The difference is that the previous post suggested a possible moderating protective effect; this post suggests a possible direct protective effect. Both effects are small, as was the negative effect reported by the authors of the study. Neither is statistically significant, due to sample size limitations (secondary data from an article; N=10). And all of this is based on a study that categorized various types of processed meat as red meat, and that did not distinguish grass-fed from non-grass-fed meat.
By the way, in discussions of red meat intake’s effect on health, often iron overload is mentioned. What many people don’t seem to realize is that iron overload is caused primarily by hereditary haemochromatosis. Another cause is “blood doping” to improve athletic performance (). Hereditary haemochromatosis is a very rare genetic disorder; rare enough to be statistically “invisible” in any study that does not specifically target people with this disorder.
Monday, April 2, 2012
Subscribe to:
Post Comments (Atom)
29 comments:
Ned, Thanks for this. We need more people like you who really understand statistics to interpret these idiotic conclusions people read about in the popular press.
Like you feed one group red meat and PCB-laden artificial food and you feed another group organic chicken and lots of veggies, and the PCB group doesn't do as well so you conclude that red meat is dangerous.
Hi, Ned.
You mention this article is in the prestigious Annals of Internal Medicine.
Mark Crislip suggests the Annals are no longer a reliable source. Reference: http://www.sciencebasedmedicine.org/index.php/feet-of-clay/
-Steve
What about red meat's effect on diabetes? That would provide a much stronger case for the protective effect, since a hypothesis could be that the deleterious effects are through diabetes? ...Unless I am missing something.
Yes Gretchen. That is the type of pattern of inter-correlations that we are seeing here. The main “toxin” possibly being excessive food intake, but also with other health-detrimental add-ons.
Thanks for the link Steve. Interesting, especially coming from an internal medicine practitioner.
Hi John. The best chance to find red meat’s true protective effect in connection with diabetes is to do a moderating effect’s analysis; like the one I did in my previous post, but with the original data.
At least that is what I think; did you have something else in mind?
The reality is that red meat is an important component of food intake, thus also contributing to obesity when consumed with other things. My guess is that the main culprits are high-caloric and addictive industrialized foods that come in boxes and bags; primarily those rich in refined carbohydrates, sugars and seed oils.
It’d be great if we could get data from heavy red meat eaters who ate primarily whole foods, exercised, and were reasonably lean. But this is not easy to find in modern urban environments.
A key set of confounders also comes from heavy red meat eaters that are the types who “don’t care” (which is why they eat red meat in the first place, going against their doctor’s advice), which is associated with other diet and lifestyle patterns that are detrimental to health – e.g., alcohol abuse, physical inactivity, smoking.
I think that we can make a logical case, because if you supplement with things from red meat you help to avoid diabetic neuropathy. Carntine, carnosine/beta alanine, lipoic acid, taurine, etc. Those are provided by red meat! Those protect against the damaging effects of metabolic syndrome.
Point proven? Well not really, we have to actually show that red meat is a net improvement. Maybe it is actually harmful. I doubt it, its purported negative effects have been debunked and depend on context.
I think that correlations plus actual quantifiable evidence for mechanisms is generally a strong argument but it isn't a controlled trial, that's for sure.
I will continue to eat grass-fed beef in as healthy a manner as possible, taking into consideration the things that might be wrong with it and making sure to avoid them. I'm pretty sure that it has great nutritional usefulness and isn't pathological, based on empirical evidence.
Ned,
No, that idea is good.
I am not actually suggesting that red meat is causing diabetes and/or obesity by the way--just wondering about the statistics of this one study. The iron content is something to think about though when seeing many "paleo" diets.
Interesting research -- I have a few questions:
What was the sample size?
What were the model fit indices?
Did you check for endogeneity?
Hi Unknown. Are you referring to the original dataset or the small secondary dataset (N=10) that I used in this analysis?
Hi Ned,
In the the original dataset how many cases did you have and where did you get such intriguing data from?
I also now realize that you used PLS so you won't have fit indices like in covariance structure analysis aka SEM and also you can't not model any endogeneity problem you might have had.
This implies that you cannot test whether or not the structural residuals are serially correlated which might create model and parameter estimates mispecifications.
This brings me to the following questions:
Have you tried using covariance based modelling (SEM) or two or three stage least square estimation where you can also test the model in its entirety rather than reling on variance explained maximisation?
Andrea
Hi Andrea. The original data had thousands of cases; the dataset I used in this analysis, as explained in the post, had only 10 cases – 5 + 5 quintiles.
Covariance-based SEM would be unlikely to converge with such a small model. There are other problems with covariance-based SEM as well, more technical in nature, which are why I use multiple variance-based SEM models when I analyze data (see under China Study above).
Variance-based SEM (aka PLS-based) generates fit indices, and also outputs that can be used for tests that are analogous to tests of endogeneity (Q-squared coefficients). It just depends on the software you are using.
I am including them in my next comment.
Model fit indices
-----------------------------------------------------------
APC=0.574, P<0.001
ARS=0.642, P<0.001
AVIF=5.381, Good if < 5
Q-squared coefficients
----------------------
RedMeat PhysAct Diabete Mort FoodInt
0.367 0.851 0.896 0.979 0.134
Mind you, there is massive collinearity in this N=10 dataset that I used, so the results in this post are meant to raise questions, as opposed to being a definitive revision of the results reported by the authors.
It seems that the disparity in the results between my analysis and that of the original authors is due to the fact that they didn’t control for a key confounder – biological sex (male/female).
They “kept sex constant” by analyzing data from males and females separately, and then pooling the results, but did not build a model with the entire dataset, including sex as a covariate.
This can severely bias the multivariate coefficients of association. Sex is correlated with several variables in this model, including mortality. This should have been obvious to the authors, particularly since about 70 percent of the women were premenopausal.
Premenopausal women are the closest we get to indestructible in the human species. Heart disease is much rarer among them than in men of the same age.
I controlled for sex in my analysis, but used a much smaller dataset, created with the quintile means that were reported in the article.
Hi Ned,
Thanks for the clarifications and forgive my questioning but as you know as scientists we always need to be a little skeptical about what we see.
I agree, apart all the estimations issues given by a dataset of 10 cases only, one needs also to face the issue of external validity. So, perhaps the results given in your model are overestimated in terms of gamma, beta and r^2 coefficients.
I am not familiar with the fit ratios of PLS so I cannot comment but I strongly doubt you could model endogeneity with your software unless it is able to produce the Wu-Hausman test, which I am quite hesitant about as not even software packages like LISREl or Mplus can do it.
Also a word of caution, the fact that a software package can produce a model estimation does not necessarily mean that the estimation is proper. Even PLS is not immune by the statistical assumptions of "power" even though it can produce bootstrapped standard error.
Also, is the very last endogenous variable polytomous (death YES?NO)? If the answer is yes, how does PLS estimate models with categorical variables as, as far as I know, it can only work with metric variables. That could be another source of model mispesification (??)
Andrea
Mortality is measured as a percentage here. As far as I can tell, the only categorical variable here is sex, coded as male = 1 and female = 2.
The above does not create a distributional problem for variance-based SEM, because P values are calculated through resampling. This is a nonparametric technique that does not require multivariate normality.
Resampling is very computing-intensive, but even an entry-level netbook these days has enough computing power to yield results for a relatively complex model in a few seconds.
Hi Ned,
Thanks for the clarifications. Be mindful though that as far as the dichotomous variables are exogenous you have no problems but in case of having them endogenous it's a different cattle of fish as bootstrapping the standard errors still implies that the variables need to be metric, that is continuos, having a gaussian distribution.
The parameter estimates of a Bernoulli distributed variable require either a probit or logit estimation - don't get fooled by the PLS literature in MIS which states that PLS can manage all types of distributions because it cannot!
Actually I didn’t use bootstrapping, as it doesn’t perform well with very small samples. I used jackknifing.
essentially it doesn't make a difference how you re-sample 10 cases, you still have 10 cases which is not close to any asymptotic standard.
Ned,
Why do you think professionals like the Harvard group publish studies in this manner? If the interest is to really understand the issue, why do they promote their conclusions (meat = death) based upon such shaky analyses? I just don't get it!
A friend emailed me this--and I am now wondering the same--so in the spirit of friendly scientific inquiry, Ned, can you comment:
Note that Frank Hu, the senior author of the Harvard “red meat study” (anti-beef) is also one of the co-authors of the “saturated fat metastudy.” (pro-sat fat).
If one calls his statistical skills into question on one study then why not on the other?
Frank Hu bio and the study: (pro sat fat) meta study
Thanks for sharing your ideas and date with us. I would say that we still need to study and experiment in order get our data correctly.
Kamagra, how do you run an experiment where the output variable is 'mortality rate'. Would you like to be part of the experimental group?
Andrea
@Andrea (and @Ned) ~
Looks like Kamagra and Conway are just spammers... But I really did want to know Ned, if you felt Frank Hu just blew it on this one result, statistically speaking, or if you feel a lot of his stats are light weight? See my post above on April 16.
Thanks.
~ Brad
just read a little learn a lot
Post a Comment