Saturday, November 5, 2011

The China Study II: How gender takes us to the elusive and deadly factor X

The graph below shows the mortality in the 35-69 and 70-79 age ranges for men and women for the China Study II dataset. I discussed other results in my two previous posts () (), all taking us to this post. The full data for the China Study II study is publicly available (). The mortality numbers are actually averages of male and female deaths by 1,000 people in each of several counties, in each of the two age ranges.

Men do tend to die earlier than women, but the difference above is too large.

Generally speaking, when you look at a set time period that is long enough for a good number of deaths (not to be confused with “a number of good deaths”) to be observed, you tend to see around 5-10 percent more deaths among men than among women. This is when other variables are controlled for, or when men and women do not adopt dramatically different diets and lifestyles. One of many examples is a study in Finland (); you have to go beyond the abstract on this one.

As you can see from the graph above, in the China Study II dataset this difference in deaths is around 50 percent!

This huge difference could be caused by there being significantly more men than women per county included the dataset. But if you take a careful look at the description of the data collection methods employed (), this does not seem to be the case. In fact, the methodology descriptions suggest that the researchers tried to have approximately the same number of women and men studied in each county. The numbers reported also support this assumption.

As I said before, this is a well executed research project, for which Dr. Campbell and his collaborators should be commended. I may not agree with all of their conclusions, but this does not detract even a bit from the quality of the data they have compiled and made available to us all.

So there must be another factor X causing this enormous difference in mortality (and thus longevity) among men and women in the China Study II dataset.

What could be this factor X?

This situation helps me illustrate a point that I have made here before, mostly in the comments under other posts. Sometimes a variable, and its effects on other variables, are mostly a reflection of another unmeasured variable. Gender is a variable that is often involved in this type of situation. Frequently men and women do things very differently in a given population due to cultural reasons (as opposed to biological reasons), and those things can have a major effect on their health.

So, the search for our factor X is essentially a search for a health-relevant variable that is reflected by gender but that is not strictly due to the biological aspects that make men and women different (these can explain only a 5-10 percent difference in mortality). That is, we are looking for a variable that shows a lot of variation between men and women, that is behavioral, and that has a clear impact on health. Moreover, as it should be clear from my last post, we are looking for a variable that is unrelated to wheat flour and animal protein consumption.

As it turns out, the best candidate for the factor X is smoking, particularly cigarette smoking.

The second best candidate for factor X is alcohol abuse. Alcohol abuse can be just as bad for one’s health as smoking is, if not worse, but it may not be as good a candidate for factor X because the difference in prevalence between men and women does not appear to be just as large in China (). But it is still large enough for us to consider it a close second as a candidate for factor X, or a component of a more complex factor X – a composite of smoking, alcohol abuse and a few other coexisting factors that may be reflected by gender.

I have had some discussions about this with a few colleagues and doctoral students who are Chinese (thanks William and Wei), and they mentioned stress to me, based on anecdotal evidence. Moreover, they pointed out that stressful lifestyles, smoking, and alcohol abuse tend to happen together - with a much higher prevalence among men than women.

What an anti-climax for this series of posts eh?

With all the talk on the Internetz about safe and unsafe starches, animal protein, wheat bellies, and whatnot! C’mon Ned, give me a break! What about insulin!? What about leucine deficiency … or iron overload!? What about choline!? What about something truly mysterious, related to an obscure or emerging biochemistry topic; a hormone du jour like leptin perhaps? Whatever, something cool!

Smoking and alcohol abuse!? These are way too obvious. This is NOT cool at all!

Well, reality is often less mysterious than we want to believe it is.

Let me focus on smoking from here on, since it is the top candidate for factor X, although much of the following applies to alcohol abuse and a combination of the two as well.

One gets different statistics on cigarette smoking in China depending on the time period studied, but one thing seems to be a common denominator in these statistics. Men tend to smoke in much, much higher numbers than women in China. And this is not a recent phenomenon.

For example, a study conducted in 1996 () states that “smoking continues to be prevalent among more men (63%) than women (3.8%)”, and notes that these results are very similar to those in 1984, around the time when the China Study II data was collected.

A 1995 study () reports similar percentages: “A total of 2279 males (67%) but only 72 females (2%) smoke”. Another study () notes that in 1976 “56% of the men and 12% of the women were ever-smokers”, which together with other results suggest that the gap increased significantly in the 1980s, with many more men than women smoking. And, most importantly, smoking industrial cigarettes.

So we are possibly talking about a gigantic difference here; the prevalence of industrial cigarette smoking among men may have been over 30 times the prevalence among women in the China Study II dataset.

Given the above, it is reasonable to conclude that the variable “SexM1F2” reflects very strongly the variable “Smoking”, related to industrial cigarette smoking, and in an inverse way. I did something that, grossly speaking, made the mysterious factor X explicit in the WarpPLS model discussed in my previous post. I replaced the variable “SexM1F2” in the model with the variable “Smoking” by using a reverse scale (i.e., 1 and 2, but reversing the codes used for “SexM1F2”). The results of the new WarpPLS analysis are shown on the graph below. This is of course far from ideal, but gives a better picture to readers of what is going on than sticking with the variable “SexM1F2”.

With this revised model, the associations of smoking with mortality in the 35-69 and 70-79 age ranges are a lot stronger than those of animal protein and wheat flour consumption. The R-squared coefficients for mortality in both ranges are higher than 20 percent, which is a sign that this model has decent explanatory power. Animal protein and wheat flour consumption are still significantly associated with mortality, even after we control for smoking; animal protein seems protective and wheat flour detrimental. And smoking’s association with the amount of animal protein and wheat flour consumed is practically zero.

Replacing “SexM1F2” with “Smoking” would be particularly far from ideal if we were analyzing this data at the individual level. It could lead to some outlier-induced errors; for example, due to the possible existence of a minority of female chain smokers. But this variable replacement is not as harmful when we look at county-level data, as we are doing here.

In fact, this is as good and parsimonious model of mortality based on the China Study II data as I’ve ever seen based on county level data.

Now, here is an interesting thing. Does the original China Study II analysis of univariate correlations show smoking as a major problem in terms of mortality? Not really.

The table below, from the China Study II report (), shows ALL of the statistically significant (P<0.05) univariate correlations with mortality in 70-79 age range. I highlighted the only measure that is directly related to smoking; that is “dSMOKAGEm”, listed as “questionnaire AGE MALE SMOKERS STARTED SMOKING (years)”.

The high positive correlation with “dSMOKAGEm” does not even make a lot of sense, as one would expect a negative correlation here – i.e., the earlier in life folks start smoking, the higher should be the mortality. But this reverse-signed correlation may be due to smokers who get an early start dying in disproportionally high numbers before they reach age 70, and thus being captured by another age range mortality variable. The fact that other smoking-related variables are not showing up on the table above is likely due to distortions caused by inter-correlations, as well as measurement problems like the one just mentioned.

As one looks at these univariate correlations, most of them make sense, although several can be and probably are distorted by correlations with other variables, even unmeasured variables. And some unmeasured variables may turn out to be critical. Remember what I said in my previous post – the variable “SexM1F2” was introduced by me; it was not in the original dataset. “Smoking” is this variable, but reversed, to account for the fact that men are heavy smokers and women are not.

Univariate correlations are calculated without adjustments or control. To correct this problem one can adjust a variable based on other variables; as in “adjusting for age”. This is not such a good technique, in my opinion; it tends to be time-consuming to implement, and prone to errors. One can alternatively control for the effects of other variables; a better technique, employed in multivariate statistical analyses. This latter technique is the one employed in WarpPLS analyses ().

Why don’t more smoking-related variables show up on the univariate correlations table above? The reason is that the table summarizes associations calculated based on data for both sexes. Since the women in the dataset smoked very little, including them in the analysis together with men lowers the strength of smoking-related associations, which would probably be much stronger if only men were included. It lowers the strength of the associations to the point that their P values become higher than 0.05, leading to their exclusion from tables like the one above. This is where the aggregation process that may lead to ecological fallacy shows its ugly head.

No one can blame Dr. Campbell for not issuing warnings about smoking, even as they came mixed with warnings about animal food consumption (). The former warnings, about smoking, make a lot of sense based on the results of the analyses in this and the last two posts.

The latter warnings, about animal food consumption, seem increasingly ill-advised. Animal food consumption may actually be protective in regards to the factor X, as it seems to be protective in terms of wheat flour consumption ().


Anonymous said...

It's not about coolness Ned, it's about getting a very small mouse indeed. Nobody is interested in what has been long taken for granted. The only interesting piece of info in your analyisis is that smoking mortality does not correlate with wheat consumption. So the alleged Asian cancer protecting factor, if one is there, which is far from certain, is certainly not rice but maybe lack of milk consumption. But Dr Campbell's database won't provide any counterfactual answer, as milk consumption has become a habit in the coastal regions too recently to see any impact yet.

Anonymous said...



There is a bit of evidence for meat being protective against lung cancer Harsh cooking techniques seemed to make things worse, but boiling was inversely associated with lung cancer. At least I think so. What say shirtless statistical badass?

Nick said...

Ned, wouldn't the woman's health study, or many of the other long term studies mimic the results of the China Study with regard to smokers vs. non-smokers in terms of mortality? I would think that if gender didn't matter, but a factor x did, mortality would show up in the same ratios in a study that included only one gender, or both. I'm sure you thought about that, so I'm wondering what your conclusions were?

Ned Kock said...

Anon, this is not a speck of sand on the Moon’s surface that we can see only with an incredibly powerful telescope. This is more like a crater that we can see with our naked eyes, but that is on the dark side.

Actually, the best analogy here is that of a planet that is oddly changing orbit because of something that we cannot see.

Ned Kock said...

Stabby, SSB says “me no work miracles”.

Ned Kock said...

By the way, are you the same Stabby of Denise’s blog?

Ned Kock said...

Hi Nick. I am not sure I understand the question.

Ned Kock said...

Hi Nick. Maybe this is what you are getting at, so let me address this here.

An effect only shows up on stats if there is enough variation in the variables involved. Otherwise it remains hidden from view. See my post on the chain smokers “study”:

Also, if the variation happens in concert with another variable, stats may give the impression that the effect is caused by a variable that has nothing to do with it.

In the China Study II data, we happen to have a lot of variation in smoking associated with gender. Still, a problem remains, because it is possible that alcohol abuse and stress confound these results.

Ned Kock said...

It is useful to keep in mind that cigarette smoking is likely different from the more traditional forms of smoking (not that I think they are good), and that lung cancer is not the only effect of smoking.

David Isaak said...

Well done!

"But this reverse-signed correlation may be due to smokers who get an early start dying in disproportionally high numbers before they reach age 70, and thus being captured by another age range mortality variable."

Yes, indeed. People messing with stats on smoking often miss the fact that there are two ways to quit in midlife, and one of them is to die--thus removing yourself from any future statistics.

If they stay alive, health risks of being a former smoker generally converge to those who have never smoked over about a dozen years after quitting.

I suspect that one additional risk former smokers incur, however, is being the victim of homicide. Many former smokers are really annoying.

Anonymous said...

Ned: Yes the handful of times I have commented on her blog it was with this handle.

Alex J said...

Hi Ned,

I don't remember much stats... is it possible to see whether pufa consumption and smoking combined have a stronger association with mortality than we would expect by looking at those variables individually?

Pascal said...

Hi Ned,
Your blog is very interesting largely because of its numbers based approach. You have mentioned how we can look at numbers and use them to optimize health and our chances of acquiring diseases.

For conditions like heart disease and diabetes there are a lot of numbers that we can use to guide our preventative efforts.

For Heart Disease prevention we can look at Calcium Scores, C-Reactive protein, Lp(a), HDL etc.
For Diabetes prevention we can look at HgbA1c, Triglycerides, Fasting Blood Glucose etc.

However, for cancer most lab values appear to be for people who have already have cancer. Are you familiar with any lab values that can be used to guide our efforts in cancer prevention? HgbA1c is probably weakly correlated but I was looking for variables that correlate more strongly (like Calcium Score for heart Disease or HgbA1c for diabetes).

Ned Kock said...

David, the homicide angle suggests a major confounder.

You’re funny.

Ned Kock said...

So you are Stabby the raccoon, right? As in the link below:

The Stabby of Denise’s blog seemed to be very interested in using gray matter.

Why not here as well?

I am referring to the “Zzzzzzz ...”

Ned Kock said...

Hi Alex. Yes, it is possible. That could be tested through a moderating effect analysis, like here:

Ned Kock said...

Hi Pascal. I am not a cancer researcher or oncologist, but I know a few oncologists, including one that is a very close friend. Cancer screening tests in general tend to give a lot of false positives.

We know that certain things increase our chances of developing cancer. Cigarette smoking and alcohol abuse are among them; so is obesity:

Cancer cells consume a lot of glucose, so it seems reasonable to me to believe that pushing your body in the direction of becoming more adapted to fat burning, as opposed to glucose burning, is a good thing. That can be achieved by removing refined carb-rich foods from one’s diet; these have high glycemic indices AND loads:

With the current screening tests, which are not very good, I think we are better off removing the risk factors that we can.

David Isaak said...

"Cancer cells consume a lot of glucose, so it seems reasonable to me to believe that pushing your body in the direction of becoming more adapted to fat burning, as opposed to glucose burning, is a good thing."

Indeed. Not only are ketogenic diets effective in slowing advanced cancers, but when they do PET scans for tumors they use fluorine-18-labeled fludeoxyglucose--a substituted glucose. Cancer cells suck up glucose at rates that tend to be propotional to their rate of growth, and at rates far above other tissues in the resting state. Carbs are to cancer as gasoline is to a bonfire.

Alex J said...

Hi Ned,
Thanks for the response. Glad I asked, just checked my old econometrics textbook, and there is no mention of moderating effect analysis. Is that an analysis that you would be interested in running?

The reason I'm curious is because of accounts of hunter gatherers who smoke heavily with seamingly no ill-effects. If lung cancer is multifactorial, pufa comsumption seems like a good candidate for the second variable.

M. said...

Hi Ned and David,

In regards to cancer and glucose, what about the idea that people eating more carbs will have lower fasting blood glucose? In Jaminet’s latest “safe starch” post he mentions Peter and other low-carbers fasting blood glucose starting out in the 80s and over years of low-carbing creeping up over 100.

Looking at just the glucose- cancer issue, would not someone be in better shape if their fasting blood glucose was 80 instead of 110?

It would seem that someone used to eating carbs and insulin-sensitive would "clear" post-prandial glucose just as effectively as a low-carber would even though the dietary dose may be higher, but a low-carber would have larger net glucose supply available to cells because of the higher fasting glucose levels.

M. said...

PS – re cancer and glucose - It would seem like you would want to maintain good insulin sensitivity and low fasting blood glucose (70-80) prior to getting cancer, THEN if cancer hits go ketogenic.

Ned Kock said...

Hi Alex. Sure, all I need is the data and time.

Ned Kock said...

Hi M. The higher FBG in LC is due to peripheral insulin resistance. Essentially various tissues such as muscle (not liver cells) are in glucose rejection mode. That is probably a good thing.

It is elevated postprandial BG that causes the most problems:

HyperPetro’s FBG may be (or have been) high by SAD standards, but his A1C is lower than 5 percent. Those two results are contradictory in folks eating the SAD. In LC, they have a different meaning; they are probably good news:

David Isaak said...

Jaminet is also recommending that cancer patients should consume at least 400-600 calories per day of carbs.(!)

I think Jaminet is an interesting guy, but frankly a lot of his ideas seem to be entirely off the top of his head. They often look to me a bit like the "eating fat makes you fat" logic of the 1970s nutritionists--plausible, apparently in tune with common sense, but not necessarily a description of how life actually works.

M. said...


It is funny you put the exclamation point there. That is less carbs than most oncologists would recommend, and Jaminet’s wife and co-author is a cancer researcher. It is a pretty complicated issue.

David Isaak said...

Most oncologists don't even think about diet. Although I'm not claiming that Dr. David Servan-Schreiber takes this position on carbs, one thing that he emphasizes in his book "Anticancer" and in interviews--after being diagnosed with cancer--is that oncologists have virtually no training in nutrition and most of them have never given it a moment's thought. According to Servan-Schrieber, his doctors told him to eat whatever he likes as it wouldn't make any difference.

Yes, it's complicated. Most of the handful of people doing real research on control of cancer through macronutrients (Thomas Seyfried, for exmple) are working with strongly ketogeneic diets and achieving some success. 100-150 grams doesn't make the grade for most people in that regard, so it surprises me to see Jaminet making such specific medical recommendations based on sheer speculation.

Hence, "(!)".

Anonymous said...

For the record, Shou-Ching Jaminet is a molecular biologist, not a doctor. She does cancer research at Beth Israel Deaconess Medical Center and Harvard Medical School, and is the Director of BIDMC’s Multi-Gene Transcriptional Profiling Core.

And to state that Paul Jaminet's ideas seem "off the top of his head" has to be one of the more ridiculous statements I've ever heard. His research into nutrition was PERSONAL...he used it to heal himself. And he is ALL about the science. He backs up everything he says with research.

Just had to say that. I trust him more than any other health blogger in the blogosphere.