Since you are such a nice guy I am almo...Hi Chris.<br /><br />Since you are such a nice guy I am almost agreeing with the "holding the others constant" metaphor to make you happy :)<br /><br />I see the problem of adding a variable like TC to a model like the one discussed in this post, and also see the problems associated with leaving the variable out. I think the latter outweigh the former.<br /><br />I am planning on writing a couple of posts on stats. One post showing that one should control for an effect indirectly (like with TC) if direct control is not possible, and the other post showing that a strong association may exist even with a zero correlation.Ned Kockhttps://www.blogger.com/profile/02755560885749335053noreply@blogger.comtag:blogger.com,1999:blog-8859456735165996893.post-53311331454594032010-08-21T07:31:37.792-07:002010-08-21T07:31:37.792-07:00This comment has been removed by the author.Ned Kockhttps://www.blogger.com/profile/02755560885749335053noreply@blogger.comtag:blogger.com,1999:blog-8859456735165996893.post-80190178818241287252010-08-21T06:09:34.690-07:002010-08-21T06:09:34.690-07:00Hi Ned,
Sorry for taking so long -- got delayed w...Hi Ned,<br /><br />Sorry for taking so long -- got delayed with comprehensive exams.<br /><br />So do you agree, then, that the coefficient from a multiple regression model is giving the effect of one variable "holding the others constant" in a sense?<br /><br />And do you see the potential problem of including an intermediate in the model as if it is a confounder?<br /><br />I agree a book of that sort would have good potential -- it would also be a massive undertaking, since the China Study data itself is only one chapter of Campbell's book.<br /><br />ChrisChris Masterjohnhttps://www.blogger.com/profile/09922003080748568167noreply@blogger.comtag:blogger.com,1999:blog-8859456735165996893.post-48200574147098182012010-08-18T06:34:26.162-07:002010-08-18T06:34:26.162-07:00Yes, analyzing the online data is a good option. I...Yes, analyzing the online data is a good option. I assume you are referring to this dataset:<br /><br />http://www.ctsu.ox.ac.uk/~china/monograph/<br /><br />With the dataset above, one could use data from both genders, and then control for the gender effect. This would double the size of the sample.Ned Kockhttps://www.blogger.com/profile/02755560885749335053noreply@blogger.comtag:blogger.com,1999:blog-8859456735165996893.post-25565800366838981862010-08-17T18:52:20.601-07:002010-08-17T18:52:20.601-07:00why not just use the online data? it is accessibl...why not just use the online data? it is accessible to all, whereas the data denise is working from is in the 1990 printed monograph that few would have access to. it'd save LOTS of data entry time. and things seem to be "cleaner" in the online data (eg. certain dietary variables for tuoli are missing - likely because they were deemed unrepresentative of the typical dietary habits in that county). just a thought for anyone embarking on a huge data entry endeavor.healthycritiquenoreply@blogger.comtag:blogger.com,1999:blog-8859456735165996893.post-64936819478369659712010-08-17T18:46:42.368-07:002010-08-17T18:46:42.368-07:00hi ned, i continue to find your posts to be intere...hi ned, i continue to find your posts to be interesting and professionally presented. a few thoughts...<br /><br />1. one thing i find challenging in all of this is that we're talking about colorectal cancer *mortality* rather than incidence. and the two are not necessarily directly related; there are other factors related to mortality that do not play a role in incidence (eg. treatment). this of course, applies to anyone undertaking an analysis of the china survey data. therefore, i find it difficult to draw conclusions approaching causality because we're not talking about risk of developing the disease. <br /><br />2. the mortality rates are for 1975-1977 whereas the exposures were collected in 1983, so there is an even greater issue of temporality than we might normally have in a cross-sectional survey. <br /><br />3. it is possible - especially since we're using mortality as the outcome - that the u-shaped curve actually reflects some other variable (eg. BMI) that might be subject to some reverse causality.<br /><br />4. have you considered redoing your analyses using the online data? there are some differences between what denise has posted and what is online (denise was forthright about this). it would also be interesting to see the results for 1989 data, as the chinese population may have started to acquire more 'western' dietary patterns by then.<br /><br />anyway, nice reading your post!healthycritiquenoreply@blogger.comtag:blogger.com,1999:blog-8859456735165996893.post-59084737250638186792010-08-01T07:20:36.844-07:002010-08-01T07:20:36.844-07:00Hi Anon.
O Primitivo from Canibais e Reis (link b...Hi Anon.<br /><br />O Primitivo from Canibais e Reis (link below) is putting together several files with the China Study data.<br /><br />http://www.canibaisereis.com/<br /><br />With those files anyone interested will be able to play with models ad nauseum.<br /><br />I have been looking at the bivariate (aka univariate) correlations, and they are very interesting. A lot of confounding variables.<br /><br />Btw, I delete O Primitivo's comment above at his request. He had included his email and was concerned about spamming.Ned Kockhttps://www.blogger.com/profile/02755560885749335053noreply@blogger.comtag:blogger.com,1999:blog-8859456735165996893.post-48854654304781033922010-07-31T14:31:29.985-07:002010-07-31T14:31:29.985-07:00Just for the heck of it, I'd like to see a mod...Just for the heck of it, I'd like to see a model that incorporates the rest of the macronutrients and dietary fiber as well. Would that be possible?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8859456735165996893.post-42558174505423550542010-07-31T10:22:26.493-07:002010-07-31T10:22:26.493-07:00Hi Chris.
I should note that the path coefficient...Hi Chris.<br /><br />I should note that the path coefficients generated by WarpPLS should be identical to the standardized betas generated by a MR analysis, as long as you are using of WarpPLS's two linear algorithms. The unstandardized coefficients can be easily obtained from the standardized ones.<br /><br />Actually, the equation you mentioned would become something like this:<br /><br />cancer' = (-0.2X +0.5Z) + 0.3(P) <br /><br />Where cancer' would represent cancer within that subset of the population, and P would represent plant protein consumption also within that subset.<br /><br />I wonder if O Primitivo was referring to the China Study I data when he said he had the data in Excel. What do you think?<br /><br />Given the impact of the book, it is reasonable to expect that a book with a new set of analyses, showing different things, would have a lot of potential. Particularly if what it says goes against with the original book says.Ned Kockhttps://www.blogger.com/profile/02755560885749335053noreply@blogger.comtag:blogger.com,1999:blog-8859456735165996893.post-11559117497185585752010-07-31T09:10:18.844-07:002010-07-31T09:10:18.844-07:00Hi Ned,
I understand that path coefficients are s...Hi Ned,<br /><br />I understand that path coefficients are standardized regression coefficients and I understand the concept of standardization, and I understand that the Y interecept of a multiple regression model disappears during standardization. However, I don't think that changes the metaphor used. <br /><br />First, because I was addressing your comment that the metaphor was unjustified in general, and in general, multiple regression is quite frequently used without standardizing the coefficients.<br /><br />Second, conceptually, the Y intercept appears again when one performs the mental experiment of "holding the variables constant."<br /><br />My analogy could be easily modified:<br /><br />cancer = -0.2(animal protein) + 0.3(plant protein) + 0.5(cholesterol)<br /><br />For a subset of the population with a given intake of animal X and a given cholesterol level Z, the regression equation simplifies to:<br /><br />cancer = (-0.2X +0.5Z) + 0.3(plant protein) <br /><br />where the first term is a constant (Y intercept) and the second represents the relationship between plant protein and cancer "holding the other variables constant."<br /><br />However if this still doesn't represent warppls accurately, please correct me.<br /><br />I agree that mathematical observations can be used to generate hypotheses just like any other type of observation, and I think that playing with data in multiple ways can be useful for brainstorming hypotheses. I think where nutrition differs from Einstein's work is that 1)experimentation is readily performable and 2) an enormous mass of experimental evidence is readily available. Thus, mathematical and other observations must always be integrated into the theretical framework that has as its basis experimental research, which provides definitive evidence of cause-and-effect relationships.<br /><br />I'm not saying that one should NOT include cholesterol in the model to see if it yields mathematical results that are useful for brainstorming hypotheses. I'm simply saying that one must acknowledge what can be discerned from the currently accumulated body of experimental research and make a well-developed argument from a cause-and-effect theory based on that body of research to justify the inclusion or exclusion of variables in the model.<br /><br />In many cases, there might be a compelling reason to argue both sides and it might be best to present several models, and that might provide a stimulus for further experimentation to clarify the ambiguities demonstrated by the alternative models.<br /><br />The China Study I data is only available in print form as a giant monograph, and you would have to interlibrary loan it and spend hundreds of hours (at least doezens, I would imagine) typing it into Excel. I do have a copy of the monograph but have not performed this task yet.<br /><br />ChrisChris Masterjohnhttps://www.blogger.com/profile/09922003080748568167noreply@blogger.comtag:blogger.com,1999:blog-8859456735165996893.post-14640044602858052932010-07-31T08:15:15.720-07:002010-07-31T08:15:15.720-07:00Hi Chris.
Before you calculate standardized parti...Hi Chris.<br /><br />Before you calculate standardized partial regression coefficients (path coefficients), normally you standardize the variables in a model. (This also helps reduce other problems, because it makes the variables dimensionless.) Once you do that, the constant (intercept) disappears.<br /><br />Path analysis deals with standardized variables. Classic multiple regression deals with non-standardized variables. The basis for structural eq. modeling, and for the math used in software like WarpPLS, is actually path analysis. This is one of the reasons why (among other things) I told you earlier that what WarpPLS does is not exactly multiple regression.<br /><br />Of course you can always look at a plot with standardized values like the one on this post, and revert back to the non-standardized values, by multiplying them by the standard deviation and adding the mean of the variable they refer to.<br /><br />I am not saying that only mathematical considerations matter. What I am saying is that a mathematical understanding is key, and can actually lead to new discoveries.<br /><br />One example, which I don't have a reference for now, is that of Einstein's finding a result through math that contradicted his understanding of physics (which was way more sophisticated than the average person's).<br /><br />He then modified the math (introducing a term; a constant, if I recall it properly) to fit his understanding. Einstein later deeply regretted it. The math was telling him something new, and apparently true, even though it could not easily be understood in non-math language (e.g.,in terms of metaphors).<br /><br />But of course we can always agree to disagree. You may well be right.<br /><br />Do you have a link for the China Study I data?Ned Kockhttps://www.blogger.com/profile/02755560885749335053noreply@blogger.comtag:blogger.com,1999:blog-8859456735165996893.post-43429059358685977332010-07-31T06:25:46.776-07:002010-07-31T06:25:46.776-07:00Hi Ned,
The mathematical basis for the metaphor t...Hi Ned,<br /><br />The mathematical basis for the metaphor that a multiple regression model shows the effect of one variable while "keeping the others constant" is the fact that if the others are held constant, their regression coefficients collapse into the Y intercept.<br /><br />In other words if, to simplify this greatly, <br /><br />cancer = 10 + -2(animal protein) + 3(plant protein) + 7(cholesterol)<br /><br />Then for all subjects with 4.1 mmol/L cholesterol the equation would simplify to <br /><br />cancer = 29 + -2(animal protein) + 3(plant protein)<br /><br />So the regression coefficients are showing the relationship of the variable "while the others are held constant." When the others are not held constant (which, at the level of the combined sample or the population being estimated, is always), the relationship is much more complex as indicated by the inclusion of the other variables in the model. <br /><br />That's the reasoning put forward in an article I have by Garrett Fitzmaurice from Harvard's Department of Biostatistics.<br /><br />Your understanding of statistics is much more sophisticated and advanced than mine is, but if you believe that the only consideration for inclusion of a variable into the model is the mathematics of the equation then I think this is a philosophical difference on which we will probably just have to agree to disagree. I think the effect of this approach would be to generate mathematically convincing by biologically irrelevant regression models.<br /><br />Thanks for the correction about the VIF. I read the wiki article on it and it seems VIF diagnoses multicollinearity but the source of multicollinearity is still correlation between the independent variables, no?<br /><br />I learned from both reading and in stats class that using C rather than C-1 variables when dummy coding distorts the equation. I don't have the statistical expertise to decipher which is correct.<br /><br />The data from the original monograph, which Denise has been using and Campbell used in the book, divides between male and female. The data from the link you posted is, I believe, actually data from "The China Study II" so, while useful, the results will be different from analyses using the original data set.<br /><br />ChrisChris Masterjohnhttps://www.blogger.com/profile/09922003080748568167noreply@blogger.com