•  
  •  
 

Abstract

The possibility of hormesis in individual dose-response relations undermines traditional epidemiological criteria and tests for causal relations between exposure and response variables. Non-monotonic exposure-response relations in a large population may lack aggregate consistency, strength, biological gradient, and other hallmarks of traditional causal relations. For example, a u-shaped or n-shaped curve may exhibit zero correlation between dose and response. Thus, possible hormesis requires new ways to detect potentially causal exposure-response relations. This paper introduces information-theoretic criteria for identifying potential causality in epidemiological data that may contain nonmonotonic or threshold dose-response nonlinearities. Roughly, exposure variable X is a potential cause of response variable Y if and only if: (a) X is INFORMATIVE about Y (i.e., the mutual information between X and Y, I(X; Y), measured in bits, is positive. This provides the required generalization of statistical association measures for monotonic relations); (b) UNCONFOUNDED: X provides information about Y that cannot be removed by conditioning on other variables. (c) PREDICTIVE: Past values of X are informative about future values of Y, even after conditioning on past values of Y; (d) CAUSAL ORDERING: Y is conditionally independent of the parents of X, given X. These criteria yield practical algorithms for detecting potential causation in cohort, case-control, and time series data sets. We illustrate them by identifying potential causes of campylobacteriosis, a foodborne bacterial infectious diarrheal illness, in a recent case-control data set. In contrast to previous analyses, our information-theoretic approach identifies a hitherto unnoticed, highly statistically significant, hormetic (U-shaped) relation between recent fast food consumption and women’s risk of campylobacteriosis. We also discuss the application of the new information-theoretic criteria in resolving ambiguities and apparent contradictions due to confounding and information redundancy or overlap among variables in epidemiological data sets.

Share

COinS