Quantifying and estimating additive measures of interaction from case-control data

In this paper we develop a general framework for quantifying how binary risk factors jointly influence a binary outcome. Our key result is an additive expansion of odds ratios as a sum of marginal effects and interaction terms of varying order. These odds ratio expansions are used for estimating the excess odds ratio, attributable proportion and synergy index for a case-control dataset by means of maximum likelihood from a logistic regression model. The confidence intervals associated with these estimates of joint effects and interaction of risk factors rely on the delta method. Our methodology is illustrated with a large Nordic meta dataset for multiple sclerosis. It combines four studies, with a total of 6265 cases and 8401 controls. It has three risk factors (smoking and two genetic factors) and a number of other confounding variables.


Introduction
Many complex diseases are influenced by a number of risk factors that interact in a complicated way. This is often quantified by means of a regression model with affection status of a given disease as binary response, whereas the risk factors and possibly some other variables are chosen as covariates. Logistic regression models have often been used to quantify main effects and strength of interaction among the risk factors with regards to disease. There are several reasons for this. The logistic transformation is first of all the canonical link of a generalized linear model with a binomially distributed response, and the parameters of this model have a straightforward multiplicative odds interpretation [20]. A second reason is that many epidemiological datasets are collected retrospectively based on outcomes rather than on covariates. In particular, it is well known that many parameters of the logistic regression model can be estimated consistently for case-control studies under suitable sampling assumptions on the cases and controls [24]. There are also additive models of joint effects and interaction. They have recently gained in popularity, since they are believed to approximate a biological system with causal mechanisms more accurately than multiplicative odds [30,26]. The additive measures of interaction are functions of relative risks between individuals with or without exposure to the risk factors, such as the relative excess of risk due to interaction (RERI), the attributable proportion (AP) due to interaction or the synergy index (SI) [12,28]. Although these relative risks cannot be estimated consistently for case-control data, they are well approximated by estimable odds ratios when the disease risk is small [15,4,3]. When relative risks are replaced by odds ratios in the expressions for RERI, AP and SI, they correspond to additive odds models of interaction.
In previous work [16] we developed a unified theory for quantifying and estimating different measures of marginal effects, joint effects and interaction among risk factors on a multiplicative, additive, additive odds or some other scale. We also described how to estimate and produce confidence intervals for these quantities from a prospective study, with data sampled based on their covariates, or from a case-control study. Traditional definitions of attributable proportion have an unbounded negative range [18,29,19]. In [16], we introduced a novel normalization of AP that guarantees a range between −1 and 1, with negative or positive values depending on whether there is synergism or antagonism between the risk factors.
In this paper we concentrate on case-control data and additive odds measures of main effects, joint effects and interaction. Our main result is to express the odds ratio of the risk factors as a sum of terms, which include their main effects and different orders of interaction, when the effect of other confounding covariates is controlled for. In this way we extend and unify some previously used measures of interaction [23,17,16]. In order to find confidence intervals (CIs) for the attributable proportion, synergy index and excess odds ratio due to joint effects and interaction from case-control data, we first estimate the parameters of a logistic regression model by maximum likelihood. Then we use the delta method [27,6] for an appropriate transformation of these measures of joint effects and interaction in order to find their standard errors and CIs on the new scale, before transforming back to the original odds scale.
The paper is organized as follows: In Section 2 we define a logistic regression model that includes marginal and interaction effects for the risk factors of interest, as well as marginal effects of other confounding covariates. The additive odds measures of joint effects and interaction are defined in Section 3, and the procedures for estimating them are given in Section 4. Our methodology is illustrated in Section 5 for a multiple sclerosis (MS) dataset from Hedström et al. [14], and a concluding discussion appears in Section 6.

A logistic regression model
Let Y ∈ {0, 1} be a binary outcome variable, with Y = 1 for individuals that carry a certain disease, and Y = 0 for those that do not. Consider a large population, and let be the disease probability of a randomly chosen individual. It is assumed to be a function of p + q covariates x = (v, z) = (v 1 , . . . , v p , z 1 , . . . , z q ), of which the first p are binary risk factors v 1 , . . . , v p of main interest, with v j = 1 indicating presence and v j = 0 absence of each such factor j. The other q covariates z 1 , . . . , z q are not necessarily binary, and they are included in the model as possible confounders. We will parametrize the disease probability on a logit scale with w = (w 1 , . . . , w p ) a binary vector and 0 = (0, . . . , 0) a vector with zero components. Inequalities between vectors are interpreted componentwise, so that the first sum in (2) is taken over all nonzero vectors w such that w j ≤ v j , with w j = 1 for at least one factor j. This sum equals 0 when v = 0. The logistic model in (2) is saturated for the binary risk factors, including all orders of their interaction, whereas it is linear in the other covariates [1,2]. In particular, if v j = 1 and all other components of v are zero, ψ v quantifies the marginal effect of factor j in absence of the others, whereas ψ v for v such that |v| = j v j ≥ 2 quantifies interaction of order |v| among those factors j for which v j = 1.
It is assumed that κ = (κ 0 , κ 1 , . . . , κ q ) are nuisance parameters, whereas are the 2 p − 1 structural parameters of main interest that quantify marginal effects and interaction among the p risk factors, with 1 = (1, . . . , 1). Since they are defined on a logit scale, it is possible to estimate them from suitable case-control datasets. In the next section we will develop alternative measures of marginal effects, joint effects and interaction that are functions of ψ, but expressed in terms of odds ratios.

Measures of joint effects and interaction
Our purpose is to estimate and produce confidence intervals for parameters that quantify marginal effects, joint effects or interaction between a subset J ⊂ {1, . . . , p} of factors, when the levels of the remaining factors in K = {1, . . . , p} \ J, and the q other confounder variables are controlled for. It is assumed that (4) only involves the structural parameters ψ, not the nuisance parameters κ. The reason is that we consider parameters (4) that are functions of odds ratios i.e. ratios between the disease odds of two subjects with identical confounding variables z but different risk exposures v and 0 respectively, for some or all of the 2 p − 1 vectors 0 < v ≤ 1. In particular, when the disease risks in (1) are small, the odds ratio in (5) approximates the relative risk well.
After possible reordering of factors we may assume without loss of generality that those in J come first, so that v = (v J , v K ), with v J = (v j ; j ∈ J) and v K = (v j ; j ∈ K) being the exposure levels of the factors in J and K respectively. We will consider linear combinations of the odds ratios (5), and the following concept will be central for the rest of the paper: Definition 1. Suppose the exposure levels v K of the confounding risk factors are fixed. For any v J we introduce the additive increment of the odds ratio of order v J , where summation on the right-hand side of (7) is over all binary vectors w = (w j ; j ∈ J) of length |J| whose coordinates do not exceed those of v J .
When |v J | = 0, (7) is the odds ratio when none of the risk variables in J are turned on, and when |v J | = 1 it is the marginal odds ratio increment of one single factor. For |v J | ≥ 2 we interpret (7) as a measure of additive odds interaction among those factors {j ∈ J; v j = 1} in J that are at risk, when all interaction terms of lower order among these factors have been removed. More explicitly, the terms of order 0, 1, 2 and 3 in (7) have the form It is possible to give another interpretation of (7). Let w = (w j ; j ∈ J) be a vector with 0 ≤ w j ≤ 1, and extend the domain of the first |J| components of the odds ratio OR (w,vK ) Having defined the odds ratio increment in (7), we are ready to state the main result. It tells that the odds ratio for exposure v J is a sum of the odds ratio increments. This includes the baseline odds ratio when none of the factors in J are turned on, the additive marginal odds ratio increments for those |v J | factors in J that are exposed, and the odds ratio interactions of order 2, 3, . . . , |v J | among these factors (see Appendix A for a proof): for all v J , with ∆ (w,vK ) OR as defined in (7). Conversely, if (9) holds and ∆ (w,vK ) OR is a linear combination of {OR (u,vK ) ; 0 ≤ u ≤ w} for each 0 ≤ w ≤ v J , then necessarily ∆ (w,vK ) OR must satisfy (7).
The expansions in (7) and (9) define a combinatorial inclusion-exclusion principle. In order to see this, we associate to each j ∈ J a set A j and its complement A c j .
It is of interest to know how much of the odds ratio (5) can be explained by marginal effects and lower order interaction among the factors in J. For this reason we introduce the following concept: of the odds ratio (5) is obtained by including only terms up to order i in (9), for some It is possible to rewrite the predicted odds ratio (10) as a linear combination of lower order odds ratios {OR (w,vK ) ; 0 ≤ w ≤ v J , |w| ≤ i} for exposure vectors w that have at most i factors in J at risk (see Appendix B for a proof): Proposition 1. The prediction of the odds ratio OR (vJ ,vK ) in (10), based on marginal effects among the factors in J at risk and their interaction terms up to order i, satisfies if i = |v j |.
Assume from now on that v J = 1, so that all factors in J are at risk. We will quantify how much of the odds ratio OR (1,vK ) is left unexplained by marginal effects and lower orders of interaction among the factors in J. To this end, we introduce the following concept: Definition 3. The unadjusted excess odds ratio is the difference between the odds ratio (5) and a prediction (10) of it due to terms of order less than i, where 1 ≤ i ≤ |J|.
The unadjusted excess odds ratio can be interpreted as an unstandardized residual of a regression model, where only marginal effects and interaction terms up to order i − 1 are included as independent variables in order to predict the odds ratio (5). Equivalently, it quantifies the contribution of the odds ratio expansion (9) from terms of order at least i. The special case i = |J| is treated by Katsoulis and Bamia [17] when K = ∅. The definition of the unadjusted odds ratio then simplifies to ∆ (1,vK ) OR.
We will define three measures (4) of marginal, joint or interaction effects among the factors in J, and all of them are functions of UnadjEOR.

Definition 4. The excess odds ratio
expresses (13) in units of the odds ratio when no factor in J is at risk, but those in K are kept fixed at level v K . It has a range of (−∞, ∞), with EOR = 0 indicating absence of effect.
Definition 5. The quantity is the attributable proportion of the odds ratio due to terms (7) of order at least i. It uses the same denominator as in Hössjer et al. [16] in order to assure −1 ≤ AP ≤ 1, with AP = 0 indicating absence of effect.
Definition 6. The synergy index is only defined in order to quantify interaction (2 ≤ i ≤ |J|), since otherwise the denominator of (16) vanishes. It is also required that the joint and lower order effects of the factors in J are positive (OR (1,vK ) > OR (0,vK ) and OR (1,vK ),i−1 > OR (0,vK ) ) in order for SI to be meaningful. Then its range is (0, ∞), with SI = 1 indicating absence of effect.
All three quantities in equations (14)- (16) are stratified for the risk factors in K, like confounders that are controlled at their observed levels. In contrast, there is only partial (additive) control for the remaining q covariates of z. EOR and AP can be viewed as different types of standardized residuals of a regression model where marginal effects and lower order interactions among the factors in J are used to predict the odds ratio (5). The inverse synergy index SI −1 is the analogue of the coefficient of determination. It quantifies how large fraction of the odds ratio increment above the baseline level OR (0,vK ) that is explained by the regression model. Some special cases of formulas (14)- (16) are of particular interest. When i = |J| = 1, there is one single risk factor (J = {1}). Equation (14) then quantifies the excess odds ratio due the marginal effect (EORM) of factor 1, when those in K are controlled for at level v K . Similarly, equation (15) defines the attributable proportion of risk due to the marginal effect of factor 1, whereas the synergy index is not well defined.
When i = 1 and |J| ≥ 2, we refer to the quantity in (14) as EORJ, the excess odds ratio due to joint (marginal and interaction) effects of all factors in J when those in K are controlled at level v K . In the same way, AP is the attributable proportion due to joint effects among the factors in J, whereas the synergy index is undefined.
When i = 2 we refer to the quantity in (14) as the total excess odds ratio due to all levels of interaction (TotEORI) among the factors in J, when those in K are controlled at level v K . AP and SI are similarly referred to as the attributable proportion and synergy index due to all levels of interaction among the factors in J.
Finally, when i = |J| we refer to the quantity in (14) as the excess odds ratio due to the highest order |J| of interaction (EORI) among the factors in J, when those in K are controlled at level v K . Analogously, equations (15)-(16) quantify the attributable proportion and the synergy index due to the highest order |J| of interaction among the factors in J.
Since EOR, AP and SI are all functions of UnadjEOR, it suffices to specify the latter. In the subsections to follow we will do so for models with 1, 2, or 3 risk factors in J.

Two risk factors
When |J| = 2 there are only two possible unadjusted excess odds ratios obtained by inserting (10) into (13). The first unadjusted excess odds ratio is due to the joint (marginal and interaction) effect of factors 1 and 2, whereas the second only includes interaction between these two factors.

Three risk factors
When |J| = 3 there are three possible unadjusted excess odds ratios all of them derived by inserting (10) into (13). The first unadjusted excess odds ratio is caused by the joint (marginal and interaction) effect of factors 1, 2 and 3, the second includes second and third order interaction between these three factors but no marginal effects [23,16], whereas the third only includes the highest third order interaction between factors 1, 2 and 3 [17].

Inference of joint effects and interaction
Assume that a case-control dataset (x 1 , Y 1 ), . . . , (x n , Y n ) of size n = n 0 + n 1 is available, with n 0 controls (Y a = 0) and n 1 cases (Y a = 1). Since this is a retrospective sample, we cannot estimate all parameters of the logistic regression model (2). On the other hand, it is possible to estimate the structural parameters ψ consistently when controls are frequency matched with cases. This is the unconditional logistic regression approach, whereby the prospective log likelihood is maximized over the model parameters in (2). This approach can also be used if cases and controls are matched in strata, as long as all variables used for matching are included in the model as covariates, and the number of strata does not grow with sample size. On the other hand, conditional logistic regression is more appropriate if the number of strata is large [5].
be the plug-in estimator of (4) obtained from the maximum likelihood estimator (17). In order to produce a confidence interval for ξ with asymptotic coverage probability 1 − α we use the delta method in conjunction with some appropriately chosen monotone increasing and differentiable transformation h. This leads to where z β is the β-quantile of a standard normal distribution and SE is the standard error of h(ξ). This method relies on asymptotic normalitŷ of the structural part of (17) for large samples, with Σ = I(ψ, κ) −1 an asymptotic approximation of the covariance matrix Cov(ψ) ofψ, which equals the inverse of the Fisher information matrix I [6]. In order to find SE we first approximate the asymptotic variance ofξ by σ 2 = DΣD T , using (20) see for instance Rothman [25] and Hössjer et al. [16]. The three functions in (21) map the corresponding indices ξ from their original ranges ((−∞, ∞) for EOR, (−1, 1) for AP, (0, ∞) for SI) to (−∞, ∞).

Analysis of a real dataset
Multiple sclerosis (MS) is a complex and inflammatory disease causing damage to the central nervous system. Its prevalence is over 0.1% in many countries, affecting large regions of the world [21]. There is solid evidence for a genetic component of the disorder, with a major contribution from variants at the human leukocyte antigen (HLA) complex. It is also well known that presence of allele 15 of the HLA-DRB1 The four Nordic studies are from Hedström et al. [14] gene is a risk factor, whereas allele 02 of the HLA-A gene has a protective effect [7]. Several studies reveal that environmental factors, in particular smoking, impact the risk of the disease as well [22]. A Swedish study in Hedström et al. [13] demonstrated positive pairwise additive interaction between the two genetic factors, and also between smoking and each genetic factor. These results have more recently been replicated and refined in Hedström et al. [14], by merging case-control studies from several countries. Due to the size of this meta analysis, it was also possible to investigate whether third order interaction was present between the two genetic factors and smoking. In order to illustrate the methodology of this paper, we present some of the findings from the four Nordic studies (see Table 1) of Hedström et al. [14]. Apart from the two genetic factors and smoking, three other covariates (gender, age, study) were also part of the model. This gives a total of 8 covariates, encoded as The last three study covariates were only included for the meta analysis, so that p + q = 5, for each separate study, 8, for the combined Nordic study.
be the odds ratio for the joint effect of all the risk factors J, when the confounding risk factors in K are fixed at level v K , and the remaining covariates are {1, . . . , p + q} \ (J ∪ K). Table 2 lists maximum likelihood estimates (18) of these odds ratios for various choices of risk factors and confounders. The associated confidence intervals (19) are obtained using the logarithmic transformation in (21). Although The sets of risk factors is J and confounders fully controlled for is K. Each column is denoted as J when K = ∅, and otherwise as J|K. The confidence intervals are given in brackets. We use the notation DR15 for presence of allele 15 at HLA-DRB1, A2-for absence of allele 2 at HLA-A, sm for smoker and nsm for non-smoker MS is a common disorder, its prevalence is small enough to assume that the point estimates and confidence intervals of Table 2 are representative for the relative risks RR (1,vK ,z) /RR (0,vK ,z) as well. We find, for instance, that the point estimate of the marginal odds ratio (or relative risk) of having MS in the combined dataset is 3.6 for individuals with the DRB15 risk allele, compared to those that lack this allele. The corresponding marginal odds ratios for absence of the protecting A2 allele and for smoking are 1.75 and 2.0 respectively. Since the joint odds ratios for all pairs of risk factors are much larger than the corresponding marginal odds ratios, there are strong indications of two-way interactions between all pairs of risk factors. There is possibly some three-way interaction between DR15, A2-and smoking as well, since the joint OR for all three factors is higher than the pairwise odds ratios. On the other hand, the OR for the two genetic factors is only higher among smokers than among non-smokers for one study (EIMS).
The estimates of Table 2 motivate further analysis of the MS datasets in Hedström et al [14] based on the attributable proportion, excess odds ratio and synergy index for the three risk factors DR15, A2-and smoking. Table 3 gives confidence intervals for all three quantities when J = {1, 2, 3} and K = ∅, for various choices of i. It confirms a strong joint effect of all three factors, since AP and EOR are both significantly different from 0, and SI is significantly different from 1. For instance, the estimate AP = 0.92 for the combined Nordic data set indicates that about 92% of the odds ratio (or disease risk) for smokers with both genetic risk factors, is not present The measures either quantify joint marginal and interaction effects (i = 1), the total interaction of order 2 and 3 (i = 2), or the highest order 3 of interaction (i = 3) between the three risk factors DR15, A2-and smoking (J = {1, 2, 3}). No other covariates are fully controlled for (so that p = 3 andK = ∅). The confidence intervals are given in brackets.
Notice that SI is only defined for measures of interaction among those that lack all three factors. We also find that the total amount of second and third order interaction between DR15, A2-and smoking is strongly significant. In particular, AP = 0.52 for the Nordic metapopulation indicates that about half of the disease risk is due to interaction. One the other hand, SI = 2.38 tells that the disease risk increment of a smoker with both genetic risk factors, compared to an individual with none of these risk factors, is more than twice the disease risk increment due to marginal effects only. Finally, we find from the rightmost AP, EOR and SI columns of Table 3 that third order interaction between DR15, A2-and smoking is only significant for one dataset (EIMS). We have also estimated the attributable proportion separately for males and females (data not shown). The results are in quite good agreement with the upper part of Table 3, although the values for males are slightly larger than those for females, and since the gender specific datasets are smaller, the confidence intervals are wider. Such an analysis illustrates how joint effects of and interaction among three factors (J = {1, 2, 3}) is affected when one controls for a fourth factor (K = {4}).

Discussion
In this paper we studied how a collection J of binary risk factors jointly influence a binary outcome. Our objective was to develop a general framework for quantifying marginal effects and various orders of interaction between these factors on an additive scale, when stratifying or partially controlling for other confounding variables. This led to the key result; an expansion of the odds ratios as a sum of marginal effects and interaction terms of different orders, with a finite difference and a combinatorial inclusion-exclusion interpretation. We also showed how to use these odds ratio expansions for estimating and producing confidence intervals for the excess odds ratio, attributable proportion and synergy index from a case-control dataset. The methodology was illustrated using a Nordic meta study for multiple sclerosis. The inferential procedure relies on maximum likelihood estimates from a logistic regression model and the delta method. It has been implemented in a Matlab program that is available from the first author upon request.
Our approach makes it possible to stratify for some variables (in K) at various levels, and yet use the whole data set to estimate effect parameters of the other q confounders, for which there is only partial control. But this requires a large data set in order to estimate all 2 p + q = 2 |J|+|K| + q parameters of the logistic regression model. An alternative and simpler strategy is to use only those observations (x a , Y a ) for which the variables in K are at a pre-specified level. This gives a smaller model with only 2 |J| + q parameters to estimate.
The delta based confidence intervals are fast to compute. This is suitable in applications where a number of different putative risk factors are sought for. We have implemented confidence intervals based on resampling as well, using the bias-corrected accelerated percentile method [9,10,16]. There is generally good agreement between the resampling and delta-based intervals for odds ratios and attributable proportions, as long as the interaction effects are not too strong, the order of interaction is not too high and the data set is not too small. On the other hand, the delta based confidence intervals tend to be less accurate for excess odds ratios. For this reason we recommend to report excess odds ratios with resampling based confidence intervals. As a topic of future research, it would be of interest to compare the asymptotic accuracy of the delta and resampling based confidence intervals more systematically.
Another object of further study is to develop odds ratio expansions of main effects and interactions when some of the risk factors are continuous [11] and to find analogous expansions for non-binary outcomes.
Since ∆ 0 = 1, and the set {0, 1} p of binary vectors of length p is partially ordered, we can use (9) to compute ∆ v OR recursively for all v ∈ {0, 1} p . Such a procedure gives for some constants c v,w . In order to verify (7) and (9), it suffices to prove that We do this by induction with respect to v ∈ {0, 1} p . Starting with v = 0, we notice first that c 0,0 = 1, since ∆ 0 OR = OR 0 = 1. As a next step, consider a fixed v > 0.
In order to establish (25) for all w such that 0 ≤ w ≤ v, we use the induction hypothesis and assume that (25) holds for all (w, u) such that 0 ≤ w ≤ u < v. Then we insert (24)-(25) into each term on the right hand side of (23), and change order of summation. This gives It is clear from this expansion that c v,v = 1, so that (25) holds for w = v. When w < v, we identify the coefficient of OR w as and this finishes the proof of (25).

B Proof of Proposition 1
The second part of Proposition 1, equation (12), concerns the case when all orders of interaction are included for predicting the odds ratio, i.e. i = |v J |. This equation follows directly from the fact that the definition of OR (vJ ,vK ),i in (10) is identical to (9) when i = |v J |.
In order to prove the first part (11) of Proposition 1, when 0 ≤ i < |v J |, we insert the definition of ∆ (w,vK ) OR in (7) into equation (10). Then we change the order of summation. This leads to OR (vJ ,vK ),i = can be proved by induction with respect to m = 0, 1, . . . , n−1, using Pascal's triangle in each step. It implies that (27) is equivalent to (11), and this completes the proof.

C Expressions for D(ψ)
It will be convenient to introduce the notation respectively for the odds ratio when all the risk factors in J are turned on, the prediction of this odds ratio including main effects and interaction terms up to order i − 1, and the odds ratio when all the risk factors in J are turned off. The excess odds ratio (14), the attributable proportion (15), the synergy index (16) of a, b and c. Hence, in order to find the derivative D of ξ with respect to ψ, for any of these quantities, it suffices to find the derivatives of a, b and c with respect to ψ.
To this end, it is convenient to associate with any binary vector u of length p, another binary vector 1 ≤u = (1 w≤u ; 0 < w ≤ 1) of length 2 p − 1. As in the definition of ψ in (3), the indices of 1 ≤u run over all binary vectors 0 < w ≤ 1 of length p except the zero vector. For a given w, the corresponding coordinate of 1 ≤u is 1 if w ≤ u and zero otherwise.
With these preliminaries, we can state the following result, which follows from (28)- (29) and the definition of the odds ratio (5): Proposition 2. Let ξ = ξ(ψ) be a measure (4) of joint effects, marginal effects or interaction among the risk factors in J. Assume that it is a function of the three quantities a, b and c in (28). The derivative of ξ with respect to the parameter vector ψ, is then given by where A = da dψ = OR (1,vK ) 1 ≤ (1,vK ) , With a slight abuse of notation, we included binary vectors u and w of length |J| (not p) in the definition of B. In the last step of this equation we interchanged order of summation between u and w and then simplified the expression for the inner sum, similarly as in Appendix B.