Testing proportional hazards for specified covariates

Tests for proportional hazards assumption concerning specified covariates or groups of covariates are proposed. The class of alternatives is wide: log-hazard rates under different values of covariates may cross, approach, go away. The data may be right censored. The limit distribution of the test statistic is derived. Power of the test against approaching alternatives is given. Real data examples are considered.

We consider tests for proportional hazards assumption concerning specified covariates or groups of covariates. The class of alternatives is wide: log-hazard rates under different values of covariates may cross, approach, go away. The data may be right censored. The limit distribution of the test statistics is derived. Power of the test under approaching alternatives is given. Real examples are considered.
2 Modeling non-proportional hazards for specified covariates Let S(t|z), λ(t|z) and Λ(t|z) be the survival, hazard rate and cumulative hazard functions under a m-dimensional explanatory variable (or covariate) z = (z 1 , . . . , z m ) T .
Let us consider the proportional hazards (PH) model: where λ(t) is unknown baseline hazard function, the parameter β = (β 1 , . . . , β m ) T is m-dimensional. Under the PH model the ratios of hazard functions under any two different explanatory variables are constant over time, i.e. they are proportional.
Suppose that proportionality of hazard functions with respect to a specified covariate z j or a group of covariates z j1 , . . . , z j k may be violated. Our purpose is to detect such violations. So we seek a wider model which includes not only possibility of constant hazard rates ratios but also time-varying ratios.
We propose a model of the form λ(t|z) = g z, Λ(t), β, γ j λ(t), where g(z, Λ(t), β, γ j ) = e β T z+Λ(t)e γ j z j 1 + e γj zj [e Λ(t)e γ j z j − 1] . β = (β 1 , . . . , β m ) T and γ j are unknown regression parameters, λ(t) and Λ(t) are unknown baseline hazard and cumulative hazard, respectively. The PH model is a particular case of this model when γ j = 0. The model (1) is very wide as compared to the PH model. Indeed, suppose that two different values z (1) and z (2) of the covariate vector differ only by the value of the covariate z j and denote by c(t) the ratio of hazard rates. For the model (1) So this model gives a large choice of alternatives: the ratio of hazard rates c(t) can vary from any a > 0 to any b > 0, whereas c(t) is constant under the PH model. The difference of logarithms of hazard rates under different values of the covariate z j is constant under the PH model, so the logarithms of hazard rates as time functions are parallel, whereas in dependence on the values of its parameters the model (1) includes also the possibilities for these functions to approach, to go away, to intersect. So the model (1) may help to detect non-proportionality of hazard rates (or, equivalently, non-parallelism of log-hazard functions) in above mentioned directions. Other directions are very rare in real data. We do not discuss application of the model for analysis of survival regression data (which is a subject of another article). Such analysis could be done if the PH model would be rejected. Here the model is used only as a generator of a wide class of alternatives to the PH model.

Test statistic
Let us consider right censored failure time regression data: (X 1 , δ 1 , z (1) ), . . . , (X n , δ n , z (n) ), T i are failure times and C i are censoring times. X i is observation time of the ith unit, the event δ i = 1 indicates that X i is the failure time T i , and the event δ i = 0 indicates that X i is the censoring time C i . Set here 1 A denotes the indicator of the event A. The processes N i (t) and N (t) are right-continuous counting process showing the numbers of observed failures in the time interval [0, t] for the ith unit and for all n units, respectively. Y i (t) and Y (t) are decreasing (not strongly) left-continuous stochastic processes showing the numbers of units which are still not-failed and not-censored just prior to t for the ith and for all units, respectively. Suppose that the survival distribution of the i object given z (i) is absolutely continuous with the survival functions S i (t) and the hazard rates λ i (t).
Suppose that the multiplicative intensities model is verified: the compensators of the counting processes N i with respect to the history of the observed processes are Y i λ i du. It is equivalent to the assumption that for any i and t: P (X i > t) > 0 with almost all s ∈ [0, t] It means that for almost all s ∈ [0, t] the risk to fail just after the time s given that units were non-censored and did not fail to time s is equal to the risk to fail just after the time s given that units did not fail to time s when censoring does not exist. So censoring has no influence to the risk of failure. Information about time-to-failure distribution contains the points, where the counting processes N i have jumps. A jump at the point t is possible if Y i (t) = 1. Partial information give the points where the counting processes N i have not jumps but the predictable processes have jumps.
The multiplicative intensities model is verified in the case of type I, type II, independent random censoring.
In the parametric case with known λ the unknown finite-dimensional parameter consists of the parameters β and γ j . The component of the parametric score function U corresponding to γ j has the form where We consider semiparametric case when the baseline hazard rate λ is unknown. The test statistic is constructed in the following way. In the expression of U j the parameter β is replaced by its partial likelihood estimatorβ under the Cox model, the parameter γ j is replaced by 0, and the baseline cumulative intensity Λ is replaced by the Breslow estimator (see Andersen et al. [2]) The estimatorβ verifies the equation where It is an estimator of the baseline cumulative distribution function. After the above mentioned replacement the following statistic is obtained: where E j is the component of E corresponding to the covariate z j .
Let us consider asymptotic distribution of the statistic (4) under the PH model. Set is positively definite; here Assumption a) can be weakened considerably but we avoid writing complicated formulas for easier reading. Assumption b) simply means that at some finite time moment (perhaps very remote) observations are stopped. This is a usual assumption for asymptotic results to hold in survival analysis. Assumption c) also can be weakened. Assumption d) means that if censoring would be absent then units might survive after the moment τ with positive probability. Assumption e) is needed to have non-degenerated asymptotic distribution of the test statistic. This assumption is the usual assumption needed for asymptotic normality of the maximum partial likelihood estimatorβ of the regression parameter β.
Under Assumptions A [3] there exists a neighborhood Θ of β such that as n → ∞. Assumption A a) may be weakened assuming that there exist non-random s (i) such that the convergences (5) hold.
Convergences (5) and Assumption A d) imply that

Theorem 1. Under Assumptions A the following weak convergence holds:
Proof. Under the Cox model, where M i are martingales with respect to the history generated by the data. So using it and taking into account that the statisticÛ γj can be written in the form Assumptions A a)-d) and consistence of the partial likelihood estimatorβ imply that uniformly on [0, τ ]. Using convergences (5) and consistence of the estimatorβ we write the first integral in the form Under Assumptions A (now Assumption A e) is crucial) the following expression holds: Andersen et al. [2], Theorem VII.2.2, where even weaker assumptions instead of a)-d) are used). It implies The predictable variation of the local martingale M * (t) is Note that

So the predictable variation
uniformly on [0, τ ]; here Σ jj (t), Σ j (t), Σ(τ ) are the limits in probability of the random matricesΣ jj (t),Σ j (t),Σ(τ ). Under Assumptions A for any ε > 0 the predictable variation (see Andersen et al. [2]), Theorem VII.2.2, where even weaker assumptions are used) and β can be replaced byβ in the expression of the left side becauseβ is consistent. Similarly Hence, the Lindeberg conditions of the central limit theorem for martingales (see Andersen et al. [2]) are verified for M * . We saw that the predictable variations M * (t) converge in probability to non-random non-degenerated matrices. So the stochastic process n −1/2Û γj (·) converges in distribution to a zero mean Gaussian process on [0, τ ], in particular The test: the null hypothesis H 0 : γ j = 0 is rejected with approximate significance level α if |T | > z α/2 ; here z α/2 is the α/2 critical value of the standard normal distribution.

Power under approaching alternatives
Let us consider the alternative model (1) and suppose that alternatives are approaching: γ j = c √ n , where c = 0 is a fixed constant. So the model where M i are local martingales with respect to the history generated by the data. Note that Let us consider the stochastic process Note that the derivative with respect to β iṡ Constructing the test statistic we definedβ as a random vector satisfying the estimating equationl(τ, β) = 0. Let us show thatβ is a consistent estimator of β 0 .
Theorem 2. If Assumptions A are satisfied then the probability that the equatioṅ ℓ(β) = 0 has a unique solution, converges to 1 andβ P → β 0 .
Proof. Fix β ∈ R m . Using the Doob-Meier decomposition of N i (t) write the stochastic process Q n in the form The predictable variation of the martingaleM n is Note thatk a) the function k(β) is concave on E; b) the convergence is uniform in probability on compact subsets of the set E; c) if the function k(β) has a unique maximum at the point β 0 then the probability that the equationQ(β) = 0 has a unique rootβ in the set E tends to 1 andβ P → β 0 . Assumption A e) was crucial for application of Andersen's and Gill's theorem.

Theorem 3. If Assumptions A are satisfied then
where Z is an m-dimensional Gaussian process with components having independent increments, Z j (0) = 0 a.s. and for all 0 ≤ s ≤ t ≤ τ : here σ jj ′ (t) are the elements of the matrix Σ(t). In particular,

Proof. Set
Using the Doob-Meier decomposition we have where V j is the jth column of the matrix V . The first term converges weakly to Z(·, β 0 ) on (D[0, τ ]) m because the limit in probability of its predictable covariation matrix has the same expression as that of analogous term under the Cox model: Analogously, verification of the Lindeberg's condition is done by the same way as in the case of the Cox model. Let us consider the norm of the difference: Using the fact that the estimatorβ is consistent and that the first four terms have the same structure as analogous terms in the proof of Theorem VII.2.2 of Andersen et al., we have that the first four terms converge in probability to zero. Such convergence of the last term is obvious because the last term multiplied by √ n converges to a finite limit in probability.
The mean value theorem and consistency of the estimatorβ implẏ here β * j is a point on the line segment joining the pointsβ and β 0 . Sincel j (β) = 0, we obtain Note that for m = 1 because F is not equal to 1 a.s. on [0, τ ]. where µ = d/D j and D j is the limit in probability of the random variableD j .
Proof. SetM Similarly as in the proof of Theorem 1 using the equality S j − E j S (0) = 0, consistency ofβ, Assumptions A, uniform consistency ofF on [0, τ ], we write the following expression: uniformly on [0, τ ]. Applying Theorem 3 write the right side in the form Note that the non-martingale part is d and the martingale part of the expression is exactly of the same form as in the case of the PH model and has the same limit distribution as in latter case.
The power function of the test against approaching alternatives is where Φ and z α 2 are the c.d.f. and the upper α/2 critical value of the standard normal law, respectively. If d = 0 and c is large then µ is large and the power is near to 1.

The test statistic
The hypothesis H 0 : γ = 0 is rejected with approximate significance level α if T > χ 2 α (k); here χ 2 α (k) is the α critical value of the standard chi-squared distribution with k degrees of freedom.
6 Real data analysis Example 1 (Chemo-radio data, one-dimensional dichotomous covariate). Stablein and Koutrouvelis [14] studied the well-known two-sample data of the Gastrointestinal Tumor Study Group concerning effects of chemotherapy (z = 0) and chemotherapy plus radiotherapy (z = 1) on the survival times of gastric cancer patients.
The number of patients n = 90. The data are right-censored. The value of the test statistic T is 3.651, the p-value is 0.0003. The Cox model is rejected. The result is natural because the Kaplan-Meier estimators of the survival functions of two patient groups intersect.
Example 2 (Prisoners data, 7 covariates). The data are given in [1]. They consist of 432 male inmates who were released from Maryland state prisons in the early 1970s. These men were followed for 1 year after their release, and the dates of any arrests were recorded. Time is measured by the week of the first arrest after release. There were seven covariates: financial aid after release (FIN; 0 -no, 1 -yes), age in years at the time of release (AGE), race (RACE; 1 -black, 0 -otherwise), full-time work experience before incarceration (WEX; 1 -yes, 0 -no), marital status (MAR; 1 -was married at the time of release, 0 -otherwise), released on parole (PAR; 1 -released on parole, 0 -otherwise), convictions prior to incarceration (PRI).
The results of testing hypothesis for all covariates: the value of the test statistic T is 17.58, the p-value is 0.014. The assumption of the PH assumption is rejected.
The results for each covariate are given in Table 1. The PH assumption is rejected for AGE and WEXP covariates. Example 3 (UIS dataset, 10 covariates). Let us consider right censored UIS data set given in [7]. UIS was a 5-year research project comprised of two concurrent randomized trials of residential treatment for drug abuse. The purpose of the study was to compare treatment programs of different planned durations designed to reduce drug abuse and to prevent high-risk HIV behavior. The UIS sought to determine whether alternative residential treatment approaches are variable in effectiveness and whether efficacy depends on planned program duration. The time variable is time to return to drug use (measured from admission). The individuals who did not returned to drug use are right censored. We use the model with 10 covariates (which support PH assumption) given by Hosmer, Lemeshow and May (2008). The covariates are: AGE (years); Beck depression score (beckt; 0 -54); NDR1 = ((NDR + 1)/10) (−1) ; NDR2 = ((NDR + 1)/10) (−1) log((NDR + 1)/10); drug use history at admission (IVHX_3; 1 -recent, 0 -never or previous); RACE (0 -white, 1 -non-white); treatment randomization assigment (TREAT; 0 -short, 1 -long); treatment site (SITE; 0 -A, 1 -B); interaction of age and treatment site (AGEXS); interaction of race and treatment site (RACEXS). The NDR denotes number of prior drug treatments (0 -40). Due to missing data in covariates, the model is based on 575 of the 628 observations.
The results of testing hypothesis for all covariates: the value of the test statistic T is 6.781, the p-value is 0.7460. The assumption of the PH assumption is not rejected.
The results for each covariate are given in Table 2. The assumption of the PH assumption for individual covariates is not rejected.