**2.5 Proposition d’un nouvel indicateur : l’ABC**

**2.5.1 Article soumis dans la revue Biometrical Journal**

Cet article présente de manière approfondie cet indicateur et la méthode d’estimation propo-sée, détaille sa justiﬁcation théorique, évalue et compare ses performances à celles des autres indicateurs évoqués dans ce chapitre. Les informations supplémentaire soumises à cette revue sont disponibles en Annexe B.

Yoann Blangero* ^{∗ ,1,2}*, Muriel Rabilloud1,2, Pierre Laurent-Puig3,4,5, Karine Le Malicot6,

Cˆome Lepage6,7,8, Ren´e Ecochard1,2, Julien Taieb3,9, and Fabien Subtil1,2

1 Service de Biostatistique, Pˆole Sant´e Publique, Hospices Civils de Lyon, Lyon, France

2

Universit´e de Lyon, Universit´e Lyon 1, CNRS, Laboratoire de Biom´etrie et Biologie Evolutive UMR 5558, Villeurbanne, France

3 Universit´e Paris Descartes, Sorbonne Paris Cit´e, Paris, France

4 Service de g´en´etique, Hˆopital Europ´een Georges Pompidou, Paris, France

5 INSERM UMR-S 1147, Paris, France

6 F´ed´eration Francophone de Canc´erologie Digestive, Dijon, France

7 H´epato-gastroent´erologie et canc´erologie digestive, Centre hospitalier universitaire Dijon Bourgogne, Di-jon, France

8

INSERM U 866, Dijon, France

9

Chirurgie digestive g´en´erale et canc´erologique, Hˆopital Europ´een Georges Pompidou, Paris, France
Treatment selection markers are generally sought for when the beneﬁt of an innovative treatment in
com-parison with a reference treatment is considered, and that this beneﬁt is suspected to vary according to the
characteristics of the patients. Classically, such quantitative markers are detected through testing a
marker-by-treatment interaction in a parametric regression model. Most alternative methods rely on modelling the
risk of event occurrence in each treatment arm or the beneﬁt of the innovative treatment over the marker
values, but with assumptions that may be difﬁcult to verify. Herein, a simple non-parametric approach is
proposed to detect and assess the general capacity of a quantitative marker for treatment selection when
no overall difference in efﬁcacy could be demonstrated between two treatments in a clinical trial. This
graphical method relies on the area between treatment-arm-speciﬁc ROC curves (ABC), which reﬂects the
treatment selection capacity of the marker. A simulation study assessed the inference properties of the
ABC estimator and compared them with other parametric and non-parametric indicators. The simulations
showed that the estimate of the ABC had low bias, power comparable to parametric indicators, and that
its conﬁdence interval had a good coverage probability (better than the other non-parametric indicator in
some cases). Thus, the ABC is a good alternative to parametric indicators. The ABC method was applied
to data of the PETACC-8 trial that investigated FOLFOX4 vs. FOLFOX4 + cetuximab in stage III colon
*adenocarcinoma. It enabled the detection of a treatment selection marker: the DDR2 gene.*

*Key words: clinical trial; predictive marker; quantitative marker; receiver operating characteristic*
curve; treatment selection

1 Introduction

A major aim of precision medicine is to determine the best treatment for individual patients. It is therefore essential to identify and assess markers able to guide treatment decisions so as to avoid the occurrence of a given event (e.g. disease progression, recurrence, or death) in a given post-treatment interval. When comparing the efﬁcacy of two treatments (an innovative vs. a reference treatment), such markers are ex-pected to improve patient outcomes by selecting patients who would likely most beneﬁt from the innovative treatment and avoid treating those who would not beneﬁt from this. There is currently no consensus on

the naming of such a marker; whereas Italiano (2011) and Ballman (2015) have used “predictive marker” Janes et al. (2014a) used “treatment selection marker” that will be used herein.

A treatment selection marker is generally sought for when the overall risk of event occurrence is nearly the same with two different treatments; it is then expected that a subgroup of patients would get more beneﬁt from one of the treatment than from the other. One example of treatment selection marker is the mutated KRAS gene in metastatic colorectal cancer. The presence of this mutated gene is a marker of ben-eﬁt from chemotherapy alone as opposed to chemotherapy + epidermal growth factor receptor –EGFR– inhibitor; patients with tumors harboring mutated KRAS exon 2 are known to be resistant to EGFR in-hibitors, whereas those with KRAS wild-type tumors do beneﬁt from the combined treatment (Di Fiore et al., 2007; Li`evre et al., 2008; De Roock et al., 2008).

In the case of a quantitative marker, it is necessary to ﬁnd a threshold value of the marker that determines the optimal treatment allocation for the patients with a marker value above or below this threshold (Vickers et al., 2007; Janes et al., 2014a,b; Blangero et al., 2019). However, before deﬁning a threshold, the ﬁrst step in the assessment of a new promising quantitative marker is to quantify and test its overall performance for treatment selection. Various methods have been proposed to evaluate the overall performance of a marker for treatment selection. The classical approach consists in modelling the risk of event given the treatment options and the marker values, and then testing for a statistical interaction between these two variables, as proposed by Byar (1985), and applied in several studies (for some examples, see Weidhaas et al. (2016), or Skougaard et al. (2016)). One limit of this approach is that the interaction coefﬁcient depends on the additive or multiplicative structure of the model; the interaction may be present in one type of model but not in the other, and conversely so (Byar, 1985). A marker is deﬁned as a treatment selection marker when the difference in risk of event occurrence between the two treatment arms is inconstant over the marker values (Song and Pepe, 2004), which means that the additive scale should be used to assess treatment selection markers.

In addition, although the interaction approach is straightforward with binary markers, it is quite complex with quantitative markers because of the difﬁculty of verifying the adequacy of the functional form retained in modeling the interaction. One extension of the previous approach is the use of graphical tools such as “marker-by-treatment predictiveness curves” as proposed by Janes et al. (2011). These graphs plot the risk of event in each treatment arm given the marker value vs. the cumulative distribution function of the marker. The cumulative distribution function instead of real values enables the use of a single scale ranging between 0 and 1, allows marker-by-treatment predictiveness curve comparisons, and gives the proportions of patients who would receive each treatment according to the marker values, which is important in medical decision-making. Marker-by-treatment predictiveness curves allow visualization of the performance of a marker for treatment selection, but they rely on a good calibration of the risk modeling in each arm. Such a model often assumes a linear marker-by-treatment interaction on the linear predictor scale. Unfortunately, this assumption is not always valid and not easy to check. Moreover, the marker-by-treatment predictiveness curves is a graphical tool, but does not allow to quantify the performance of the marker for treatment selection.

Other methods assess the treatment selection capacity of a quantitative marker (Huang et al., 2012; Zhang et al., 2014) by measuring its ability to distinguish patients who would have a better outcome with the innovative treatment in comparison to the reference treatment, from patients who would have a worse outcome. However, this kind of approach needs to model the probability of having a better outcome with the innovative treatment compared with the reference treatment in each patient. Except in cross-over trials, this requires modeling using a potential outcomes framework with complex assumptions that may be very difﬁcult to verify (Janes et al., 2015b). For example, Huang et al. (2012) made the monotonicity assump-tion (one treatment is always at least as effective as the other one) to estimate the individual beneﬁt, which is a strong assumption. Zhang et al. (2014) relaxed the latter assumption by assuming that the potential outcomes are independent given observed covariates. That means that the beneﬁt of the innovative treat-ment for a patient may be calculated by comparing its outcome to the one of patients similar regarding the

observed covariates but receiving the reference treatment. This method assumes that the observed covari-ates are sufﬁcient to explain the dependence between the two potential outcomes, which is an assumption difﬁcult to verify.

In the present paper, a simple non-parametric method is proposed to investigate the capacity of a quan-titative marker for treatment selection when the overall risk of event in each treatment arm is equal. The method relies on a special use of Receiver Operating Characteristic (ROC) curves and provides a bounded indicator able to quantify and test the treatment selection capacity of the marker. The method is described, tested, and compared with other methods in a simulation study; it is then applied to a real dataset.

2 Methods

*Throughout this article it is assumed that the marker under study (denoted V ) is measured before treatment*
allocation within the context of a parallel randomized controlled clinical trial with two treatment arms
(innovative vs. reference) and that the outcome of interest is a binary event measured after a ﬁxed duration
of follow-up.

*The binary event of interest is denoted by E, where E = 1 indicates the presence of the event of interest,*
*and E = 0 its absence. Let us also denote the treatments under study by T , where T = 1 indicates the*

*innovative treatment and T =−1 indicates the reference one.*

Moreover, it is assumed that:

*ρ _{(−1)}= ρ*

_{(1)}

*= ρ*(1)

*where ρ _{(−1)}*

*= P(E = 1|T = −1) and ρ*(1)

*= P(E = 1|T = 1) denote the overall risk of event in each*

*arm, and ρ = P(E = 1) denotes the marginal risk of event in the trial. Assumption (1) means that no*
overall difference in efﬁcacy could be demonstrated between the two treatment arms.

Song and Pepe (2004) proposed a mathematical deﬁnition of a treatment selection marker. A marker has no capacity for treatment selection if

*δ(v) = ρ _{(−1)}(v)− ρ*(1)

*(1)*

^{(v) = ρ}(−1)− ρ*∀v*(2)

*where ρ _{(−1)}(v) = P(E = 1|T = −1, V = v) and ρ*(1)

*(v) = P(E = 1|T = 1, V = v) denote the risk of*

*event in each treatment arm for a value v of the marker.*

Conversely, a marker has a capacity for treatment selection when the difference in risks between the two treatment arms is dependent of the marker values. As Song and Pepe (2004) and Janes et al. (2014a) suggested, the difference in risk is the key point in treatment selection marker assessment. A marker is all the more interesting for treatment selection that the changes in risk differences are important according to the marker values.

2.1 Marker-by-treatment predictiveness curves

Marker-by-treatment predictiveness curves are simple graphical tools that help understanding the differ-ence between a treatment selection marker and a simple prognostic marker.

In Figure 1, each panel presents two curves: one relative to the reference treatment and another relative to the innovative treatment.

– Panel A shows a case where the risk of event is independent of the marker value in each treatment arm. As the overall risk of event is the same in each treatment arm (assumption (1)), the marker-by-treatment predictiveness curves overlap; hence, the marker cannot be a marker-by-treatment selection marker. – Panel B shows a case where the risk changes with the marker value in both treatment arms. Thus,

the marker may be called “prognostic marker” in each arm. However, the difference in risk between
*the two arms (δ(V )) is constant and equal to 0 in this case. Thus, there is no interaction between the*
treatment arm and the marker values, the marker cannot be a treatment selection marker.

– Panel C shows a case where the risk of event occurrence decreases with the marker value in the
innovative arm but increases in the reference arm: the prognostic value is different between treatment
*arms. δ(V ) changes with the marker value: this marker is thus a treatment selection marker. In this*
case, the threshold of marker value that deﬁnes treatment allocation should be close to the marker
value that corresponds to 50% of its cumulative distribution.

– Panel D shows another case where the risk of event occurrence decreases with the marker value in
the innovative arm and increases in the reference arm, but the slopes are greater than in panel C: the
prognostic value of the marker is stronger in the two treatment arms. This marker is also a treatment
selection marker; furthermore, its capacity for treatment selection is greater than in panel C because
*of greater magnitude of changes in δ(V ). The treatment selection capacity of a marker is all the more*
*important that the changes in δ(V ) over the cumulative distribution function of the marker values are*
important.

Thus, a marker is a treatment selection marker when its prognostic ability is different between two treat-ment arms, which is the deﬁnition of a marker-by-treattreat-ment interaction. This is the basis for the develop-ment of the method presented hereafter.

2.2 Notations and illustration of the ”area between curves”

A simple and non-parametric method to estimate the prognostic ability of a marker in a single treatment
*arm relies on the area under the ROC curve (AUC, θ in equations) that quantiﬁes the ability of the marker*
to discriminate subjects who will experience the event in a given post-treatment interval from those who
will not (Hanley and McNeil, 1982). We propose to estimate the treatment selection capacity of a marker
by estimating the difference in prognostic ability between two treatment arms. This difference can be
quantiﬁed by the area that separates the two treatment-arm-speciﬁc ROC curves, named “area between
curves” (ABC).

A classical assumption in marker evaluation using ROC curves is that the risk of event in both arms is either monotonically increasing or monotonically decreasing over the marker values. Otherwise, the issue of improper ROC curves would arise (Metz and Pan, 1999).

When the two ROC curves do not intersect, the ABC can be measured by the difference between the

two AUCs: Δ* _{θ}= θ_{(−1)}−θ*(1)

*, θ*

_{(−1)}and θ_{(1)}

*being, respectively, the AUCs of the marker for T =−1 and*

*T = 1. As both θ _{(−1)}and θ*

_{(1)}range between 0 and 1, Δ

*θ*ranges between

*−1 and 1 (when Δθ*is negative,

the ABC is the absolute value of Δ* _{θ}*).

The second row in Figure 1 presents the ROC curves that correspond to the marker-by-treatment pre-dictiveness curves of the ﬁrst row.

– Panel E shows two overlapping ROC curves on the diagonal; the marker has no prognostic ability in

either arm, Δ* _{θ}*= 0.

– Panel F shows two overlapping ROC curves but distinct from the diagonal; the marker has the same

prognostic ability in both arms but no capacity for treatment selection and Δ* _{θ}*= 0.

– Panel G shows two distinct ROC curves located on either side of the diagonal; the marker has a prognostic ability in both arms but the risk is increasing in the innovative arm and decreasing in the

reference arm. The marker in Panel G has a capacity for treatment selection and Δ* _{θ}*=

*−0.11.*

– Panel H shows two distinct ROC curves located on either side of the diagonal too. As shown by the marker-by-treatment predictiveness curves, the marker in panel H has a stronger capacity for treatment

selection than the marker in panel G. This is reﬂected by a Δ* _{θ}* =

*−0.48 further from zero than in*

To summarize, the capacity of a marker for treatment selection increases with the ABC or the gap between

the ROC curves; thus Δ* _{θ}*different from 0. When Δ

*= 0 (overlapping ROC curves) the marker has no*

_{θ}capacity for treatment selection. When Δ*θ*is equal to*−1 or 1, the marker is a perfect treatment selection*

marker; i.e. the marker distinguishes perfectly patients with (or, alternatively, without) the event under the innovative treatment from those under the reference treatment (Appendix A.1 presents an illustration

of the deﬁnition of a perfect marker). Furthermore, Δ* _{θ}* may be used to test whether a marker has a

statistically signiﬁcant treatment selection capacity (i.e. testing whether Δ* _{θ}*= 0) and compare the capacity

for treatment selection of several markers.

2.3 Justiﬁcation of the use of Δ_{θ}

The use of Δ* _{θ}*to quantify the treatment selection capacity of a marker is justiﬁed by its close connection

with the difference in risk between the two treatment arms over the marker values. Viallon and Latouche
(2011) demonstrated that the AUC in a single treatment arm could be written as a function of the
predic-tiveness curve:
*θ _{(T )}*=

_{+∞}*−∞*

^{F (v)ρ}^{(T )}^{(v) dF (v)}−

^{ρ}^{2}

*(T )*2

*ρ*(1

_{(T )}*− ρ(T )*

^{)}

*where F (.) is the cumulative distribution function of marker V . With this expression, it is easy to show*
that when the overall risks of event occurrence in the reference and innovative treatment arms are equal

*(i.e. when ρ _{(−1)}= ρ*

_{(1)}

*= ρ), Δ*

_{θ}can be expressed as a function of δ(V ):Δ* _{θ}= θ_{(−1)}− θ*(1)

^{=}

_{+∞}

*−∞* ^{F (v)}× δ(v) dF (v)

*ρ(1− ρ)*

Δ_{θ}is greater when the variations in the risk difference δ(V ) are high on the range of marker values, hence

when the marker has a greater capacity for treatment selection. According to equation (2), it can be shown

that when a marker has no capacity for treatment selection then Δ* _{θ}*= 0, and conversely (Appendix A.2).

2.4 Connection between Δ* _{θ}*and two other indicators

Hereafter, two indicators are presented in order to show their connection with Δ* _{θ}*: the total gain indicator

*of Janes et al. (2014b) and the γ indicator of Zhang et al. (2017).*

2.4.1 The Total Gain

In their article, Janes et al. (2014a) proposed to evaluate the overall capacity of a marker for treatment selection using the total gain (TG) expressed as:

TG =

*|δ(v) − (ρ(−1)− ρ*(1)^{)}*| dFδ*

*In this equation, F _{δ}is the cumulative distribution function of δ(V ).*

The TG indicator measures the overall treatment selection capacity of a marker. When the marker has no treatment selection capacity, the TG equals 0, and conversely so. However, the maximum TG value

*depends on ρ _{(−1)}and ρ*

_{(1)}, and therefore the TG cannot be used to compare markers from different studies.

The TG and Δ*θ* are two closely connected overall indicators of the treatment selection capacity of a

marker. From the expressions of TG and Δ* _{θ}*, one may see that there is a monotone relationship between

these two indicators and that the intensity of this relationship depends on the overall risk of event

occur-rence in each treatment arm. However, whereas the TG is based on risks, Δ* _{θ}*is based on ROC curves that

of the marker to distinguish patients with (or, alternatively, without) the event under the innovative

treat-ment from those under the reference treattreat-ment. Moreover, Δ* _{θ}*is non-parametric regarding the functional

form of the interaction, and is always bounded between*−1 and 1.*

Finally, note that Janes et al. (2014a) did not propose an inference method for the TG except from bootstrap.

2.4.2 *The γ concordance measure*

In another article, Zhang et al. (2017) proposed a quantitative concordance measure for the assessment of the overall performance of treatment selection markers. This concordance measure is expressed as

*γ = E(G _{ij}*)

*where G _{ij}= sgn(V_{i}− Vj^{)[δ(V}i*

^{)}

*− δ(Vj)], i and j are the indices of two independent patients, and sgn(.)*

is the sign function.

*As one may see, when δ(V ) is constant over the marker values then γ = 0, and the greater the variations*
*in δ(V ) are, the greater γ is, and the greater the performance of the marker for treatment selection is.*

There is a connection between this indicator and the two ones described above as there are all functions
*of δ(V ) and that their value depends on the variations in δ(V ). γ is estimated non-parametrically using*
pairwise comparisons of patient outcomes; since it is a U-statistic, the estimator converges to a normal
distribution (Hoeffding, 1948). The variance of the estimator follows from asymptotic theory and may rely
on a working model that predicts the risk of event occurrence in order to be more efﬁcient. The variance is
optimal when the working model for risk prediction includes all the covariates that impact the risk of event
(see Zhang et al. (2017) for more details).

2.5 Estimation and inference of Δ_{θ}

When estimated non-parametrically with the trapezoidal rule, the AUC estimate is asymptotically normally

distributed with Delong’s variance (DeLong et al., 2011). As Δ* _{θ}*is a difference between two independent

AUCs, its estimator is also asymptotically normally distributed, with variance equal to the sum of the two AUC variances. Thus, a symmetric conﬁdence interval can be obtained using the normal approximation:

Δ_{θ}± z1−α/2×^{}Var( Δ* _{θ}*)

In this expression Δ* _{θ}* denotes the estimator of Δ

*is the (1*

_{θ}, and z_{1−α/2}*− α/2) quantile of a standard*

normal distribution.

*The symmetric conﬁdence interval may indicate limits > 1 or <−1, especially when Δθ*is close to 1

or*−1. To obtain asymmetric conﬁdence limits between 1 and −1, these limits may be calculated on the*