Impacting Clinical Outcomes: Computational Analysis of Complex Genomics Data Sets

computational-biologyData analysis, interpretation, and integration of high volumes of multidimensional data are the key to elucidating the mechanism of action of pharmaceutical compounds, understanding the pathways to disease onset and progression, biomarker and target identification. In today’s era of large data sets, biopharmaceutical companies are utilizing computational biology to comb through these massive sets of genomic data to find links between specific genotypes and diseases, screen drug data to identify therapeutic candidates, as well as identify responders to the treatment. 

The paradigm of ‘one drug, one target’ is evolving to understanding complex biological systems that can help predict the adverse effects of a drug or the therapeutic effects of a drug. At Covance Genomics Lab (CGL), we provide end-to-end solutions to biopharmaceutical clients from sample receipt to quality control, data analysis, and interpretation of biological data. CGL’s Computational Biology group is responsible for ensuring high-quality genomics data through our industry leading lab informatics, and servicing clients with experiment design, statistical analysis, and interpretation of customer data.

Covance Computational Biology Group Helps Predict Patient Response to RA Agent

Patients with Rheumatoid Arthritis (RA) exhibit substantial variability in both the magnitude and duration of their clinical response to treatment. Despite considerable research, it is difficult to identify blood-born biomarkers of compound efficacy and responder characterization, especially ones that can be used to predict the likelihood of clinical response prior to drug exposure.

Tabalumab is a monoclonal antibody neutralizing membrane bound and soluble B cell activating factor (BAFF). It has been shown to reduce the signs and symptoms of RA and may be an effective long-term treatment for patients who have failed conventional treatments. Our Computational Biology group at Covance employed statistical analysis and predictive modeling in combination with pathway analysis to help one of our large pharmaceutical clients determine this compound’s mechanism of action and identify responders.

Through gene expression profiling of whole blood mRNA, obtained from phase 2 clinical trial samples, our Computational Biology Group identified statistically significant PD/efficacy markers of compound effect. These markers clearly demonstrated the on-target compound engagement and significantly correlated with the clinical data, demonstrating disease obliteration and efficacy. Pathway analysis of these markers using GeneGo software, as well as gene set enrichment analysis using Gene Ontology, demonstrated that these markers formed a tight network with Tabulamab’s BAFF.

As part of this analysis, markers predictive of patient response to the treatment were also identified from the blood gene expression profiles prior to drug exposure (pre-treatment/baseline). These markers will help the client make early decisions and select patients who will likely respond to the treatment based on the blood gene expression data prior to the compound administration.

In addition, CGL and Computational Biology validated these markers via independent technology (qPCR), demonstrating that they perform robustly in an independent set of patients from another clinical trial with a similar compound. Thus, using gene expression data from patient blood, we were able to provide the client with a complete solution to a complex question of biomarker discovery and validation/verification from the clinical samples. Overall, the client can apply these identified biomarkers in new trials of similar compounds, as well as make critical early decisions on patient enrollment.

Methods Used

The methods we used included obtaining whole blood samples (at baseline and post-treatment) from 158 RA subjects who were previously enrolled in a phase 2 randomized, double blinded, placebo controlled clinical trial and had an inadequate response to methotrexate – an anti-inflammatory agent also used for the treatment of RA. In addition to these 158 RA subjects, we also collected samples from 30 healthy blood donors, which we used for identification of disease genes. These genes were later used in monitoring the disease reversal by the compound and constituted a valuable component in assessing the compound’s efficacy.

Affymetrix U133 Plus 2.0 arrays were used to measure gene expressions in the blood from the clinical trial samples. The quality and the signals obtained from the whole blood samples profiled with these arrays were superior, allowing the detection of robust and reproducible signals after the arrays were pre-processed using a Robust Multi-array Average (RMA) algorithm3. The RMA processed data was then interrogated by means of statistical analysis using linear mixed effect models to identify the PD/efficacy biomarkers of compound actions. These models were carefully selected to reflect the longitudinal and repeated measures design of the clinical trial and provided significant findings after multiple hypothesis adjustments and false discovery rate (FDR) corrections were applied to the raw p-values of the model.

Logistic regression analysis and various predictive modeling tools such as Support Vector Machines (SVM), as well as other classifiers, were trained and cross-validated in order to select biomarkers and predict the responders and non-responders to the treatment. Identified biomarkers were validated by means of independently performed qPCR in the same set of samples, as well as the samples obtained from the different patients in another clinical trial.


The key highlights of our findings were the identification of compound response markers, such as TCL1A and the responder/non-responderclassification marker CLEC4C. TCL1A was dose- and visit-dependently down regulated by the compound and significantly anti-correlated with the clinical endpoint ACR, indicating the disease reversal.

The significance of this marker could be observed earlier that the final visit of the clinical trial, verifying that using genomic markers can effectively shorten the duration of trials and result on significant commercial benefit.

The CLEC4C-responder/non-responder classification marker demonstrated a statistically significant increase in level of gene expression in the responder patients compared to non-responders. This marker assesses the likelihood of response based on the gene expression measurements from pre-treatment/baseline blood data and constitutes a powerful companion diagnostic utility for clinical trial enrollment and maximization of response. It is also a new and significant discovery that provides a gateway into the field of personalized medicine.

Overall, using mRNA expression profiling and pathway analysis, significant gene expression changes associated with the clinical response to Tabalumab were identified. For our client, this data may provide additional potential targets or biomarkers for future studies in RA. It can also be used to prevent exposure to patients to non-beneficial treatments.


This is just one of several examples of how the Computational Biology group at Covance utilizes our more than total of 50 years experience in applications of mathematical and statistical methods to help our clients solve problems of drug discovery. Our data analysis services cover diverse areas of drug discovery and assays such as genomic, proteomic, genetic, NextGen Sequencing, and clinical data. We routinely work with preclinical in-vitro and in-vivo compound screening and biomarker identification experiments, design and statistical analysis of clinical trials, and perform sophisticated modeling and analysis of diverse data types to empower the decision making in the clinical and preclinical arenas. We utilize state-of-the-art tools for statistical analysis and predictive modeling available in SAS, R, and Matlab, and develop our own algorithms in C, C++ and Python to efficiently analyze massive data sets.

In addition to the models and methods applied in the biomarker discovery of Tabulamab, our Computational Biology team developed and tested a new proprietary, ground breaking mathematical methodology for the analysis of multidimensional data resulting from genomic experiments. This methodology allows using multiple marker aggregation techniques to assess the efficacious effects of the compounds when traditional single marker approaches fail. This new methodology effectively alleviates the curse of multiple hypotheses testing used in one-marker at a time statistical modeling and significantly helps interpreting the disease reversal effect of the compound. It also provides key insights into responder and non-responder characterization. This method can be widely used in a variety of clinical applications and therapeutic areas.

For more information, contact us today.

About the Authors

Sergey Stepaniants, Ph.D., is Head of Computational Biology, Covance Genomics Laboratory. Sergey received his M.S. from the Moscow Institute for Physics and Technology and his Ph.D. from the University of Rhode Island in Theoretical Physics and Applied Mathematics. He began his Computational Biology career as a postdoctoral fellow at the University of Illinois and Beckman Institute for Advanced Studies. Sergey has more than 15 years experience in genomics and genetics data analysis, focusing on biomarker/target discovery in pre-clinical and clinical experiments. Sergey has also been published in numerous peer-reviewed journals.

Anne Ho is a Senior Scientist in the Covance Genomics Laboratory’s Computational Biology Group. Anne has more than 10 years experience in genomics and genetics data analysis. Before joining Covance, Anne worked for 6 years as a data analyst at Merck and 3 years at Incyte Genomics in their Bioinformatics Department. She holds a BS in Biochemistry from the University of California, Davis.

Acknowledgments: We truly appreciate significant contributions and useful discussions of our work with CGL’s Sinnathamby Gomathinayagam Ph.D, Walter Jessen Ph.D, Bin Li PhD, Anup Madan Ph.D, and Mark Parrish.

2 thoughts on “Impacting Clinical Outcomes: Computational Analysis of Complex Genomics Data Sets

  1. Thank you for posting this very interesting summary of biomarker development capabilities at Covance. I am wondering if these data have been/will be published? Our group would like to see additonal detail regarding the statistical results, e.g., what is the OOB AUROC for CLEC4C-responder/non-responder model in the test set, was a final model locked and pre-specified prior to validation, and what is the AUROC of the same classifier in the validation set.