Catalogue Search | MBRL

Modeling unobserved heterogeneity in multistate event history data using frailty and weighted survival approaches

by Tripathy, Abhipsa , Bhattacharjee, Atanu , Vishwakarma, Gajendra K. in 631/67 , 639/705 , Censoring

2025

Conventional survival analysis models typically assume that the hazard function depends solely on the baseline hazard and covariate values, overlooking unobserved factors that influence survival outcomes. In practice, however, unmeasured variables often contribute to heterogeneity among seemingly similar individuals. Frailty models offer an effective approach to account for such unobserved heterogeneity, providing a robust framework for analyzing naturally clustered survival data. This study applies frailty models to multistate event history data, emphasizing their ability to handle unobserved heterogeneity. We introduce individual-specific survival weights to adjust survival times, better reflecting the impact of unmeasured factors. These weighted survival times are critical when data exhibit bias or when standard models fail to fully capture the influence of investigated variables. Through a simulation study, we evaluate the effectiveness and performance of frailty models in a multistate framework, comparing mean, mean squared error (MSE), and bias of regression coefficients with and without frailty. For example, in the simulated dataset for age bias has reduced from -0.01 in unweighted survival time to -0.03 in weighted survival time for transition , similarly for bias has reduced from 0.01 to -0.05. Our findings underscore the importance of addressing unobserved heterogeneity in survival analysis, particularly in multistate models with weighted survival times.

Journal Article

Share this book

Add to My Shelf

A two-stage joint model approach to handle incomplete time dependent markers in survival data through inverse probability weight and multiple imputation

by Rajbongshi, Bhrigu Kumar , Bhattacharjee, Atanu , Vishwakarma, Gajendra K. in 639/705/531 , 639/705/794 , Accuracy

2025

Joint models for longitudinal and survival data are essential in biomedical research, enabling the simultaneous analysis of biomarker progression and clinical events. These models account for the interdependence between longitudinal and survival outcomes, improving insights into disease progression. However, missing data in longitudinal studies pose challenges, particularly when time dependent markers contain missing values, leading to biased estimates. This paper proposes a two-stage joint modeling framework integrating multiple imputation and inverse probability weighting. First, a linear mixed-effects model estimates biomarker trajectories, handling missing data using multiple imputation. Second, predicted biomarker values are incorporated into a Cox model, where inverse probability weight corrects for selection bias in survival estimation. A detailed simulation study has been conducted to study the performance of the proposed method compared to other common approaches. Results demonstrate the framework’s effectiveness in handling incomplete time dependent covariates while providing precise estimates of the relationship between biomarker progression and survival outcomes.

Journal Article

Share this book

Add to My Shelf

Estimation of Treatment Effect with Missing Observations for Three Arms and Three Periods Crossover Clinical Trials

by Bhattacharjee, Atanu in Artificial Intelligence , Bayesian analysis , Business and Management

2020

The statistical analysis in presence of missing data in any study is challenging. It gets more attention since last few years for clinical trials. There are several reasons for the occurrence of missing data in the crossover trial. However, attempts toward crossover trial data are negligible. This manuscript is dedicated towards development of missing data handling technique for three arms three periods crossover trial.Data obtained from a crossover trial having microarray gene expression values are considered. The gene expression values are considered as outcomes with therapeutic effects. The statistical methodology are explained through Multiple Imputation and Bayesian approach separately. Further, their performance with same data is documented. In Bayesian context, it becomes feasible to perform the causal effect relation jointly with imputation. However, we failed to perform it through mixed effect model jointly. We performed separately Multiple Imputation procedures to overcome the missing values in the dataset and thereafter performed with the mixed effect model to explore the causal effect relation between therapeutic arm on gene expression values.

Journal Article

Share this book

Add to My Shelf

HER2 borderline is a negative prognostic factor for primary malignant breast cancer

by Bhattacharjee, Atanu , Dikshit, Rajesh , Dutt, Shilpee in Adult , Aged , Aged, 80 and over

2020

Background HER-(human epidermal growth factor receptor 2) gene amplification and protein overexpression are important predictive, prognosis markers, and therapeutic target for breast cancer, emphasizing the importance of categorizing patients into HER2 positive and negative. However, from immunohistochemistry scores, 2% patients are neither HER2 + nor -ve, but borderline called HER2B. To make informed treatment decisions of these patients, it is important to know how different this group is compared to HER-2 positive/negative. Methods We analyzed n = 104,668 breast cancer patient samples from Surveillance, Epidemiology, and End Results (SEER) database. Survival analysis was performed using open source R (Cran project R version 3.5.0) “survival” package. Hazard ratio with confidence intervals was computed using coxph function. Results Of n = 104,668, 2239 (2.13%) patients were HER2 borderline, 87,157 (83.26%) HER2-negative, and 15,272 (14.6%) HER2-positive. The breast cancer as primary malignancy was observed in 84,944 (81.16%) patients. In primary malignant breast cancer (PMBC) patients, the hazard ratio among HER2-negative patients was significantly higher than HER2-positive patient samples (HR = 0.772, 95% CI 0.715–0.833, p = < .001), whereas HER2 negative status was not significantly favorable in PMBC negative patients in HER2-positive (HR = .919, 95% 0.797–1.06, p = .248). Most importantly in PMBC patients, the HR for HER2-borderline was poor in comparison to HER2 negative (HR = 1.354, 95% CI 1.126–1.627, p = < .001). Conclusion This is the first report with large cohort of patient samples and significant statistical power to demonstrate that HER2 borderline represents a negative prognostic factor for PMBC. Thus providing rationale for controlled clinical trial for HER2-targeted therapies in HER2-borderline patients.

Journal Article

Share this book

Add to My Shelf

Enhancing survival risk prediction through imputation and feature selection in high-dimensional protein biomarker data

by Kumar, Neelesh , Bhattacharjee, Atanu , Vishwakarma, Gajendra K. in 631/114 , 631/67 , 692/53

2026

Protein-based molecular biomarkers play an important role in prognostic modeling and risk stratification in precision medicine. However, longitudinal survival studies involving high-dimensional biomarker data are frequently challenged by pervasive missingness and limited sample sizes, which can compromise model stability and interpretability. In this study, we present and evaluate a reproducible analytical pipeline for survival risk prediction that integrates established methods for missing data handling, feature selection, and time-to-event modeling. Missing values are addressed using an unsupervised random forest-based imputation approach that leverages internal covariate structure without incorporating outcome information, thereby reducing the risk of information leakage. Feature dimensionality is subsequently reduced using penalized Cox regression with the least absolute shrinkage and selection operator, followed by refinement and stability assessment using random survival forests to capture nonlinear effects and interactions. The final set of selected biomarkers is examined using univariate and multivariable Cox proportional hazards models to support clinical interpretability and risk stratification. Using a publicly available proteomic dataset from cancer patients, we demonstrate how this sequential modeling strategy can identify stable prognostic biomarkers while highlighting the challenges of overfitting in small-sample, high-dimensional survival settings. The proposed workflow serves as a practical and transparent framework for biomarker-driven survival analysis rather than a new statistical methodology.

Journal Article

Share this book

Add to My Shelf

Competing risk multistate censored data modeling by propensity score matching method

by Tripathy, Abhipsa , Bhattacharjee, Atanu , Vishwakarma, Gajendra K. in 631/67 , 692/308 , 692/499

2024

The potential contribution of the paper is the use of the propensity score matching method for updating censored observations within the context of multi-state model featuring two competing risks.The competing risks are modelled using cause-specific Cox proportional hazard model.The simulation findings demonstrate that updating censored observations tends to lead to reduced bias and mean squared error for all estimated parameters in the risk of cause-specific Cox model.The results for a chemoradiotherapy real dataset are consistent with the simulation results.

Journal Article

Share this book

Add to My Shelf

jmBIG: enhancing dynamic risk prediction and personalized medicine through joint modeling of longitudinal and survival data in big routinely collected data

by Rajbongshi, Bhrigu Kumar , Bhattacharjee, Atanu , Vishwakarma, Gajendra K. in Algorithms , Automation , Bayes Theorem

2024

We have introduced the R package jmBIG to facilitate the analysis of large healthcare datasets and the development of predictive models. This package provides a comprehensive set of tools and functions specifically designed for the joint modelling of longitudinal and survival data in the context of big data analytics. The jmBIG package offers efficient and scalable implementations of joint modelling algorithms, allowing for integrating large-scale healthcare datasets. By utilizing the capabilities of jmBIG, researchers and analysts can effectively handle the challenges associated with big healthcare data, such as high dimensionality and complex relationships between multiple outcomes. With the support of jmBIG, analysts can seamlessly fit Bayesian joint models, generate predictions, and evaluate the performance of the models. The package incorporates cutting-edge methodologies and harnesses the computational capabilities of parallel computing to accelerate the analysis of large-scale healthcare datasets significantly. In summary, jmBIG empowers researchers to gain deeper insights into disease progression and treatment response, fostering evidence-based decision-making and paving the way for personalized healthcare interventions that can positively impact patient outcomes on a larger scale.

Journal Article

Share this book

Add to My Shelf

Glutathione S-transferasesP1 AA (105Ile) allele increases oral cancer risk, interacts strongly with c-Jun Kinase and weakly detoxifies areca-nut metabolites

by Boruah, Nabamita , Nongrum, Henry B. , Mukherjee, Souvik in 38/77 , 631/114 , 631/67/68

2020

The Glutathione S-transferases (GSTs) protects cellular DNA against oxidative damage. The role of GSTP1 polymorphism (A313G; Ile105Val) as a susceptibility factor in oral cancer was evaluated in a hospital-based case-control study in North-East India, because the habit of chewing raw areca-nut (RAN) with/without tobacco is common in this region. Genetic polymorphism was investigated by genotyping 445 cases and 444 controls. Individuals with the GSTP1 AA-genotype showed association with the oral cancer (OR = 3.1, 95% CI = 2.4–4.2, p = 0.0002). Even after adjusting for age, sex and habit the AA-genotype is found to be significantly associated with oral cancer (OR = 2.4, 95% CI = 1.7–3.2, p = 0.0001). A protein-protein docking analysis demonstrated that in the GG-genotype the binding geometry between c-Jun Kinase and GSTP1 was disrupted. It was validated by immunohistochemistry in human samples, showing lower c-Jun-phosphorylation and down-regulation of pro-apoptotic genes in normal oral epithelial cells with the AA-genotype. In silico docking revealed that AA-genotype weakly detoxifies the RAN/tobacco metabolites. In addition, experiments revealed a higher level of 8-Oxo-2′-deoxyguanosine induction in tumor samples with the AA-genotype. Thus, habit of using RAN/tobacco and GSTP1 AA-genotype together play a significant role in predisposition to oral cancer risk by showing higher DNA-lesions and lower c-Jun phosphorylation that may inhibit apoptosis.

Journal Article

Share this book

Add to My Shelf

Clinical biomarker discovery by SWATH-MS based label-free quantitative proteomics: impact of criteria for identification of differentiators and data normalization method

by Chawade, Aakash , Bhattacharjee, Atanu , Govekar, Rukmini in Analysis , Bioinformatics , Bioinformatics and Computational Biology (Methods development to be 10203)

2019

Background SWATH-MS has emerged as the strategy of choice for biomarker discovery due to the proteome coverage achieved in acquisition and provision to re-interrogate the data. However, in quantitative analysis using SWATH, each sample from the comparison group is run individually in mass spectrometer and the resulting inter-run variation may influence relative quantification and identification of biomarkers. Normalization of data to diminish this variation thereby becomes an essential step in SWATH data processing. In most reported studies, data normalization methods used are those provided in instrument-based data analysis software or those used for microarray data. This study, for the first time provides an experimental evidence for selection of normalization method optimal for biomarker identification. Methods The efficiency of 12 normalization methods to normalize SWATH-MS data was evaluated based on statistical criteria in ‘Normalyzer’—a tool which provides comparative evaluation of normalization by different methods. Further, the suitability of normalized data for biomarker discovery was assessed by evaluating the clustering efficiency of differentiators, identified from the normalized data based on p-value, fold change and both, by hierarchical clustering in Genesis software v.1.8.1. Results Conventional statistical criteria identified VSN-G as the optimal method for normalization of SWATH data. However, differentiators identified from VSN-G normalized data failed to segregate test and control groups. We thus assessed data normalized by eleven other methods for their ability to yield differentiators which segregate the study groups. Datasets in our study demonstrated that differentiators identified based on p-value from data normalized with Loess-R stratified the study groups optimally. Conclusion This is the first report of experimentally tested strategy for SWATH-MS data processing with an emphasis on identification of clinically relevant biomarkers. Normalization of SWATH-MS data by Loess-R method and identification of differentiators based on p-value were found to be optimal for biomarker discovery in this study. The study also demonstrates the need to base the choice of normalization method on the application of the data.

Journal Article

Share this book

Add to My Shelf

Gene co-expression network construction and analysis for identification of genetic biomarkers associated with glioblastoma multiforme using topological findings

by Redekar, Seema Sandeep , Bhattacharjee, Atanu , Varma, Satishkumar L in Analysis , Biological markers , Biomarkers

2023

Glioblastoma multiforme (GBM) is one of the most malignant types of central nervous system tumors. GBM patients usually have a poor prognosis. Identification of genes associated with the progression of the disease is essential to explain the mechanisms or improve the prognosis of GBM by catering to targeted therapy. It is crucial to develop a methodology for constructing a biological network and analyze it to identify potential biomarkers associated with disease progression. Gene expression datasets are obtained from TCGA data repository to carry out this study. A survival analysis is performed to identify survival associated genes of GBM patient. A gene co-expression network is constructed based on Pearson correlation between the gene's expressions. Various topological measures along with set operations from graph theory are applied to identify most influential genes linked with the progression of the GBM. Ten key genes are identified as a potential biomarkers associated with GBM based on centrality measures applied to the disease network. These genes are SEMA3B, APS, SLC44A2, MARK2, PITPNM2, SFRP1, PRLH, DIP2C, CTSZ, and KRTAP4.2. Higher expression values of two genes, SLC44A2 and KRTAP4.2 are found to be associated with progression and lower expression values of seven gens SEMA3B, APS, MARK2, PITPNM2, SFRP1, PRLH, DIP2C, and CTSZ are linked with the progression of the GBM. The proposed methodology employing a network topological approach to identify genetic biomarkers associated with cancer.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter