Catalogue Search | MBRL

scClassify: sample size estimation and multiscale classification of cells using single and multiple reference

by Yang, Pengyi , Lin, David M , Cao, Yue in Accuracy , Animals , Annotations

2020

Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single‐cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state‐of‐the‐art methodology in automated cell type identification from scRNA‐seq data. Synopsis scClassify is a multiscale classification framework based on ensemble learning and cell type hierarchies, enabling sample size estimation required for accurate cell type classification and joint classification of cells using multiple references. scClassify performs multiscale cell type classification based on cell type hierarchies constructed from single or multiple reference datasets. It implements a post‐hoc clustering procedure for discovering novel cell types from cells that are unassigned due to the absence of their types in the reference data. It enables the estimation of the number of cells required in a reference dataset to accurately discriminate a given cell type in a cell type hierarchy. Application to large atlas datasets such as Tabula Muris demonstrates its ability to refine cell types and identify cells from sub‐populations. Graphical Abstract scClassify is a multiscale classification framework based on ensemble learning and cell type hierarchies, enabling sample size estimation required for accurate cell type classification and joint classification of cells using multiple references.

Journal Article

Share this book

Add to My Shelf

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland−Altman Method

by Lu, Meng-Jie , Ji, Mu-Huo , Liu, Yu-Xiu in Biometry , Bland-Altman method , Humans

2016

The Bland–Altman method has been widely used for assessing agreement between two methods of measurement. However, it remains unsolved about sample size estimation. We propose a new method of sample size estimation for Bland–Altman agreement assessment. According to the Bland–Altman method, the conclusion on agreement is made based on the width of the confidence interval for LOAs (limits of agreement) in comparison to predefined clinical agreement limit. Under the theory of statistical inference, the formulae of sample size estimation are derived, which depended on the pre-determined level of , β, the mean and the standard deviation of differences between two measurements, and the predefined limits. With this new method, the sample sizes are calculated under different parameter settings which occur frequently in method comparison studies, and Monte-Carlo simulation is used to obtain the corresponding powers. The results of Monte-Carlo simulation showed that the achieved powers could coincide with the pre-determined level of powers, thus validating the correctness of the method. The method of sample size estimation can be applied in the Bland–Altman method to assess agreement between two methods of measurement.

Journal Article

Share this book

Add to My Shelf

Blinded continuous monitoring of nuisance parameters in clinical trials

by Friede, Tim , Miller, Frank in Applications , Approximation , Bias

2012

Determination of a clinical trial's size is an important task in the planning of any trial because of the direct implications of the sample size on feasibility, costs and timelines. However, sample size calculations are often subject to substantial uncertainty due to limited prior information on the size of nuisance parameters such as variances or event rates. Continuous monitoring of the nuisance parameter in clinical trials has been proposed as a tool to size trials appropriately. With this approach, the nuisance parameter is continuously monitored during the trial. The trial is stopped when the actual estimate for the nuisance parameter and sample size fulfil a stopping criterion. Continuous monitoring can therefore be viewed as a stochastic process with stopping time. We describe the bias that occurs with unblinded continuous monitoring of the variance in clinical trials by means of a simulation study. Then we propose a procedure for blinded continuous monitoring that does not require breaking the treatment code during the on-going study and show that the procedure does not suffer from the same biases as observed in unblinded monitoring. Results on the performance properties of such designs are given and the designs are compared with blinded re-estimation procedures with a single data look. By means of asymptotic theoretical arguments and finite sample size simulations we find that the variability in sample size is smaller with blinded continuous monitoring than with blinded sample size re-estimation whenever the power for both designs is close to the target value. Repeated sample size re-estimation is in between continuous monitoring and sample size re-estimation in this respect. Furthermore, we present a hypertension trial where blinded sample size re-estimation with a single data look was applied and we investigate the properties of blinded continuous monitoring in this setting. Finally we close with a brief discussion.

Journal Article

Share this book

Add to My Shelf

Assessing data size requirements for training generalizable sequence-based TCR specificity models via pan-allelic MHC-I point-mutation ligandome evaluation

by Rayment, Isaac , Chaves García-Mascaraque, Sergio , Gorbushin, Nikolai in 631/114/1305 , 631/250/1619/554/1775 , 631/250/21/324/1509

2025

Rapid identification of T cell receptors (TCRs) that specifically bind patient-unique neoepitopes is a critical challenge for personalized TCR-based therapies in oncology. Due to enormous diversity of both TCR and neoepitope repertoires, a machine learning predictor of TCR-pMHC specificity for personalized therapy must generalize to TCRs and epitopes not seen in the training data. We estimate the necessary size of such training data. We first confirm that published models fail to generalize beyond a single-residue dissimilarity to the epitope training set distribution. We then impute the point-mutation ligandome across the 34 most prevalent human MHC alleles and represent it as a graph based on our established dissimilarity cutoff. By finding the dominating set of this graph, we estimate that between one and 100 million epitopes are required to train a generalizable sequence-based TCR specificity prediction model—1000 times the size of current public data.

Journal Article

Share this book

Add to My Shelf

Sample size estimation for randomised controlled trials with repeated assessment of patient-reported outcomes: what correlation between baseline and follow-up outcomes should we assume?

by Jacques, Richard M. , dos Anjos Henriques-Cadby, Inês Bonacho , Candlish, Jane in Analysis , Analysis of covariance , Biomedicine

2019

Background Patient-reported outcome measures (PROMs) are now frequently used in randomised controlled trials (RCTs) as primary endpoints. RCTs are longitudinal, and many have a baseline (PRE) assessment of the outcome and one or more post-randomisation assessments of outcome (POST). With such pre-test post-test RCT designs there are several ways of estimating the sample size and analysing the outcome data: analysis of post-randomisation treatment means (POST); analysis of mean changes from pre- to post-randomisation (CHANGE); analysis of covariance (ANCOVA). Sample size estimation using the CHANGE and ANCOVA methods requires specification of the correlation between the baseline and follow-up measurements. Other parameters in the sample size estimation method being unchanged, an assumed correlation of 0.70 (between baseline and follow-up outcomes) means that we can halve the required sample size at the study design stage if we used an ANCOVA method compared to a comparison of POST treatment means method. So what correlation (between baseline and follow-up outcomes) should be assumed and used in the sample size calculation? The aim of this paper is to estimate the correlations between baseline and follow-up PROMs in RCTs. Methods The Pearson correlation coefficients between the baseline and repeated PROM assessments from 20 RCTs (with 7173 participants at baseline) were calculated and summarised. Results The 20 reviewed RCTs had sample sizes, at baseline, ranging from 49 to 2659 participants. The time points for the post-randomisation follow-up assessments ranged from 7 days to 24 months; 464 correlations, between baseline and follow-up, were estimated; the mean correlation was 0.50 (median 0.51; standard deviation 0.15; range − 0.13 to 0.91). Conclusions There is a general consistency in the correlations between the repeated PROMs, with the majority being in the range of 0.4 to 0.6. The implications are that we can reduce the sample size in an RCT by 25% if we use an ANCOVA model, with a correlation of 0.50, for the design and analysis. There is a decline in correlation amongst more distant pairs of time points.

Journal Article

Share this book

Add to My Shelf

Sample size variation in single-time post-dose assessment vs multi-time post-dose assessment version 1; peer review: awaiting peer review

by Mathur, Ashwini , Sayyed, Sarfaraz , Kamath, Asha

2022

Background: Many randomized trials measure a continuous outcome simultaneously at baseline and after taking the drug. For a single continuous post-treatment outcome, the sample size calculation is simple, but if there are assessments at multiple time points post-treatment then this longitudinal data may give more insights by analyzing the data using the repeated measures method. Also, if the sample size is calculated using the single time-point method for longitudinal data, it may lead to a larger than required sample size, increasing the cost and time. Methods: In this research, an effort is made to determine the size of the sample for repeated measures case and then compared with the single post-baseline case. The sample sizes were examined under different scenarios for the continuous type of response variable. Under Mean contrast and Diff contrast the sample sizes were calculated with different correlations. These two scenarios were again examined under compound symmetry as well as Auto regressive of order 1 type of correlation structure in longitudinal data. The graphical presentation is given for better visualization of the scenarios. Results: Sample size required for highly correlated longitudinal data using multi timepoint sample size derivation method led to much smaller sample size requirement as compared to single timepoint sample size calculation method. Conclusions: This study will help researchers to make better decisions in choosing the right method for sample size determination which may reduce the time and cost of carrying out the experiment. Also, we must carefully assess which method to go with when the correlation is weak. More complex correlation structures are not studied in this article but can be studied in the same fashion.

Journal Article

Share this book

Add to My Shelf

Estimation of the Number of Scans Required per Hard-to-Clean Location and Establishing the Limit of Quantification of a Partial Least Squares Calibration Model When the FTIR Is Used for Pharmaceutical Cleaning Verification

by McSweeney, Conor , Sarwar, Apu , Moore, Eric in Calibration , FTIR for cleaning , LOQ for FTIR

2022

This study aims to identify two critical components required for pharmaceutical cleaning verification when an FTIR is used: (a) the number of scans required per hard-to-clean location, and (b) the limit of quantification (LOQ) of the FTIR instrument when measuring the surface contamination. The current practice in pharmaceutical manufacturing does not require multiple samples as it is standard practice to collect a single swab sample from a 25 × 25 cm area from a difficult-to-reach area of the manufacturing equipment. However, since the FTIR will only scan a tiny portion of the surface compared to the swab, a sufficient number of samples (data points) are required to provide enough confidence to ensure that the measurement results are close to the true value with a maximum degree of certainty. Similarly, calculating the LOQ for a linear regression could be straightforward. However, complexity arises when the experimental data are complex; in this case, the complexity arises due to the nature of the measurement and the lack of the defined peak in the pre-processed spectra. Therefore, this study uses the practical approach of calculating the sample size and the LOQ.

Journal Article

Share this book

Add to My Shelf

Reporting and communication of sample size calculations in adaptive clinical trials: a review of trial protocols and grant applications

by Dimairo, Munyaradzi , Julious, Steven A. , Zhang, Qiang in Adaptation , Adaptive Clinical Trials as Topic - methods , Adaptive Clinical Trials as Topic - statistics & numerical data

2024

Background An adaptive design allows modifying the design based on accumulated data while maintaining trial validity and integrity. The final sample size may be unknown when designing an adaptive trial. It is therefore important to consider what sample size is used in the planning of the study and how that is communicated to add transparency to the understanding of the trial design and facilitate robust planning. In this paper, we reviewed trial protocols and grant applications on the sample size reporting for randomised adaptive trials. Method We searched protocols of randomised trials with comparative objectives on ClinicalTrials.gov (01/01/2010 to 31/12/2022). Contemporary eligible grant applications accessed from UK publicly funded researchers were also included. Suitable records of adaptive designs were reviewed, and key information was extracted and descriptively analysed. Results We identified 439 records, and 265 trials were eligible. Of these, 164 (61.9%) and 101 (38.1%) were sponsored by industry and public sectors, respectively, with 169 (63.8%) of all trials using a group sequential design although trial adaptations used were diverse. The maximum and minimum sample sizes were the most reported or directly inferred ( n = 199, 75.1%). The sample size assuming no adaptation would be triggered was usually set as the estimated target sample size in the protocol. However, of the 152 completed trials, 15 (9.9%) and 33 (21.7%) had their sample size increased or reduced triggered by trial adaptations, respectively. The sample size calculation process was generally well reported in most cases ( n = 216, 81.5%); however, the justification for the sample size calculation parameters was missing in 116 (43.8%) trials. Less than half gave sufficient information on the study design operating characteristics ( n = 119, 44.9%). Conclusion Although the reporting of sample sizes varied, the maximum and minimum sample sizes were usually reported. Most of the trials were planned for estimated enrolment assuming no adaptation would be triggered. This is despite the fact a third of reported trials changed their sample size. The sample size calculation was generally well reported, but the justification of sample size calculation parameters and the reporting of the statistical behaviour of the adaptive design could still be improved.

Journal Article

Share this book

Add to My Shelf

Sensei: how many samples to tell a change in cell type abundance?

by Dou, Jinzhuang , Vilar, Eduardo , Mohanty, Vakul in Accuracy , Algorithms , Analysis

2022

Cellular heterogeneity underlies cancer evolution and metastasis. Advances in single-cell technologies such as single-cell RNA sequencing and mass cytometry have enabled interrogation of cell type-specific expression profiles and abundance across heterogeneous cancer samples obtained from clinical trials and preclinical studies. However, challenges remain in determining sample sizes needed for ascertaining changes in cell type abundances in a controlled study. To address this statistical challenge, we have developed a new approach, named Sensei, to determine the number of samples and the number of cells that are required to ascertain such changes between two groups of samples in single-cell studies. Sensei expands the t-test and models the cell abundances using a beta-binomial distribution. We evaluate the mathematical accuracy of Sensei and provide practical guidelines on over 20 cell types in over 30 cancer types based on knowledge acquired from the cancer cell atlas (TCGA) and prior single-cell studies. We provide a web application to enable user-friendly study design via https://kchen-lab.github.io/sensei/table_beta.html .

Journal Article

Share this book

Add to My Shelf

Are Flexible Designs Sound?

by Burman, Carl-Fredrik , Sonesson, Christian in Adaptive design , Biometrics , Biometry - methods

2006

Flexible designs allow large modifications of a design during an experiment. In particular, the sample size can be modified in response to interim data or external information. A standard flexible methodology combines such design modifications with a weighted test, which guarantees the type I error level. However, this inference violates basic inference principles. In an example with independent N(μ, 1) observations, the test rejects the null hypothesis of μ ≤ 0 while the average of the observations is negative. We conclude that flexible design in its most general form with the corresponding weighted test is not valid. Several possible modifications of the flexible design methodology are discussed with a focus on alternative hypothesis tests.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter