Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
197 result(s) for "Shi, Leming"
Sort by:
Assessing and mitigating batch effects in large-scale omics studies
Batch effects in omics data are notoriously common technical variations unrelated to study objectives, and may result in misleading outcomes if uncorrected, or hinder biomedical discovery if over-corrected. Assessing and mitigating batch effects is crucial for ensuring the reliability and reproducibility of omics data and minimizing the impact of technical variations on biological interpretation. In this review, we highlight the profound negative impact of batch effects and the urgent need to address this challenging problem in large-scale omics studies. We summarize potential sources of batch effects, current progress in evaluating and correcting them, and consortium efforts aiming to tackle them.
Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning
Highly specific Cas9 nucleases derived from SpCas9 are valuable tools for genome editing, but their wide applications are hampered by a lack of knowledge governing guide RNA (gRNA) activity. Here, we perform a genome-scale screen to measure gRNA activity for two highly specific SpCas9 variants (eSpCas9(1.1) and SpCas9-HF1) and wild-type SpCas9 (WT-SpCas9) in human cells, and obtain indel rates of over 50,000 gRNAs for each nuclease, covering ~20,000 genes. We evaluate the contribution of 1,031 features to gRNA activity and develope models for activity prediction. Our data reveals that a combination of RNN with important biological features outperforms other models for activity prediction. We further demonstrate that our model outperforms other popular gRNA design tools. Finally, we develop an online design tool DeepHF for the three Cas9 nucleases. The database, as well as the designer tool, is freely accessible via a web server, http://www.DeepHF.com/ . Application of highly specific Cas9 variants can be restricted by the design of the guide RNA. Here the authors present DeepHF, a gRNA activity prediction tool built from genome-scale screens of 50,000 guides covering 20,000 genes.
A Comprehensive Mouse Transcriptomic BodyMap across 17 Tissues by RNA-seq
The mouse has been widely used as a model organism for studying human diseases and for evaluating drug safety and efficacy. Many diseases and drug effects exhibit tissue specificity that may be reflected by tissue-specific gene-expression profiles. Here we construct a comprehensive mouse transcriptomic BodyMap across 17 tissues of six-weeks old C57BL/6JJcl mice using RNA-seq. We find different expression patterns between protein-coding and non-coding genes. Liver expressed the least complex transcriptomes, that is, the smallest number of genes detected in liver across all 17 tissues, whereas testis and ovary harbor more complex transcriptomes than other tissues. We report a comprehensive list of tissue-specific genes across 17 tissues, along with a list of 4,781 housekeeping genes in mouse. In addition, we propose a list of 27 consistently and highly expressed genes that can be used as reference controls in expression-profiling analysis. Our study provides a unique resource of mouse gene-expression profiles, which is helpful for further biomedical research.
Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method
Background Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios. Results As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies. Conclusions Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.
AI-powered omics-based drug pair discovery for pyroptosis therapy targeting triple-negative breast cancer
Due to low success rates and long cycles of traditional drug development, the clinical tendency is to apply omics techniques to reveal patient-level disease characteristics and individualized responses to treatment. However, the heterogeneous form of data and uneven distribution of targets make drug discovery and precision medicine a non-trivial task. This study takes pyroptosis therapy for triple-negative breast cancer (TNBC) as a paradigm and uses data mining of a large TNBC cohort and drug databases to establish a biofactor-regulated neural network for rapidly screening and optimizing compound pyroptosis drug pairs. Subsequently, biomimetic nanococrystals are prepared using the preferred combination of mitoxantrone and gambogic acid for rational drug delivery. The unique mechanism of obtained nanococrystals regulating pyroptosis genes through ribosomal stress and triggering pyroptosis cascade immune effects are revealed in TNBC models. In this work, a target omics-based intelligent compound drug discovery framework explores an innovative drug development paradigm, which repurposes existing drugs and enables precise treatment of refractory diseases. Cancer-targeted drug discovery can be achieved by transcriptomics screening on patients. Here this group reports a drug target screening model built upon triple-negative breast cancer (TNBC) cohort and drug database with the selected drug pair exhibiting effective pyroptosis induction and TNBC tumor growth inhibition.
High quality 3C de novo assembly and annotation of a multidrug resistant ST-111 Pseudomonas aeruginosa genome: Benchmark of hybrid and non-hybrid assemblers
Genotyping methods and genome sequencing are indispensable to reveal genomic structure of bacterial species displaying high level of genome plasticity. However, reconstruction of genome or assembly is not straightforward due to data complexity, including repeats, mobile and accessory genetic elements of bacterial genomes. Moreover, since the solution to this problem is strongly influenced by sequencing technology, bioinformatics pipelines, and selection criteria to assess assemblers, there is no systematic way to select a priori the optimal assembler and parameter settings. To assembly the genome of Pseudomonas aeruginosa strain AG1 (PaeAG1), short reads (Illumina) and long reads (Oxford Nanopore) sequencing data were used in 13 different non-hybrid and hybrid approaches. PaeAG1 is a multiresistant high-risk sequence type 111 (ST-111) clone that was isolated from a Costa Rican hospital and it was the first report of an isolate of P. aeruginosa carrying both blaVIM-2 and blaIMP-18 genes encoding for metallo-β-lactamases (MBL) enzymes. To assess the assemblies, multiple metrics regard to contiguity, correctness and completeness (3C criterion, as we define here) were used for benchmarking the 13 approaches and select a definitive assembly. In addition, annotation was done to identify genes (coding and RNA regions) and to describe the genomic content of PaeAG1. Whereas long reads and hybrid approaches showed better performances in terms of contiguity, higher correctness and completeness metrics were obtained for short read only and hybrid approaches. A manually curated and polished hybrid assembly gave rise to a single circular sequence with 100% of core genes and known regions identified, >98% of reads mapped back, no gaps, and uniform coverage. The strategy followed to obtain this high-quality 3C assembly is detailed in the manuscript and we provide readers with an all-in-one script to replicate our results or to apply it to other troublesome cases. The final 3C assembly revealed that the PaeAG1 genome has 7,190,208 bp, a 65.7% GC content and 6,709 genes (6,620 coding sequences), many of which are included in multiple mobile genomic elements, such as 57 genomic islands, six prophages, and two complete integrons with blaVIM-2 and blaIMP-18 MBL genes. Up to 250 and 60 of the predicted genes are anticipated to play a role in virulence (adherence, quorum sensing and secretion) or antibiotic resistance (β-lactamases, efflux pumps, etc). Altogether, the assembly and annotation of the PaeAG1 genome provide new perspectives to continue studying the genomic diversity and gene content of this important human pathogen.
Genomic and immune profiling of pre-invasive lung adenocarcinoma
Adenocarcinoma in situ and minimally invasive adenocarcinoma are the pre-invasive forms of lung adenocarcinoma. The genomic and immune profiles of these lesions are poorly understood. Here we report exome and transcriptome sequencing of 98 lung adenocarcinoma precursor lesions and 99 invasive adenocarcinomas. We have identified EGFR , RBM10 , BRAF , ERBB2 , TP53 , KRAS , MAP2K1 and MET as significantly mutated genes in the pre/minimally invasive group. Classes of genome alterations that increase in frequency during the progression to malignancy are revealed. These include mutations in TP53 , arm-level copy number alterations, and HLA loss of heterozygosity. Immune infiltration is correlated with copy number alterations of chromosome arm 6p, suggesting a link between arm-level events and the tumor immune environment. The genomic and immune landscape of pre-invasive lung adenocarcinoma is poorly understood. Here, the authors perform exome and transcriptome sequencing on precursor legions and invasive lung adenocarcinomas, identifying recurrently mutated genes in pre/minimally invasive cases, and arm level alteration events linked to immune infiltration.
Prediction of base editor off-targets by deep learning
Due to the tolerance of mismatches between gRNA and targeting sequence, base editors frequently induce unwanted Cas9-dependent off-target mutations. Here, to develop models to predict such off-targets, we design gRNA-off- target pairs for adenine base editors (ABEs) and cytosine base editors (CBEs) and stably integrate them into the human cells. After five days of editing, we obtain valid efficiency datasets of 54,663 and 55,727 off-targets for ABEs and CBEs, respectively. We use the datasets to train deep learning models, resulting in ABEdeepoff and CBEdeepoff, which can predict off-target sites. We use these tools to predict off-targets for a panel of endogenous loci and achieve Spearman correlation values varying from 0.710 to 0.859. Finally, we develop an integrated tool that is freely accessible via an online web server http://www.deephf.com/#/bedeep/bedeepoff . These tools could facilitate minimizing the off-target effects of base editing. Base editors can induce unwanted off-target effects. Here the authors design libraries of gRNA-off-target pairs and perform a screen to obtain editing efficiencies for ABE and CBE: they use the datasets to train DL models (ABEdeepoff and CBEdeepoff) which can predict mutation tolerance at potential off-targets.
A real-world multi-center RNA-seq benchmarking study using the Quartet and MAQC reference materials
Translating RNA-seq into clinical diagnostics requires ensuring the reliability and cross-laboratory consistency of detecting clinically relevant subtle differential expressions, such as those between different disease subtypes or stages. As part of the Quartet project, we present an RNA-seq benchmarking study across 45 laboratories using the Quartet and MAQC reference samples spiked with ERCC controls. Based on multiple types of ‘ground truth’, we systematically assess the real-world RNA-seq performance and investigate the influencing factors involved in 26 experimental processes and 140 bioinformatics pipelines. Here we show greater inter-laboratory variations in detecting subtle differential expressions among the Quartet samples. Experimental factors including mRNA enrichment and strandedness, and each bioinformatics step, emerge as primary sources of variations in gene expression. We underscore the profound influence of experimental execution, and provide best practice recommendations for experimental designs, strategies for filtering low-expression genes, and the optimal gene annotation and analysis pipelines. In summary, this study lays the foundation for developing and quality control of RNA-seq for clinical diagnostic purposes. Here the authors report on an RNA-seq benchmarking study that demonstrates greater inter-lab variations in detecting subtle differential expression. The study reveals the impact of experimental execution, experimental designs, low-expression gene filtering, and analysis tool selection.
Similarities and differences between variants called with human reference genome HG19 or HG38
Background Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. Methods We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. Results The conversion rates from HG38 to HG19 (average 95%) were lower than the conversion rates from HG19 to HG38 (average 99%). The conversion rates varied slightly among the various calling pipelines. Around 1.5% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52% observed versus 42% expected). Conclusion A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome.