Catalogue Search | MBRL

High-coverage allele-resolved single-cell DNA methylation profiling reveals cell lineage, X-inactivation state, and replication dynamics

by Milliron, Hsiao-yun , Morrison, Jacob , Zhou, Wanding in 13/106 , 13/31 , 45/22

2025

DNA methylation patterns at crucial short sequence features, such as enhancers and promoters, may convey key information about cell lineage and state. The need for high-resolution single-cell DNA methylation profiling has therefore become increasingly apparent. Existing single-cell whole-genome bisulfite sequencing (scWGBS) studies have both methodological and analytical shortcomings. Inefficient library generation and low CpG coverage mostly preclude direct cell-to-cell comparisons and necessitate the use of cluster-based analyses, imputation of methylation states, or averaging of DNA methylation measurements across large genomic bins. Such summarization methods obscure the interpretation of methylation states at individual regulatory elements and limit our ability to discern important cell-to-cell differences. We report an improved scWGBS method, single-cell Deep and Efficient Epigenomic Profiling of methyl-C (scDEEP-mC), which offers efficient generation of high-coverage libraries. scDEEP-mC allows for cell type identification, genome-wide profiling of hemi-methylation, and allele-resolved analysis of X-inactivation epigenetics in single cells. Furthermore, we combine methylation and copy-number data from scDEEP-mC to identify single, actively replicating cells and profile DNA methylation maintenance dynamics during and after DNA replication. These analyses unlock further avenues for exploring DNA methylation regulation and dynamics and illustrate the power of high-complexity, highly efficient scWGBS library construction as facilitated by scDEEP-mC. Here, the authors describe scDEEP-mC, an improved single-cell whole-genome bisulfite sequencing method for complex libraries and deep genomic coverage, and show advanced analyses of allele-specific methylation, replication dynamics, and X-inactivation.

Journal Article

Share this book

Add to My Shelf

Evaluation of whole-genome DNA methylation sequencing library preparation protocols

by Morrison, Jacob , Zhou, Wanding , Adams, Marie in Animal Genetics and Genomics , Biological products industry , Biomedical and Life Sciences

2021

Background With rapidly dropping sequencing cost, the popularity of whole-genome DNA methylation sequencing has been on the rise. Multiple library preparation protocols currently exist. We have performed 22 whole-genome DNA methylation sequencing experiments on snap frozen human samples, and extensively benchmarked common library preparation protocols for whole-genome DNA methylation sequencing, including three traditional bisulfite-based protocols and a new enzyme-based protocol. In addition, different input DNA quantities were compared for two kits compatible with a reduced starting quantity. In addition, we also present bioinformatic analysis pipelines for sequencing data from each of these library types. Results An assortment of metrics were collected for each kit, including raw read statistics, library quality and uniformity metrics, cytosine retention, and CpG beta value consistency between technical replicates. Overall, the NEBNext Enzymatic Methyl-seq and Swift Accel-NGS Methyl-Seq kits performed quantitatively better than the other two protocols. In addition, the NEB and Swift kits performed well at low-input amounts, validating their utility in applications where DNA is the limiting factor. Results The NEBNext Enzymatic Methyl-seq kit appeared to be the best option for whole-genome DNA methylation sequencing of high-quality DNA, closely followed by the Swift kit, which potentially works better for degraded samples. Further, a general bioinformatic pipeline is applicable across the four protocols, with the exception of extra trimming needed for the Swift Biosciences’s Accel-NGS Methyl-Seq protocol to remove the Adaptase sequence.

Journal Article

Share this book

Add to My Shelf

Impact of BRCA mutations, age, surgical indication, and hormone status on the molecular phenotype of the human Fallopian tube

by Jung, Euihye , Hechmer, Aaron , Morrison, Jacob in 38/39 , 45/91 , 631/208/176/1988

2025

The human Fallopian tube (FT) is an important organ in the female reproductive system and has been implicated as a site of origin for pelvic serous cancers, including high-grade serous tubo-ovarian carcinoma (HGSC). We have generated comprehensive whole-genome bisulfite sequencing, RNA-seq, and proteomic data of over 100 human FTs, with detailed clinical covariate annotations. Our results challenge existing paradigms that extensive epigenetic, transcriptomic and proteomic alterations exist in the FTs from women carrying heterozygous germline BRCA1/2 pathogenic variants. We find minimal differences between BRCA1 /2 carriers and non-carriers prior to loss of heterozygosity. Covariates such as age and surgical indication can confound BRCA1/2 -related differences reported in the literature, mainly through their impact on cell composition. We systematically document and highlight the degree of variations across normal human FT, defining five groups capturing major cellular and molecular changes across various reproductive stages, pregnancy, and aging. We are able to associate gene, protein, and epigenetic changes with these and other clinical covariates, but not heterozygous BRCA1 /2 mutation status. This sheds new light into prevention and early detection of tumorigenesis in populations at high-risk for ovarian cancer. The human Fallopian tube (FT) is implicated as a site of origin for pelvic serous cancers. Here the authors conduct multi-omics analysis on over 100 FTs. The results challenge the assumption that BRCA1/2 mutation carriers exhibit significant molecular alterations in normal FTs before loss of heterozygosity (LOH) occurs, and suggest that tumorigenesis in BRCA1/2 carriers requires LOH or secondary genetic events rather than haploinsufficiency alone.

Journal Article

Share this book

Add to My Shelf

Effect of New Samples in the T2K Off-Axis Near Detector for the T2k Oscillation Analysis

by Morrison, Jacob Alexander in Particle physics , Physics

2019

The Tokai-to-Kamioka (T2K) experiment is a long baseline neutrino oscillation experiment. T2K uses a beam of muon neutrinos (neutrino beam mode) or antineutrinos (antineutrino beam mode) produced at the Japan Proton Accelerator Research Complex and directed towards the Super-Kamiokande detector to study neutrino oscillations in two ways. One is the disappearance of muon neutrinos as they oscillate to other flavors of neutrinos, while the other is the appearance of electron neutrinos that have oscillated from muon neutrinos. In addition to the far detector, Super-Kamiokande, a suite of detectors is set close to the neutrino source to probe the beam composition prior to the neutrinos oscillating. Within the neutrino oscillation analysis, uncertainties due to the neutrino beam flux and the cross section of neutrinos serve as the largest sources of error on the oscillation parameters. By including data from the Near Detector at 280 m (ND280), the uncertainties on the flux and cross section can be constrained beyond what the data at the far detector can do on its own. This work describes the near detector maximum likelihood fit and how it is used to constrain uncertainties for the oscillation analysis. For this thesis, new data samples were included in the near detector fit so that the antineutrino beam mode samples would be treated in the same way as the neutrino beam mode samples. The results are consistent with those seen before; however, they also indicate that certain checks should be updated when the new neutrino interaction model is available before fully transitioning to the new samples. Additionally, tests were performed to study the effect of alternative cross section models on the near detector fit. These studies showed that there is not enough freedom in the current cross section model to fully describe any effects on the data if the underlying cross section differed from the current model.

Dissertation

Share this book

Add to My Shelf

Synthbar: A Lightweight Tool for Adding Synthetic Barcodes to Sequencing Reads

by Morrison, Jacob , Johnson, Benjamin K , Shen, Hui in Bioinformatics

2025

Preparation of single-cell sequencing libraries includes adding nucleotide barcodes to assist with pooling samples or cells together for sequencing. The popularity of droplet-based single-cell protocols has spurred the development of computational tools that expect the read structure of the assay to include a cell barcode (CB). Microwell plate-based protocols, such as the Switching Mechanism At the 5' end of the RNA Transcript (SMART) single-cell RNA sequencing (scRNA-seq) family of methods, typically do not add a CB as part of the library preparation method as there is typically one cell per well and standard unique dual indices are sufficient for multiplexing. While several tools exist to manipulate and parse varying single-cell read structures, no tool is currently available to easily add synthetic CBs to enable use of computational tooling that expects the presence of a CB, such as STARsolo, zUMIs, and Alevin. Synthbar fills this gap as a lightweight tool that is assay agnostic, can add user-defined CBs, and modify read structures.

Journal Article

Share this book

Add to My Shelf

Unsettled Law: Time to Generate New Approaches?

by Atkinson, David , Morrison, Jacob in Contract law , Ethics , Generative artificial intelligence

2024

We identify several important and unsettled legal questions with profound ethical and societal implications arising from generative artificial intelligence (GenAI), focusing on its distinguishable characteristics from traditional software and earlier AI models. Our key contribution is formally identifying the issues that are unique to GenAI so scholars, practitioners, and others can conduct more useful investigations and discussions. While established legal frameworks, many originating from the pre-digital era, are currently employed in GenAI litigation, we question their adequacy. We argue that GenAI's unique attributes, including its general-purpose nature, reliance on massive datasets, and potential for both pervasive societal benefits and harms, necessitate a re-evaluation of existing legal paradigms. We explore potential areas for legal and regulatory adaptation, highlighting key issues around copyright, privacy, torts, contract law, criminal law, property law, and the First Amendment. Through an exploration of these multifaceted legal challenges, we aim to stimulate discourse and policy considerations surrounding GenAI, emphasizing a proactive approach to legal and ethical frameworks. While we refrain from advocating specific legal changes, we underscore the need for policymakers to carefully consider the issues raised. We conclude by summarizing key questions across these areas of law in a helpful table for easy reference.

Paper

Share this book

Add to My Shelf

A Legal Risk Taxonomy for Generative Artificial Intelligence

by Atkinson, David , Morrison, Jacob in Generative artificial intelligence , Litigation , Taxonomy

2024

For the first time, this paper presents a taxonomy of legal risks associated with generative AI (GenAI) by breaking down complex legal concepts to provide a common understanding of potential legal challenges for developing and deploying GenAI models. The methodology is based on (1) examining the legal claims that have been filed in existing lawsuits and (2) evaluating the reasonably foreseeable legal claims that may be filed in future lawsuits. First, we identified 22 lawsuits against prominent GenAI entities and tallied the claims of each lawsuit. From there, we identified seven claims that are cited at least four times across these lawsuits as the most likely claims for future GenAI lawsuits. For each of these seven claims, we describe the elements of the claim (what the plaintiff must prove to prevail) and provide an example of how it may apply to GenAI. Next, we identified 30 other potential claims that we consider to be more speculative, because they have been included in fewer than four lawsuits or have yet to be filed. We further separated those 30 claims into 19 that are most likely to be made in relation to pre-deployment of GenAI models and 11 that are more likely to be made in connection with post-deployment of GenAI models since the legal risks will vary between entities that create versus deploy them. For each of these claims, we describe the elements of the claim and the potential remedies that plaintiffs may seek to help entities determine their legal risks in developing or deploying GenAI. Lastly, we close the paper by noting the novelty of GenAI technology and propose some applications for the paper's taxonomy in driving further research.

Paper

Share this book

Add to My Shelf

The Hidden Cost of Thinking: Energy Use and Environmental Impact of LMs Beyond Pretraining

by Morrison, Jacob , Strubell, Emma , Smith, Noah A in Ablation , Data centers , Energy consumption

2026

Modern language model development extends far beyond pretraining, yet environmental reporting remains narrowly focused on the cost of training a single final model. In this work, we provide the first detailed breakdown of the environmental impact of a full model development pipeline, from pretraining through supervised fine-tuning, preference optimization, and reinforcement learning, for Olmo 3, a family of 7 billion and 32 billion parameter models in both instruction-following and reasoning variants. We find that reasoning models are 17x more expensive to post-train than their instruction-tuned counterparts in terms of datacenter energy, driven by reinforcement learning rollout generation. Development costs (including experimentation, failed runs, and ablations) account for 82.2% of total compute, a roughly 65% increase over the ~50% reported for pretraining-focused pipelines in prior work. In total, we estimate our model development process consumed ~12.3 GWh of datacenter energy, emitted 4,251 tCO2eq, and consumed 15,887 kL of water, with water consumption driven entirely by power generation infrastructure rather than data center cooling. These costs, which are almost entirely unreported by model developers, are growing rapidly as post-training pipelines become more complex, and must be accounted for in environmental reporting standards and by the research community working to reduce AI's environmental impact.

Paper

Share this book

Add to My Shelf

Intentionally Unintentional: GenAI Exceptionalism and the First Amendment

by Hwang, Jena D , Atkinson, David , Morrison, Jacob in Generative artificial intelligence , Speech

2025

This paper challenges the assumption that courts should grant First Amendment protections to outputs from large generative AI models, such as GPT-4 and Gemini. We argue that because these models lack intentionality, their outputs do not constitute speech as understood in the context of established legal precedent, so there can be no speech to protect. Furthermore, if the model outputs are not speech, users cannot claim a First Amendment speech right to receive the outputs. We also argue that extending First Amendment rights to AI models would not serve the fundamental purposes of free speech, such as promoting a marketplace of ideas, facilitating self-governance, or fostering self-expression. In fact, granting First Amendment protections to AI models would be detrimental to society because it would hinder the government's ability to regulate these powerful technologies effectively, potentially leading to the unchecked spread of misinformation and other harms.

Paper

Share this book

Add to My Shelf

Tranquillyzer: A Flexible Neural Network Framework for Structural Annotation and Demultiplexing of Long-Read Transcriptomes

by Jang, H Josh , Morrison, Jacob , Majewski, Mary F in Bioinformatics

2025

Long-read single-cell RNA sequencing using platforms such as Oxford Nanopore Technologies (ONT) enables full-length transcriptome profiling at single-cell resolution. However, high sequencing error rates, diverse library architectures, and increasing dataset scale introduce major challenges for accurately identifying cell barcodes (CBCs) and unique molecular identifiers (UMIs) - key prerequisites for reliable demultiplexing and deduplication, respectively. Existing pipelines rely on hard-coded heuristics or local transition rules that cannot fully capture this broader structural context and often fail to robustly interpret reads with indel-induced shifts, truncated segments, or non-canonical element ordering. We introduce (TRANscript QUantification In Long reads-anaLYZER), a flexible, architecture-aware deep learning framework for processing long-read single-cell RNA-seq data. employs a hybrid neural network architecture and a global, context-aware design, and enables precise identification of structural elements - even when elements are shifted, partially degraded, or repeated due to sequencing noise or library construction variability. In addition to supporting established single-cell protocols, accommodates custom library formats through rapid, one-time model training on user-defined label schemas, typically completed within a few hours on standard GPUs. Additional features such as scalability across large datasets and comprehensive visualization capabilities further position as a flexible and scalable framework solution for processing long-read single-cell transcriptomic datasets.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter