Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
415
result(s) for
"Smith, Mike L."
Sort by:
beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types
by
Pagès, Hervé
,
Smith, Mike L.
,
Lun, Aaron T. L.
in
Algorithms
,
Application programming interface
,
Bioinformatics
2018
Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set.
Journal Article
Haloquadratum walsbyi : Limited Diversity in a Global Pond
by
Schuster, Stephan C.
,
Klee, Kathrin
,
Dyall-Smith, Mike L.
in
Biochemistry
,
Bioinformatics
,
Biology
2011
Haloquadratum walsbyi commonly dominates the microbial flora of hypersaline waters. Its cells are extremely fragile squares requiring >14%(w/v) salt for growth, properties that should limit its dispersal and promote geographical isolation and divergence. To assess this, the genome sequences of two isolates recovered from sites at near maximum distance on Earth, were compared.
Both chromosomes are 3.1 MB in size, and 84% of each sequence was highly similar to the other (98.6% identity), comprising the core sequence. ORFs of this shared sequence were completely synteneic (conserved in genomic orientation and order), without inversion or rearrangement. Strain-specific insertions/deletions could be precisely mapped, often allowing the genetic events to be inferred. Many inferred deletions were associated with short direct repeats (4-20 bp). Deletion-coupled insertions are frequent, producing different sequences at identical positions. In cases where the inserted and deleted sequences are homologous, this leads to variant genes in a common synteneic background (as already described by others). Cas/CRISPR systems are present in C23(T) but have been lost in HBSQ001 except for a few spacer remnants. Numerous types of mobile genetic elements occur in both strains, most of which appear to be active, and with some specifically targetting others. Strain C23(T) carries two ∼6 kb plasmids that show similarity to halovirus His1 and to sequences nearby halovirus/plasmid gene clusters commonly found in haloarchaea.
Deletion-coupled insertions show that Hqr. walsbyi evolves by uptake and precise integration of foreign DNA, probably originating from close relatives. Change is also driven by mobile genetic elements but these do not by themselves explain the atypically low gene coding density found in this species. The remarkable genome conservation despite the presence of active systems for genome rearrangement implies both an efficient global dispersal system, and a high selective fitness for this species.
Journal Article
Orchestrating single-cell analysis with Bioconductor
2020
Recent technological advancements have enabled the profiling of a large number of genome-wide features in individual cells. However, single-cell data present unique challenges that require the development of specialized methods and software infrastructure to successfully derive biological insights. The Bioconductor project has rapidly grown to meet these demands, hosting community-developed open-source software distributed as R packages. Featuring state-of-the-art computational methods, standardized data infrastructure and interactive data visualization tools, we present an overview and online book (
https://osca.bioconductor.org
) of single-cell methods for prospective users.
This Perspective highlights open-source software for single-cell analysis released as part of the Bioconductor project, providing an overview for users and developers.
Journal Article
BeadArray Expression Analysis Using Bioconductor
by
Smith, Mike L.
,
Shi, Wei
,
Ritchie, Matthew E.
in
Biology
,
Computational Biology - methods
,
Data processing
2011
Illumina whole-genome expression BeadArrays are a popular choice in gene profiling studies. Aside from the vendor-provided software tools for analyzing BeadArray expression data (GenomeStudio/BeadStudio), there exists a comprehensive set of open-source analysis tools in the Bioconductor project, many of which have been tailored to exploit the unique properties of this platform. In this article, we explore a number of these software packages and demonstrate how to perform a complete analysis of BeadArray data in various formats. The key steps of importing data, performing quality assessments, preprocessing, and annotation in the common setting of assessing differential expression in designed experiments will be covered.
Journal Article
illuminaio: An open source IDAT parsing tool for Illumina microarrays version 1; peer review: 2 approved
by
Bengtsson, Henrik
,
Ritchie, Matthew E
,
Baggerly, Keith A
in
Bioinformatics
,
Genomics
,
Web Tool
2013
The IDAT file format is used to store BeadArray data from the myriad of genomewide profiling platforms on offer from Illumina Inc. This proprietary format is output directly from the scanner and stores summary intensities for each probe-type on an array in a compact manner. A lack of open source tools to process IDAT files has hampered their uptake by the research community beyond the standard step of using the vendor's software to extract the data they contain in a human readable text format. To fill this void, we have developed the illuminaio package that parses IDAT files from any BeadArray platform, including the decryption of files from Illumina's gene expression arrays. illuminaio provides the first open-source package for this task, and will promote wider uptake of the IDAT format as a standard for sharing Illumina BeadArray data in public databases, in the same way that the CEL file serves as the standard for the Affymetrix platform.
Journal Article
NaCl-saturated brines are thermodynamically moderate, rather than extreme, microbial habitats
by
McMullan, Phillip E
,
Stevenson, Andrew
,
McMullan, Geoffrey
in
Activity recognition
,
Archaea
,
Astrobiology
2018
NaCl-saturated brines such as saltern crystalliser ponds, inland salt lakes, deep-sea brines and liquids-of-deliquescence on halite are commonly regarded as a paradigm for the limit of life on Earth. There are, however, other habitats that are thermodynamically more extreme. Typically, NaCl-saturated environments contain all domains of life and perform complete biogeochemical cycling. Despite their reduced water activity, ∼0.755 at 5 M NaCl, some halophiles belonging to the Archaea and Bacteria exhibit optimum growth/metabolism in these brines. Furthermore, the recognised water-activity limit for microbial function, ∼0.585 for some strains of fungi, lies far below 0.755. Other biophysical constraints on the microbial biosphere (temperatures of >121°C; pH > 12; and high chaotropicity; e.g. ethanol at >18.9% w/v (24% v/v) and MgCl2 at >3.03 M) can prevent any cellular metabolism or ecosystem function. By contrast, NaCl-saturated environments contain biomass-dense, metabolically diverse, highly active and complex microbial ecosystems; and this underscores their moderate character. Here, we survey the evidence that NaCl-saturated brines are biologically permissive, fertile habitats that are thermodynamically mid-range rather than extreme. Indeed, were NaCl sufficiently soluble, some halophiles might grow at concentrations of up to 8 M. It may be that the finite solubility of NaCl has stabilised the genetic composition of halophile populations and limited the action of natural selection in driving halophile evolution towards greater xerophilicity. Further implications are considered for the origin(s) of life and other aspects of astrobiology.
Journal Article
Genome-wide quantification of transcription factor binding at single-DNA-molecule resolution using methyl-transferase footprinting
by
Zaugg, Judith B.
,
Smith, Mike L.
,
Barzaghi, Guido
in
631/114/1314
,
631/208/200
,
631/337/100/1701
2021
Precise control of gene expression requires the coordinated action of multiple factors at
cis
-regulatory elements. We recently developed single-molecule footprinting to simultaneously resolve the occupancy of multiple proteins including transcription factors, RNA polymerase II and nucleosomes on single DNA molecules genome-wide. The technique combines the use of cytosine methyltransferases to footprint the genome with bisulfite sequencing to resolve transcription factor binding patterns at
cis
-regulatory elements. DNA footprinting is performed by incubating permeabilized nuclei with recombinant methyltransferases. Upon DNA extraction, whole-genome or targeted bisulfite libraries are prepared and loaded on Illumina sequencers. The protocol can be completed in 4–5 d in any laboratory with access to high-throughput sequencing. Analysis can be performed in 2 d using a dedicated R package and requires access to a high-performance computing system. Our method can be used to analyze how transcription factors cooperate and antagonize to regulate transcription.
This protocol describes experimental and computational procedures for genome-wide mapping of transcription factor binding at single-molecule resolution using methyl-transferase footprinting.
Journal Article
Identification and correction of previously unreported spatial phenomena using raw Illumina BeadArray data
2010
Background
A key stage for all microarray analyses is the extraction of feature-intensities from an image. If this step goes wrong, then subsequent preprocessing and processing stages will stand little chance of rectifying the matter. Illumina employ random construction of their BeadArrays, making feature-intensity extraction even more important for the Illumina platform than for other technologies. In this paper we show that using raw Illumina data it is possible to identify, control, and perhaps correct for a range of spatial-related phenomena that affect feature-intensity extraction.
Results
We note that feature intensities can be unnaturally high when in the proximity of a number of phenomena relating either to the images themselves or to the layout of the beads on an array. Additionally we note that beads neighbour beads of the same type more often than one might expect, which may cause concern in some models of hybridization. We highlight issues in the identification of a bead's location, and in particular how this both affects and is affected by its intensity. Finally we show that beads can be wrongly identified in the image on either a local or array-wide scale, with obvious implications for data quality.
Conclusions
The image processing issues identified will often pass unnoticed by an analysis of the standard data returned from an experiment. We detail some simple diagnostics that can be implemented to identify problems of this nature, and outline approaches to correcting for such problems. These approaches require access to the raw data from the arrays, not just the summarized data usually returned, making the acquisition of such raw data highly desirable.
Journal Article
Genome Sequence of an Australian Monophasic Salmonella enterica subsp. enterica Typhimurium Isolate (TW-Stm6) Carrying a Large Plasmid with Multiple Antimicrobial Resistance Genes
2017
ABSTRACT We report the genome sequence of a monophasic Salmonella enterica subsp. enterica Typhimurium strain (TW-Stm6) isolated in Australia that is similar to epidemic multidrug-resistant strains from Europe and elsewhere. This strain carries additional antibiotic and heavy-metal resistance genes on a large (275-kb) IncHI2 plasmid.
Journal Article
Publisher Correction: Orchestrating single-cell analysis with Bioconductor
by
Pagès, Hervé
,
Waldron, Levi
,
Geistlinger, Ludwig
in
631/1647/2217
,
631/1647/794
,
Bioinformatics
2020
An amendment to this paper has been published and can be accessed via a link at the top of the paper.An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Journal Article