Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
317
result(s) for
"DNA intelligence database"
Sort by:
Genetic differentiation within and between four UK ethnic groups
by
Lambert, James A
,
Foreman, Lindsey A
in
African Continental Ancestry Group - genetics
,
Asia
,
Bayes Theorem
2000
In previous papers [L.A. Foreman, J.A. Lambert, I.W. Evett, Regional genetic variation in Caucasians, Forensic Sci. Int. 95 (1998) 27–37; L.A. Foreman, Analyses to investigate appropriate measures of differentiation between European Caucasian populations using short tandem repeat (STR) data, FSS Research Report FSS-RR-804 (1999)], we have carried out detailed investigations of the level of regional and national variation in STR characteristics exhibited within white Caucasian populations. The studies described here extend our earlier work to the black African/Caribbean and Asian (Indo-Pakistani) populations of the UK, routinely considered in casework calculations at the Forensic Science Service (FSS). In addition, estimation of allele distributions and database comparisons are carried out for two further populations, i.e. those classified as containing individuals of Oriental and Arabic appearance.
Journal Article
Efficient DNA database laboratory strategy for high through-put STR typing of reference samples
2001
DNA intelligence databases were installed successfully in various countries during the past few years. It is a general trend that laboratories performing STR analysis for DNA databases have to adjust to increased sample through-put, especially when dealing with a high number of reference samples. In contrast to routine forensic casework analysis, where samples of suspects and unknown samples are interpreted with regard to the specific circumstances of the case and are kept distinctly apart from other cases, DNA databases consist of single, primarily unlinked DNA profiles. Problems areas associated with the high number of anonymous DNA profiles are the risk of logistic errors, such as sample mix-up during the laboratory procedure, and the risk of typing errors during manual transcription of data and/or results. Thus, DNA databases clearly require new laboratory strategies to rise to the challenge.
This paper presents an efficient automated laboratory strategy on the platform of a laboratory management information system (LIMS) with the Austrian DNA Intelligence Database as example. Two goals were tackled in particular: first, data safety by avoiding both manual interaction during critical laboratory steps (i.e. when DNA is transferred form one tube into another), and errors due to manual transcription of sample information and results. Secondly, efficient sample processing by automizing the laboratory procedure with the help of robotic instruments, thus, giving the DNA staff more time to analyze data.
Journal Article
Regional genetic variation in Caucasians
by
Lambert, James A
,
Foreman, Lindsey A
,
Evett, Ian W
in
Bayes Theorem
,
Bayesian analysis
,
Bayesian statistics
1998
When evaluating DNA evidence, the necessary calculations are often carried out using databases drawn from broad populations; for example, the Forensic Science Service (FSS) maintains genetic databases for the 3 major racial groups of England and Wales—Caucasian, Afro-Caribbean and Asian (from the Indian subcontinent). The resulting figures may be challenged in court on the premise that they are not based on data from the population of most relevance in the particular case under consideration. One important factor might be the location of the crime. Since the recent establishment of a National DNA Intelligence Database, data have been made available from a wide range of geographical regions in England and Wales. This paper gives details of analyses conducted to measure the differentiation between white Caucasian populations from these regions and from other areas of the UK and abroad using a Bayesian approach.
Journal Article
The U.K. National DNA Database
by
Lynch, Michael
,
McNally, Ruth
,
Cole, Simon A
in
casework profiles
,
Contemporary History (Post 1945)
,
criminal justice profiles
2009
This chapter focuses on the National DNA Database (NDNAD) of England and Wales, the oldest and largest national DNA intelligence database, the daily operation of which is managed by FSS Ltd., a former executive agency of the Home Office. The DNA profiles at the NDNAD are from three different sources: casework profiles from unknown persons, criminal justice profiles, and elimination or volunteer profiles.
Book Chapter
Accelerating promoter identification and design by deep learning
by
Ma, Fuqiang
,
Huang, Zhongshi
,
Lin, Yanna
in
Artificial intelligence
,
biochemical pathways
,
Biosynthesis
2025
Engineered promoters enable precise modulation of recombinant protein expression and metabolic pathways, facilitating natural product biosynthesis and biotechnological applications.Deep learning techniques are transforming the field by enabling accurate promoter identification, strength prediction across species, and de novo design through generative models.Combining generative models with predictive networks enables rapid generation and testing of large numbers of potential promoter designs. This accelerates the development process and enhances the precision of promoter engineering.Database quality, feature extraction, and model architecture are key factors that significantly impact the accuracy and reliability of deep learning models in promoter engineering.
Promoters are DNA sequences that govern the location, direction, and strength of gene transcription, playing a pivotal role in cellular growth and lifespan. Engineered promoters facilitate precise control of recombinant protein expression and metabolic pathway modulation for natural product biosynthesis. Traditional methods such as rational design and directed evolution have established the foundation for promoter engineering, and recent advances in deep learning (DL) have revolutionized the field. This review highlights the application of DL techniques for promoter identification, strength prediction, and de novo design using generative models. We describe how these tools are used and the impact of database quality, feature extraction, and model architecture on predictive accuracy. We discuss challenges and perspectives in developing robust models for promoter engineering.
Promoters are DNA sequences that govern the location, direction, and strength of gene transcription, playing a pivotal role in cellular growth and lifespan. Engineered promoters facilitate precise control of recombinant protein expression and metabolic pathway modulation for natural product biosynthesis. Traditional methods such as rational design and directed evolution have established the foundation for promoter engineering, and recent advances in deep learning (DL) have revolutionized the field. This review highlights the application of DL techniques for promoter identification, strength prediction, and de novo design using generative models. We describe how these tools are used and the impact of database quality, feature extraction, and model architecture on predictive accuracy. We discuss challenges and perspectives in developing robust models for promoter engineering.
Journal Article
Geminivirus data warehouse: a database enriched with machine learning approaches
by
Vidigal, Pedro M. P.
,
Cerqueira, Fabio R.
,
Santos, Anésia A.
in
Algorithms
,
Amplification
,
Artificial intelligence
2017
Background
The
Geminiviridae
family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics.
Results
Here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes.
Conclusions
The use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses.
Journal Article
OncoLnc: linking TCGA survival data to mRNAs, miRNAs, and lncRNAs
2016
OncoLnc is a tool for interactively exploring survival correlations, and for downloading clinical data coupled to expression data for mRNAs, miRNAs, or long noncoding RNAs (lncRNAs). OncoLnc contains survival data for 8,647 patients from 21 cancer studies performed by The Cancer Genome Atlas (TCGA), along with RNA-SEQ expression for mRNAs and miRNAs from TCGA, and lncRNA expression from MiTranscriptome beta. Storing this data gives users the ability to separate patients by gene expression, and then create publication-quality Kaplan-Meier plots or download the data for further analyses. OncoLnc also stores precomputed survival analyses, allowing users to quickly explore survival correlations for up to 21 cancers in a single click. This resource allows researchers studying a specific gene to quickly investigate if it may have a role in cancer, and the supporting data allows researchers studying a specific cancer to identify the mRNAs, miRNAs, and lncRNAs most correlated with survival, and researchers looking for a novel lncRNA involved with cancer lists of potential candidates. OncoLnc is available at http://www.oncolnc.org .
Journal Article
TADKB: Family classification and a knowledge base of topologically associating domains
2019
Background
Topologically associating domains (TADs) are considered the structural and functional units of the genome. However, there is a lack of an integrated resource for TADs in the literature where researchers can obtain family classifications and detailed information about TADs.
Results
We built an online knowledge base TADKB integrating knowledge for TADs in eleven cell types of human and mouse. For each TAD, TADKB provides the predicted three-dimensional (3D) structures of chromosomes and TADs, and detailed annotations about the protein-coding genes and long non-coding RNAs (lncRNAs) existent in each TAD. Besides the 3D chromosomal structures inferred by population Hi-C, the single-cell haplotype-resolved chromosomal 3D structures of 17 GM12878 cells are also integrated in TADKB. A user can submit query gene/lncRNA ID/sequence to search for the TAD(s) that contain(s) the query gene or lncRNA. We also classified TADs into families. To achieve that, we used the TM-scores between reconstructed 3D structures of TADs as structural similarities and the Pearson’s correlation coefficients between the fold enrichment of chromatin states as functional similarities. All of the TADs in one cell type were clustered based on structural and functional similarities respectively using the spectral clustering algorithm with various predefined numbers of clusters. We have compared the overlapping TADs from structural and functional clusters and found that most of the TADs in the functional clusters with depleted chromatin states are clustered into one or two structural clusters. This novel finding indicates a connection between the 3D structures of TADs and their DNA functions in terms of chromatin states.
Conclusion
TADKB is available at
http://dna.cs.miami.edu/TADKB/
.
Journal Article
Distinct tissue-specific transcriptional regulation revealed by gene regulatory networks in maize
2018
Background
Transcription factors (TFs) are proteins that can bind to DNA sequences and regulate gene expression. Many TFs are master regulators in cells that contribute to tissue-specific and cell-type-specific gene expression patterns in eukaryotes. Maize has been a model organism for over one hundred years, but little is known about its tissue-specific gene regulation through TFs. In this study, we used a network approach to elucidate gene regulatory networks (GRNs) in four tissues (leaf, root, SAM and seed) in maize. We utilized GENIE3, a machine-learning algorithm combined with large quantity of RNA-Seq expression data to construct four tissue-specific GRNs. Unlike some other techniques, this approach is not limited by high-quality Position Weighed Matrix (PWM), and can therefore predict GRNs for over 2000 TFs in maize.
Results
Although many TFs were expressed across multiple tissues, a multi-tiered analysis predicted tissue-specific regulatory functions for many transcription factors. Some well-studied TFs emerged within the four tissue-specific GRNs, and the GRN predictions matched expectations based upon published results for many of these examples. Our GRNs were also validated by ChIP-Seq datasets (KN1, FEA4 and O2). Key TFs were identified for each tissue and matched expectations for key regulators in each tissue, including GO enrichment and identity with known regulatory factors for that tissue. We also found functional modules in each network by clustering analysis with the MCL algorithm.
Conclusions
By combining publicly available genome-wide expression data and network analysis, we can uncover GRNs at tissue-level resolution in maize. Since ChIP-Seq and PWMs are still limited in several model organisms, our study provides a uniform platform that can be adapted to any species with genome-wide expression data to construct GRNs. We also present a publicly available database, maize tissue-specific GRN (mGRN,
https://www.bio.fsu.edu/mcginnislab/mgrn/
), for easy querying. All source code and data are available at Github (
https://github.com/timedreamer/maize_tissue-specific_GRN
).
Journal Article