Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
57 result(s) for "Wu, Leihong"
Sort by:
Benchmarking bias in embeddings of healthcare AI models: using SD-WEAT for detection and measurement across sensitive populations
Background Artificial intelligence (AI) has been shown to exhibit and perpetuate human biases; recent research efforts have focused on measuring bias within the input embeddings of AI language models, especially with non-binary classifications that are common in medicine and healthcare scenarios. For instance, ethnicity-linked terms might include categories such as Asian, Black, Hispanic, and White, complicating the definition of – traditionally binary – attribute groups. In this study, we aimed to develop a new framework to detect and measure inherent medical biases based on SD-WEAT (Standard Deviation - Word Embedding Association Test). Compared to its predecessor, WEAT, SD-WEAT was able to measure bias among multi-level attribute groups common in the field of medicine, such as age, race, and region. Methods We constructed a collection of medicine-based benchmarks that can be used to detect and measure biases among sex, ethnicities, and medical conditions. Then, we evaluated a collection of language models, including GloVe, BERT, LegalBERT, BioBERT, GPT-2, and BioGPT, and determined which had potential undesirable or desirable healthcare biases. Results With the presented framework, we were able to detect and measure a significant presence of bias among gender-linked ( P  < 0.01) and ethnicity-linked ( P  < 0.01) medical conditions for a biomedicine-focused language model (e.g., BioBERT) compared to general BERT models. In addition, we demonstrated that SD-WEAT was capable of simultaneously handling multiple attribute groups, detecting and measuring bias among a collection of ethnicity-linked medical conditions and multiple ethnic/racial groups. Conclusions To conclude, we presented an AI bias measurement framework, based on SD-WEAT. This framework provided a promising approach to detect and measure biases in language models that have been applied in biomedical/healthcare text analysis.
A Network Pharmacology Study of Chinese Medicine QiShenYiQi to Reveal Its Underlying Multi-Compound, Multi-Target, Multi-Pathway Mode of Action
Chinese medicine is a complex system guided by traditional Chinese medicine (TCM) theories, which has proven to be especially effective in treating chronic and complex diseases. However, the underlying modes of action (MOA) are not always systematically investigated. Herein, a systematic study was designed to elucidate the multi-compound, multi-target and multi-pathway MOA of a Chinese medicine, QiShenYiQi (QSYQ), on myocardial infarction. QSYQ is composed of Astragalus membranaceus (Huangqi), Salvia miltiorrhiza (Danshen), Panax notoginseng (Sanqi), and Dalbergia odorifera (Jiangxiang). Male Sprague Dawley rat model of myocardial infarction were administered QSYQ intragastrically for 7 days while the control group was not treated. The differentially expressed genes (DEGs) were identified from myocardial infarction rat model treated with QSYQ, followed by constructing a cardiovascular disease (CVD)-related multilevel compound-target-pathway network connecting main compounds to those DEGs supported by literature evidences and the pathways that are functionally enriched in ArrayTrack. 55 potential targets of QSYQ were identified, of which 14 were confirmed in CVD-related literatures with experimental supporting evidences. Furthermore, three sesquiterpene components of QSYQ, Trans-nerolidol, (3S,6S,7R)-3,7,11-trimethyl-3,6-epoxy-1,10-dodecadien-7-ol and (3S,6R,7R)-3,7,11-trimethyl-3,6-epoxy-1,10-dodecadien-7-ol from Dalbergia odorifera T. Chen, were validated experimentally in this study. Their anti-inflammatory effects and potential targets including extracellular signal-regulated kinase-1/2, peroxisome proliferator-activated receptor-gamma and heme oxygenase-1 were identified. Finally, through a three-level compound-target-pathway network with experimental analysis, our study depicts a complex MOA of QSYQ on myocardial infarction.
Study of serious adverse drug reactions using FDA-approved drug labeling and MedDRA
Background Adverse Drug Reactions (ADRs) are of great public health concern. FDA-approved drug labeling summarizes ADRs of a drug product mainly in three sections, i.e., Boxed Warning (BW), Warnings and Precautions (WP), and Adverse Reactions (AR), where the severity of ADRs are intended to decrease in the order of BW > WP > AR. Several reported studies have extracted ADRs from labeling documents, but most, if not all, did not discriminate the severity of the ADRs by the different labeling sections. Such a practice could overstate or underestimate the impact of certain ADRs to the public health. In this study, we applied the Medical Dictionary for Regulatory Activities (MedDRA) to drug labeling and systematically analyzed and compared the ADRs from the three labeling sections with a specific emphasis on analyzing serious ADRs presented in BW, which is of most drug safety concern. Results This study investigated New Drug Application (NDA) labeling documents for 1164 single-ingredient drugs using Oracle Text search to extract MedDRA terms. We found that only a small portion of MedDRA Preferred Terms (PTs), 3819 out of 21,920 or 17.42%, were observed in a whole set of documents. In detail, 466/3819 (12.0%) PTs were in BW, 2023/3819 (53.0%) were in WP, and 2961/3819 (77.5%) were in AR sections. We also found a higher overlap of top 20 occurring BW PTs with WP sections compared to AR sections. Within the MedDRA System Organ Class levels, serious ADRs (sADRs) from BW were prevalent in Nervous System disorders and Vascular disorders. A Hierarchical Cluster Analysis (HCA) revealed that drugs within the same therapeutic category shared the same ADR patterns in BW (e.g., nervous system drug class is highly associated with drug abuse terms such as dependence , substance abuse , and respiratory depression ). Conclusions This study demonstrated that combining MedDRA standard terminologies with data mining techniques facilitated computer-aided ADR analysis of drug labeling. We also highlighted the importance of labeling sections that differ in seriousness and application in drug safety. Using sADRs primarily related to BW sections, we illustrated a prototype approach for computer-aided ADR monitoring and studies which can be applied to other public health documents.
HetEnc: a deep learning predictive model for multi-type biological dataset
Background Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; however, is critical and challenging. In this study, we proposed HetEnc, a novel deep learning-based approach, for information domain separation. Results HetEnc includes both an unsupervised feature representation module and a supervised neural network module to handle multi-platform gene expression datasets. It first constructs three different encoding networks to represent the original gene expression data using high-level abstracted features. A six-layer fully-connected feed-forward neural network is then trained using these abstracted features for each targeted endpoint. We applied HetEnc to the SEQC neuroblastoma dataset to demonstrate that it outperforms other machine learning approaches. Although we used multi-platform data in feature abstraction and model training, HetEnc does not need multi-platform data for prediction, enabling a broader application of the trained model by reducing the cost of gene expression profiling for new samples to a single platform. Thus, HetEnc provides a new solution to integrated gene expression analysis, accelerating modern biological research.
Enhancing Bias Assessment for Complex Term Groups in Language Embedding Models: Quantitative Comparison of Methods
Artificial intelligence (AI) is rapidly being adopted to build products and aid in the decision-making process across industries. However, AI systems have been shown to exhibit and even amplify biases, causing a growing concern among people worldwide. Thus, investigating methods of measuring and mitigating bias within these AI-powered tools is necessary. In natural language processing applications, the word embedding association test (WEAT) is a popular method of measuring bias in input embeddings, a common area of measure bias in AI. However, certain limitations of the WEAT have been identified (ie, their nonrobust measure of bias and their reliance on predefined and limited groups of words or sentences), which may lead to inadequate measurements and evaluations of bias. Thus, this study takes a new approach at modifying this popular measure of bias, with a focus on making it more robust and applicable in other domains. In this study, we introduce the SD-WEAT, which is a modified version of the WEAT that uses the SD of multiple permutations of the WEATs to calculate bias in input embeddings. With the SD-WEAT, we evaluated the biases and stability of several language embedding models, including Global Vectors for Word Representation (GloVe), Word2Vec, and bidirectional encoder representations from transformers (BERT). This method produces results comparable to those of the WEAT, with strong correlations between the methods' bias scores or effect sizes (r=0.786) and P values (r=0.776), while addressing some of its largest limitations. More specifically, the SD-WEAT is more accessible, as it removes the need to predefine attribute groups, and because the SD-WEAT measures bias over multiple runs rather than one, it reduces the impact of outliers and sample size. Furthermore, the SD-WEAT was found to be more consistent and reliable than its predecessor. Thus, the SD-WEAT shows promise for robustly measuring bias in the input embeddings fed to AI language models.
Optimized imaging methods for species-level identification of food-contaminating beetles
Identifying the exact species of pantry beetle responsible for food contamination, is imperative in assessing the risks associated with contamination scenarios. Each beetle species is known to have unique patterns on their hardened forewings (known as elytra ) through which they can be identified. Currently, this is done through manual microanalysis of the insect or their fragments in contaminated food samples. We envision that the use of automated pattern analysis would expedite and scale up the identification process. However, such automation would require images to be captured in a consistent manner, thereby enabling the creation of large repositories of high-quality images. Presently, there is no standard imaging technique for capturing images of beetle elytra, which consequently means, there is no standard method of beetle species identification through elytral pattern analysis. This deficiency inspired us to optimize and standardize imaging methods, especially for food-contaminating beetles. For this endeavor, we chose multiple species of beetles belonging to different families or genera that have near-identical elytral patterns, and thus are difficult to identify correctly at the species level. Our optimized imaging method provides enhanced images such that the elytral patterns between individual species could easily be distinguished from each other, through visual observation . We believe such standardization is critical in developing automated species identification of pantry beetles and/or other insects. This eventually may lead to improved taxonomical classification, allowing for better management of food contamination and ecological conservation.
DLI-IT: a deep learning approach to drug label identification through image and text embedding
Background Drug label, or packaging insert play a significant role in all the operations from production through drug distribution channels to the end consumer. Image of the label also called Display Panel or label could be used to identify illegal, illicit, unapproved and potentially dangerous drugs. Due to the time-consuming process and high labor cost of investigation, an artificial intelligence-based deep learning model is necessary for fast and accurate identification of the drugs. Methods In addition to image-based identification technology, we take advantages of rich text information on the pharmaceutical package insert of drug label images. In this study, we developed the Drug Label Identification through Image and Text embedding model (DLI-IT) to model text-based patterns of historical data for detection of suspicious drugs. In DLI-IT, we first trained a Connectionist Text Proposal Network (CTPN) to crop the raw image into sub-images based on the text. The texts from the cropped sub-images are recognized independently through the Tesseract OCR Engine and combined as one document for each raw image. Finally, we applied universal sentence embedding to transform these documents into vectors and find the most similar reference images to the test image through the cosine similarity. Results We trained the DLI-IT model on 1749 opioid and 2365 non-opioid drug label images. The model was then tested on 300 external opioid drug label images, the result demonstrated our model achieves up-to 88% of the precision in drug label identification, which outperforms previous image-based or text-based identification method by up-to 35% improvement. Conclusion To conclude, by combining Image and Text embedding analysis under deep learning framework, our DLI-IT approach achieved a competitive performance in advancing drug label identification.
Long noncoding RNA LINC00844-mediated molecular network regulates expression of drug metabolizing enzymes and nuclear receptors in human liver cells
Noncoding RNAs, such as long noncoding RNAs (lncRNAs) and microRNAs (miRNAs), regulate gene expression in many physiological and pathological processes, including drug metabolism. Drug metabolizing enzymes (DMEs) are critical components in drug-induced liver toxicity. In this study, we used human hepatic HepaRG cells treated with 5 or 10 mM acetaminophen (APAP) as a model system and identified LINC00844 as a toxicity-responsive lncRNA. We analyzed the expression profiles of LINC00844 in different human tissues. In addition, we examined the correlations between the levels of LINC00844 and those of key DMEs and nuclear receptors (NRs) for APAP metabolism in humans. Our results showed that lncRNA LINC00844 is enriched in the liver and its expression correlates positively with mRNA levels of CYP3A4, CYP2E1, SULT2A1, pregnane X receptor (PXR), and hepatocyte nuclear factor (HNF) 4α. We demonstrated that LINC00844 regulates the expression of these five genes in HepaRG cells using gain- and loss-of-function assays. Further, we discovered that LINC00844 is localized predominantly in the cytoplasm and acts as an hsa-miR-486-5p sponge, via direct binding, to protect SULT2A1 from miRNA-mediated gene silencing. Our data also demonstrated a functional interaction between LINC00844 and hsa-miR-486-5p in regulating DME and NR expression in HepaRG cells and primary human hepatocytes. We depicted a LINC00844-mediated regulatory network that involves miRNA and NRs and influences DME expression in response to APAP toxicity.
Transcriptome analysis reveals lung-specific miRNAs associated with impaired mucociliary clearance induced by cigarette smoke in an in vitro human airway tissue model
Exposure to cigarette smoke (CS) is strongly associated with impaired mucociliary clearance (MCC), which has been implicated in the pathogenesis of CS-induced respiratory diseases, such as chronic obstructive pulmonary diseases (COPD). In this study, we aimed to identify microRNAs (miRNAs) that are associated with impaired MCC caused by CS in an in vitro human air–liquid-interface (ALI) airway tissue model. ALI cultures were exposed to CS (diluted with 0.5 L/min, 1.0 L/min, and 4.0 L/min of clean air) from smoking five 3R4F University of Kentucky reference cigarettes under the International Organization for Standardization (ISO) machine smoking regimen, every other day for 1 week (a total of 3 days, 40 min/day). Transcriptome analyses of ALI cultures exposed to the high concentration of CS identified 5090 differentially expressed genes and 551 differentially expressed miRNAs after the third exposure. Genes involved in ciliary function and ciliogenesis were significantly perturbed by repeated CS exposures, leading to changes in cilia beating frequency and ciliary protein expression. In particular, a time-dependent decrease in the expression of miR-449a, a conserved miRNA highly enriched in ciliated airway epithelia and implicated in motile ciliogenesis, was observed in CS-exposed cultures. Similar alterations in miR-449a have been reported in smokers with COPD. Network analysis further indicates that downregulation of miR-449a by CS may derepress cell-cycle proteins, which, in turn, interferes with ciliogenesis. Investigating the effects of CS on transcriptome profile in human ALI cultures may provide not only mechanistic insights, but potential early biomarkers for CS exposure and harm.
Technical advance in targeted NGS analysis enables identification of lung cancer risk-associated low frequency TP53, PIK3CA, and BRAF mutations in airway epithelial cells
Background Standardized Nucleic Acid Quantification for SEQuencing (SNAQ-SEQ) is a novel method that utilizes synthetic DNA internal standards spiked into each sample prior to next generation sequencing (NGS) library preparation. This method was applied to analysis of normal appearing airway epithelial cells (AEC) obtained by bronchoscopy in an effort to define a somatic mutation field effect associated with lung cancer risk. There is a need for biomarkers that reliably detect those at highest lung cancer risk, thereby enabling more effective screening by annual low dose CT. The purpose of this study was to test the hypothesis that lung cancer risk is characterized by increased prevalence of low variant allele frequency (VAF) somatic mutations in lung cancer driver genes in AEC. Methods Synthetic DNA internal standards (IS) were prepared for 11 lung cancer driver genes and mixed with each AEC genomic (g) DNA specimen prior to competitive multiplex PCR amplicon NGS library preparation. A custom Perl script was developed to separate IS reads and respective specimen gDNA reads from each target into separate files for parallel variant frequency analysis. This approach identified nucleotide-specific sequencing error and enabled reliable detection of specimen mutations with VAF as low as 5 × 10 − 4 (0.05%). This method was applied in a retrospective case-control study of AEC specimens collected by bronchoscopic brush biopsy from the normal airways of 19 subjects, including eleven lung cancer cases and eight non-cancer controls, and the association of lung cancer risk with AEC driver gene mutations was tested. Results TP53 mutations with 0.05–1.0% VAF were more prevalent ( p  < 0.05) and also enriched for tobacco smoke and age-associated mutation signatures in normal AEC from lung cancer cases compared to non-cancer controls matched for smoking and age. Further, PIK3CA and BRAF mutations in this VAF range were identified in AEC from cases but not controls. Conclusions Application of SNAQ-SEQ to measure mutations in the 0.05–1.0% VAF range enabled identification of an AEC somatic mutation field of injury associated with lung cancer risk. A biomarker comprising TP53, PIK3CA, and BRAF somatic mutations may better stratify individuals for optimal lung cancer screening and prevention outcomes.