Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
391 result(s) for "Perplexity"
Sort by:
A heuristic approach to determine an appropriate number of topics in topic modeling
Background Topic modelling is an active research field in machine learning. While mainly used to build models from unstructured textual data, it offers an effective means of data mining where samples represent documents, and different biological endpoints or omics data represent words. Latent Dirichlet Allocation (LDA) is the most commonly used topic modelling method across a wide number of technical fields. However, model development can be arduous and tedious, and requires burdensome and systematic sensitivity studies in order to find the best set of model parameters. Often, time-consuming subjective evaluations are needed to compare models. Currently, research has yielded no easy way to choose the proper number of topics in a model beyond a major iterative approach. Methods and results Based on analysis of variation of statistical perplexity during topic modelling, a heuristic approach is proposed in this study to estimate the most appropriate number of topics. Specifically, the rate of perplexity change (RPC) as a function of numbers of topics is proposed as a suitable selector. We test the stability and effectiveness of the proposed method for three markedly different types of grounded-truth datasets: Salmonella next generation sequencing, pharmacological side effects, and textual abstracts on computational biology and bioinformatics (TCBB) from PubMed. Conclusion The proposed RPC-based method is demonstrated to choose the best number of topics in three numerical experiments of widely different data types, and for databases of very different sizes. The work required was markedly less arduous than if full systematic sensitivity studies had been carried out with number of topics as a parameter. We understand that additional investigation is needed to substantiate the method's theoretical basis, and to establish its generalizability in terms of dataset characteristics.
Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study
Artificial intelligence (AI) chatbots have recently gained use in medical practice by health care practitioners. Interestingly, the output of these AI chatbots was found to have varying degrees of hallucination in content and references. Such hallucinations generate doubts about their output and their implementation. The aim of our study was to propose a reference hallucination score (RHS) to evaluate the authenticity of AI chatbots' citations. Six AI chatbots were challenged with the same 10 medical prompts, requesting 10 references per prompt. The RHS is composed of 6 bibliographic items and the reference's relevance to prompts' keywords. RHS was calculated for each reference, prompt, and type of prompt (basic vs complex). The average RHS was calculated for each AI chatbot and compared across the different types of prompts and AI chatbots. Bard failed to generate any references. ChatGPT 3.5 and Bing generated the highest RHS (score=11), while Elicit and SciSpace generated the lowest RHS (score=1), and Perplexity generated a middle RHS (score=7). The highest degree of hallucination was observed for reference relevancy to the prompt keywords (308/500, 61.6%), while the lowest was for reference titles (169/500, 33.8%). ChatGPT and Bing had comparable RHS (β coefficient=-0.069; P=.32), while Perplexity had significantly lower RHS than ChatGPT (β coefficient=-0.345; P<.001). AI chatbots generally had significantly higher RHS when prompted with scenarios or complex format prompts (β coefficient=0.486; P<.001). The variation in RHS underscores the necessity for a robust reference evaluation tool to improve the authenticity of AI chatbots. Further, the variations highlight the importance of verifying their output and citations. Elicit and SciSpace had negligible hallucination, while ChatGPT and Bing had critical hallucination levels. The proposed AI chatbots' RHS could contribute to ongoing efforts to enhance AI's general reliability in medical research.
Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures
Managers and researchers alike have long recognized the importance of corporate textual risk disclosures. Yet it is a nontrivial task to discover and quantify variables of interest from unstructured text. In this paper, we develop a variation of the latent Dirichlet allocation topic model and its learning algorithm for simultaneously discovering and quantifying risk types from textual risk disclosures. We conduct comprehensive evaluations in terms of both conventional statistical fit and substantive fit with respect to the quality of discovered information. Experimental results show that our proposed method outperforms all competing methods, and could find more meaningful topics (risk types). By taking advantage of our proposed method for measuring risk types from textual data, we study how risk disclosures in 10-K forms affect the risk perceptions of investors. Different from prior studies, our results provide support for all three competing arguments regarding whether and how risk disclosures affect the risk perceptions of investors, depending on the specific risk types disclosed. We find that around two-thirds of risk types lack informativeness and have no significant influence. Moreover, we find that the informative risk types do not necessarily increase the risk perceptions of investors-the disclosure of three types of systematic and liquidity risks will increase the risk perceptions of investors, whereas the other five types of unsystematic risks will decrease them. Data, as supplemental material, are available at http://dx.doi.org/10.1287/mnsc.2014.1930 . This paper was accepted by Alok Gupta, special issue on business analytics .
A CiteSpace Analysis of Rural Accounting Research in the Perspective of Digital Rural Governance
In digital rural governance, the effective integration of accounting and rural development is crucial for maximizing rural economic benefits. This paper uses CiteSpace software and LDA modeling to examine the current state and hot topics in rural accounting research from 2018 to 2022. The study was visualized by analyzing valid literature and using perplexity to determine the best effect of topic extraction. The results show that the number of annual publications in this field continues to grow, and the number of universities and institutions publishing relevant papers is also increasing yearly. Regarding research hotspots, the keyword “rural” appears most frequently, reaching 122 times, followed by “financial management” with a frequency of 106 times. The keyword with the highest centrality is “rural revitalization”, with a centrality as high as 0.38. The research keywords are mainly divided into rural revitalization and financial management, with the financial management group being the most dominant. However, the number of studies on rural revitalization has been increasing yearly, and the gap between them has been narrowing. The importance of accounting is growing in promoting rural revitalization and improving rural economic efficiency.
How Useful are Current Chatbots Regarding Urology Patient Information? Comparison of the Ten Most Popular Chatbots’ Responses About Female Urinary Incontinence
This research evaluates the readability and quality of patient information material about female urinary incontinence (fUI) in ten popular artificial intelligence (AI) supported chatbots. We used the most recent versions of 10 widely-used chatbots, including OpenAI’s GPT-4, Claude-3 Sonnet, Grok 1.5, Mistral Large 2, Google Palm 2, Meta’s Llama 3, HuggingChat v0.8.4, Microsoft’s Copilot, Gemini Advanced, and Perplexity. Prompts were created to generate texts about UI, stress type UI, urge type UI, and mix type UI. The modified Ensuring Quality Information for Patients (EQIP) technique and QUEST (Quality Evaluating Scoring Tool) were used to assess the quality, and the average of 8 well-known readability formulas, which is Average Reading Level Consensus (ARLC), were used to evaluate readability. When comparing the average scores, there were significant differences in the mean mQEIP and QUEST scores across ten chatbots (p = 0.049 and p = 0.018). Gemini received the greatest mean scores for mEQIP and QUEST, whereas Grok had the lowest values. The chatbots exhibited significant differences in mean ARLC, word count, and sentence count (p = 0.047, p = 0.001, and p = 0.001, respectively). For readability, Grok is the easiest to read, while Mistral is highly complex to understand. AI-supported chatbot technology needs to be improved in terms of readability and quality of patient information regarding female UI.
The great detectives: humans versus AI detectors in catching large language model-generated medical writing
Background The application of artificial intelligence (AI) in academic writing has raised concerns regarding accuracy, ethics, and scientific rigour. Some AI content detectors may not accurately identify AI-generated texts, especially those that have undergone paraphrasing. Therefore, there is a pressing need for efficacious approaches or guidelines to govern AI usage in specific disciplines. Objective Our study aims to compare the accuracy of mainstream AI content detectors and human reviewers in detecting AI-generated rehabilitation-related articles with or without paraphrasing. Study design This cross-sectional study purposively chose 50 rehabilitation-related articles from four peer-reviewed journals, and then fabricated another 50 articles using ChatGPT. Specifically, ChatGPT was used to generate the introduction, discussion, and conclusion sections based on the original titles, methods, and results. Wordtune was then used to rephrase the ChatGPT-generated articles. Six common AI content detectors (Originality.ai, Turnitin, ZeroGPT, GPTZero, Content at Scale, and GPT-2 Output Detector) were employed to identify AI content for the original, ChatGPT-generated and AI-rephrased articles. Four human reviewers (two student reviewers and two professorial reviewers) were recruited to differentiate between the original articles and AI-rephrased articles, which were expected to be more difficult to detect. They were instructed to give reasons for their judgements. Results Originality.ai correctly detected 100% of ChatGPT-generated and AI-rephrased texts. ZeroGPT accurately detected 96% of ChatGPT-generated and 88% of AI-rephrased articles. The areas under the receiver operating characteristic curve (AUROC) of ZeroGPT were 0.98 for identifying human-written and AI articles. Turnitin showed a 0% misclassification rate for human-written articles, although it only identified 30% of AI-rephrased articles. Professorial reviewers accurately discriminated at least 96% of AI-rephrased articles, but they misclassified 12% of human-written articles as AI-generated. On average, students only identified 76% of AI-rephrased articles. Reviewers identified AI-rephrased articles based on ‘incoherent content’ (34.36%), followed by ‘grammatical errors’ (20.26%), and ‘insufficient evidence’ (16.15%). Conclusions and relevance This study directly compared the accuracy of advanced AI detectors and human reviewers in detecting AI-generated medical writing after paraphrasing. Our findings demonstrate that specific detectors and experienced reviewers can accurately identify articles generated by Large Language Models, even after paraphrasing. The rationale employed by our reviewers in their assessments can inform future evaluation strategies for monitoring AI usage in medical education or publications. AI content detectors may be incorporated as an additional screening tool in the peer-review process of academic journals.
Evaluation of three artificial intelligence chatbots for generating clinical hematology multiple choice questions for medical students
The integration of artificial intelligence (AI) into medical education has shown promise in streamlining content creation, yet the reliability and validity of AI-generated assessments remain critical concerns. This study evaluates three AI models-ChatGPT, Perplexity, and DeepSeek-in generating hematology multiple-choice questions (MCQs), focusing on their alignment with clinical guidelines, cognitive complexity, and expert acceptability, to determine their practical utility in medical education. To quantitatively evaluate and compare the performance of three AI models-ChatGPT, Perplexity, and DeepSeek-in generating multiple-choice questions (MCQs) relevant to hematology, with a focus on content validity, cognitive level alignment, and expert acceptance. In this study, each AI model was prompted to generate 50 MCQs across five key hematology topics, following standardized instructions emphasizing guideline alignment and cognitive diversity. Three hematology experts, blinded to question source, independently rated all 150 MCQs on criteria including accuracy, clinical relevance, clarity, distractor plausibility, and overall quality, using a structured rubric. Scores were averaged per model, and questions were categorized by Bloom’s taxonomy level. Acceptance was defined as a total score ≥ 15 out of 25. DeepSeek achieved the highest scores for accuracy (4.7 ± 0.4), clinical relevance (4.8 ± 0.3), and distractor plausibility (4.7 ± 0.4), with a perfect acceptance rate (100%) and no need for revision. Perplexity and ChatGPT also produced clinically relevant questions but required minor revisions (acceptance rates: 96% and 90%, respectively). All models favored higher-order cognitive questions. Knowledge and comprehension questions were limited across all models. AI models, particularly DeepSeek, can efficiently generate high-quality, clinically relevant hematology MCQs suitable for medical education and assessment. While DeepSeek demonstrated superior reliability and required minimal expert revision, all models underrepresented foundational knowledge questions and lacked autonomous image-based item generation. Hybrid human-AI workflows and targeted prompt engineering are recommended to optimize cognitive coverage and ensure educational rigor.
Advancing deep learning for expressive music composition and performance modeling
The pursuit of expressive and human-like music generation remains a significant challenge in the field of artificial intelligence (AI). While deep learning has advanced AI music composition and transcription, current models often struggle with long-term structural coherence and emotional nuance. This study presents a comparative analysis of three leading deep learning architectures: Long Short-Term Memory (LSTM) networks, Transformer models, and Generative Adversarial Networks (GANs), for AI-generated music composition and transcription using the MAESTRO dataset. Our key innovation lies in the integration of a dual evaluation framework that combines objective metrics (perplexity, harmonic consistency, and rhythmic entropy) with subjective human evaluations via a Mean Opinion Score (MOS) study involving 50 listeners. The Transformer model achieved the best overall performance (perplexity: 2.87, harmonic consistency: 79.4%, MOS: 4.3), indicating its superior ability to produce musically rich and expressive outputs. However, human compositions remained highest in perceptual quality (MOS: 4.8). Our findings provide a benchmarking foundation for future AI music systems and emphasize the need for emotion-aware modeling, real-time human-AI collaboration, and reinforcement learning to bridge the gap between machine-generated and human-performed music.
Genomic perplexity and the evolution of context-dependent function
Abstract The fundamental principle that selection acts on a gene's function often assumes implicitly that this function is fixed and intrinsic. However, empirical evidence from pangenomics, synthetic biology, and GWAS consistently demonstrates that organismal function is highly context-dependent, varying across genomic backgrounds and cellular states, even for core genes. Drawing a conceptual parallel with modern large language models (LLMs), I propose that genomes, like LLMs, do not encode fixed functions but rather “probability distributions” over functional and phenotypic outcomes. This framework draws a conceptual analogy between epistasis and transformer-style “attention mechanisms,” suggesting that genomic context weights the influence of distant genetic elements. I also introduce the concept of “genomic perplexity”—an information-theoretic measure of the statistical unexpectedness and incompatibility of a genetic element within its host context. I demonstrate how perplexity serves as a quantifiable metric for the well-known fitness cost associated with interspecies gene flow (eg horizontal gene transfer (HGT) and introgression), where a new gene represents a high-perplexity token. This perspective formalizes long-standing observations of genomic fit and provides a testable framework for predicting the integration potential of accessory genes and directing future research in synthetic biology and evolutionary modeling.