Catalogue Search | MBRL

DB-GPT: Large Language Model Meets Database

by Zhou, Xuanhe , Li, Guoliang , Sun, Zhaoyan in Algorithm Analysis and Problem Complexity , Artificial Intelligence , Chemistry and Earth Sciences

2024

Large language models (LLMs) have shown superior performance in various areas. And LLMs have the potential to revolutionize data management by serving as the \"brain\" of next-generation database systems. However, there are several challenges that utilize LLMs to optimize databases. First, it is challenging to provide appropriate prompts (e.g., instructions and demonstration examples) to enable LLMs to understand the database optimization problems. Second, LLMs only capture the logical database characters (e.g., SQL semantics) but are not aware of physical characters (e.g., data distributions), and it requires to fine-tune LLMs to capture both physical and logical information. Third, LLMs are not well trained for databases with strict constraints (e.g., query plan equivalence) and privacy-preserving requirements, and it is challenging to train database-specific LLMs while ensuring database privacy. To overcome these challenges, this vision paper proposes a LLM-based database framework (DB-GPT), including automatic prompt generation, DB-specific model fine-tuning, and DB-specific model design and pre-training. Preliminary experiments show that DB-GPT achieves relatively good performance in database tasks like query rewrite and index tuning. The source code and datasets are available at github.com/TsinghuaDatabaseGroup/DB-GPT.

Journal Article

Share this book

Add to My Shelf

The Atypical Antipsychotic Agent, Clozapine, Protects Against Corticosterone-Induced Death of PC12 Cells by Regulating the Akt/FoxO3a Signaling Pathway

by Zhou, Xuanhe , Wang, Xue , Srivastava, Lalit K. in 1-Phosphatidylinositol 3-kinase , AKT protein , AKT1 protein

2017

Schizophrenia is one of the most severe psychiatric disorders. Increasing evidence implicates that neurodegeneration is a component of schizophrenia pathology and some atypical antipsychotics are neuroprotective and successful in slowing the progressive morphological brain changes. As an antipsychotic agent, clozapine has superior and unique effects, but the intracellular signaling pathways that mediate clozapine action remain to be elucidated. The phosphatidylinositol-3-kinase/protein kinase B/Forkhead box O3 (PI3K/Akt/FoxO3a) pathway is crucial for neuronal survival. However, little information is available regarding this pathway with clozapine. In the present study, we investigated the protective effect of clozapine on the PC12 cells against corticosterone toxicity. Our results showed that corticosterone decreases the phosphorylation of Akt and FoxO3a, leading to the nuclear localization of FoxO3a and the apoptosis of PC12 cells, while clozapine concentration dependently protected PC12 cells against corticosterone insult. Pathway inhibitors studies displayed that the protective effect of clozapine was reversed by LY294002 and wortmannin, two PI3K inhibitors, or Akt inhibitor VIII although several other inhibitors had no effect. The shRNA knockdown results displayed that downregulated Akt1 or FoxO3a attenuated the protective effect of clozapine. Western blot analyses revealed that clozapine induced the phosphorylation of Akt and FoxO3a by the PI3K/Akt pathway and reversed the reduction of the phosphorylated Akt and FoxO3a and the nuclear translocation of FoxO3a evoked by corticosterone. Together, our data indicates that clozapine protects PC12 cells against corticosterone-induced cell death by modulating activity of the PI3K/Akt/FoxO3a pathway.

Journal Article

Share this book

Add to My Shelf

Altered brain activation and functional connectivity in working memory related networks in patients with type 2 diabetes: An ICA-based analysis

by Zhou, Xuanhe , Zhang, Yang , Zhang, Huimei in 631/378/2649/2150 , 692/163/2743/137/773 , Blood Glucose - metabolism

2016

Type 2 diabetes mellitus (T2DM) can cause multidimensional cognitive deficits, among which working memory (WM) is usually involved at an early stage. However, the neural substrates underlying impaired WM in T2DM patients are still unclear. To clarify this issue, we utilized functional magnetic resonance imaging (fMRI) and independent component analysis to evaluate T2DM patients for alterations in brain activation and functional connectivity (FC) in WM networks and to determine their associations with cognitive and clinical variables. Twenty complication-free T2DM patients and 19 matched healthy controls (HCs) were enrolled and fMRI data were acquired during a block-designed 1-back WM task. The WM metrics of the T2DM patients showed no differences compared with those of the HCs, except for a slightly lower accuracy rate in the T2DM patients. Compared with the HCs, the T2DM patients demonstrated increased activation within their WM fronto-parietal networks and activation strength was significantly correlated with WM performance. The T2DM patients also showed decreased FC within and between their WM networks. Our results indicate that the functional integration of WM sub-networks was disrupted in the complication-free T2DM patients and that strengthened regional activity in fronto-parietal networks may compensate for the WM impairment caused by T2DM.

Journal Article

Share this book

Add to My Shelf

The role of Akt/FoxO3a in the protective effect of venlafaxine against corticosterone-induced cell death in PC12 cells

by Zhou, Xuanhe , Chen, Shaorui , Huang, Jianchu in Animals , Antidepressants , Antidepressive Agents, Second-Generation - pharmacology

2013

Rationale Antidepressants could exert neuroprotective effects against various insults and the antidepressant-like effect may result from its neuroprotective effects. The phosphatidylinositol-3-kinase/protein kinase B/Forkhead box O3 (PI3K/Akt/FoxO3a) pathway is a key signaling pathway in mediating cell survival. However, no information is available regarding the interaction of FoxO3a and antidepressants. Objectives PC12 cells treated with corticosterone were used as a model to study the protective effect of venlafaxine and underlying mechanisms. Methods Methyl thiazolyl tetrazolium (MTT) assay, Hoechst staining, and the observation of FoxO3a subcellular location were used to study the protective effect of venlafaxine against cell damage caused by corticosterone. Pretreatments with various pathway inhibitors were used to investigate the possible pathways involved in the protection of venlafaxine. The phosphorylation of Akt and FoxO3a was analyzed by Western blot. Results Corticosterone decreased the phosphorylation of Akt and FoxO3a and led to the nuclear localization of FoxO3a and the apoptosis of PC12 cells. Venlafaxine concentration-dependently protected PC12 cells against corticosterone. The protective effect of venlafaxine was reversed by LY294002 and wortmannin, two PI3K inhibitors, and Akt inhibitor VIII, whereas mitogen-activated protein kinase kinase (MAPK kinase) inhibitor PD98059 and the p38 MAPK inhibitor PD160316 had no effect. Western blot analyses showed that venlafaxine induced the phosphorylation of Akt and FoxO3a by the PI3K/Akt pathway and reversed the reduction of the phosphorylated Akt and FoxO3a, and the nuclear translocation of Foxo3a induced by corticosterone. Conclusions Venlafaxine protects PC12 cells against corticosterone-induced cell death by modulating the activity of the PI3K/Akt/FoxO3a pathway.

Journal Article

Share this book

Add to My Shelf

R-Bot: An LLM-based Query Rewrite System

by Zhou, Xuanhe , Sun, Zhaoyan , Li, Guoliang in Hallucinations , Hybrid structures , Large language models

2024

Query rewrite is essential for optimizing SQL queries to improve their execution efficiency without changing their results. Traditionally, this task has been tackled through heuristic and learning-based methods, each with its limitations in terms of inferior quality and low robustness. Recent advancements in LLMs offer a new paradigm by leveraging their superior natural language and code comprehension abilities. Despite their potential, directly applying LLMs like GPT-4 has faced challenges due to problems such as hallucinations, where the model might generate inaccurate or irrelevant results. To address this, we propose R-Bot, an LLM-based query rewrite system with a systematic approach. We first design a multi-source rewrite evidence preparation pipeline to generate query rewrite evidences for guiding LLMs to avoid hallucinations. We then propose a hybrid structure-semantics retrieval method that combines structural and semantic analysis to retrieve the most relevant rewrite evidences for effectively answering an online query. We next propose a step-by-step LLM rewrite method that iteratively leverages the retrieved evidences to select and arrange rewrite rules with self-reflection. We conduct comprehensive experiments on widely used benchmarks, and demonstrate the superior performance of our system, R-Bot, surpassing state-of-the-art query rewrite methods.

Paper

Share this book

Add to My Shelf

Vector Search for the Future: From Memory-Resident, Static Heterogeneous Storage, to Cloud-Native Architectures

by Zhou, Xuanhe , Song, Yitong , Jensen, Christian S in Computer architecture , Data management , Data storage

2026

Vector search (VS) has become a fundamental component in multimodal data management, enabling core functionalities such as image, video, and code retrieval. As vector data scales rapidly, VS faces growing challenges in balancing search, latency, scalability, and cost. The evolution of VS has been closely driven by changes in storage architecture. Early VS methods rely on all-in-memory designs for low latency, but scalability is constrained by memory capacity and cost. To address this, recent research has adopted heterogeneous architectures that offload space-intensive vectors and index structures to SSDs, while exploiting block locality and I/O-efficient strategies to maintain high search performance at billion scale. Looking ahead, the increasing demand for trillion-scale vector retrieval and cloud-native elasticity is driving a further shift toward memory-SSD-object storage architectures, which enable cost-efficient data tiering and seamless scalability. In this tutorial, we review the evolution of VS techniques from a storage-architecture perspective. We first review memory-resident methods, covering classical IVF, hash, quantization, and graph-based designs. We then present a systematic overview of heterogeneous storage VS techniques, including their index designs, block-level layouts, query strategies, and update mechanisms. Finally, we examine emerging cloud-native systems and highlight open research opportunities for future large-scale vector retrieval systems.

Paper

Share this book

Add to My Shelf

LLM-Enhanced Data Management

by Zhou, Xuanhe , Zhao, Xinyang , Li, Guoliang in Accuracy , Context , Data management

2024

Machine learning (ML) techniques for optimizing data management problems have been extensively studied and widely deployed in recent five years. However traditional ML methods have limitations on generalizability (adapting to different scenarios) and inference ability (understanding the context). Fortunately, large language models (LLMs) have shown high generalizability and human-competitive abilities in understanding context, which are promising for data management tasks (e.g., database diagnosis, database tuning). However, existing LLMs have several limitations: hallucination, high cost, and low accuracy for complicated tasks. To address these challenges, we design LLMDB, an LLM-enhanced data management paradigm which has generalizability and high inference ability while avoiding hallucination, reducing LLM cost, and achieving high accuracy. LLMDB embeds domain-specific knowledge to avoid hallucination by LLM fine-tuning and prompt engineering. LLMDB reduces the high cost of LLMs by vector databases which provide semantic search and caching abilities. LLMDB improves the task accuracy by LLM agent which provides multiple-round inference and pipeline executions. We showcase three real-world scenarios that LLMDB can well support, including query rewrite, database diagnosis and data analytics. We also summarize the open research challenges of LLMDB.

Paper

Share this book

Add to My Shelf

CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models

by Zhou, Xuanhe , Gao, Yuyang , Zhou, Wei in Dialects , Large language models , Queries

2025

Dialect translation plays a key role in enabling seamless interaction across heterogeneous database systems. However, translating SQL queries between different dialects (e.g., from PostgreSQL to MySQL) remains a challenging task due to syntactic discrepancies and subtle semantic variations. Existing approaches including manual rewriting, rule-based systems, and large language model (LLM)-based techniques often involve high maintenance effort (e.g., crafting custom translation rules) or produce unreliable results (e.g., LLM generates non-existent functions), especially when handling complex queries. In this demonstration, we present CrackSQL, the first hybrid SQL dialect translation system that combines rule and LLM-based methods to overcome these limitations. CrackSQL leverages the adaptability of LLMs to minimize manual intervention, while enhancing translation accuracy by segmenting lengthy complex SQL via functionality-based query processing. To further improve robustness, it incorporates a novel cross-dialect syntax embedding model for precise syntax alignment, as well as an adaptive local-to-global translation strategy that effectively resolves interdependent query operations. CrackSQL supports three translation modes and offers multiple deployment and access options including a web console interface, a PyPI package, and a command-line prompt, facilitating adoption across a variety of real-world use cases

Paper

Share this book

Add to My Shelf

LLM As DBA

by Zhou, Xuanhe , Liu, Zhiyuan , Li, Guoliang in Availability , Data base management systems , Diagnosis

2023

Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability. However, it is hard and tedious for DBAs to manage a large number of database instances (e.g., millions of instances on the cloud databases). Recently large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Thus, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources, and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results that D-Bot can efficiently and effectively diagnose the root causes and our code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.

Paper

Share this book

Add to My Shelf

PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation

by Zhou, Xuanhe , Wu, Fan , Wang, Haoyu in Benchmarks , Large language models , Query languages

2025

Large language models (LLMS) have shown increasing effectiveness in Text-to-SQL tasks. However, another closely related problem, Cross-System SQL Translation (a.k.a., SQL-to-SQL), which adapts a query written for one database system (e.g., MySQL) into its equivalent one for another system (e.g., ClickHouse), is of great practical importance but remains underexplored. Existing SQL benchmarks are not well-suited for SQL-to-SQL evaluation, which (1) focus on a limited set of database systems (often just SQLite) and (2) cannot capture many system-specific SQL dialects (e.g., customized functions, data types, and syntax rules). Thus, in this paper, we introduce PARROT, a Practical And Realistic BenchmaRk for CrOss-System SQL Translation. PARROT comprises 598 translation pairs from 38 open-source benchmarks and real-world business services, specifically prepared to challenge system-specific SQL understanding (e.g., LLMS achieve lower than 38.53% accuracy on average). We also provide multiple benchmark variants, including PARROT-Diverse with 28,003 translations (for extensive syntax testing) and PARROT-Simple with 5,306 representative samples (for focused stress testing), covering 22 production-grade database systems. To promote future research, we release a public leaderboard and source code at: https://code4db.github.io/parrot-bench/.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter