Catalogue Search | MBRL

A Semantic Learning-Based SQL Injection Attack Detection Technology

by Lu, Dongzhe , Liu, Long , Fei, Jinlong in Applications programs , Computer crimes , Data integrity

2023

Over the years, injection vulnerabilities have been at the top of the Open Web Application Security Project Top 10 and are one of the most damaging and widely exploited types of vulnerabilities against web applications. Structured Query Language (SQL) injection attack detection remains a challenging problem due to the heterogeneity of attack loads, the diversity of attack methods, and the variety of attack patterns. It has been demonstrated that no single model can guarantee adequate security to protect web applications, and it is crucial to develop an efficient and accurate model for SQL injection attack detection. In this paper, we propose synBERT, a semantic learning-based detection model that explicitly embeds the sentence-level semantic information from SQL statements into an embedding vector. The model learns representations that can be mapped to SQL syntax tree structures, as evidenced by visualization work. We gathered a wide range of datasets to assess the classification performance of the synBERT, and the results show that our approach outperforms previously proposed models. Even on brand-new, untrained models, accuracy can reach 90% or higher, indicating that the model has good generalization performance.

Journal Article

Share this book

Add to My Shelf

SamQL: a structured query language and filtering tool for the SAM/BAM file format

by Lee, Christopher T. , Maragkakis, Manolis in Algorithms , Analysis , Big data

2021

Background The Sequence Alignment/Map Format Specification (SAM) is one of the most widely adopted file formats in bioinformatics and many researchers use it daily. Several tools, including most high-throughput sequencing read aligners, use it as their primary output and many more tools have been developed to process it. However, despite its flexibility, SAM encoded files can often be difficult to query and understand even for experienced bioinformaticians. As genomic data are rapidly growing, structured, and efficient queries on data that are encoded in SAM/BAM files are becoming increasingly important. Existing tools are very limited in their query capabilities or are not efficient. Critically, new tools that address these shortcomings, should not be able to support existing large datasets but should also do so without requiring massive data transformations and file infrastructure reorganizations. Results Here we introduce SamQL, an SQL-like query language for the SAM format with intuitive syntax that supports complex and efficient queries on top of SAM/BAM files and that can replace commonly used Bash one-liners employed by many bioinformaticians. SamQL has high expressive power with no upper limit on query size and when parallelized, outperforms other substantially less expressive software. Conclusions SamQL is a complete query language that we envision as a step to a structured database engine for genomics. SamQL is written in Go, and is freely available as standalone program and as an open-source library under an MIT license, https://github.com/maragkakislab/samql/ .

Journal Article

Share this book

Add to My Shelf

NLINQ: A natural language interface for querying network performance

by Gordon, Paul , Gillbrand, Tore , Saha, Barun Kumar in Artificial Intelligence , Communication networks , Computer Science

2023

Artificial Intelligence is finding increased applications in communication networks. In particular, the field of text-to-Structured Query Language (SQL) translation has great potential to improve customer experience by allowing the querying of network performance databases using natural language. Such adoption, however, is challenging, in general. On one hand, live production systems may have databases with non-semantic table and column names, which makes natural language parsing and text-to-SQL translation difficult. On the other hand, noisy input texts may lead to the generation of incorrect queries. Moreover, inaccurate transcription of speech input into text may further aggravate the problem. Motivated by these aspects, we investigate the problem of natural language-based querying of network performance databases used by Wireless Mesh Networks (WMNs). In particular, we fine-tune a state-of-the-art model to translate natural language questions into appropriate SQL queries. In order to mitigate the problem of non-semantic names, we generate database views with semantic column names, based on the existing tables. In addition, we make domain-specific corrections in the text in order to help generate accurate queries. We also design the Natural Language Interface for Network Query (NLINQ) prototype for a real-life industrial WMN solution. The results of the performance evaluation indicate that natural language text can be translated into SQL queries with an accuracy of 89.021–92.663%, on average. Moreover, the average turnaround time of NLINQ ranges between 1.263–2.013 seconds. The results indicate that NLINQ is suitable for real-time, interactive querying of network performance databases.

Journal Article

Share this book

Add to My Shelf

An Integrated Data-Driven System for Digital Bridge Management

by Quinci, Gianluca , de Felice, Gianmarco , Napolitano, Antonio in Algorithms , Bridges , building information modelling (BIM)

2024

Relational databases are established and widespread tools for storing and managing information. The efficient collection of information in a database appears to be a promising solution for bridge management (BM), thus facilitating the digital transition. The Italian regulatory framework on infrastructure operation and maintenance (O&M) is complex and is constantly being updated. The current plan for implementing its guidelines envisages that infrastructure managers, also on a regional scale, equip themselves with their own digital database for BM. Within this context, this research proposes an integrated methodology that collects information derived from project documentation, in situ inspections, digital surveys, and monitoring and field tests in a queryable database for digitalising, georeferencing, and creating models of many bridges. Structured query language (SQL) statements are used to efficiently export specific shared information, enabling network cross-analysis. Furthermore, the database represents the source of a geographic information system (GIS) catalogue and the basis for deriving models for building information modelling (BIM). The methodology focuses on the infrastructural context of the Lazio region, Italy, the first beneficiary of the research.

Journal Article

Share this book

Add to My Shelf

A systematic survey of LLM-based text-to-SQL: methodologies, security vulnerabilities, and future challenges

by Ngo, Son Tung , Bui, Chien Dinh , Nguyen, Ha Hoang in Accuracy , Analysis , Cost effectiveness

2026

Text-to-Structured Query Language (SQL) systems, which allow users to query databases using natural language, have advanced significantly with the rise of Large Language Models (LLMs). While this progress has boosted accuracy, it has also introduced serious security and practical deployment challenges that existing literature has not systematically analyzed. In this article, we systematically review the existing literature to examine various approaches, identify security vulnerabilities, discuss design trade-offs, and outline future challenges. Our goal is to offer a comprehensive overview. Our analysis shows that modern approaches fall into two main categories: prompt engineering on proprietary models and fine-tuning open-source models. Regarding security, using the Open Worldwide Application Security Project (OWASP) Top 10 framework, we identify critical threats such as Prompt Injection (P2SQL), data poisoning to create backdoors, and inference attacks. The analysis reveals that current defense mechanisms are not effective enough against these attacks. We also highlight the strategic trade-off between the superior accuracy of proprietary models and the control, security, and cost-effectiveness of open-source models. This survey provides a systematic analysis of security vulnerabilities in LLM-based Text-to-SQL systems, concluding that current countermeasures are inadequate. Our findings point to an urgent research direction: developing systems that are not only accurate but also robust, efficient, and fundamentally secure for reliable real-world deployment.

Journal Article

Share this book

Add to My Shelf

Extraction and processing of intensive care chart data from a patient data management system

by Meybohm, Patrick , Ertl, Maximilian , Schmid, Benedikt in Access control , anaesthesia , Blood pressure

2026

Routine clinical data captured in Patient Data Management Systems (PDMS) in intensive care and perioperative settings are an invaluable resource for clinical research. However, the proprietary, fragmented, and transaction-oriented architecture of many systems severely limits secondary data use and requires extensive Extract, Transform, and Load (ETL) processing. We developed a modular, Python-based ETL framework that enables flexible, domain-specific extraction of high-frequency, multimodal PDMS data. The system provides reusable components for data retrieval, preprocessing, harmonization, and de-identification, allowing extraction methods to be adapted or extended without modifying the core architecture. Each clinical domain is represented through dedicated Pydantic models enforcing consistent output schemas, type constraints, and automated plausibility checks. SQLAlchemy abstracts database access, while structured preprocessing logic resolves common documentation inconsistencies and transforms heterogeneous PDMS entries into standardized representations. The framework produces reproducible, analysis-ready datasets through a transparent, auditable workflow. An integrated audit logger records extraction parameters, transformations, and derived fields, providing full traceability. Salted, irreversible pseudonymization is embedded directly into the pipeline, supporting compliance with the European General Data Protection Regulation (GDPR; German: Datenschutz-Grundverordnung, DSGVO) and Art. 27 of the Bayerisches Krankenhausgesetz (BayKrG). By encapsulating extraction logic in modular processing units with consistent validation and automated de-identification, the system replaces complex queries with standardized, maintainable, and research-ready processes. The presented framework overcomes substantial technical and regulatory barriers to the secondary use of PDMS data by operationalizing a governance-first extraction pipeline. Its modular architecture encapsulates site-specific PDMS queries in a bounded adapter layer, while keeping validation, pseudonymization, and audit logging portable and reusable across domains and installations. By embedding domain-level validation models, irreversible pseudonymization, and structured auditing, the framework enables reproducible, governance-compliant access to high-frequency intensive care data. Rather than requiring immediate alignment to a common data model, it provides a pragmatic foundation on which semantic and syntactic interoperability can be added incrementally as requirements and resources evolve.

Journal Article

Share this book

Add to My Shelf

Enhancing text-to-structured query language translation for seamless electronic medical record access

by Balasubramanian, Saravana Balaji , Vadivel, Dhanushkumar , Bakthavatchalu, Gomathi in Accuracy , Adaptability , Adaptation

2026

Traditional models for natural language-to-SQL translation in Electronic Medical Record (EMR) systems struggle with understanding medical terminology, handling complex queries, and bridging the syntax-semantics gap, leading to scalability and accuracy issues. Advanced solutions like Large Language Model (LLM) based approaches address these challenges by leveraging deep learning and domain-specific training to enhance performance and usability. Hence, this article introduces an advanced medical Text-to-Structured Query Language (SQL) paradigm that simplifies accessing EMRs by translating natural language queries into SQL commands. This model is built on the advanced Code-T5 (Text-to-Text Transfer Transformer) architecture, further enhanced with Low-Rank Adaptation (LoRA) and Quantized Low-Rank Adaptation (QLoRA) techniques; it effectively addresses the challenges posed by the complexity of traditional SQL queries enabling seamless access to critical healthcare data. The innovation of the proposed model lies in its exceptional performance across multiple evaluation metrics. It achieves a Bilingual Evaluation Understudy (BLEU) score of 81.68, significantly outperforming leading models like T5, Fine-Tuned Language Net (FLAN) T5, and Bidirectional and Auto-Regressive Transformers (BART) while excelling in Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, underscoring its proficiency in generating semantically accurate and coherent SQL queries. Furthermore, the proposed model attains a high token-level F1-score, ensuring a balanced precision and recall and a Jaccard similarity score of 0.83, surpassing T5, Flan T5, and BART. The proposed model excels in handling complex medical queries, bridging natural language and SQL to empower data-driven decisions and advance medical informatics.

Journal Article

Share this book

Add to My Shelf

Medical nearest-word embedding technique implemented using an unsupervised machine learning approach for Bengali language

by Choudhury, Tanupriya , Arya, Pradeep Kumar , Vishnu, Devraj in Health care , Information retrieval , Language

2024

The rapid growth of natural language processing (NLP) applications, such as text summarization, speech recognition, information extraction, and machine translation, has led to the development of structured query language (SQL) for extracting information from structured data. However, due to limited resources, converting Natural Language (NL) queries to SQL in Bengali is challenging. This article proposes an unsupervised machine learning model to find semantically Bengali closed words that can generate SQL from NL queries in Bengali. The main objective of the proposed system is to provide support in the creation of patient-oriented explanations and educational resources by simplifying intricate medical terminology. The major findings of the proposed system are as follows: The use of machine translation in the field of medicine facilitates the dissemination of healthcare information to a diverse international audience and improves the performance of entity recognition tasks, including the identification of medical conditions, drugs, or procedures within clinical notes or electronic health data. This system allows a naive user to extract health-related information from a healthcare-structured database without any knowledge of SQL. The system accepts a query and generates a response according to the query in Bengali language. Query tokenization and stop word removal are carried out in the preprocessing stage, and unsupervised machine learning techniques are implemented to process the input query sentence. Tokenized words are converted into vectors using the skip-gram model, with noise-contrastive estimation (NCE) applied to discriminate between actual and irrelevant words. Stochastic gradient descent (SGD) optimizes the model by randomly choosing a small amount of data from the dataset and using cosine similarity to measure closer words. The semantically closer words are found using an unsupervised learning method to generate the SQL.

Journal Article

Share this book

Add to My Shelf

Teaching Case: SQL as a Tool for Civic Crime Analysis

by Sharma, Shwadhin in Ability , Arrests , Business analytics

2026

This case study engages students in the practice of civic data analysis by applying SQL to the Los Angeles Police Department's open crime datasets, covering the period from 2010 to the present. Participants step into the role of city analysts responsible for examining crime distribution, temporal rhythms, enforcement outcomes, and neighborhood variations. Through structured exercises, learners gain hands-on experience with SQL operations, including table creation, data cleaning, aggregation, and spatial joins. The project emphasizes how database queries can reveal actionable insights for policy discussions on public safety, community partnerships, and policing strategies. In doing so, students strengthen both their technical fluency in SQL and their ability to interpret real-world public data in a critical, applied context

Journal Article

Share this book

Add to My Shelf

Bridging SQL Mastery and Career Confidence for Undergraduate Students Through Simulated Job Interviews

by Albert, Leslie J , Chen, Yu , Zheng, Dailin in Business communication , Business students , Candidates

2025

Employers increasingly prioritize candidates who can solve real-world Structured Query Language (SQL) problems, particularly during technical interviews. However, many undergraduate students feel underprepared for these interviews because they have not engaged in the deep learning needed to apply SQL concepts confidently. Additionally, students often fail to recognize the career relevance of SQL skills. This Teaching Tip introduces an immersive SQL lesson designed to bridge the gap between conceptual learning and practical application. The lesson includes a mock SQL technical interview, where students apply their knowledge to solve real-world business problems, class discussions on SQL-related careers, and a post-interview debrief to foster reflection and feedback. Results from pre- and post-lesson surveys indicate significant benefits, including enhanced student confidence in their SQL knowledge, student intention to continue learning and using SQL in the future, and student confidence in their ability to perform well in real SQL interviews. Open-ended survey responses support these findings and further reveal that the SQL lesson positively impacts students by clarifying concepts, reinforcing learned skills, and demonstrating the applicability of SQL in real-world scenarios. This approach demonstrates a practical and scalable framework for integrating immersive professional experiences into technical coursework that may be adapted to different class types (e.g., adopting an abridged version) and different courses (e.g., data analysis).

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter