Catalogue Search | MBRL

Elastic-Degenerate String Matching with 1 Error or Mismatch

by Sweering, Michelle , Gabory, Esteban , Pissis, Solon P in Algorithms , Combinatorial analysis , Computational geometry

2024

An elastic-degenerate (ED) string is a sequence of n finite sets of strings of total length N, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an O~(nmω-1)+O(N)-time algorithm [Bernardini et al., SIAM J. Comput. 2022], where ω denotes the matrix multiplication exponent and the O~(·) notation suppresses polylog factors. In the k-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most k errors. k-EDSM can be solved in O(k2mG+kN) time, under edit distance, or O(kmG+kN) time, under Hamming distance, where G denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, G is only bounded by N, and so even for k=1, the existing algorithms run in Ω(mN) time in the worst case. In this paper we make progress in this direction. We show that 1-EDSM can be solved in O((nm2+N)logm) or O(nm3+N) time under edit distance. For the decision version of the problem, we present a faster O(nm2logm+Nloglogm)-time algorithm. We also show that 1-EDSM can be solved in O(nm2+Nlogm) time under Hamming distance. Our algorithms for edit distance rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or 2d range emptiness), which we show how to solve efficiently. In order to obtain an even faster algorithm for Hamming distance, we rely on employing and adapting the k-errata trees for indexing with errors [Cole et al., STOC 2004]. This is an extended version of a paper presented at LATIN 2022.

Journal Article

Share this book

Add to My Shelf

Invariants and String Properties in the Analysis of the Knuth-Morris-Pratt Algorithm

by Ásványi, Tibor in Algorithms , Invariants , String matching

2025

This paper is about the string-matching problem. We find the occurrences of a pattern inside a text. First, we formally define the problem and summarise its naive solution. Next, we analyse an efficient method, the Knuth-Morris-Pratt (KMP) algorithm. We prove the correctness of the KMP algorithm. We also analyse its efficiency. Our reasoning is based on the properties of the pattern and the text. It is also based on the invariant properties of KMP. In this way, we could develop an extremely compact and elegant proof. And the method of proving program correctness with the invariant properties of the program is already familiar to our students at our university.

Journal Article

Share this book

Add to My Shelf

A novel technique to prevent SQL injection and cross-site scripting attacks using Knuth-Morris-Pratt string match algorithm

by Christiana, Abikoye Oluwakemi , Anthonia, Kayode Aderonke , Abubakar Abdullahi in Algorithms , Applications programs , Malware

2020

Structured Query Language (SQL) injection and cross-site scripting remain a major threat to data-driven web applications. Instances where hackers obtain unrestricted access to back-end database of web applications so as to steal, edit, and destroy confidential data are increasing. Therefore, measures must be put in place to curtail the growing threats of SQL injection and XSS attacks. This study presents a technique for detecting and preventing these threats using Knuth-Morris-Pratt (KMP) string matching algorithm. The algorithm was used to match user’s input string with the stored pattern of the injection string in order to detect any malicious code. The implementation was carried out using PHP scripting language and Apache XAMPP Server. The security level of the technique was measured using different test cases of SQL injection, cross-site scripting (XSS), and encoded injection attacks. Results obtained revealed that the proposed technique was able to successfully detect and prevent the attacks, log the attack entry in the database, block the system using its mac address, and also generate a warning message. Therefore, the proposed technique proved to be more effective in detecting and preventing SQL injection and XSS attacks

Journal Article

Share this book

Add to My Shelf

A method to convert non-numeric characters into numerical values in dynamic time warping for string matching

by Park, Hyun Jun in Accuracy , Algorithms , Correlation coefficients

2021

Dynamic time warping (DTW) is one of the well-known algorithms for measuring similarity between two temporal sequences, and it can be used for character matching. It uses a distance of two character strings. However, since the characters are non-numeric, it must be assigned to numerical values to calculate a distance between two character strings. Therefore, in this paper, we propose a method to convert non-numeric characters into numerical values in dynamic time warping for string matching. The proposed method uses normalized correlation coefficient, and it makes DTW gives more accurate results. Experimental results show that the proposed method gives excellent results.

Journal Article

Share this book

Add to My Shelf

A Bidirectional Arabic Sign Language Framework Using Deep Learning and Fuzzy Matching Score

by Gumaei, Abdu H. , Al-Qahtani, Manal , Mosleh, Mogeeb A. A. in Accuracy , Algorithms , Analysis

2024

Sign language is widely used to facilitate the communication process between deaf people and their surrounding environment. Sign language, like most other languages, is considered a complex language which cannot be mastered easily. Thus, technology can be used as an assistive tool to solve the difficulties and challenges that deaf people face during interactions with society. In this study, an automatic bidirectional translation framework for Arabic Sign Language (ArSL) is designed to assist both deaf and ordinary people to communicate and express themselves easily. Two main modules were intended to translate Arabic sign images into text by utilizing different transfer learning models and to translate the input text into Arabic sign images. A prototype was implemented based on the proposed framework by using several pre-trained convolutional neural network (CNN)-based deep learning models, including the DenseNet121, ResNet152, MobileNetV2, Xception, InceptionV3, NASNetLarge, VGG19, and VGG16 models. A fuzzy string matching score method, as a novel concept, was employed to translate the input text from ordinary people into appropriate sign language images. The dataset was constructed with specific criteria to obtain 7030 images for 14 classes captured from both deaf and ordinary people locally. The prototype was developed to conduct the experiments on the collected ArSL dataset using the utilized CNN deep learning models. The experimental results were evaluated using standard measurement metrics such as accuracy, precision, recall, and F1-score. The performance and efficiency of the ArSL prototype were assessed using a test set of an 80:20 splitting procedure, obtaining accuracy results from the highest to the lowest rates with average classification time in seconds for each utilized model, including (VGG16, 98.65%, 72.5), (MobileNetV2, 98.51%, 100.19), (VGG19, 98.22%, 77.16), (DenseNet121, 98.15%, 80.44), (Xception, 96.44%, 72.54), (NASNetLarge, 96.23%, 84.96), (InceptionV3, 94.31%, 76.98), and (ResNet152, 47.23%, 98.51). The fuzzy matching score is mathematically validated by computing the distance between the input and associative dictionary words. The study results showed the prototype’s ability to successfully translate Arabic sign images into Arabic text and vice versa, with the highest accuracy. This study proves the ability to develop a robust and efficient real-time bidirectional ArSL translation system using deep learning models and the fuzzy string matching score method.

Journal Article

Share this book

Add to My Shelf

Hybridizing Fuzzy String Matching and Machine Learning for Improved Ontology Alignment

by Rudwan, Mohammed Suleiman Mohammed , Fonou-Dombeu, Jean Vincent in Accuracy , Algorithms , Alignment

2023

Ontology alignment has become an important process for identifying similarities and differences between ontologies, to facilitate their integration and reuse. To this end, fuzzy string-matching algorithms have been developed for strings similarity detection and have been used in ontology alignment. However, a significant limitation of existing fuzzy string-matching algorithms is their reliance on lexical/syntactic contents of ontology only, which do not capture semantic features of ontologies. To address this limitation, this paper proposed a novel method that hybridizes fuzzy string-matching algorithms and the Deep Bidirectional Transformer (BERT) deep learning model with three machine learning regression classifiers, namely, K-Nearest Neighbor Regression (kNN), Decision Tree Regression (DTR), and Support Vector Regression (SVR), to perform the alignment of ontologies. The use of the kNN, SVR, and DTR classifiers in the proposed method resulted in the building of three similarity models (SM), encoded SM-kNN, SM-SVR, and SM-DTR, respectively. The experiments were conducted on a dataset obtained from the anatomy track in the Ontology Alignment and Evaluation Initiative 2022 (OAEI 2022). The performances of the SM-kNN, SM-SVR, and SM-DTR models were evaluated using various metrics including precision, recall, F1-score, and accuracy at thresholds 0.70, 0.80, and 0.90, as well as error rates and running times. The experimental results revealed that the SM-SVR model achieved the best recall of 1.0, while the SM-DTR model exhibited the best precision, accuracy, and F1-score of 0.98, 0.97, and 0.98, respectively. Furthermore, the results showed that the SM-kNN, SM-SVR, and SM-DTR models outperformed state-of-the-art alignment systems that participated in the OAEI 2022 challenge, indicating the superior capability of the proposed method.

Journal Article

Share this book

Add to My Shelf

Boyer Moore string-match framework for a hybrid short message service spam filtering technique

by Ojugo, Arnold Adimabua , Oyemade, David Ademola in Algorithms , Artificial neural networks , Deep learning

2021

Advances in technology and the proliferation of mobile device have continued to advance the ubiquitous nature of computing alongside their many prowess and improved features it brings as a disruptive technology to aid information sharing amongst many online users. This popularity, usage and adoption ease, mobility, and portability of the mobile smartphone devices have allowed for its acceptability and popularity. Mobile smartphones continue to adopt the use of short messages services accompanied with a scenario for spamming to thrive. Spams are unsolicited message or inappropriate contents. An effective spam filter studies are limited as short-text message service (SMS) are 140bytes, 160-characters, and rippled with abbreviation and slangs that further inhibits the effective training of models. The study proposes a string match algorithm used as deep learning ensemble on a hybrid spam filtering technique to normalize noisy features, expand text and use semantic dictionaries of disambiguation to train underlying learning heuristics and effectively classify SMS into legitimate and spam classes. Study uses a profile hidden Markov network to select and train the network structure and employs the deep neural network as a classifier network structure. Model achieves an accuracy of 97% with an error rate of 1.2%.

Journal Article

Share this book

Add to My Shelf

Entropy-Based Approach in Selection Exact String-Matching Algorithms

by Zorić, Marija , Stipaničev, Darko , Markić, Ivan in algorithm efficiency , algorithm performance , Algorithms

2020

The string-matching paradigm is applied in every computer science and science branch in general. The existence of a plethora of string-matching algorithms makes it hard to choose the best one for any particular case. Expressing, measuring, and testing algorithm efficiency is a challenging task with many potential pitfalls. Algorithm efficiency can be measured based on the usage of different resources. In software engineering, algorithmic productivity is a property of an algorithm execution identified with the computational resources the algorithm consumes. Resource usage in algorithm execution could be determined, and for maximum efficiency, the goal is to minimize resource usage. Guided by the fact that standard measures of algorithm efficiency, such as execution time, directly depend on the number of executed actions. Without touching the problematics of computer power consumption or memory, which also depends on the algorithm type and the techniques used in algorithm development, we have developed a methodology which enables the researchers to choose an efficient algorithm for a specific domain. String searching algorithms efficiency is usually observed independently from the domain texts being searched. This research paper aims to present the idea that algorithm efficiency depends on the properties of searched string and properties of the texts being searched, accompanied by the theoretical analysis of the proposed approach. In the proposed methodology, algorithm efficiency is expressed through character comparison count metrics. The character comparison count metrics is a formal quantitative measure independent of algorithm implementation subtleties and computer platform differences. The model is developed for a particular problem domain by using appropriate domain data (patterns and texts) and provides for a specific domain the ranking of algorithms according to the patterns’ entropy. The proposed approach is limited to on-line exact string-matching problems based on information entropy for a search pattern. Meticulous empirical testing depicts the methodology implementation and purports soundness of the methodology.

Journal Article

Share this book

Add to My Shelf

Parallel String Matching with Linear Array, Butterfly and Divide and Conquer Models

by Raju, S. Viswanadha , Reddy, K. K. V. V. S. , Rao, Chinta Someswara in Algorithms , Artificial Intelligence , Business and Management

2018

String Matching is a technique of searching a pattern in a text. It is the basic concept to extract the fruitful information from large volume of text, which is used in different applications like text processing, information retrieval, text mining, pattern recognition, DNA sequencing and data cleaning etc., . Though it is stated some of the simple mechanisms perform very well in practice, plenty of research has been published on the subject and research is still active in this area and there are ample opportunities to develop new techniques. For this purpose, this paper has proposed linear array based string matching, string matching with butterfly model and string matching with divide and conquer models for sequential and parallel environments. To assess the efficiency of the proposed models, the genome sequences of different sizes (10–100 Mb) are taken as input data set. The experimental results have shown that the proposed string matching algorithms performs very well compared to those of Brute force, KMP and Boyer moore string matching algorithms.

Journal Article

Share this book

Add to My Shelf

Toward Effective Aircraft Call Sign Detection Using Fuzzy String-Matching between ASR and ADS-B Data

by Kasttet, Mohammed Saïd , Lyhyaoui, Abdelouahid , Zbakh, Douae in Accuracy , Acoustics , Air traffic control

2024

Recently, artificial intelligence and data science have witnessed dramatic progress and rapid growth, especially Automatic Speech Recognition (ASR) technology based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs). Consequently, new end-to-end Recurrent Neural Network (RNN) toolkits were developed with higher speed and accuracy that can often achieve a Word Error Rate (WER) below 10%. These toolkits can nowadays be deployed, for instance, within aircraft cockpits and Air Traffic Control (ATC) systems in order to identify aircraft and display recognized voice messages related to flight data, especially for airports not equipped with radar. Hence, the performance of air traffic controllers and pilots can ultimately be improved by reducing workload and stress and enforcing safety standards. Our experiment conducted at Tangier’s International Airport ATC aimed to build an ASR model that is able to recognize aircraft call signs in a fast and accurate way. The acoustic and linguistic models were trained on the Ibn Battouta Speech Corpus (IBSC), resulting in an unprecedented speech dataset with approved transcription that includes real weather aerodrome observation data and flight information with a call sign captured by an ADS-B receiver. All of these data were synchronized with voice recordings in a structured format. We calculated the WER to evaluate the model’s accuracy and compared different methods of dataset training for model building and adaptation. Despite the high interference in the VHF radio communication channel and fast-speaking conditions that increased the WER level to 20%, our standalone and low-cost ASR system with a trained RNN model, supported by the Deep Speech toolkit, was able to achieve call sign detection rate scores up to 96% in air traffic controller messages and 90% in pilot messages while displaying related flight information from ADS-B data using the Fuzzy string-matching algorithm.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter