Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
1,082
result(s) for
"String matching"
Sort by:
Elastic-Degenerate String Matching with 1 Error or Mismatch
by
Sweering, Michelle
,
Gabory, Esteban
,
Pissis, Solon P
in
Algorithms
,
Combinatorial analysis
,
Computational geometry
2024
An elastic-degenerate (ED) string is a sequence of n finite sets of strings of total length N, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an O~(nmω-1)+O(N)-time algorithm [Bernardini et al., SIAM J. Comput. 2022], where ω denotes the matrix multiplication exponent and the O~(·) notation suppresses polylog factors. In the k-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most k errors. k-EDSM can be solved in O(k2mG+kN) time, under edit distance, or O(kmG+kN) time, under Hamming distance, where G denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, G is only bounded by N, and so even for k=1, the existing algorithms run in Ω(mN) time in the worst case. In this paper we make progress in this direction. We show that 1-EDSM can be solved in O((nm2+N)logm) or O(nm3+N) time under edit distance. For the decision version of the problem, we present a faster O(nm2logm+Nloglogm)-time algorithm. We also show that 1-EDSM can be solved in O(nm2+Nlogm) time under Hamming distance. Our algorithms for edit distance rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or 2d range emptiness), which we show how to solve efficiently. In order to obtain an even faster algorithm for Hamming distance, we rely on employing and adapting the k-errata trees for indexing with errors [Cole et al., STOC 2004]. This is an extended version of a paper presented at LATIN 2022.
Journal Article
Invariants and String Properties in the Analysis of the Knuth-Morris-Pratt Algorithm
2025
This paper is about the string-matching problem. We find the occurrences of a pattern inside a text. First, we formally define the problem and summarise its naive solution. Next, we analyse an efficient method, the Knuth-Morris-Pratt (KMP) algorithm. We prove the correctness of the KMP algorithm. We also analyse its efficiency. Our reasoning is based on the properties of the pattern and the text. It is also based on the invariant properties of KMP. In this way, we could develop an extremely compact and elegant proof. And the method of proving program correctness with the invariant properties of the program is already familiar to our students at our university.
Journal Article
A novel technique to prevent SQL injection and cross-site scripting attacks using Knuth-Morris-Pratt string match algorithm
by
Christiana, Abikoye Oluwakemi
,
Anthonia, Kayode Aderonke
,
Abubakar Abdullahi
in
Algorithms
,
Applications programs
,
Malware
2020
Structured Query Language (SQL) injection and cross-site scripting remain a major threat to data-driven web applications. Instances where hackers obtain unrestricted access to back-end database of web applications so as to steal, edit, and destroy confidential data are increasing. Therefore, measures must be put in place to curtail the growing threats of SQL injection and XSS attacks. This study presents a technique for detecting and preventing these threats using Knuth-Morris-Pratt (KMP) string matching algorithm. The algorithm was used to match user’s input string with the stored pattern of the injection string in order to detect any malicious code. The implementation was carried out using PHP scripting language and Apache XAMPP Server. The security level of the technique was measured using different test cases of SQL injection, cross-site scripting (XSS), and encoded injection attacks. Results obtained revealed that the proposed technique was able to successfully detect and prevent the attacks, log the attack entry in the database, block the system using its mac address, and also generate a warning message. Therefore, the proposed technique proved to be more effective in detecting and preventing SQL injection and XSS attacks
Journal Article
A method to convert non-numeric characters into numerical values in dynamic time warping for string matching
2021
Dynamic time warping (DTW) is one of the well-known algorithms for measuring similarity between two temporal sequences, and it can be used for character matching. It uses a distance of two character strings. However, since the characters are non-numeric, it must be assigned to numerical values to calculate a distance between two character strings. Therefore, in this paper, we propose a method to convert non-numeric characters into numerical values in dynamic time warping for string matching. The proposed method uses normalized correlation coefficient, and it makes DTW gives more accurate results. Experimental results show that the proposed method gives excellent results.
Journal Article
A Bidirectional Arabic Sign Language Framework Using Deep Learning and Fuzzy Matching Score
2024
Sign language is widely used to facilitate the communication process between deaf people and their surrounding environment. Sign language, like most other languages, is considered a complex language which cannot be mastered easily. Thus, technology can be used as an assistive tool to solve the difficulties and challenges that deaf people face during interactions with society. In this study, an automatic bidirectional translation framework for Arabic Sign Language (ArSL) is designed to assist both deaf and ordinary people to communicate and express themselves easily. Two main modules were intended to translate Arabic sign images into text by utilizing different transfer learning models and to translate the input text into Arabic sign images. A prototype was implemented based on the proposed framework by using several pre-trained convolutional neural network (CNN)-based deep learning models, including the DenseNet121, ResNet152, MobileNetV2, Xception, InceptionV3, NASNetLarge, VGG19, and VGG16 models. A fuzzy string matching score method, as a novel concept, was employed to translate the input text from ordinary people into appropriate sign language images. The dataset was constructed with specific criteria to obtain 7030 images for 14 classes captured from both deaf and ordinary people locally. The prototype was developed to conduct the experiments on the collected ArSL dataset using the utilized CNN deep learning models. The experimental results were evaluated using standard measurement metrics such as accuracy, precision, recall, and F1-score. The performance and efficiency of the ArSL prototype were assessed using a test set of an 80:20 splitting procedure, obtaining accuracy results from the highest to the lowest rates with average classification time in seconds for each utilized model, including (VGG16, 98.65%, 72.5), (MobileNetV2, 98.51%, 100.19), (VGG19, 98.22%, 77.16), (DenseNet121, 98.15%, 80.44), (Xception, 96.44%, 72.54), (NASNetLarge, 96.23%, 84.96), (InceptionV3, 94.31%, 76.98), and (ResNet152, 47.23%, 98.51). The fuzzy matching score is mathematically validated by computing the distance between the input and associative dictionary words. The study results showed the prototype’s ability to successfully translate Arabic sign images into Arabic text and vice versa, with the highest accuracy. This study proves the ability to develop a robust and efficient real-time bidirectional ArSL translation system using deep learning models and the fuzzy string matching score method.
Journal Article
Hybridizing Fuzzy String Matching and Machine Learning for Improved Ontology Alignment
by
Rudwan, Mohammed Suleiman Mohammed
,
Fonou-Dombeu, Jean Vincent
in
Accuracy
,
Algorithms
,
Alignment
2023
Ontology alignment has become an important process for identifying similarities and differences between ontologies, to facilitate their integration and reuse. To this end, fuzzy string-matching algorithms have been developed for strings similarity detection and have been used in ontology alignment. However, a significant limitation of existing fuzzy string-matching algorithms is their reliance on lexical/syntactic contents of ontology only, which do not capture semantic features of ontologies. To address this limitation, this paper proposed a novel method that hybridizes fuzzy string-matching algorithms and the Deep Bidirectional Transformer (BERT) deep learning model with three machine learning regression classifiers, namely, K-Nearest Neighbor Regression (kNN), Decision Tree Regression (DTR), and Support Vector Regression (SVR), to perform the alignment of ontologies. The use of the kNN, SVR, and DTR classifiers in the proposed method resulted in the building of three similarity models (SM), encoded SM-kNN, SM-SVR, and SM-DTR, respectively. The experiments were conducted on a dataset obtained from the anatomy track in the Ontology Alignment and Evaluation Initiative 2022 (OAEI 2022). The performances of the SM-kNN, SM-SVR, and SM-DTR models were evaluated using various metrics including precision, recall, F1-score, and accuracy at thresholds 0.70, 0.80, and 0.90, as well as error rates and running times. The experimental results revealed that the SM-SVR model achieved the best recall of 1.0, while the SM-DTR model exhibited the best precision, accuracy, and F1-score of 0.98, 0.97, and 0.98, respectively. Furthermore, the results showed that the SM-kNN, SM-SVR, and SM-DTR models outperformed state-of-the-art alignment systems that participated in the OAEI 2022 challenge, indicating the superior capability of the proposed method.
Journal Article
Boyer Moore string-match framework for a hybrid short message service spam filtering technique
by
Ojugo, Arnold Adimabua
,
Oyemade, David Ademola
in
Algorithms
,
Artificial neural networks
,
Deep learning
2021
Advances in technology and the proliferation of mobile device have continued to advance the ubiquitous nature of computing alongside their many prowess and improved features it brings as a disruptive technology to aid information sharing amongst many online users. This popularity, usage and adoption ease, mobility, and portability of the mobile smartphone devices have allowed for its acceptability and popularity. Mobile smartphones continue to adopt the use of short messages services accompanied with a scenario for spamming to thrive. Spams are unsolicited message or inappropriate contents. An effective spam filter studies are limited as short-text message service (SMS) are 140bytes, 160-characters, and rippled with abbreviation and slangs that further inhibits the effective training of models. The study proposes a string match algorithm used as deep learning ensemble on a hybrid spam filtering technique to normalize noisy features, expand text and use semantic dictionaries of disambiguation to train underlying learning heuristics and effectively classify SMS into legitimate and spam classes. Study uses a profile hidden Markov network to select and train the network structure and employs the deep neural network as a classifier network structure. Model achieves an accuracy of 97% with an error rate of 1.2%.
Journal Article
Entropy-Based Approach in Selection Exact String-Matching Algorithms
by
Zorić, Marija
,
Stipaničev, Darko
,
Markić, Ivan
in
algorithm efficiency
,
algorithm performance
,
Algorithms
2020
The string-matching paradigm is applied in every computer science and science branch in general. The existence of a plethora of string-matching algorithms makes it hard to choose the best one for any particular case. Expressing, measuring, and testing algorithm efficiency is a challenging task with many potential pitfalls. Algorithm efficiency can be measured based on the usage of different resources. In software engineering, algorithmic productivity is a property of an algorithm execution identified with the computational resources the algorithm consumes. Resource usage in algorithm execution could be determined, and for maximum efficiency, the goal is to minimize resource usage. Guided by the fact that standard measures of algorithm efficiency, such as execution time, directly depend on the number of executed actions. Without touching the problematics of computer power consumption or memory, which also depends on the algorithm type and the techniques used in algorithm development, we have developed a methodology which enables the researchers to choose an efficient algorithm for a specific domain. String searching algorithms efficiency is usually observed independently from the domain texts being searched. This research paper aims to present the idea that algorithm efficiency depends on the properties of searched string and properties of the texts being searched, accompanied by the theoretical analysis of the proposed approach. In the proposed methodology, algorithm efficiency is expressed through character comparison count metrics. The character comparison count metrics is a formal quantitative measure independent of algorithm implementation subtleties and computer platform differences. The model is developed for a particular problem domain by using appropriate domain data (patterns and texts) and provides for a specific domain the ranking of algorithms according to the patterns’ entropy. The proposed approach is limited to on-line exact string-matching problems based on information entropy for a search pattern. Meticulous empirical testing depicts the methodology implementation and purports soundness of the methodology.
Journal Article
Parallel String Matching with Linear Array, Butterfly and Divide and Conquer Models
by
Raju, S. Viswanadha
,
Reddy, K. K. V. V. S.
,
Rao, Chinta Someswara
in
Algorithms
,
Artificial Intelligence
,
Business and Management
2018
String Matching is a technique of searching a pattern in a text. It is the basic concept to extract the fruitful information from large volume of text, which is used in different applications like text processing, information retrieval, text mining, pattern recognition, DNA sequencing and data cleaning etc., . Though it is stated some of the simple mechanisms perform very well in practice, plenty of research has been published on the subject and research is still active in this area and there are ample opportunities to develop new techniques. For this purpose, this paper has proposed linear array based string matching, string matching with butterfly model and string matching with divide and conquer models for sequential and parallel environments. To assess the efficiency of the proposed models, the genome sequences of different sizes (10–100 Mb) are taken as input data set. The experimental results have shown that the proposed string matching algorithms performs very well compared to those of Brute force, KMP and Boyer moore string matching algorithms.
Journal Article
Toward Effective Aircraft Call Sign Detection Using Fuzzy String-Matching between ASR and ADS-B Data
by
Kasttet, Mohammed Saïd
,
Lyhyaoui, Abdelouahid
,
Zbakh, Douae
in
Accuracy
,
Acoustics
,
Air traffic control
2024
Recently, artificial intelligence and data science have witnessed dramatic progress and rapid growth, especially Automatic Speech Recognition (ASR) technology based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs). Consequently, new end-to-end Recurrent Neural Network (RNN) toolkits were developed with higher speed and accuracy that can often achieve a Word Error Rate (WER) below 10%. These toolkits can nowadays be deployed, for instance, within aircraft cockpits and Air Traffic Control (ATC) systems in order to identify aircraft and display recognized voice messages related to flight data, especially for airports not equipped with radar. Hence, the performance of air traffic controllers and pilots can ultimately be improved by reducing workload and stress and enforcing safety standards. Our experiment conducted at Tangier’s International Airport ATC aimed to build an ASR model that is able to recognize aircraft call signs in a fast and accurate way. The acoustic and linguistic models were trained on the Ibn Battouta Speech Corpus (IBSC), resulting in an unprecedented speech dataset with approved transcription that includes real weather aerodrome observation data and flight information with a call sign captured by an ADS-B receiver. All of these data were synchronized with voice recordings in a structured format. We calculated the WER to evaluate the model’s accuracy and compared different methods of dataset training for model building and adaptation. Despite the high interference in the VHF radio communication channel and fast-speaking conditions that increased the WER level to 20%, our standalone and low-cost ASR system with a trained RNN model, supported by the Deep Speech toolkit, was able to achieve call sign detection rate scores up to 96% in air traffic controller messages and 90% in pilot messages while displaying related flight information from ADS-B data using the Fuzzy string-matching algorithm.
Journal Article