Catalogue Search | MBRL

Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes

by Kilchoer, Laurent , Aguilar, Pablo S. , Dessimoz, Christophe in Analysis , Bioinformatics , Biological activity

2020

Phylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require at least quadratic time as a function of the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles. We show that the approach outperforms Enhanced Phylogenetic Tree, a phylogeny-based method, and use the tool to reconstruct networks and query for interactors of the kinetochore complex as well as conserved proteins involved in sexual reproduction: Hap2, Spo11 and Gex1. HogProf enables large-scale phylogenetic profiling across the three domains of life, and will be useful to predict biological pathways among the hundreds of thousands of eukaryotic species that will become available in the coming few years. HogProf is available at

Journal Article

Share this book

Add to My Shelf

Deep Relevance Hashing for Remote Sensing Image Retrieval

by Liu, Xiaojie , Chen, Xiliang , Zhu, Guobin in Analysis , Archives & records , Artificial intelligence

2025

With the development of remote sensing technologies, the volume of remote sensing data is growing dramatically, making efficient management and retrieval of large-scale remote sensing images increasingly important. Recently, deep hashing for content-based remote sensing image retrieval (CBRSIR) has attracted significant attention due to its computational efficiency and high retrieval accuracy. Although great advancements have been achieved, the imbalance between easy and difficult image pairs during training often limits the model’s ability to capture complex similarities and degrades retrieval performance. Additionally, distinguishing images with the same Hamming distance but different categories remains a challenge during the retrieval phase. In this paper, we propose a novel deep relevance hashing (DRH) for remote sensing image retrieval, which consists of a global hash learning model (GHLM) and a local hash re-ranking model (LHRM). The goal of GHLM is to extract global features from RS images and generate compact hash codes for initial ranking. To achieve this, GHLM employs a deep convolutional neural network to extract discriminative representations. A weighted pairwise similarity loss is introduced to emphasize difficult image pairs and reduce the impact of easy ones during training. The LHRM predicts relevance scores for images that share the same Hamming distance with the query to reduce confusion in the retrieval stage. Specifically, we represent the retrieval list as a relevance matrix and employ a lightweight CNN model to learn the relevance scores of image pairs and refine the list. Experimental results on three benchmark datasets demonstrate that the proposed DRH method outperforms other deep hashing approaches, confirming its effectiveness in CBRSIR.

Journal Article

Share this book

Add to My Shelf

Geohash-Based High-Definition Map Provisioning System Using Smart RSU

by Park, Wangyu , Lee, Jimin , Moon, Changjoo in Algorithms , Communication , Comparative analysis

2025

High-definition (HD) maps are essential for safe and reliable autonomous driving, but their growing size and the need for real-time updates pose significant challenges for in-vehicle storage and communication efficiency. This study proposes a lightweight and scalable HD map provisioning system based on Geohash spatial indexing and Smart Roadside Units (Smart RSUs). The system divides HD map data into Geohash-based spatial blocks and enables vehicles to request only the map segments corresponding to their current location, reducing storage burden and communication load. To validate the system’s effectiveness, we constructed a simulation environment where multiple vehicle clients simultaneously request map data from a Smart RSU. Experimental results showed that the proposed Geohash-based approach achieved an average response time (RTT) of 1244.82 ms—approximately 296.3% faster than the conventional GPS-based spatial query method—and improved database query performance by 1072.6%. Additionally, we demonstrate the system’s scalability by adjusting Geohash levels according to road density, using finer blocks in urban areas and coarser blocks in rural areas. The hierarchical nature of Geohash also enables consistent integration of blocks with different resolutions. These results confirm that the proposed method provides an efficient and real-time HD map delivery framework suitable for dynamic and dense traffic environments.

Journal Article

Share this book

Add to My Shelf

S-conLSH: alignment-free gapped mapping of noisy long reads

by Chakraborty, Angana , Bandyopadhyay, Sanghamitra , Morgenstern, Burkhard in Algorithms , Alignment , Alignment-free sequence comparison

2021

Background The advancement of SMRT technology has unfolded new opportunities of genome analysis with its longer read length and low GC bias. Alignment of the reads to their appropriate positions in the respective reference genome is the first but costliest step of any analysis pipeline based on SMRT sequencing. However, the state-of-the-art aligners often fail to identify distant homologies due to lack of conserved regions, caused by frequent genetic duplication and recombination. Therefore, we developed a novel alignment-free method of sequence mapping that is fast and accurate. Results We present a new mapper called S-conLSH that uses S paced con text based L ocality S ensitive H ashing. With multiple spaced patterns, S-conLSH facilitates a gapped mapping of noisy long reads to the corresponding target locations of a reference genome. We have examined the performance of the proposed method on 5 different real and simulated datasets. S-conLSH is at least 2 times faster than the recently developed method lordFAST. It achieves a sensitivity of 99%, without using any traditional base-to-base alignment, on human simulated sequence data. By default, S-conLSH provides an alignment-free mapping in PAF format. However, it has an option of generating aligned output as SAM-file, if it is required for any downstream processing. Conclusions S-conLSH is one of the first alignment-free reference genome mapping tools achieving a high level of sensitivity. The spaced -context is especially suitable for extracting distant similarities. The variable-length spaced-seeds or patterns add flexibility to the proposed algorithm by introducing gapped mapping of the noisy long reads. Therefore, S-conLSH may be considered as a prominent direction towards alignment-free sequence analysis.

Journal Article

Share this book

Add to My Shelf

Enhanced Image Retrieval Using Multiscale Deep Feature Fusion in Supervised Hashing

by Belalia, Amina , Belloulata, Kamel , Redaoui, Adil in Accuracy , Algorithms , Analysis

2025

In recent years, deep-network-based hashing has gained prominence in image retrieval for its ability to generate compact and efficient binary representations. However, most existing methods predominantly focus on high-level semantic features extracted from the final layers of networks, often neglecting structural details that are crucial for capturing spatial relationships within images. Achieving a balance between preserving structural information and maximizing retrieval accuracy is the key to effective image hashing and retrieval. To address this challenge, we introduce Multiscale Deep Feature Fusion for Supervised Hashing (MDFF-SH), a novel approach that integrates multiscale feature fusion into the hashing process. The hallmark of MDFF-SH lies in its ability to combine low-level structural features with high-level semantic context, synthesizing robust and compact hash codes. By leveraging multiscale features from multiple convolutional layers, MDFF-SH ensures the preservation of fine-grained image details while maintaining global semantic integrity, achieving a harmonious balance that enhances retrieval precision and recall. Our approach demonstrated a superior performance on benchmark datasets, achieving significant gains in the Mean Average Precision (MAP) compared with the state-of-the-art methods: 9.5% on CIFAR-10, 5% on NUS-WIDE, and 11.5% on MS-COCO. These results highlight the effectiveness of MDFF-SH in bridging structural and semantic information, setting a new standard for high-precision image retrieval through multiscale feature fusion.

Journal Article

Share this book

Add to My Shelf

BarWare: efficient software tools for barcoded single-cell genomics

by Reading, Julian , Graybuck, Lucas T. , Swanson, Elliott in Algorithms , Analysis , Automation

2022

Background Barcode-based multiplexing methods can be used to increase throughput and reduce batch effects in large single-cell genomics studies. Despite advantages in flexibility of sample collection and scale, there are additional complications in the data deconvolution steps required to assign each cell to their originating samples. Results To meet computational needs for efficient sample deconvolution, we developed the tools BarCounter and BarMixer that compute barcode counts and deconvolute mixed single-cell data into sample-specific files, respectively. Together, these tools are implemented as the BarWare pipeline to support demultiplexing from large sequencing projects with many wells of hashed 10x Genomics scRNA-seq data. Conclusions BarWare is a modular set of tools linked by shell scripting: BarCounter, a computationally efficient barcode sequence quantification tool implemented in C; and BarMixer, an R package for identification of barcoded populations, merging barcoded data from multiple wells, and quality-control reporting related to scRNA-seq data. These tools and a self-contained implementation of the pipeline are freely available for non-commercial use at https://github.com/AllenInstitute/BarWare-pipeline .

Journal Article

Share this book

Add to My Shelf

Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data

by Tenzer, Stefan , Hildebrandt, Andreas , Bob, Konstantin in Algorithms , Analysis , Availability

2022

Background Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties. Results In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs. Conclusions Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data. Availability Generated data and code are available at https://github.com/hildebrandtlab/mzBucket . Raw data is available at https://zenodo.org/record/5036526 .

Journal Article

Share this book

Add to My Shelf

Hashes are not suitable to verify fixity of the public archived web

by Aturban, Mohamed , Weigle, Michele C. , Klein, Martin in Analysis , Archives , Archives & records

2023

Web archives, such as the Internet Archive, preserve the web and allow access to prior states of web pages. We implicitly trust their versions of archived pages, but as their role moves from preserving curios of the past to facilitating present day adjudication, we are concerned with verifying the fixity of archived web pages, or mementos, to ensure they have always remained unaltered. A widely used technique in digital preservation to verify the fixity of an archived resource is to periodically compute a cryptographic hash value on a resource and then compare it with a previous hash value. If the hash values generated on the same resource are identical, then the fixity of the resource is verified. We tested this process by conducting a study on 16,627 mementos from 17 public web archives. We replayed and downloaded the mementos 39 times using a headless browser over a period of 442 days and generated a hash for each memento after each download, resulting in 39 hashes per memento. The hash is calculated by including not only the content of the base HTML of a memento but also all embedded resources, such as images and style sheets. We expected to always observe the same hash for a memento regardless of the number of downloads. However, our results indicate that 88.45% of mementos produce more than one unique hash value, and about 16% (or one in six) of those mementos always produce different hash values. We identify and quantify the types of changes that cause the same memento to produce different hashes. These results point to the need for defining an archive-aware hashing function, as conventional hashing functions are not suitable for replayed archived web pages.

Journal Article

Share this book

Add to My Shelf

Random Number Generators: Principles and Applications

by Petroudis, Georgios , Nastou, Panagiotis E. , Bikos, Anastasios in Algorithms , Congruences , cryptographic key generation

2023

In this paper, we present approaches to generating random numbers, along with potential applications. Rather than trying to provide extensive coverage of several techniques or algorithms that have appeared in the scientific literature, we focus on some representative approaches, presenting their workings and properties in detail. Our goal is to delineate their strengths and weaknesses, as well as their potential application domains, so that the reader can judge what would be the best approach for the application at hand, possibly a combination of the available approaches. For instance, a physical source of randomness can be used for the initial seed; then, suitable preprocessing can enhance its randomness; then, the output of preprocessing can feed different types of generators, e.g., a linear congruential generator, a cryptographically secure one and one based on the combination of one-way hash functions and shared key cryptoalgorithms in various modes of operation. Then, if desired, the outputs of the different generators can be combined, giving the final random sequence. Moreover, we present a set of practical randomness tests that can be applied to the outputs of random number generators in order to assess their randomness characteristics. In order to demonstrate the importance of unpredictable random sequences, we present an application of cryptographically secure generators in domains where unpredictability is one of the major requirements, i.e., eLotteries and cryptographic key generation.

Journal Article

Share this book

Add to My Shelf

Perceptual Image Hashing Fusing Zernike Moments and Saliency-Based Local Binary Patterns

by Wang, Tingting , Li, Wei , Liu, Kai in Accuracy , color vector angle , Data security

2025

This paper proposes a novel perceptual image hashing scheme that robustly combines global structural features with local texture information for image authentication. The method starts with image normalization and Gaussian filtering to ensure scale invariance and suppress noise. A saliency map is then generated from a color vector angle matrix using a frequency-tuned model to identify perceptually significant regions. Local Binary Pattern (LBP) features are extracted from this map to represent fine-grained textures, while rotation-invariant Zernike moments are computed to capture global geometric structures. These local and global features are quantized and concatenated into a compact binary hash. Extensive experiments on standard databases show that the proposed method outperforms state-of-the-art algorithms in both robustness against content-preserving manipulations and discriminability across different images. Quantitative evaluations based on ROC curves and AUC values confirm its superior robustness–uniqueness trade-off, demonstrating the effectiveness of the saliency-guided fusion of Zernike moments and LBP for reliable image hashing.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter