Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
153
result(s) for
"Locality sensitive hashing"
Sort by:
Propagation kernels: efficient graph kernels from propagated information
by
Kersting, Kristian
,
Neumann, Marion
,
Garnett, Roman
in
Algorithms
,
Artificial Intelligence
,
Computer Science
2016
We introduce
propagation kernels
, a general graph-kernel framework for efficiently measuring the similarity of structured data. Propagation kernels are based on monitoring how information spreads through a set of given graphs. They leverage early-stage distributions from propagation schemes such as random walks to capture structural information encoded in node labels, attributes, and edge information. This has two benefits. First, off-the-shelf propagation schemes can be used to naturally construct kernels for many graph types, including labeled, partially labeled, unlabeled, directed, and attributed graphs. Second, by leveraging existing efficient and informative propagation schemes, propagation kernels can be considerably faster than state-of-the-art approaches without sacrificing predictive performance. We will also show that if the graphs at hand have a regular structure, for instance when modeling image or video data, one can exploit this regularity to scale the kernel computation to large databases of graphs with thousands of nodes. We support our contributions by exhaustive experiments on a number of real-world graphs from a variety of application domains.
Journal Article
One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome
by
Reymond, Jean-Louis
,
Probst, Daniel
,
Capecchi, Alice
in
Analogs
,
Benchmarks
,
Big Data in Chemistry
2020
Background
Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure fingerprints perform best for small molecules such as drugs, while atom-pair fingerprints are preferable for large molecules such as peptides. However, no available fingerprint achieves good performance on both classes of molecules.
Results
Here we set out to design a new fingerprint suitable for both small and large molecules by combining substructure and atom-pair concepts. Our quest resulted in a new fingerprint called MinHashed atom-pair fingerprint up to a diameter of four bonds (MAP4). In this fingerprint the circular substructures with radii of
r
= 1 and
r
= 2 bonds around each atom in an atom-pair are written as two pairs of SMILES, each pair being combined with the topological distance separating the two central atoms. These so-called atom-pair molecular shingles are hashed, and the resulting set of hashes is MinHashed to form the MAP4 fingerprint. MAP4 significantly outperforms all other fingerprints on an extended benchmark that combines the Riniker and Landrum small molecule benchmark with a peptide benchmark recovering BLAST analogs from either scrambled or point mutation analogs. MAP4 furthermore produces well-organized chemical space tree-maps (TMAPs) for databases as diverse as DrugBank, ChEMBL, SwissProt and the Human Metabolome Database (HMBD), and differentiates between all metabolites in HMBD, over 70% of which are indistinguishable from their nearest neighbor using substructure fingerprints.
Conclusion
MAP4 is a new molecular fingerprint suitable for drugs, biomolecules, and the metabolome and can be adopted as a universal fingerprint to describe and search chemical space. The source code is available at
https://github.com/reymond-group/map4
and interactive MAP4 similarity search tools and TMAPs for various databases are accessible at
http://map-search.gdb.tools/
and
http://tm.gdb.tools/map4/
.
Journal Article
LSH-aware multitype health data prediction with privacy preservation in edge environment
2022
With the increasing development of electronic technology, traditional paper-driven medical systems have been converting to efficient electronic records that can be easily checked and transmitted. However, due to system updating and equipment failure, missing data problems are very common in the healthcare field. Health data can help people evaluate their health status and adjust their fitness. Therefore, predicting missing health data is a current pressing task. There are two challenges when predicting missing data: (1) people’s health data are complex. The data contain multiple data types (such as continuous data, discrete data and Boolean data) and (2) privacy issues are raised at the edge because huge amounts of health data are published while the edge devices can only provide limited computing and storage resources. Therefore, a novel multitype health data privacy-aware prediction approach based on locality-sensitive hashing is proposed in this paper. Through locality-sensitive hashing, our proposed method can realize a good tradeoff between prediction accuracy and privacy preservation. Finally, through a set of experiments deployed on the WISDM dataset, we verify the validity of our approach in dealing with multitype data and attaining user privacy.
Journal Article
LSH-GANSAD: A Spectrum Anomaly Detection Model Based on Local Sensitive Hashing and Generative Adversarial Networks
2025
Spectrum anomaly detection is critical for ensuring the security of wireless communications. However, traditional spectrum and time-series analysis models often suffer from the curse of dimensionality due to the high-dimensional nature of communication data sampled from complex systems. While deep learning-based approaches have shown promise, existing methods typically rely on a simple supervised framework that requires a large volume of labelled anomalous data. This reliance is problematic, as it is often impractical to obtain sufficient anomalous samples in real-world scenarios. To address this issue, we propose LSH-GANSAD, an unsupervised Generative Adversarial Network (GAN) for communication spectrum anomaly detection, driven by Local Sensitivity Hashing (LSH). First, we integrate the LSH module to generate hash space embeddings, which mitigates the curse of dimensionality. Next, we introduce a novel hashing-based latent space constraint grounded in the manifold hypothesis, along with an auxiliary anomaly indicator to enhance representation learning. Our approach utilizes unsupervised adversarial training, eliminating the need for anomalous samples while still achieving high-precision anomaly detection. Experimental results on a real-world communication spectrum dataset demonstrate that LSH-GANSAD achieves an average accuracy exceeding 95% across nine complex anomaly types, highlighting the scalability of our model for real-world spectrum monitoring applications.
Journal Article
An Exception Handling Approach for Privacy-Preserving Service Recommendation Failure in a Cloud Environment
by
Qi, Lianyong
,
Xu, Xiaolong
,
Meng, Shunmei
in
converse Locality-Sensitive Hashing
,
exception handling
,
failure
2018
Service recommendation has become an effective way to quickly extract insightful information from massive data. However, in the cloud environment, the quality of service (QoS) data used to make recommendation decisions are often monitored by distributed sensors and stored in different cloud platforms. In this situation, integrating these distributed data (monitored by remote sensors) across different platforms while guaranteeing user privacy is an important but challenging task, for the successful service recommendation in the cloud environment. Locality-Sensitive Hashing (LSH) is a promising way to achieve the abovementioned data integration and privacy-preservation goals, while current LSH-based recommendation studies seldom consider the possible recommendation failures and hence reduce the robustness of recommender systems significantly. In view of this challenge, we develop a new LSH variant, named converse LSH, and then suggest an exception handling approach for recommendation failures based on the converse LSH technique. Finally, we conduct several simulated experiments based on the well-known dataset, i.e., Movielens to prove the effectiveness and efficiency of our approach.
Journal Article
A probabilistic molecular fingerprint for big data settings
by
Reymond, Jean-Louis
,
Probst, Daniel
in
Algorithms
,
Analysis
,
Approximate k-nearest neighbor search
2018
Background
Among the various molecular fingerprints available to describe small organic molecules, extended connectivity fingerprint, up to four bonds (ECFP4) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires high dimensional representations (≥ 1024D) to perform well, resulting in ECFP4 nearest neighbor searches in very large databases such as GDB, PubChem or ZINC to perform very slowly due to the curse of dimensionality.
Results
Herein we report a new fingerprint, called MinHash fingerprint, up to six bonds (MHFP6), which encodes detailed substructures using the extended connectivity principle of ECFP in a fundamentally different manner, increasing the performance of exact nearest neighbor searches in benchmarking studies and enabling the application of locality sensitive hashing (LSH) approximate nearest neighbor search algorithms. To describe a molecule, MHFP6 extracts the SMILES of all circular substructures around each atom up to a diameter of six bonds and applies the MinHash method to the resulting set. MHFP6 outperforms ECFP4 in benchmarking analog recovery studies. By leveraging locality sensitive hashing, LSH approximate nearest neighbor search methods perform as well on unfolded MHFP6 as comparable methods do on folded ECFP4 fingerprints in terms of speed and relative recovery rate, while operating in very sparse and high-dimensional binary chemical space.
Conclusion
MHFP6 is a new molecular fingerprint, encoding circular substructures, which outperforms ECFP4 for analog searches while allowing the direct application of locality sensitive hashing algorithms. It should be well suited for the analysis of large databases. The source code for MHFP6 is available on GitHub (
https://github.com/reymond-group/mhfp
).
Journal Article
A robust method based on locality sensitive hashing for K-nearest neighbors searching
by
Zhang, Sulan
,
Wu, Quanwang
,
Cheng, Dongdong
in
Communications Engineering
,
Computer Communication Networks
,
Electrical Engineering
2024
K-nearest neighbors searching (KNNS) is to find
K
-nearest neighbors for query points. It is a primary problem in clustering analysis, classification, outlier detection and pattern recognition, and has been widely used in various applications. The exact searching algorithms, like KD-tree, M-tree, are not suitable for high-dimensional data. Approximate KNNS algorithms for high-dimensional data based on locality sensitive hashing (LSH) is becoming popular. However, the existing searching strategies are sensitive to the parameters of constructing LSH index. To solve this problem, a robust strategy for KNNS, called Robust-LSH, is proposed. It makes full use of points that frequently appear together with the query points to improve the diversity of candidates, so that it can use fewer hash tables to obtain more valuable candidates for KNNS. We do experiments on synthetic and real data. The results show that in terms of searching accuracy and running time, Robust-LSH has better performance than the p-stable LSH, RLSH and KD-tree algorithms.
Journal Article
Query-aware locality-sensitive hashing scheme for lp norm
2017
The problem of c-Approximate Nearest Neighbor (c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes to tackle the c-ANN search problem. Traditionally, LSH functions are constructed in a query-oblivious manner, in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer c≥2 . In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the “anchor” for bucket partition. Accordingly, a query-aware LSH function under a specific lp norm with p∈(0,2] is a random projection coupled with query-aware bucket partition, which removes random shift required by traditional query-oblivious LSH functions. The query-aware bucket partitioning strategy can be easily implemented so that query performance is guaranteed. For each lp norm (p∈(0,2]) , based on the corresponding p-stable distribution, we propose a novel LSH scheme named query-aware LSH (QALSH) for c-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio c>1 . In addition, we propose a heuristic variant named QALSH + to improve the scalability of QALSH. Extensive experiments show that QALSH and QALSH + outperform the state-of-the-art schemes, especially in high-dimensional space. Specifically, by using a ratio c<2 , QALSH can achieve much better query quality.
Journal Article
Unbalanced power anomaly detection model based on improved transformer and countermeasure encoder
2025
Current intelligent grid anomaly detection faces challenges such as low minority-class recognition due to imbalanced data, high computational complexity in long-sequence processing, and model bias from scarce anomaly samples. To address these, we propose a hybrid architecture combining an enhanced Transformer with an Adversarial Autoencoder (AAE). We introduce a Locality-Sensitive Hashing (LSH) attention mechanism using Focal Loss with Temperature (FLT) to cluster similar features. A dynamic weighting module, implemented via a Spatial-Temporal Feature Disentanglement Network (STFDN), adaptively adjusts gradients by category. Our approach reduces memory usage for node sequences from 18.7GB to 8.9GB (52.4% less) via Spectral Normalization. Under Wasserstein distance constraints, the model achieves an FID score of 28.4, a 10.4% improvement. An innovative dynamic temperature scaling strategy elevates the AUPRC to 0.837 on the SGSC dataset. Tests on the UK-DALE dataset show an F1-score of 89.3% with 183ms inference latency, meeting edge deployment requirements. This research offers a promising new generation of automated detection tools for grid operation and maintenance.
Journal Article
Accuracy-enhanced E-commerce recommendation based on deep learning and locality-sensitive hashing
by
Esquivel, James A.
,
Li, Dejuan
in
Accuracy
,
Communications Engineering
,
Computer Communication Networks
2024
Recommender systems facilitate the discovery of relevant content in several online communities by analyzing users' past interactions and preferences. With the expansion of data-intensive online activities and online content, cybersecurity risks have increased. Users may not be adequately protected by traditional collaborative recommendation systems. Sparsity and cold-start are common challenges for traditional recommendation systems. Advances in deep learning have enabled recommender systems to enhance user behavior prediction precision, a task previously deemed unattainable. To enhance privacy and speed up neighbor searches, we propose locality sensitive hashing (LSH) in neighbor-based embedded learning. Through an adversarial approach, LSH enables efficient neighbor searching. Deriving multi-view embeddings from diverse behavioral data enhances the accuracy of predictions. By using multi-view preference embeddings, user preferences can be depicted more intricately. LSH, neighbor-centered embedding, self-embedding, and interaction-aware embedding are all used to accomplish this task. In addition to providing efficient similarity search capabilities, neighbor-based embedding learning and adversarial search provide robust privacy protection. As a result, the outcomes are consolidated into an advanced prediction system based on long short-term memory. Numerous empirical studies with authentic datasets demonstrate that our proposed methodology outperforms existing state-of-the-art benchmarks in terms of predictive accuracy, while maintaining robust security.
Journal Article