Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
579 result(s) for "Jaccard"
Sort by:
Jaccard-constrained dense subgraph discovery
Finding dense subgraphs is a core problem in graph mining with many applications in diverse domains. At the same time many real-world networks vary over time, that is, the dataset can be represented as a sequence of graph snapshots. Hence, it is natural to consider the question of finding dense subgraphs in a temporal network that are allowed to vary over time to a certain degree. In this paper, we search for dense subgraphs that have large pairwise Jaccard similarity coefficients. More formally, given a set of graph snapshots and input parameter α , we find a collection of dense subgraphs, with pairwise Jaccard index at least α , such that the sum of densities of the induced subgraphs is maximized. We prove that this problem is NP -hard and we present a greedy, iterative algorithm which runs in O n k 2 + m time per single iteration, where k is the length of the graph sequence and n and m denote number of vertices and total number of edges respectively. We also consider an alternative problem where subgraphs with large pairwise Jaccard indices are rewarded. We do this by incorporating the indices directly into the objective function. More formally, given a set of graph snapshots and a weight λ , we find a collection of dense subgraphs such that the sum of densities of the induced subgraphs plus the sum of Jaccard indices, weighted by λ , is maximized. We prove that this problem is NP -hard. To discover dense subgraphs with good objective value, we present an iterative algorithm which runs in O n 2 k 2 + m log n + k 3 n time per single iteration, and a greedy algorithm which runs in O n 2 k 2 + m log n + k 3 n time. We show experimentally that our algorithms are efficient, they can find ground truth in synthetic datasets and provide good results from real-world datasets. Finally, we present two case studies that show the usefulness of our problem.
Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data
Background A survey of presences and absences of specific species across multiple biogeographic units (or bioregions) are used in a broad area of biological studies from ecology to microbiology. Using binary presence-absence data, we evaluate species co-occurrences that help elucidate relationships among organisms and environments. To summarize similarity between occurrences of species, we routinely use the Jaccard/Tanimoto coefficient, which is the ratio of their intersection to their union. It is natural, then, to identify statistically significant Jaccard/Tanimoto coefficients, which suggest non-random co-occurrences of species. However, statistical hypothesis testing using this similarity coefficient has been seldom used or studied. Results We introduce a hypothesis test for similarity for biological presence-absence data, using the Jaccard/Tanimoto coefficient. Several key improvements are presented including unbiased estimation of expectation and centered Jaccard/Tanimoto coefficients, that account for occurrence probabilities. The exact and asymptotic solutions are derived. To overcome a computational burden due to high-dimensionality, we propose the bootstrap and measurement concentration algorithms to efficiently estimate statistical significance of binary similarity. Comprehensive simulation studies demonstrate that our proposed methods produce accurate p -values and false discovery rates. The proposed estimation methods are orders of magnitude faster than the exact solution, particularly with an increasing dimensionality. We showcase their applications in evaluating co-occurrences of bird species in 28 islands of Vanuatu and fish species in 3347 freshwater habitats in France. The proposed methods are implemented in an open source R package called jaccard ( https://cran.r-project.org/package=jaccard ). Conclusion We introduce a suite of statistical methods for the Jaccard/Tanimoto similarity coefficient for binary data, that enable straightforward incorporation of probabilistic measures in analysis for species co-occurrences. Due to their generality, the proposed methods and implementations are applicable to a wide range of binary data arising from genomics, biochemistry, and other areas of science.
Fast computation of the eigensystem of genomic similarity matrices
The computation of a similarity measure for genomic data is a standard tool in computational genetics. The principal components of such matrices are routinely used to correct for biases due to confounding by population stratification, for instance in linear regressions. However, the calculation of both a similarity matrix and its singular value decomposition (SVD) are computationally intensive. The contribution of this article is threefold. First, we demonstrate that the calculation of three matrices (called the covariance matrix, the weighted Jaccard matrix, and the genomic relationship matrix) can be reformulated in a unified way which allows for the application of a randomized SVD algorithm, which is faster than the traditional computation. The fast SVD algorithm we present is adapted from an existing randomized SVD algorithm and ensures that all computations are carried out in sparse matrix algebra. The algorithm only assumes that row-wise and column-wise subtraction and multiplication of a vector with a sparse matrix is available, an operation that is efficiently implemented in common sparse matrix packages. An exception is the so-called Jaccard matrix, which does not have a structure applicable for the fast SVD algorithm. Second, an approximate Jaccard matrix is introduced to which the fast SVD computation is applicable. Third, we establish guaranteed theoretical bounds on the accuracy (in L 2 norm and angle) between the principal components of the Jaccard matrix and the ones of our proposed approximation, thus putting the proposed Jaccard approximation on a solid mathematical foundation, and derive the theoretical runtime of our algorithm. We illustrate that the approximation error is low in practice and empirically verify the theoretical runtime scalings on both simulated data and data of the 1000 Genome Project.
A Bibliometric Analysis of COVID-19 across Science and Social Science Research Landscape
The lack of knowledge about the COVID-19 pandemic has encouraged extensive research in the academic sphere, reflected in the exponentially growing scientific literature. While the state of COVID-19 research reveals it is currently in an early stage of developing knowledge, a comprehensive and in-depth overview is still missing. Accordingly, the paper’s main aim is to provide an extensive bibliometric analysis of COVID-19 research across the science and social science research landscape, using innovative bibliometric approaches (e.g., Venn diagram, Biblioshiny descriptive statistics, VOSviewer co-occurrence network analysis, Jaccard distance cluster analysis, text mining based on binary logistic regression). The bibliometric analysis considers the Scopus database, including all relevant information on COVID-19 related publications (n = 16,866) available in the first half of 2020. The empirical results indicate the domination of health sciences in terms of number of relevant publications and total citations, while physical sciences and social sciences and humanities lag behind significantly. Nevertheless, there is an evidence of COVID-19 research collaboration within and between different subject area classifications with a gradual increase in importance of non-health scientific disciplines. The findings emphasize the great need for a comprehensive and in-depth approach that considers various scientific disciplines in COVID-19 research so as to benefit not only the scientific community but evidence-based policymaking as part of efforts to properly respond to the COVID-19 pandemic.
Assessing similarity of n-dimensional hypervolumes
Aim The n‐dimensional hypervolume framework (Glob. Ecol. Biogeogr. [2014] 23:595–609) implemented through the R package 'hypervolume' is being increasingly used in ecology and biogeography. This approach offers a reliable means for comparing the niche of two or more species, through the calculation of the intersection between hypervolumes in a multidimensional space, as well as different distance metrics (minimum and centroid distance) and niche similarity indexes based on volume ratios (Sørensen–Dice and Jaccard similarity). However, given that these metrics have conceptual differences, there is still no consensus on which one(s) should be routinely used in order to assess niche similarity. The aim of this study is to provide general guidance for constructing and comparing n‐dimensional hypervolumes. Location Virtual study site. Taxon Virtual species. Method First, the literature was screened to verify the usage of the different metrics in studies (2014–2018) relying on this method. Subsequently, a comparative analysis based on simulated morphological and bioclimatic traits was performed, taking into consideration different analytical dimensions, sample sizes and algorithms for hypervolume construction. Results Literature survey revealed that there was no clear preference for one metric over the others in current studies relying on the n‐dimensional hypervolume method. In simulated data, a high correlation among similarity and distance metrics was found for all datatypes considered. For most analytical scenarios, using at least one overlap and one distance metric would be therefore the most appropriate approach for assessing niche overlap. Yet, when hypervolumes are fully disjunct, similarity metrics become uninformative and calculating the two distance metrics is recommended. The sample size and the choice of algorithm and dimensionality can lead to significant variations in the overlap of hypervolumes in the hyperspace, and therefore should be carefully considered. Main conclusions Best practise for constructing n‐dimensional hypervolumes and assessing their similarity are drawn, representing a practical aid for scientists using the 'hypervolume' R package in their research. These recommendations apply to most datatypes and analytical scenarios. The R scripts published alongside this methodological study can be modified for performing large‐scale analyses of species niches or automatically assessing pairwise similarity metrics among multiple hypervolume objects.
Skin Lesion Segmentation from Dermoscopic Images Using Convolutional Neural Network
Clinical treatment of skin lesion is primarily dependent on timely detection and delimitation of lesion boundaries for accurate cancerous region localization. Prevalence of skin cancer is on the higher side, especially that of melanoma, which is aggressive in nature due to its high metastasis rate. Therefore, timely diagnosis is critical for its treatment before the onset of malignancy. To address this problem, medical imaging is used for the analysis and segmentation of lesion boundaries from dermoscopic images. Various methods have been used, ranging from visual inspection to the textural analysis of the images. However, accuracy of these methods is low for proper clinical treatment because of the sensitivity involved in surgical procedures or drug application. This presents an opportunity to develop an automated model with good accuracy so that it may be used in a clinical setting. This paper proposes an automated method for segmenting lesion boundaries that combines two architectures, the U-Net and the ResNet, collectively called Res-Unet. Moreover, we also used image inpainting for hair removal, which improved the segmentation results significantly. We trained our model on the ISIC 2017 dataset and validated it on the ISIC 2017 test set as well as the PH2 dataset. Our proposed model attained a Jaccard Index of 0.772 on the ISIC 2017 test set and 0.854 on the PH2 dataset, which are comparable results to the current available state-of-the-art techniques.
Multi-Person Tracking and Crowd Behavior Detection via Particles Gradient Motion Descriptor and Improved Entropy Classifier
To prevent disasters and to control and supervise crowds, automated video surveillance has become indispensable. In today’s complex and crowded environments, manual surveillance and monitoring systems are inefficient, labor intensive, and unwieldy. Automated video surveillance systems offer promising solutions, but challenges remain. One of the major challenges is the extraction of true foregrounds of pixels representing humans only. Furthermore, to accurately understand and interpret crowd behavior, human crowd behavior (HCB) systems require robust feature extraction methods, along with powerful and reliable decision-making classifiers. In this paper, we describe our approach to these issues by presenting a novel Particles Force Model for multi-person tracking, a vigorous fusion of global and local descriptors, along with a robust improved entropy classifier for detecting and interpreting crowd behavior. In the proposed model, necessary preprocessing steps are followed by the application of a first distance algorithm for the removal of background clutter; true-foreground elements are then extracted via a Particles Force Model. The detected human forms are then counted by labeling and performing cluster estimation, using a K-nearest neighbors search algorithm. After that, the location of all the human silhouettes is fixed and, using the Jaccard similarity index and normalized cross-correlation as a cost function, multi-person tracking is performed. For HCB detection, we introduced human crowd contour extraction as a global feature and a particles gradient motion (PGD) descriptor, along with geometrical and speeded up robust features (SURF) for local features. After features were extracted, we applied bat optimization for optimal features, which also works as a pre-classifier. Finally, we introduced a robust improved entropy classifier for decision making and automated crowd behavior detection in smart surveillance systems. We evaluated the performance of our proposed system on a publicly available benchmark PETS2009 and UMN dataset. Experimental results show that our system performed better compared to existing well-known state-of-the-art methods by achieving higher accuracy rates. The proposed system can be deployed to great benefit in numerous public places, such as airports, shopping malls, city centers, and train stations to control, supervise, and protect crowds.
Online Book Recommendation System using Collaborative Filtering (With Jaccard Similarity)
Recommendation System (RS) is software that suggests similar items to a purchaser based on his/her earlier purchases or preferences. RS examines huge data of objects and compiles a list of those objects which would fulfil the requirements of the buyer. Nowadays most ecommerce companies are using Recommendation systems to lure buyers to purchase more by offering items that the buyer is likely to prefer. Book Recommendation System is being used by Amazon, Barnes and Noble, Flipkart, Goodreads, etc. to recommend books the customer would be tempted to buy as they are matched with his/her choices. The challenges they face are to filter, set a priority and give recommendations which are accurate. RS systems use Collaborative Filtering (CF) to generate lists of items similar to the buyer's preferences. Collaborative filtering is based on the assumption that if a user has rated two books then to a user who has read one of these books, the other book can be recommended (Collaboration). CF has difficulties in giving accurate recommendations due to problems of scalability, sparsity and cold start. Therefore this paper proposes a recommendation that uses Collaborative filtering with Jaccard Similarity (JS) to give more accurate recommendations. JS is based on an index calculated for a pair of books. It is a ratio of common users (users who have rated both books) divided by the sum of users who have rated the two books individually. Larger the number of common users higher will be the JS Index and hence better recommendations. Books with high JS index (more recommended) will appear on top of the recommended books list.
A Rapid and Automated Urban Boundary Extraction Method Based on Nighttime Light Data in China
As urbanization has progressed over the past 40 years, continuous population growth and the rapid expansion of urban land use have caused some regions to experience various problems, such as insufficient resources and issues related to the environmental carrying capacity. The urbanization process can be understood using nighttime light data to quickly and accurately extract urban boundaries at large scales. A new method is proposed here to quickly and accurately extract urban boundaries using nighttime light imagery. Three types of nighttime light data from the DMSP/OLS (US military’s defense meteorological satellite), NPP-VIIRS (National Polar-orbiting Partnership-Visible Infrared Imaging Radiometer Suite), and Luojia1-01 data sets are selected, and the high-precision urban boundaries obtained from a high-resolution image are selected as the true value. Next, 15 cities are selected as the training samples, and the Jaccard coefficient is introduced. The spatial data comparison method is then used to determine the optimal threshold function for the urban boundary extraction. Alternative high-precision urban boundary truth-values for the 13 cities are then selected, and the accuracy of the urban boundary extraction results obtained using the optimal threshold function and the mutation detection method are evaluated. The following observations are made from the results: (i) The average relative errors for the urban boundary extraction results based on the three nighttime light data sources (DMSP/OLS, NPP-VIIRS, and Luojia1-01) using the optimal threshold functions are 29%, 20%, and 39%, respectively. Compared with the mutation detection method, these relative errors are reduced by 83%, 18%, and 77%, respectively; (ii) The average overall classification accuracies of the extracted urban boundaries are 95%, 96%, and 93%, respectively, which are 5%, 1%, and 7% higher than those for the mutation detection method; (iii) The average Kappa coefficients of the extracted urban boundaries are 61%, 71%, and 61%, respectively, which are 5%, 4%, and 12% higher than for the mutation detection method.
A novel essential protein identification method based on PPI networks and gene expression data
Background Some proposed methods for identifying essential proteins have better results by using biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of \"active\" and \"inactive\" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins. Results In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression. Conclusions We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of \"active\" and \"inactive\" state of gene expression in a cluster of the PPI network.