Catalogue Search | MBRL

Higher-order organization of complex networks

by Gleich, David F. , Benson, Austin R. , Leskovec, Jure in Clustering , Data analysis , Graphs

2016

Networks are a fundamental tool for understanding and modeling complex systems in physics, biology, neuroscience, engineering, and social science. Many networks are known to exhibit rich, lower-order connectivity patterns that can be captured at the level of individual nodes and edges. However, higher-order organization of complex networks—at the level of small network subgraphs—remains largely unknown. Here, we develop a generalized framework for clustering networks on the basis of higher-order connectivity patterns. This framework provides mathematical guarantees on the optimality of obtained clusters and scales to networks with billions of edges. The framework reveals higher-order organization in a number of networks, including information propagation units in neuronal networks and hub structure in transportation networks. Results show that networks exhibit rich higher-order organizational structures that are exposed by clustering based on higher-order connectivity patterns.

Journal Article

Share this book

Add to My Shelf

Local hypergraph clustering using capacity releasing diffusion

by Gleich, David F. , Ibrahim, Rania in Algorithms , Cluster Analysis , Computer and Information Sciences

2020

Local graph clustering is an important machine learning task that aims to find a well-connected cluster near a set of seed nodes. Recent results have revealed that incorporating higher order information significantly enhances the results of graph clustering techniques. The majority of existing research in this area focuses on spectral graph theory-based techniques. However, an alternative perspective on local graph clustering arises from using max-flow and min-cut on the objectives, which offer distinctly different guarantees. For instance, a new method called capacity releasing diffusion (CRD) was recently proposed and shown to preserve local structure around the seeds better than spectral methods. The method was also the first local clustering technique that is not subject to the quadratic Cheeger inequality by assuming a good cluster near the seed nodes. In this paper, we propose a local hypergraph clustering technique called hypergraph CRD (HG-CRD) by extending the CRD process to cluster based on higher order patterns, encoded as hyperedges of a hypergraph. Moreover, we theoretically show that HG-CRD gives results about a quantity called motif conductance, rather than a biased version used in previous experiments. Experimental results on synthetic datasets and real world graphs show that HG-CRD enhances the clustering quality.

Journal Article

Share this book

Add to My Shelf

A geometric approach to characterize the functional identity of single cells

by Ravindra, Vikram , Grama, Ananth , Gleich, David F. in 49/39 , 631/114/1305 , 631/114/2114

2018

Single-cell transcriptomic data has the potential to radically redefine our view of cell-type identity. Cells that were previously believed to be homogeneous are now clearly distinguishable in terms of their expression phenotype. Methods for automatically characterizing the functional identity of cells, and their associated properties, can be used to uncover processes involved in lineage differentiation as well as sub-typing cancer cells. They can also be used to suggest personalized therapies based on molecular signatures associated with pathology. We develop a new method, called ACTION, to infer the functional identity of cells from their transcriptional profile, classify them based on their dominant function, and reconstruct regulatory networks that are responsible for mediating their identity. Using ACTION, we identify novel Melanoma subtypes with differential survival rates and therapeutic responses, for which we provide biomarkers along with their underlying regulatory networks. Functional characterisation of single cells is crucial for uncovering the true extent of cellular heterogeneity. Here the authors offer an approach to infer functional identities of cells from their transcriptomes, identify their dominant function, and reconstruct the underlying regulatory networks.

Journal Article

Share this book

Add to My Shelf

Neighborhood and PageRank methods for pairwise link prediction

by Nassar, Huda , Gleich, David F. , Benson, Austin R. in Algorithms , Applications of Graph Theory and Complex Networks , Cliques

2020

Link prediction is a common problem in network science that cuts across many disciplines. The goal is to forecast the appearance of new links or to find links missing in the network. Typical methods for link prediction use the topology of the network to predict the most likely future or missing connections between a pair of nodes. However, network evolution is often mediated by higher-order structures involving more than pairs of nodes; for example, cliques on three nodes (also called triangles) are key to the structure of social networks, but the standard link prediction framework does not directly predict these structures. To address this gap, in recent work, we propose a new link prediction task called “pairwise link prediction” that directly targets the prediction of new triangles, where one is tasked with finding which nodes are most likely to form a triangle with a given edge. We extend this work in this manuscript, and we evaluate a variety of natural extensions to link prediction methods including neighborhood and PageRank-based methods. A key difference from our previous work is the definition of the neighborhood of an edge, which has a surprisingly large impact on the empirical performance. Our experiments on a variety of networks show that diffusion-based methods are less sensitive to the type of graphs used and more consistent in their results. We also show how our pairwise link prediction framework can be used to get better predictions within the context of standard link prediction evaluation.

Journal Article

Share this book

Add to My Shelf

Dimensionality of Social Networks Using Motifs and Eigenvalues

by Prałat, Paweł , Gleich, David F. , Tian, Yanhua in Algorithms , Analysis , Applied mathematics

2014

We consider the dimensionality of social networks, and develop experiments aimed at predicting that dimension. We find that a social network model with nodes and links sampled from an m-dimensional metric space with power-law distributed influence regions best fits samples from real-world networks when m scales logarithmically with the number of nodes of the network. This supports a logarithmic dimension hypothesis, and we provide evidence with two different social networks, Facebook and LinkedIn. Further, we employ two different methods for confirming the hypothesis: the first uses the distribution of motif counts, and the second exploits the eigenvalue distribution.

Journal Article

Share this book

Add to My Shelf

PageRank Beyond the Web

by Gleich, David F. in SURVEY and REVIEW

2015

Google's PageRank method was developed to evaluate the importance of web-pages via their link structure. The mathematics of PageRank, however, are entirely general and apply to any graph or network in any domain. Thus, PageRank is now regularly used in bibliometrics, social and information network analysis, and for link prediction and recommendation. It's even used for systems analysis of road networks, as well as biology, chemistry, neuroscience, and physics. We'll see the mathematics and ideas that unite these diverse applications.

Journal Article

Share this book

Add to My Shelf

Gauss’s law for networks directly reveals community boundaries

by Gleich, David F. , Sinha, Ayan , Ramani, Karthik in 639/705/1041 , 639/705/1042 , Boundaries

2018

The study of network topology provides insight into the function and behavior of physical, social, and biological systems. A natural step towards discovering the organizing principles of these complex topologies is to identify a reduced network representation using cohesive subgroups or communities. This procedure often uncovers the underlying mechanisms governing the functional assembly of complex networks. A community is usually defined as a subgraph or a set of nodes that has more edges than would be expected from a simple, null distribution of edges over the graph. This view drives objective such as modularity. Another perspective, corresponding to objectives like conductance or density, is that communities are groups of nodes that have extremal properties with respect to the number of internal edges and cut edges. Here we show that identifying community boundaries rather than communities results in a more accurate decomposition of the network into informative components. We derive a network analog of Gauss’s law that relates a measure of flux through a subgraph’s boundary to the connectivity among the subgraph’s nodes. Our Gauss’s law for networks naturally characterizes a community as a subgraph with high flux through its boundary. Aggregating flux over these boundaries gives rise to a Laplacian and forms the basis of our “Laplacian modularity” quality function for community detection that is applicable to general network types. This technique allows us to determine communities that are both overlapping and hierarchically organized.

Journal Article

Share this book

Add to My Shelf

Topological structure of complex predictions

by Gleich, David F. , Dey, Tamal K. , Liu, Meng in 4014/2801 , 639/705/117 , Algorithms

2023

Current complex prediction models are the result of fitting deep neural networks, graph convolutional networks or transducers to a set of training data. A key challenge with these models is that they are highly parameterized, which makes describing and interpreting the prediction strategies difficult. We use topological data analysis to transform these complex prediction models into a simplified topological view of the prediction landscape. The result is a map of the predictions that enables inspection of the model results with more specificity than dimensionality-reduction methods such as tSNE and UMAP. The methods scale up to large datasets across different domains. We present a case study of a transformer-based model previously designed to predict expression levels of a piece of DNA in thousands of genomic tracks. When the model is used to study mutations in the BRCA1 gene, our topological analysis shows that it is sensitive to the location of a mutation and the exon structure of BRCA1 in ways that cannot be found with tools based on dimensionality reduction. Moreover, the topological framework offers multiple ways to inspect results, including an error estimate that is more accurate than model uncertainty. Further studies show how these ideas produce useful results in graph-based learning and image classification. Deep learning is a powerful method to process large datasets, and shown to be useful in many scientific fields, but models are highly parameterized and there are often challenges in interpretation and generalization. David Gleich and colleagues develop a method rooted in computational topology, starting with a graph-based topological representation of the data, to help assess and diagnose predictions from deep learning and other complex prediction methods.

Journal Article

Share this book

Add to My Shelf

Estimating statewide carrying capacity of bobcats (Lynx rufus) using improved maximum clique algorithms

in Algorithms , Biologists , Carrying capacity

2022

ContextMaximum clique analysis (MCA) can approximate landscape carrying capacity (Nk) for populations of territorial wildlife. However, MCA has not been widely adopted for wildlife applications, mainly due to computational constraints and software wildlife biologists may find difficult to use. Moreover, MCA does not incorporate uncertainty into estimates of Nk.ObjectivesWe extended MCA by applying a vertex cover algorithm to compute Nk over a large (92,789 km2), continuous spatial scale for female bobcats (Lynx rufus) in Indiana, USA. We incorporated uncertainty by calculating confidence intervals for Nk across five thresholds of habitat suitability using 10 replicate suitability maps from bootstrapped datasets. For portions of the landscape too large to be solved with the vertex cover algorithm, we compared predictions from a linear model and a “greedy” algorithm.ResultsMean estimates of Nk for female bobcats in Indiana across habitat suitability thresholds ranged from 539 (0.75 threshold) to 1200 territories (0.25 threshold). On average, each 12.5 percentile reduction in the suitability threshold increased estimates for Nk by 1.2-fold. Both the predictive and greedy algorithm produced reasonable estimates of maximum cliques for areas that were too large to compute with the vertex cover algorithm. The greedy algorithm produced smaller confidence intervals compared to the predictive approach but underestimated maximum cliques by 1.2%.ConclusionsOur research demonstrates effective application of MCA to species occupying large landscapes while accounting for uncertainty. We believe our methods, coupled with availability of annotated scripts developed in R, will make MCA more broadly accessible to wildlife biologists.

Journal Article

Share this book

Add to My Shelf

An Inner-Outer Iteration for Computing PageRank

by Greif, Chen , Lau, Tracy , Gleich, David F. in Algorithms , Cloud computing , Computation

2010

We present a new iterative scheme for PageRank computation. The algorithm is applied to the linear system formulation of the problem, using inner-outer stationary iterations. It is simple, can be easily implemented and parallelized, and requires minimal storage overhead. Our convergence analysis shows that the algorithm is effective for a crude inner tolerance and is not sensitive to the choice of the parameters involved. The same idea can be used as a preconditioning technique for nonstationary schemes. Numerical examples featuring matrices of dimensions exceeding 100,000,000 in sequential and parallel environments demonstrate the merits of our technique. Our code is available online for viewing and testing, along with several large scale examples. [PUBLICATION ABSTRACT]

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter