Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
104
result(s) for
"Gleich, David F."
Sort by:
Higher-order organization of complex networks
2016
Networks are a fundamental tool for understanding and modeling complex systems in physics, biology, neuroscience, engineering, and social science. Many networks are known to exhibit rich, lower-order connectivity patterns that can be captured at the level of individual nodes and edges. However, higher-order organization of complex networks—at the level of small network subgraphs—remains largely unknown. Here, we develop a generalized framework for clustering networks on the basis of higher-order connectivity patterns. This framework provides mathematical guarantees on the optimality of obtained clusters and scales to networks with billions of edges. The framework reveals higher-order organization in a number of networks, including information propagation units in neuronal networks and hub structure in transportation networks. Results show that networks exhibit rich higher-order organizational structures that are exposed by clustering based on higher-order connectivity patterns.
Journal Article
Local hypergraph clustering using capacity releasing diffusion
by
Gleich, David F.
,
Ibrahim, Rania
in
Algorithms
,
Cluster Analysis
,
Computer and Information Sciences
2020
Local graph clustering is an important machine learning task that aims to find a well-connected cluster near a set of seed nodes. Recent results have revealed that incorporating higher order information significantly enhances the results of graph clustering techniques. The majority of existing research in this area focuses on spectral graph theory-based techniques. However, an alternative perspective on local graph clustering arises from using max-flow and min-cut on the objectives, which offer distinctly different guarantees. For instance, a new method called capacity releasing diffusion (CRD) was recently proposed and shown to preserve local structure around the seeds better than spectral methods. The method was also the first local clustering technique that is not subject to the quadratic Cheeger inequality by assuming a good cluster near the seed nodes. In this paper, we propose a local hypergraph clustering technique called hypergraph CRD (HG-CRD) by extending the CRD process to cluster based on higher order patterns, encoded as hyperedges of a hypergraph. Moreover, we theoretically show that HG-CRD gives results about a quantity called motif conductance, rather than a biased version used in previous experiments. Experimental results on synthetic datasets and real world graphs show that HG-CRD enhances the clustering quality.
Journal Article
A geometric approach to characterize the functional identity of single cells
2018
Single-cell transcriptomic data has the potential to radically redefine our view of cell-type identity. Cells that were previously believed to be homogeneous are now clearly distinguishable in terms of their expression phenotype. Methods for automatically characterizing the functional identity of cells, and their associated properties, can be used to uncover processes involved in lineage differentiation as well as sub-typing cancer cells. They can also be used to suggest personalized therapies based on molecular signatures associated with pathology. We develop a new method, called ACTION, to infer the functional identity of cells from their transcriptional profile, classify them based on their dominant function, and reconstruct regulatory networks that are responsible for mediating their identity. Using ACTION, we identify novel Melanoma subtypes with differential survival rates and therapeutic responses, for which we provide biomarkers along with their underlying regulatory networks.
Functional characterisation of single cells is crucial for uncovering the true extent of cellular heterogeneity. Here the authors offer an approach to infer functional identities of cells from their transcriptomes, identify their dominant function, and reconstruct the underlying regulatory networks.
Journal Article
Neighborhood and PageRank methods for pairwise link prediction
by
Nassar, Huda
,
Gleich, David F.
,
Benson, Austin R.
in
Algorithms
,
Applications of Graph Theory and Complex Networks
,
Cliques
2020
Link prediction is a common problem in network science that cuts across many disciplines. The goal is to forecast the appearance of new links or to find links missing in the network. Typical methods for link prediction use the topology of the network to predict the most likely future or missing connections between a pair of nodes. However, network evolution is often mediated by higher-order structures involving more than pairs of nodes; for example, cliques on three nodes (also called triangles) are key to the structure of social networks, but the standard link prediction framework does not directly predict these structures. To address this gap, in recent work, we propose a new link prediction task called “pairwise link prediction” that directly targets the prediction of new triangles, where one is tasked with finding which nodes are most likely to form a triangle with a given edge. We extend this work in this manuscript, and we evaluate a variety of natural extensions to link prediction methods including neighborhood and PageRank-based methods. A key difference from our previous work is the definition of the neighborhood of an edge, which has a surprisingly large impact on the empirical performance. Our experiments on a variety of networks show that diffusion-based methods are less sensitive to the type of graphs used and more consistent in their results. We also show how our pairwise link prediction framework can be used to get better predictions within the context of standard link prediction evaluation.
Journal Article
Dimensionality of Social Networks Using Motifs and Eigenvalues
2014
We consider the dimensionality of social networks, and develop experiments aimed at predicting that dimension. We find that a social network model with nodes and links sampled from an m-dimensional metric space with power-law distributed influence regions best fits samples from real-world networks when m scales logarithmically with the number of nodes of the network. This supports a logarithmic dimension hypothesis, and we provide evidence with two different social networks, Facebook and LinkedIn. Further, we employ two different methods for confirming the hypothesis: the first uses the distribution of motif counts, and the second exploits the eigenvalue distribution.
Journal Article
PageRank Beyond the Web
2015
Google's PageRank method was developed to evaluate the importance of web-pages via their link structure. The mathematics of PageRank, however, are entirely general and apply to any graph or network in any domain. Thus, PageRank is now regularly used in bibliometrics, social and information network analysis, and for link prediction and recommendation. It's even used for systems analysis of road networks, as well as biology, chemistry, neuroscience, and physics. We'll see the mathematics and ideas that unite these diverse applications.
Journal Article
Gauss’s law for networks directly reveals community boundaries
2018
The study of network topology provides insight into the function and behavior of physical, social, and biological systems. A natural step towards discovering the organizing principles of these complex topologies is to identify a reduced network representation using cohesive subgroups or communities. This procedure often uncovers the underlying mechanisms governing the functional assembly of complex networks. A community is usually defined as a subgraph or a set of nodes that has more edges than would be expected from a simple, null distribution of edges over the graph. This view drives objective such as modularity. Another perspective, corresponding to objectives like conductance or density, is that communities are groups of nodes that have extremal properties with respect to the number of internal edges and cut edges. Here we show that identifying community boundaries rather than communities results in a more accurate decomposition of the network into informative components. We derive a network analog of Gauss’s law that relates a measure of flux through a subgraph’s boundary to the connectivity among the subgraph’s nodes. Our Gauss’s law for networks naturally characterizes a community as a subgraph with high flux through its boundary. Aggregating flux over these boundaries gives rise to a Laplacian and forms the basis of our “Laplacian modularity” quality function for community detection that is applicable to general network types. This technique allows us to determine communities that are both overlapping and hierarchically organized.
Journal Article
Topological structure of complex predictions
2023
Current complex prediction models are the result of fitting deep neural networks, graph convolutional networks or transducers to a set of training data. A key challenge with these models is that they are highly parameterized, which makes describing and interpreting the prediction strategies difficult. We use topological data analysis to transform these complex prediction models into a simplified topological view of the prediction landscape. The result is a map of the predictions that enables inspection of the model results with more specificity than dimensionality-reduction methods such as tSNE and UMAP. The methods scale up to large datasets across different domains. We present a case study of a transformer-based model previously designed to predict expression levels of a piece of DNA in thousands of genomic tracks. When the model is used to study mutations in the
BRCA1
gene, our topological analysis shows that it is sensitive to the location of a mutation and the exon structure of
BRCA1
in ways that cannot be found with tools based on dimensionality reduction. Moreover, the topological framework offers multiple ways to inspect results, including an error estimate that is more accurate than model uncertainty. Further studies show how these ideas produce useful results in graph-based learning and image classification.
Deep learning is a powerful method to process large datasets, and shown to be useful in many scientific fields, but models are highly parameterized and there are often challenges in interpretation and generalization. David Gleich and colleagues develop a method rooted in computational topology, starting with a graph-based topological representation of the data, to help assess and diagnose predictions from deep learning and other complex prediction methods.
Journal Article
Estimating statewide carrying capacity of bobcats (Lynx rufus) using improved maximum clique algorithms
2022
ContextMaximum clique analysis (MCA) can approximate landscape carrying capacity (Nk) for populations of territorial wildlife. However, MCA has not been widely adopted for wildlife applications, mainly due to computational constraints and software wildlife biologists may find difficult to use. Moreover, MCA does not incorporate uncertainty into estimates of Nk.ObjectivesWe extended MCA by applying a vertex cover algorithm to compute Nk over a large (92,789 km2), continuous spatial scale for female bobcats (Lynx rufus) in Indiana, USA. We incorporated uncertainty by calculating confidence intervals for Nk across five thresholds of habitat suitability using 10 replicate suitability maps from bootstrapped datasets. For portions of the landscape too large to be solved with the vertex cover algorithm, we compared predictions from a linear model and a “greedy” algorithm.ResultsMean estimates of Nk for female bobcats in Indiana across habitat suitability thresholds ranged from 539 (0.75 threshold) to 1200 territories (0.25 threshold). On average, each 12.5 percentile reduction in the suitability threshold increased estimates for Nk by 1.2-fold. Both the predictive and greedy algorithm produced reasonable estimates of maximum cliques for areas that were too large to compute with the vertex cover algorithm. The greedy algorithm produced smaller confidence intervals compared to the predictive approach but underestimated maximum cliques by 1.2%.ConclusionsOur research demonstrates effective application of MCA to species occupying large landscapes while accounting for uncertainty. We believe our methods, coupled with availability of annotated scripts developed in R, will make MCA more broadly accessible to wildlife biologists.
Journal Article
An Inner-Outer Iteration for Computing PageRank
2010
We present a new iterative scheme for PageRank computation. The algorithm is applied to the linear system formulation of the problem, using inner-outer stationary iterations. It is simple, can be easily implemented and parallelized, and requires minimal storage overhead. Our convergence analysis shows that the algorithm is effective for a crude inner tolerance and is not sensitive to the choice of the parameters involved. The same idea can be used as a preconditioning technique for nonstationary schemes. Numerical examples featuring matrices of dimensions exceeding 100,000,000 in sequential and parallel environments demonstrate the merits of our technique. Our code is available online for viewing and testing, along with several large scale examples. [PUBLICATION ABSTRACT]
Journal Article