Catalogue Search | MBRL

Customization scenarios for de-identification of clinical notes

by Szpektor, Idan , Dean, Jeff , Amira, Rony in Automatic data collection systems , Automation , Clinical notes

2020

Background Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new datasets. Objective We present practical options for clinical note de-identification, assessing performance of machine learning systems ranging from off-the-shelf to fully customized. Methods We implement a state-of-the-art machine learning de-identification system, training and testing on pairs of datasets that match the deployment scenarios. We use clinical notes from two i2b2 competition corpora, the Physionet Gold Standard corpus, and parts of the MIMIC-III dataset. Results Fully customized systems remove 97–99% of personally identifying information. Performance of off-the-shelf systems varies by dataset, with performance mostly above 90%. Providing a small labeled dataset or large unlabeled dataset allows for fine-tuning that improves performance over off-the-shelf systems. Conclusion Health organizations should be aware of the levels of customization available when selecting a de-identification deployment solution, in order to choose the one that best matches their resources and target performance level.

Journal Article

Share this book

Add to My Shelf

The non-backtracking spectrum of the universal cover of a graph

by Friedman, Joel , Hoory, Shlomo , Angel, Omer in Research article

2015

A non-backtracking walk on a graph, HH, is a directed path of directed edges of HH such that no edge is the inverse of its preceding edge. Non-backtracking walks of a given length can be counted using the non-backtracking adjacency matrix, BB, indexed by HH’s directed edges and related to Ihara’s Zeta function. We show how to determine BB’s spectrum in the case where HH is a tree covering a finite graph. We show that when HH is not regular, this spectrum can have positive measure in the complex plane, unlike the regular case. We show that outside of BB’s spectrum, the corresponding Green function has “periodic decay ratios”. The existence of such a “ratio system” can be effectively checked and is equivalent to being outside the spectrum. We also prove that the spectral radius of the non-backtracking walk operator on the tree covering a finite graph is exactly gr\\sqrt {\\mathrm {gr}}, where gr\\mathrm {gr} is the cogrowth of BB, or growth rate of the tree. This further motivates the definition of the graph theoretical Riemann hypothesis proposed by Stark and Terras. Finally, we give experimental evidence that for a fixed, finite graph, HH, a random lift of large degree has non-backtracking new spectrum near that of HH’s universal cover. This suggests a new generalization of Alon’s second eigenvalue conjecture.

Journal Article

Share this book

Add to My Shelf

On the Girth of Graph Lifts

by Hoory, Shlomo in Combinatorial analysis , Lower bounds , Upper bounds

2025

The size of the smallest \\(k\\)-regular graph of girth at least \\(g\\) is denoted by the well-studied function \\(n(k,g)\\). We introduce an analogous function \\(n(H,g)\\), defined as the smallest size graph of girth at least \\(g\\) that is a lift (or cover) of the, possibly non-regular, graph \\(H\\). We prove that the two main combinatorial bounds on \\(n(k,g)\\) -- the Moore lower bound and the Erd\"{o}s-Sachs upper bound -- carry over to the new lift setting. We also consider two other functions: i) The smallest size graph of girth at least \\(g\\) sharing a universal cover with \\(H\\). We prove that it is the same as \\(n(H,g)\\) up to a multiplicative constant. ii) The smallest size graph of girth least \\(g\\) with a prescribed degree distribution. We discuss this known generalization and argue that the new suggested definitions are superior. We conclude with experimental results for a specific base graph, followed by conjectures and open problems for future research.

Paper

Share this book

Add to My Shelf

On the Girth of Graph Lifts

by Hoory, Shlomo in Combinatorial analysis , Lower bounds , Upper bounds

2024

The size of the smallest \\(k\\)-regular graph of girth \\(g\\) is denoted by the well studied function \\(n(k,g)\\). We suggest generalizing this function to \\(n(H,g)\\), defined as the smallest size girth \\(g\\) graph covering the, possibly non-regular, graph \\(H\\). We prove that the two main combinatorial bounds on \\(n(k,g)\\), the Moore lower bound and the Erd\"{o}s Sachs upper bound, carry over to the new setting of lifts, even in their non-asymptotic form. We also consider two other generalizations of \\(n(k,g)\\): i) The smallest size girth \\(g\\) graph sharing a universal cover with \\(H\\). We prove that it is the same as \\(n(H,g)\\) up to a multiplicative constant. ii) The smallest size girth \\(g\\) graph with a prescribed degree distribution. We discuss this known generalization and argue that the new suggested definitions are superior. We conclude with experimental results for a specific base graph and with some conjectures and open problems.

Paper

Share this book

Add to My Shelf

A Note on Unsatisfiable k -CNF Formulas with Few Occurrences per Variable

by Szeider, Stefan , Hoory, Shlomo in Computer science , Construction , Variables

2006

The (k,s)-SAT problem is the satisfiability problem restricted to instances where each clause has exactly k literals and every variable occurs at most s times. It is known that there exists a function f such that for s \\leq f(k) all (k,s)-SAT instances are satisfiable, but (k,f(k)+1)-SAT is already NP-complete (k \\geq 3). We prove that f(k) = O(2k \\cdot log k/k), improving upon the best known upper bound O(2k/kalpha), where alpha=log3 4 - 1 \\approx 0.26. The new upper bound is tight up to a log k factor with the best known lower bound Omega(2k/k).

Journal Article

Share this book

Add to My Shelf

Entropy and the growth rate of universal covering trees

by Eisner, Idan , Hoory, Shlomo in Entropy , Graphs , Random walk

2024

This work studies the relation between two graph parameters, \\(\\rho\\) and \\(\\Lambda\\). For an undirected graph \\(G\\), \\(\\rho(G)\\) is the growth rate of its universal covering tree, while \\(\\Lambda(G)\\) is a weighted geometric average of the vertex degree minus one, corresponding to the rate of entropy growth for the non-backtracking random walk (NBRW). It is well known that \\(\\rho(G) \\geq \\Lambda(G)\\) for all graphs, and that graphs with \\(\\rho=\\Lambda\\) exhibit some special properties. In this work we derive an easy to check, necessary and sufficient condition for the equality to hold. Furthermore, we show that the variance of the number of random bits used by a length \\(\\ell\\) NBRW is \\(O(1)\\) if \\(\\rho = \\Lambda\\) and \\(\\Omega(\\ell)\\) if \\(\\rho > \\Lambda\\). As a consequence we exhibit infinitely many non-trivial examples of graphs with \\(\\rho = \\Lambda\\).

Paper

Share this book

Add to My Shelf

The Moore Bound for Irregular Graphs

by Alon, Noga , Hoory, Shlomo , Linial, Nathan in Graphs , Mathematical models

2002

What is the largest number of edges in a graph of order n and girth g? For d-regular graphs, essentially the best known answer is provided by the Moore bound. This result is extended here to cover irregular graphs as well, yielding an affirmative answer to an old open problem ([4] p. 163, problem 10). [PUBLICATION ABSTRACT]

Journal Article

Share this book

Add to My Shelf

A note on unsatisfiable κ-CNF formulas with few occurrences per variable

by SZEIDER, Stefan , HOORY, Shlomo in Algebra , Combinatorics , Combinatorics. Ordered structures

2007

Journal Article

Share this book

Add to My Shelf

Reducing the size of resolution proofs in linear time

by Strichman, Ofer , Fuhrmann, Oded , Shacham, Ohad in Algorithms , Computer programs , Computer Science

2011

DPLL-based SAT solvers progress by implicitly applying binary resolution. The resolution proofs that they generate are used, after the SAT solver’s run has terminated, for various purposes. Most notable uses in formal verification are: extracting an unsatisfiable core , extracting an interpolant , and detecting clauses that can be reused in an incremental satisfiability setting (the latter uses the proof only implicitly, during the run of the SAT solver). Making the resolution proof smaller can benefit all of these goals: it can lead to smaller cores, smaller interpolants, and smaller clauses that are propagated to the next SAT instance in an incremental setting. We suggest two methods that are linear in the size of the proof for doing so. Our first technique, called R ecycle -U nits uses each learned constant (unit clause) ( x ) for simplifying resolution steps in which x was the pivot, prior to when it was learned. Our second technique, called simplifies proofs in which there are several nodes in the resolution graph, one of which dominates the others, that correspond to the same pivot. Our experiments with industrial instances show that these simplifications reduce the core by ≈5% and the proof by ≈13%. It reduces the core less than competing methods such as run- till- fix , but whereas our algorithms are linear in the size of the proof, the latter and other competing techniques are all exponential as they are based on SAT runs. If we consider the size of the proof (the resolution graph) as being polynomial in the number of variables (it is not necessarily the case in general), this gives our method an exponential time reduction comparing to existing tools for small core extraction. Our experiments show that this result is evident in practice more so for the second method: rarely it takes more than a few seconds, even when competing tools time out, and hence it can be used as a cheap proof post-processing procedure.

Journal Article

Share this book

Add to My Shelf

Reducing the size of resolution proofs in linear time

by FUHRMANN, Oded , HOORY, Shlomo , STRICHMAN, Ofer in Applied sciences , Computer science; control theory; systems , Computer systems performance. Reliability

2011

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter