Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
16
result(s) for
"Hoory, Shlomo"
Sort by:
Customization scenarios for de-identification of clinical notes
by
Szpektor, Idan
,
Dean, Jeff
,
Amira, Rony
in
Automatic data collection systems
,
Automation
,
Clinical notes
2020
Background
Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new datasets.
Objective
We present practical options for clinical note de-identification, assessing performance of machine learning systems ranging from off-the-shelf to fully customized.
Methods
We implement a state-of-the-art machine learning de-identification system, training and testing on pairs of datasets that match the deployment scenarios. We use clinical notes from two i2b2 competition corpora, the Physionet Gold Standard corpus, and parts of the MIMIC-III dataset.
Results
Fully customized systems remove 97–99% of personally identifying information. Performance of off-the-shelf systems varies by dataset, with performance mostly above 90%. Providing a small labeled dataset or large unlabeled dataset allows for fine-tuning that improves performance over off-the-shelf systems.
Conclusion
Health organizations should be aware of the levels of customization available when selecting a de-identification deployment solution, in order to choose the one that best matches their resources and target performance level.
Journal Article
The non-backtracking spectrum of the universal cover of a graph
2015
A non-backtracking walk on a graph, HH, is a directed path of directed edges of HH such that no edge is the inverse of its preceding edge. Non-backtracking walks of a given length can be counted using the non-backtracking adjacency matrix, BB, indexed by HH’s directed edges and related to Ihara’s Zeta function. We show how to determine BB’s spectrum in the case where HH is a tree covering a finite graph. We show that when HH is not regular, this spectrum can have positive measure in the complex plane, unlike the regular case. We show that outside of BB’s spectrum, the corresponding Green function has “periodic decay ratios”. The existence of such a “ratio system” can be effectively checked and is equivalent to being outside the spectrum. We also prove that the spectral radius of the non-backtracking walk operator on the tree covering a finite graph is exactly gr\\sqrt {\\mathrm {gr}}, where gr\\mathrm {gr} is the cogrowth of BB, or growth rate of the tree. This further motivates the definition of the graph theoretical Riemann hypothesis proposed by Stark and Terras. Finally, we give experimental evidence that for a fixed, finite graph, HH, a random lift of large degree has non-backtracking new spectrum near that of HH’s universal cover. This suggests a new generalization of Alon’s second eigenvalue conjecture.
Journal Article
On the Girth of Graph Lifts
2025
The size of the smallest \\(k\\)-regular graph of girth at least \\(g\\) is denoted by the well-studied function \\(n(k,g)\\). We introduce an analogous function \\(n(H,g)\\), defined as the smallest size graph of girth at least \\(g\\) that is a lift (or cover) of the, possibly non-regular, graph \\(H\\). We prove that the two main combinatorial bounds on \\(n(k,g)\\) -- the Moore lower bound and the Erd\"{o}s-Sachs upper bound -- carry over to the new lift setting. We also consider two other functions: i) The smallest size graph of girth at least \\(g\\) sharing a universal cover with \\(H\\). We prove that it is the same as \\(n(H,g)\\) up to a multiplicative constant. ii) The smallest size graph of girth least \\(g\\) with a prescribed degree distribution. We discuss this known generalization and argue that the new suggested definitions are superior. We conclude with experimental results for a specific base graph, followed by conjectures and open problems for future research.
On the Girth of Graph Lifts
2024
The size of the smallest \\(k\\)-regular graph of girth \\(g\\) is denoted by the well studied function \\(n(k,g)\\). We suggest generalizing this function to \\(n(H,g)\\), defined as the smallest size girth \\(g\\) graph covering the, possibly non-regular, graph \\(H\\). We prove that the two main combinatorial bounds on \\(n(k,g)\\), the Moore lower bound and the Erd\"{o}s Sachs upper bound, carry over to the new setting of lifts, even in their non-asymptotic form. We also consider two other generalizations of \\(n(k,g)\\): i) The smallest size girth \\(g\\) graph sharing a universal cover with \\(H\\). We prove that it is the same as \\(n(H,g)\\) up to a multiplicative constant. ii) The smallest size girth \\(g\\) graph with a prescribed degree distribution. We discuss this known generalization and argue that the new suggested definitions are superior. We conclude with experimental results for a specific base graph and with some conjectures and open problems.
A Note on Unsatisfiable k -CNF Formulas with Few Occurrences per Variable
2006
The (k,s)-SAT problem is the satisfiability problem restricted to instances where each clause has exactly k literals and every variable occurs at most s times. It is known that there exists a function f such that for s \\leq f(k) all (k,s)-SAT instances are satisfiable, but (k,f(k)+1)-SAT is already NP-complete (k \\geq 3). We prove that f(k) = O(2k \\cdot log k/k), improving upon the best known upper bound O(2k/kalpha), where alpha=log3 4 - 1 \\approx 0.26. The new upper bound is tight up to a log k factor with the best known lower bound Omega(2k/k).
Journal Article
Entropy and the growth rate of universal covering trees
2024
This work studies the relation between two graph parameters, \\(\\rho\\) and \\(\\Lambda\\). For an undirected graph \\(G\\), \\(\\rho(G)\\) is the growth rate of its universal covering tree, while \\(\\Lambda(G)\\) is a weighted geometric average of the vertex degree minus one, corresponding to the rate of entropy growth for the non-backtracking random walk (NBRW). It is well known that \\(\\rho(G) \\geq \\Lambda(G)\\) for all graphs, and that graphs with \\(\\rho=\\Lambda\\) exhibit some special properties. In this work we derive an easy to check, necessary and sufficient condition for the equality to hold. Furthermore, we show that the variance of the number of random bits used by a length \\(\\ell\\) NBRW is \\(O(1)\\) if \\(\\rho = \\Lambda\\) and \\(\\Omega(\\ell)\\) if \\(\\rho > \\Lambda\\). As a consequence we exhibit infinitely many non-trivial examples of graphs with \\(\\rho = \\Lambda\\).
The Moore Bound for Irregular Graphs
2002
What is the largest number of edges in a graph of order n and girth g? For d-regular graphs, essentially the best known answer is provided by the Moore bound. This result is extended here to cover irregular graphs as well, yielding an affirmative answer to an old open problem ([4] p. 163, problem 10). [PUBLICATION ABSTRACT]
Journal Article
Reducing the size of resolution proofs in linear time
by
Strichman, Ofer
,
Fuhrmann, Oded
,
Shacham, Ohad
in
Algorithms
,
Computer programs
,
Computer Science
2011
DPLL-based SAT solvers progress by implicitly applying binary resolution. The resolution proofs that they generate are used, after the SAT solver’s run has terminated, for various purposes. Most notable uses in formal verification are: extracting an
unsatisfiable core
, extracting an
interpolant
, and detecting clauses that can be reused in an
incremental satisfiability
setting (the latter uses the proof only implicitly, during the run of the SAT solver). Making the resolution proof smaller can benefit all of these goals: it can lead to smaller cores, smaller interpolants, and smaller clauses that are propagated to the next SAT instance in an incremental setting. We suggest two methods that are linear in the size of the proof for doing so. Our first technique, called R
ecycle
-U
nits
uses each learned constant (unit clause) (
x
) for simplifying resolution steps in which
x
was the pivot, prior to when it was learned. Our second technique, called simplifies proofs in which there are several nodes in the resolution graph, one of which dominates the others, that correspond to the same pivot. Our experiments with industrial instances show that these simplifications reduce the core by ≈5% and the proof by ≈13%. It reduces the core less than competing methods such as
run- till- fix
, but whereas our algorithms are linear in the size of the proof, the latter and other competing techniques are all exponential as they are based on SAT runs. If we consider the size of the proof (the resolution graph) as being polynomial in the number of variables (it is not necessarily the case in general), this gives our method an exponential time reduction comparing to existing tools for small core extraction. Our experiments show that this result is evident in practice more so for the second method: rarely it takes more than a few seconds, even when competing tools time out, and hence it can be used as a cheap proof post-processing procedure.
Journal Article