Catalogue Search | MBRL

Hidden Markov models with random restarts versus boosting for malware detection

by Di Troia, Fabio , Stamp, Mark , Raghavan, Aditya in Accuracy , Algorithms , Anti-virus software

2019

Effective and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection research has seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has been used widely in the field of pattern matching in general—and malware detection in particular—is hidden Markov models (HMMs). HMM training is based on a hill climb, and hence we can often improve a model by training multiple times with different initial values. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection. These techniques are applied to a variety of challenging malware datasets. We find that random restarts perform surprisingly well in comparison to boosting. Only in the most difficult “cold start” cases (where training data is severely limited) does boosting appear to offer sufficient improvement to justify its higher computational cost in the scoring phase.

Journal Article

Share this book

Add to My Shelf

Mic-hackathon 2024: hackathon on machine learning for electron and scanning probe microscopy

by Manganaris, Panayotis , Mishra, Himanshu , Paul, Yogesh in Application programming interface , Benchmarks , Data analysis

2025

Microscopy is one of the primary sources of information on materials structure and functionality at the nanometer and atomic scales. The data generated through microscopy is often contained in well-structured datasets, enriched with extensive metadata and sample histories, although not always with the same level of detail or storage format. The broad incorporation of data management plans by major funding agencies ensures the preservation and accessibility of this data. However, deriving insights from these rich datasets remains challenging due to the lack of established code ecosystems, standardized benchmarks, and integration strategies. Correspondingly, the efficiency of data usage is very low, and time expenditures at the analysis stage are enormous. In addition to post-acquisition data analysis, the emergence of application programming interfaces by major microscope manufacturers now creates opportunities for real-time ML-based data analytics to enable automated decision making, and particularly ML-agent controlled real-time microscope operation. Despite these opportunities, there is a significant gap in integrating the ML community with the broader microscopy community, limiting the value that these methods bring to physics and materials discovery and materials optimization. Hackathons address these challenges by fostering collaboration between ML experts and microscopy professionals, encouraging the development of innovative solutions that leverage ML for microscopy and preparing the workforce of the future both for microscopy-intensive domains areas, instrument manufacturers, and ML scientists interested in real world applications for fundamental research, materials optimization, and manufacturing. The hackathon generated benchmark datasets and digital twins of microscopes that further contribute to the development of the field and establish data analysis ecosystems. All the codes can be found at GitHub(https://github.com/KalininGroup/Mic-hackathon-2024-codes-publication/tree/1.0.0.1) and Zenodo (https://zenodo.org/records/15579940).

Journal Article

Share this book

Add to My Shelf

Mic-hackathon 2024: hackathon on machine learning for electron and scanning probe microscopy

by Mishra, Himanshu , Paul, Yogesh , Narasimha, Ganesh in electron microscopy , hackathon , machine learning

2025

Microscopy is one of the primary sources of information on materials structure and functionality at the nanometer and atomic scales. The data generated through microscopy is often contained in well-structured datasets, enriched with extensive metadata and sample histories, although not always with the same level of detail or storage format. The broad incorporation of data management plans by major funding agencies ensures the preservation and accessibility of this data. However, deriving insights from these rich datasets remains challenging due to the lack of established code ecosystems, standardized benchmarks, and integration strategies. Correspondingly, the efficiency of data usage is very low, and time expenditures at the analysis stage are enormous. In addition to post-acquisition data analysis, the emergence of application programming interfaces by major microscope manufacturers now creates opportunities for real-time ML-based data analytics to enable automated decision making, and particularly ML-agent controlled real-time microscope operation. Despite these opportunities, there is a significant gap in integrating the ML community with the broader microscopy community, limiting the value that these methods bring to physics and materials discovery and materials optimization. Hackathons address these challenges by fostering collaboration between ML experts and microscopy professionals, encouraging the development of innovative solutions that leverage ML for microscopy and preparing the workforce of the future both for microscopy-intensive domains areas, instrument manufacturers, and ML scientists interested in real world applications for fundamental research, materials optimization, and manufacturing. The hackathon generated benchmark datasets and digital twins of microscopes that further contribute to the development of the field and establish data analysis ecosystems. All the codes can be found at GitHub(https://github.com/KalininGroup/Mic-hackathon-2024-codes-publication/tree/1.0.0.1) and Zenodo (https://zenodo.org/records/15579940).

Journal Article

Share this book

Add to My Shelf

Rapid optimization in high dimensional space by deep kernel learning augmented genetic algorithms

by Mani Valleti , Kalinin, Sergei V , Raghavan, Aditya in Genetic algorithms , Machine learning , Optimization

2025

Exploration of complex high-dimensional spaces presents significant challenges in fields such as molecular discovery, process optimization, and supply chain management. Genetic Algorithms (GAs), while offering significant power for creating new candidate spaces, often entail high computational demands due to the need for evaluation of each new proposed solution. On the other hand, Deep Kernel Learning (DKL) efficiently navigates the spaces of preselected candidate structures but lacks generative capabilities. This study introduces an approach that amalgamates the generative power of GAs to create new candidates with the efficiency of DKL-based surrogate models to rapidly ascertain the behavior of new candidate spaces. This DKL-GA framework can be further used to build Bayesian Optimization (BO) workflows. We demonstrate the effectiveness of this approach through the optimization of the FerroSIM model, showcasing its broad applicability to diverse challenges, including molecular discovery and battery charging optimization.

Paper

Share this book

Add to My Shelf

Rapid optimization in high dimensional space by deep kernel learning augmented genetic algorithms

by Mani Valleti , Kalinin, Sergei V , Raghavan, Aditya in Genetic algorithms , Machine learning , Molecular chains

2024

Exploration of complex high-dimensional spaces presents significant challenges in fields such as molecular discovery, process optimization, and supply chain management. Genetic Algorithms (GAs), while offering significant power for creating new candidate spaces, often entail high computational demands due to the need for evaluation of each new proposed solution. On the other hand, Deep Kernel Learning (DKL) efficiently navigates the spaces of preselected candidate structures but lacks generative capabilities. This study introduces an approach that amalgamates the generative power of GAs to create new candidates with the efficiency of DKL-based surrogate models to rapidly ascertain the behavior of new candidate spaces. This DKL-GA framework can be further used to build Bayesian Optimization (BO) workflows. We demonstrate the effectiveness of this approach through the optimization of the FerroSIM model, showcasing its broad applicability to diverse challenges, including molecular discovery and battery charging optimization.

Paper

Share this book

Add to My Shelf

Properties of hard-core bosons in potential traps

by Raghavan, Aditya in Condensed matter physics

2009

Following the recent advances in controlling ultracold quantum gases that have led to the realization of boson and fermion condensation, we present computational studies of one-dimensional bosons on optical lattices. These systems have proven to be a great tool to study exciting new frontiers of condensed matter, from exotic phase transitions to nonequilibrium phenomena. In this thesis we study one-dimesional hard-core bosons, which is a limiting case of the Bose-Hubbard model when the on-site repulsion becomes infinite. Previously, this model has been well studied and is known to admit superfluid and Mott insulating phases. The time evolution of hard-core bosons in a one-dimensional trap is studied. Using an exact numerical approach, we study the ratio of the breathing mode and sloshing mode vibrations of the one-dimensional gas. We then study the evolution in a parametrically driven trap. Using an exact numerical approach, the dynamics of the system is determined as the trap curvature is modulated. The response is found to be markedly different in the superfluid and Mott-insulating regimes. By measuring the frequency dependence of the zero-momentum peak of the momentum distribution function, parametric excitations are observed, which depend on the phases present in the system. It is shown that these excitations closely match the quasi-particle spectrum, thus providing a useful tool to probe the low energy excitations of the system in its various phases. We then revisit the one-dimensional Bose-Hubbard model with a finite on-site repulsion. We study the effects of disorder on interacting trapped bosons. At small to moderate disorder strengths it is observed that if there is a Mott plateau at the center of the trap in the clean limit, phase coherence increases as a result of turning on disorder. The localization effects due to correlation and disorder compete against each other, resulting in a partial delocalization of the particles in the Mott region. We show that in this regime delocalization can lead to an increase in phase coherence.

Dissertation

Share this book

Add to My Shelf

Hidden Markov Models with Random Restarts vs Boosting for Malware Detection

by Stamp, Mark , Fabio Di Troia , Raghavan, Aditya in Algorithms , Digital systems , Machine learning

2023

Effective and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection research has seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has been used widely in the field of pattern matching in general-and malware detection in particular-is hidden Markov models (HMMs). HMM training is based on a hill climb, and hence we can often improve a model by training multiple times with different initial values. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection. These techniques are applied to a variety of challenging malware datasets. We find that random restarts perform surprisingly well in comparison to boosting. Only in the most difficult \"cold start\" cases (where training data is severely limited) does boosting appear to offer sufficient improvement to justify its higher computational cost in the scoring phase.

Paper

Share this book

Add to My Shelf

$SAM\$^\$: Task-Adaptive SAM with Physics-Guided Rewards$

SAM\$^\$: Task-Adaptive SAM with Physics-Guided Rewards

by Kalinin, Sergei V , Pratiush, Utkarsh , Barakati, Kamyar in Cellular structure , Data analysis , Image segmentation

2025

Image segmentation is a critical task in microscopy, essential for accurately analyzing and interpreting complex visual data. This task can be performed using custom models trained on domain-specific datasets, transfer learning from pre-trained models, or foundational models that offer broad applicability. However, foundational models often present a considerable number of non-transparent tuning parameters that require extensive manual optimization, limiting their usability for real-time streaming data analysis. Here, we introduce a reward function-based optimization to fine-tune foundational models and illustrate this approach for SAM (Segment Anything Model) framework by Meta. The reward functions can be constructed to represent the physics of the imaged system, including particle size distributions, geometries, and other criteria. By integrating a reward-driven optimization framework, we enhance SAM's adaptability and performance, leading to an optimized variant, SAM\$^*\$, that better aligns with the requirements of diverse segmentation tasks and particularly allows for real-time streaming data segmentation. We demonstrate the effectiveness of this approach in microscopy imaging, where precise segmentation is crucial for analyzing cellular structures, material interfaces, and nanoscale features.

Paper

Share this book

Add to My Shelf

Invariant Discovery of Features Across Multiple Length Scales: Applications in Microscopy and Autonomous Materials Characterization

by Mani Valleti , Liu, Yongtao , Pratiush, Utkarsh in Astronomy , Atomic bonding , Chemical bonds

2024

Physical imaging is a foundational characterization method in areas from condensed matter physics and chemistry to astronomy and spans length scales from atomic to universe. Images encapsulate crucial data regarding atomic bonding, materials microstructures, and dynamic phenomena such as microstructural evolution and turbulence, among other phenomena. The challenge lies in effectively extracting and interpreting this information. Variational Autoencoders (VAEs) have emerged as powerful tools for identifying underlying factors of variation in image data, providing a systematic approach to distilling meaningful patterns from complex datasets. However, a significant hurdle in their application is the definition and selection of appropriate descriptors reflecting local structure. Here we introduce the scale-invariant VAE approach (SI-VAE) based on the progressive training of the VAE with the descriptors sampled at different length scales. The SI-VAE allows the discovery of the length scale dependent factors of variation in the system. Here, we illustrate this approach using the ferroelectric domain images and generalize it to the movies of the electron-beam induced phenomena in graphene and topography evolution across combinatorial libraries. This approach can further be used to initialize the decision making in automated experiments including structure-property discovery and can be applied across a broad range of imaging methods. This approach is universal and can be applied to any spatially resolved data including both experimental imaging studies and simulations, and can be particularly useful for exploration of phenomena such as turbulence, scale-invariant transformation fronts, etc.

Paper

Share this book

Add to My Shelf

Automated Materials Discovery Platform Realized: Scanning Probe Microscopy of Combinatorial Libraries

by Liu, Yu , Pratiush, Utkarsh , Dimitrov, Edgar in Automation , Combinatorial analysis , Composition

2025

Combinatorial materials libraries provide a powerful platform for mapping how physical properties evolve across binary and ternary cross-sections of multicomponent phase diagrams. While synthesis of such libraries has advanced since the 1960s and been accelerated by laboratory automation, their broader utility depends on rapid, quantitative measurements of composition-dependent structures and functionalities. Scanning probe microscopies (SPM), including piezoresponse force microscopy (PFM), offer unique potential for providing these functionally relevant, spatially resolved readouts. Here, we demonstrate a fully automated SPM framework for exploring ferroelectric properties across combinatorial libraries, focusing on binary Sm-doped BiFeO3 (SmBFO) and ternary Al\$_1-x-y\$Sc\$_x\$B\$_y\$N (Al,Sc,B)N systems. In SmBFO, automated exploration identifies the known morphotropic phase boundary with enhanced ferroelectric response and reveals a previously unreported double-peak fine structure. In the (Al,Sc,B)N library, ferroelectric behavior emerges at the phase-stability boundary, correlating with variations in morphology and defect concentration. By integrating automated SPM with wavelength-dispersive spectroscopy (WDS) and photoluminescence mapping, we resolve the composition-morphology-defect-property relationships underlying ferroelectric response and demonstrate a pathway toward a multi-tool, high-throughput characterization platform. Finally, we implement Gaussian-process-based single- and multi-objective Bayesian optimization to enable autonomous exploration, highlighting the Pareto front as a powerful framework for balancing competing physical rewards and accelerating data-driven physics discovery.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter