Catalogue Search | MBRL

Multiple social platforms reveal actionable signals for software vulnerability awareness: A study of GitHub, Twitter and Reddit

by Volkova, Svitlana , Sathanur, Arun , Shrestha, Prasha in Community structure , Computer and Information Sciences , Computer programs

2020

The awareness about software vulnerabilities is crucial to ensure effective cybersecurity practices, the development of high-quality software, and, ultimately, national security. This awareness can be better understood by studying the spread, structure and evolution of software vulnerability discussions across online communities. This work is the first to evaluate and contrast how discussions about software vulnerabilities spread on three social platforms-Twitter, GitHub, and Reddit. Moreover, we measure how user-level e.g., bot or not, and content-level characteristics e.g., vulnerability severity, post subjectivity, targeted operating systems as well as social network topology influence the rate of vulnerability discussion spread. To lay the groundwork, we present a novel fundamental framework for measuring information spread in multiple social platforms that identifies spread mechanisms and observables, units of information, and groups of measurements. We then contrast topologies for three social networks and analyze the effect of the network structure on the way discussions about vulnerabilities spread. We measure the scale and speed of the discussion spread to understand how far and how wide they go, how many users participate, and the duration of their spread. To demonstrate the awareness of more impactful vulnerabilities, a subset of our analysis focuses on vulnerabilities targeted during recent major cyber-attacks and those exploited by advanced persistent threat groups. One of our major findings is that most discussions start on GitHub not only before Twitter and Reddit, but even before a vulnerability is officially published. The severity of a vulnerability contributes to how much it spreads, especially on Twitter. Highly severe vulnerabilities have significantly deeper, broader and more viral discussion threads. When analyzing vulnerabilities in software products we found that different flavors of Linux received the highest discussion volume. We also observe that Twitter discussions started by humans have larger size, breadth, depth, adoption rate, lifetime, and structural virality compared to those started by bots. On Reddit, discussion threads of positive posts are larger, wider, and deeper than negative or neutral posts. We also found that all three networks have high modularity that encourages spread. However, the spread on GitHub is different from other networks, because GitHub is more dense, has stronger community structure and assortativity that enhances information diffusion. We anticipate the results of our analysis to not only increase the understanding of software vulnerability awareness but also inform the existing and new analytical frameworks for simulating information spread e.g., disinformation across multiple social environments online.

Journal Article

Share this book

Add to My Shelf

Opinions, influence, and zealotry: a computational study on stubbornness

by Arendt, Dustin L. , Blaha, Leslie M. in Adoption of innovations , Algorithms , Analysis

2015

We present a simple, efficient, and predictive model for opinion dynamics with zealots. Our model captures curvature-driven dynamics (e.g., clear, smooth boundaries separating domains whose curvature decreases over time) through a simple, individual rule, providing a method for rapidly testing basic hypotheses about innovation diffusion, opinion dynamics, and related phenomena. Our model belongs to a class of models called dimer automata, which are asynchronous, graph-based (i.e., non-uniform lattice) variants of cellular automata. Individuals in the model update their states via a dyadic update rule; population opinion dynamics emerge from these pairwise interactions. Zealots are stubborn individuals whose opinion is not susceptible to influence by others. We observe experimentally that a system without zealots usually converges to the majority opinion, but a relatively small number of zealots can sway the opinion of the whole population. The influence of zealots can be further increased by placing zealots at more effective locations within the network. These locations can be determined by rankings from standard social network analysis metrics, or by using a greedy algorithm for influence maximization. We apply the influence maximization technique to a politically polarized social network to explore opinion dynamics in a real-world network and to gain insight about influence and political entrenchment through the zealot model’s ability to sway the entire network to one side or the other.

Journal Article

Share this book

Add to My Shelf

Comparing explanations in RL

by Pierson, Britt Davis , Taylor, Matthew E. , Arendt, Dustin in Algorithms , Artificial Intelligence , Computational Biology/Bioinformatics

2024

As deep reinforcement learning (RL)’s capabilities surpass traditional reinforcement learning, the community is working to make these black boxes less opaque. Explanations about algorithms’ choices and strategies serve this purpose. However, information about RL algorithms’ operations is not easily accessible. Our research aimed to extract such information, use it to build explanations, and test those explanations with users. For our user study, eight RL agents were trained using OpenAI baselines. Then, the HIGHLIGHTS-VIS algorithm was created by altering previous HIGHLIGHTS algorithms to collect data about the agents’ interactions with the environment. The data were used to create explanations that were compared to previous works’ video-based summaries in a user study. The between-subjects user study had participants answer questions about different agents. Participants were measured using both self-reported trust and performance on downstream tasks. Downstream tasks are tasks that a participant is more likely to do correctly with the information contained in the explanation. Collecting data about both trust and the utility of explanations allowed comparison and analysis of the explanations’ effectiveness. Results showed that the alternative explanations built from the collected data led to more correct answers about the agents and their strategies. Additionally, explanations’ utility depended on the context. Finally, users’ reported trust in an explanation did not directly correlate to performance. These results suggest trust and effectiveness may need to be measured and calibrated separately in future examinations of explanations.

Journal Article

Share this book

Add to My Shelf

Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods

by Volkova, Svitlana , Saldanha, Emily , Aksoy, Sinan in Artificial intelligence , Behavior , Causal models

2023

Ground Truth program was designed to evaluate social science modeling approaches using simulation test beds with ground truth intentionally and systematically embedded to understand and model complex Human Domain systems and their dynamics Lazer et al. (Science 369:1060–1062, 2020). Our multidisciplinary team of data scientists, statisticians, experts in Artificial Intelligence (AI) and visual analytics had a unique role on the program to investigate accuracy, reproducibility, generalizability, and robustness of the state-of-the-art (SOTA) causal structure learning approaches applied to fully observed and sampled simulated data across virtual worlds. In addition, we analyzed the feasibility of using machine learning models to predict future social behavior with and without causal knowledge explicitly embedded. In this paper, we first present our causal modeling approach to discover the causal structure of four virtual worlds produced by the simulation teams—Urban Life, Financial Governance, Disaster and Geopolitical Conflict. Our approach adapts the state-of-the-art causal discovery (including ensemble models), machine learning, data analytics, and visualization techniques to allow a human-machine team to reverse-engineer the true causal relations from sampled and fully observed data. We next present our reproducibility analysis of two research methods team’s performance using a range of causal discovery models applied to both sampled and fully observed data, and analyze their effectiveness and limitations. We further investigate the generalizability and robustness to sampling of the SOTA causal discovery approaches on additional simulated datasets with known ground truth. Our results reveal the limitations of existing causal modeling approaches when applied to large-scale, noisy, high-dimensional data with unobserved variables and unknown relationships between them. We show that the SOTA causal models explored in our experiments are not designed to take advantage from vasts amounts of data and have difficulty recovering ground truth when latent confounders are present; they do not generalize well across simulation scenarios and are not robust to sampling; they are vulnerable to data and modeling assumptions, and therefore, the results are hard to reproduce. Finally, when we outline lessons learned and provide recommendations to improve models for causal discovery and prediction of human social behavior from observational data, we highlight the importance of learning data to knowledge representations or transformations to improve causal discovery and describe the benefit of causal feature selection for predictive and prescriptive modeling.

Journal Article

Share this book

Add to My Shelf

Explainability in JupyterLab and Beyond: Interactive XAI Systems for Integrated and Collaborative Workflows

by Guo, Grace , Arendt, Dustin , Endert, Alex in Best practice , Collaboration , Explainable artificial intelligence

2024

Explainable AI (XAI) tools represent a turn to more human-centered and human-in-the-loop AI approaches that emphasize user needs and perspectives in machine learning model development workflows. However, while the majority of ML resources available today are developed for Python computational environments such as JupyterLab and Jupyter Notebook, the same has not been true of interactive XAI systems, which are often still implemented as standalone interfaces. In this paper, we address this mismatch by identifying three design patterns for embedding front-end XAI interfaces into Jupyter, namely: 1) One-way communication from Python to JavaScript, 2) Two-way data synchronization, and 3) Bi-directional callbacks. We also provide an open-source toolkit, bonXAI, that demonstrates how each design pattern might be used to build interactive XAI tools for a Pytorch text classification workflow. Finally, we conclude with a discussion of best practices and open questions. Our aims for this paper are to discuss how interactive XAI tools might be developed for computational notebooks, and how they can better integrate into existing model development workflows to support more collaborative, human-centered AI.

Paper

Share this book

Add to My Shelf

Phishing in the Wild

by Yun, JY , Fallon, CK , Arendt, DL in Business hours , Cybercrime , Decision making

2023

In this research, 153 employees at a National Laboratory received one of four different phishing emails. All of the emails were similar in content, but systematically varied according to the number and combination of phishing tactics in the message. Participants were unaware they would be receiving the email, which was sent during regular business hours. After receiving the emails, participants completed online questionnaires designed to measure possible predictors of phishing attack susceptibility. The significant predictors included how suspicious participants were of the email and their reported level of distress related to their work prior to completing the study.

Journal Article

Share this book

Add to My Shelf

Steinhaus Filtration and Stable Paths in the Mapper

by Broussard, Matthew , Arendt, Dustin L , Thrall, Amber in Datasets , Filtration , Homology

2025

We define a new filtration called the Steinhaus filtration built from a single cover based on a generalized Steinhaus distance, a generalization of Jaccard distance. The homology persistence module of a Steinhaus filtration with infinitely many cover elements may not be \\(q\\)-tame, even when the covers are in a totally bounded space. While this may pose a challenge to derive stability results, we show that the Steinhaus filtration is stable when the cover is finite. We show that while the Čech and Steinhaus filtrations are not isomorphic in general, they are isomorphic for a finite point set in dimension one. Furthermore, the VR filtration completely determines the \\(1\\)-skeleton of the Steinhaus filtration in arbitrary dimension. We then develop a language and theory for stable paths within the Steinhaus filtration. We demonstrate how the framework can be applied to several applications where a standard metric may not be defined but a cover is readily available. We introduce a new perspective for modeling recommendation system datasets. As an example, we look at a movies dataset and we find the stable paths identified in our framework represent a sequence of movies constituting a gentle transition and ordering from one genre to another. For explainable machine learning, we apply the Mapper algorithm for model induction by building a filtration from a single Mapper complex, and provide explanations in the form of stable paths between subpopulations. For illustration, we build a Mapper complex from a supervised machine learning model trained on the FashionMNIST dataset. Stable paths in the Steinhaus filtration provide improved explanations of relationships between subpopulations of images.

Paper

Share this book

Add to My Shelf

Evaluating Neural Machine Comprehension Model Robustness to Noisy Inputs and Adversarial Attacks

by Volkova, Svitlana , Arendt, Dustin , Wu, Winston in Robustness , Training

2020

We evaluate machine comprehension models' robustness to noise and adversarial attacks by performing novel perturbations at the character, word, and sentence level. We experiment with different amounts of perturbations to examine model confidence and misclassification rate, and contrast model performance in adversarial training with different embedding types on two benchmark datasets. We demonstrate improving model performance with ensembling. Finally, we analyze factors that effect model behavior under adversarial training and develop a model to predict model errors during adversarial attacks.

Paper

Share this book

Add to My Shelf

GPU Acceleration of Many Independent Mid-Sized Simulations on Graphs

by Cao, Yang , Arendt, Dustin

2012

Many GPU parallelizations exist to speedup simulation of complex systems, but these approaches see less benefit when the simulation is not large. Simulation of many independent complex systems is useful for Monte Carlo sampling or for exploring the behavior of many different models at once. We present and evaluate an algorithm for simulating many mid-sized dimer automata (e.g., having tens of thousands of vertices) on the GPU. Our algorithm has, in the best case, a throughput of over 300 million edge updates per second, and a speedup of over 37 on modest GPU hardware. Dimer automata can also be used to design and implement useful computations on graphs. As a test case, we implement a solution to the all pairs shortest path problem using dimer automata and our GPU algorithm, but find that the structure of the graph has a significant effect on the efficiency of the algorithm. [PUBLICATION ABSTRACT]

Conference Proceeding

Share this book

Add to My Shelf

Elastic Dimer Automata: Discrete, Tunable Models for Complex Systems

by Cao, Yang , Arendt, Dustin

2012

Cellular automata, dimer automata, and other similar models are useful tools for understanding complex phenomena using simple rules and discrete states. They have simple implementations that run efficiently without problems such as numerical stability and round off errors that are common with continuous models like PDE's. However, these discrete models, in general, lack a mechanism to tune the desired level of detail or accuracy, an important feature of PDE's. We propose elastic dimer automata as a general way to reconcile the simplicity of discrete models with the tunable nature of continuous models. Furthermore, we present a simple method for measuring self-organization in elastic dimer automata which we use to aid in an exhaustive search for interesting rules. This search revealed simple rules for several interesting phenomena including multiple different types of wave phenomena as well as a mechanism for labyrinthine patterns and dislocation repairing. [PUBLICATION ABSTRACT]

Conference Proceeding

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter