Catalogue Search | MBRL

Predicting first-time-in-college students’ degree completion outcomes

by Smail, John , Dorodchi, Mohsen , Demeter, Elise in Academic degrees , Admissions policies , Algorithms

2022

About one-third of college students drop out before finishing their degree. The majority of those remaining will take longer than 4 years to complete their degree at “4-year” institutions. This problem emphasizes the need to identify students who may benefit from support to encourage timely graduation. Here we empirically develop machine learning algorithms, specifically Random Forest, to accurately predict if and when first-time-in-college undergraduates will graduate based on admissions, academic, and financial aid records two to six semesters after matriculation. Credit hours earned, college and high school grade point averages, estimated family (financial) contribution, and enrollment and grades in required gateway courses within a student’s major were all important predictors of graduation outcome. We predicted students’ graduation outcomes with an overall accuracy of 79%. Applying the machine learning algorithms to currently enrolled students allowed identification of those who could benefit from added support. Identified students included many who may be missed by established university protocols, such as students with high financial need who are making adequate but not strong degree progress.

Journal Article

Share this book

Add to My Shelf

Can we generate shellcodes via natural language? An empirical study

by Natella, Roberto , Shaikh, Samira , Liguori, Pietro in Accuracy , Artificial Intelligence , Assembly language

2022

Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset ( Shellcode_IA32 ), which consists of 3200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors.

Journal Article

Share this book

Add to My Shelf

A Computational Framework for Socratic Debugging Conversations

by Al-Hossami, Erfan in Artificial intelligence , Computer science , Education

2025

The Socratic teaching method encourages students to solve problems through instructor-guided questioning, rather than providing direct answers. Although this method can enhance learning outcomes, it is both time-consuming and cognitively demanding, limiting instructors' ability to provide individualized attention at scale. Automated Socratic conversational agents offer a promising avenue for supplementing human instruction in programming education, yet their development has been constrained by the lack of appropriate datasets, evaluation frameworks, and principled approaches to dialogue generation. This dissertation presents a computational framework for automated Socratic debugging conversations in novice programming environments. The framework makes three important, interconnected contributions: (1) benchmarks and evaluation standards for Socratic debugging, (2) automated mining of student misconceptions from code submissions, and (3) generation of Socratic dialogue that guides students to discover and correct their errors. First, I introduce the novel task of Socratic debugging and present a benchmark dataset of expert-crafted multi-turn Socratic conversations, which has been used to evaluate various large language models in zero-shot and fine-tuned settings. Second, I describe an automated approach for mining known as well as novel student misconceptions in code submissions, which can provide crucial knowledge for targeted pedagogical interventions. Third, I introduce the concept of Reasoning Trajectories as intermediate representations of Socratic conversations that are designed to guide the student towards statements about code behavior that contradict their misconceptions. The ensuing cognitive dissonance is expected to lead to enduring belief updates that fix the misconception. Overall, the three contributions establish conceptual and computational foundations for automated Socratic agents. While the focus is on programming education, the framework described in this dissertation is generalizable to any domain that can benefit from Socratic teaching of problem-solving skills through guided discovery and correction of misconceptions. Furthermore, this work opens avenues for research on the optimization of personalized Socratic agents.

Dissertation

Share this book

Add to My Shelf

Extraction of Atypical Aspects from Customer Reviews: Datasets and Experiments with Language Models

by Bunescu, Razvan , Al-Hossami, Erfan , Nannaware, Smita in Customers , Datasets , Restaurants

2023

A restaurant dinner may become a memorable experience due to an unexpected aspect enjoyed by the customer, such as an origami-making station in the waiting area. If aspects that are atypical for a restaurant experience were known in advance, they could be leveraged to make recommendations that have the potential to engender serendipitous experiences, further increasing user satisfaction. Although relatively rare, whenever encountered, atypical aspects often end up being mentioned in reviews due to their memorable quality. Correspondingly, in this paper we introduce the task of detecting atypical aspects in customer reviews. To facilitate the development of extraction models, we manually annotate benchmark datasets of reviews in three domains - restaurants, hotels, and hair salons, which we use to evaluate a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer Flan-T5 to zero-shot and few-shot prompting of GPT-3.5.

Paper

Share this book

Add to My Shelf

A Survey on Artificial Intelligence for Source Code: A Dialogue Systems Perspective

by Shaikh, Samira , Al-Hossami, Erfan in Applications programs , Artificial intelligence , Natural language processing

2022

In this survey paper, we overview major deep learning methods used in Natural Language Processing (NLP) and source code over the last 35 years. Next, we present a survey of the applications of Artificial Intelligence (AI) for source code, also known as Code Intelligence (CI) and Programming Language Processing (PLP). We survey over 287 publications and present a software-engineering centered taxonomy for CI placing each of the works into one category describing how it best assists the software development cycle. Then, we overview the field of conversational assistants and their applications in software engineering and education. Lastly, we highlight research opportunities at the intersection of AI for code and conversational assistants and provide future directions for researching conversational assistants with CI capabilities.

Paper

Share this book

Add to My Shelf

Can We Generate Shellcodes via Natural Language? An Empirical Study

by Natella, Roberto , Shaikh, Samira , Liguori, Pietro in Assembly language , Empirical analysis , Language

2022

Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3,200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors.

Paper

Share this book

Add to My Shelf

EVIL: Exploiting Software via Natural Language

by Natella, Roberto , Shaikh, Samira , Al-Hossami, Erfan in Feasibility studies , Machine translation , Natural language

2021

Writing exploits for security assessment is a challenging task. The writer needs to master programming and obfuscation techniques to develop a successful exploit. To make the task easier, we propose an approach (EVIL) to automatically generate exploits in assembly/Python language from descriptions in natural language. The approach leverages Neural Machine Translation (NMT) techniques and a dataset that we developed for this work. We present an extensive experimental study to evaluate the feasibility of EVIL, using both automatic and manual analysis, and both at generating individual statements and entire exploits. The generated code achieved high accuracy in terms of syntactic and semantic correctness.

Paper

Share this book

Add to My Shelf

Can Language Models Employ the Socratic Method? Experiments with Code Debugging

by Smith, Justin , Bunescu, Razvan , Teehan, Ryan in Datasets , Debugging

2023

When employing the Socratic method of teaching, instructors guide students toward solving a problem on their own rather than providing the solution directly. While this strategy can substantially improve learning outcomes, it is usually time-consuming and cognitively demanding. Automated Socratic conversational agents can augment human instruction and provide the necessary scale, however their development is hampered by the lack of suitable data for training and evaluation. In this paper, we introduce a manually created dataset of multi-turn Socratic advice that is aimed at helping a novice programmer fix buggy solutions to simple computational problems. The dataset is then used for benchmarking the Socratic debugging abilities of a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer Flan-T5 to zero-shot and chain of thought prompting of the much larger GPT-4. The code and datasets are made freely available for research at the link below. https://github.com/taisazero/socratic-debugging-benchmark

Paper

Share this book

Add to My Shelf

Shellcode_IA32: A Dataset for Automatic Shellcode Generation

by Natella, Roberto , Shaikh, Samira , Al-Hossami, Erfan in Datasets , Machine translation , Natural language

2022

We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with standard methods in neural machine translation (NMT) to establish baseline performance levels on this task.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter