Asset Details

MbrlCatalogueTitleDetail

Dissertation

Model Interpretability for Natural Language Processing Applications

Chrysostomou, George

2022

Overview

This thesis focuses on model interpretability, an area concerned with under- standing model predictions in Natural Language Processing (NLP) tasks. The increase in adoption of opaque models, such as BERT, leads to an increasing need for explaining their predictions. This is typically performed by extract- ing a sub-set of the input, that is indicative of the true reasoning behind the model's prediction (i.e. a faithful explanation or rationale). Whilst there are multiple approaches in literature for extracting explana- tions (e.g. feature attribution methods), some faced criticism about how faith- ful they are. Furthermore, explanation faithfulness also depends on the model employed, where highly parametrised models have been shown to produce less faithful explanations. Previous research has also shown that there is no sin- gle best feature attribution method across models, tasks and even instances of the same dataset, whilst finding a rationale length is still an open problem. Additionally, a limitation of current evaluations for explanation faithfulness, is that they are performed on a held-out dataset coming from the same do- main (i.e. the data they are evaluated on, are from the same distribution as the training data). However, we are not aware how faithfulness is impacted in out-of-domain settings. The main aim of this thesis therefore, is to improve and evaluate the faith- fulness of explanations in NLP applications. First, we improve the faithfulness of explanations extracted using attention mechanisms, a popular component used in neural NLP models. In a similar direction, we show improvements on the faithfulness of explanations from feature attribution approaches, when us- ing large language models. We then address the problem of specifying a priori a feature scoring method, rationale length and type. Finally, we evaluate the faithfulness of explanations in out-of-domain settings, highlighting a problem when using popular faithfulness evaluation metrics.

Share this book

Add to My Shelf

Publisher

ProQuest Dissertations & Theses

Subject

Natural language processing