Catalogue Search | MBRL

The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central

by Krüger, Frank , Schindler, David , Bensmann, Felix in Academic publications , Analysis , Bibliographical citations

2022

Science across all disciplines has become increasingly data-driven, leading to additional needs with respect to software for collecting, processing and analysing data. Thus, transparency about software used as part of the scientific process is crucial to understand provenance of individual research data and insights, is a prerequisite for reproducibility and can enable macro-analysis of the evolution of scientific methods over time. However, missing rigor in software citation practices renders the automated detection and disambiguation of software mentions a challenging problem. In this work, we provide a large-scale analysis of software usage and citation practices facilitated through an unprecedented knowledge graph of software mentions and affiliated metadata generated through supervised information extraction models trained on a unique gold standard corpus and applied to more than 3 million scientific articles. Our information extraction approach distinguishes different types of software and mentions, disambiguates mentions and outperforms the state-of-the-art significantly, leading to the most comprehensive corpus of 11.8 M software mentions that are described through a knowledge graph consisting of more than 300 M triples. Our analysis provides insights into the evolution of software usage and citation patterns across various fields, ranks of journals, and impact of publications. Whereas, to the best of our knowledge, this is the most comprehensive analysis of software use and citation at the time, all data and models are shared publicly to facilitate further research into scientific use and citation of software.

Journal Article

Share this book

Add to My Shelf

Whom are you going to call? determinants of @-mentions in Github discussions

by Premkumar Devanbu , Filkov, Vladimir , Kavaler, David in Determinants , Prediction models , Visibility

2019

Open Source Software (OSS) project success relies on crowd contributions. When an issue arises in pull-request based systems, @-mentions are used to call on people to task; previous studies have shown that @-mentions in discussions are associated with faster issue resolution. In most projects there may be many developers who could technically handle a variety of tasks. But OSS supports dynamic teams distributed across a wide variety of social and geographic backgrounds, as well as levels of involvement. It is, then, important to know whom to call on, i.e., who can be relied or trusted with important task-related duties, and why. In this paper, we sought to understand which observable socio-technical attributes of developers can be used to build good models of them being future @-mentioned in GitHub issues and pull request discussions. We built overall and project-specific predictive models of future @-mentions, in order to capture the determinants of @-mentions in each of two hundred GitHub projects, and to understand if and how those determinants differ between projects. We found that visibility, expertise, and productivity are associated with an increase in @-mentions, while responsiveness is not, in the presence of a number of control variables. Also, we find that though project-specific differences exist, the overall model can be used for cross-project prediction, indicating its GitHub-wide utility.

Journal Article

Share this book

Add to My Shelf

Social media in GitHub： the role of @-mention in assisting software development

by Yang ZHANG Huaimin WANG Gang YIN Tao WANG Yue YU in Collaboration , Computer Science , Cooperation

2017

Recently, many researches propose that social media tools can promote the collaboration among developers, which are beneficial to the software development. Nevertheless, there is little empirical evidence to confirm that using @-mention has indeed a beneficial impact on the issues in GitHub. In order to begin investigating such claim, we examine data from two large and successful projects hosted on GitHub, the Ruby on Rails and the AngularJS. By using qualitative and quantitative analysis, we give an in-depth understanding on how @-mention is used in the issues and the role of @-mention in assisting software development. Our statistical results indicate that, Q-mention attracts more participants and tends to be used in the difficult issues. @-mention favors the solving process of issues by enlarging the visibility of issues and facilitating the developers＇ collaboration. Our study also build an @-network based on the Q-mention database we extracted. Through the @-network, we investigate its evolution over time and prove that we certainly have the potential to mine the relationships and characteristics of developers by exploiting the knowledge from the @-network.

Journal Article

Share this book

Add to My Shelf

Relation Enhanced Neural Model for Type Classification of Entity Mentions with a Fine-Grained Taxonomy

by Ma, Jun , Chen, Zhu-Min , Cui, Kai-Yuan in Artificial Intelligence , Classification , Computer Science

2017

Inferring semantic types of the entity mentions in a sentence is a necessary yet challenging task. Most of existing methods employ a very coarse-grained type taxonomy, which is too general and not exact enough for many tasks. However, the performances of the methods drop sharply when we extend the type taxonomy to a fine-grained one with several hundreds of types. In this paper, we introduce a hybrid neural network model for type classification of entity mentions with a fine-grained taxonomy. There are four components in our model, namely, the entity mention component, the context component, the relation component, the already known type component, which are used to extract features from the target entity mention, context, relations and already known types of the entity mentions in surrounding context respectively. The learned features by the four components are concatenated and fed into a softmax layer to predict the type distribution. We carried out extensive experiments to evaluate our proposed model. Experimental results demonstrate that our model achieves state-of-the-art performance on the FIGER dataset. Moreover, we extracted larger datasets from Wikipedia and DBpedia. On the larger datasets, our model achieves the comparable performance to the state-of-the-art methods with the coarse-grained type taxonomy, but performs much better than those methods with the fine-grained type taxonomy in terms of micro- F 1, macro- F 1 and weighted- F 1.

Journal Article

Share this book

Add to My Shelf

Nested relation extraction with iterative neural network

by LI, Hongwei , CHEN, Dian , XU, Zhengqi in Annotations , Computer Science , Iterative methods

2021

Most existing researches on relation extraction focus on binary flat relations like BornIn relation between a Person and a Location. But a large portion of objective facts described in natural language are complex, especially in professional documents in fields such as finance and biomedicine that require precise expressions. For example, \"the GDP of the United States in 2018 grew 2.9% compared with 2017\" describes a growth rate relation between two other relations about the economic index, which is beyond the expressive power of binary flat relations. Thus, we propose the nested relation extraction problem and formulate it as a directed acyclic graph (DAG) structure extraction problem. Then, we propose a solution using the Iterative Neural Network which extracts relations layer by layer. The proposed solution achieves 78.98 and 97.89 F1 scores on two nested relation extraction tasks, namely semantic cause-and-effect relation extraction and formula extraction. Furthermore, we observe that nested relations are usually expressed in long sentences where entities are mentioned repetitively, which makes the annotation difficult and errorprone. Hence, we extend our model to incorporate a mentioninsensitive mode that only requires annotations of relations on entity concepts (instead of exact mentions) while preserving most of its performance. Our mention-insensitive model performs better than the mention sensitive model when the random level in mention selection is higher than 0.3.

Journal Article

Share this book

Add to My Shelf

Semi-automatic Annotation for Mentions in Hindi Text

by Dutta, Kamlesh , Lata, Kusum , Singh, Pardeep in Ancient languages , Annotations , Automation

2023

Annotated corpora are required for the development of modern, accurate, and robust techniques for Natural Language Processing (NLP) downstream applications. The annotated data contain additional information which is required to train the system in many tasks such as parsing, named entity recognizer, etc. For low-resource languages or domains, manually labeled datasets are costly to construct and often unavailable. An alternative way to annotate datasets in a faster and cheaper way is the automatic annotation approach. The evaluation and training of the Mention detection system need a lot of annotated resources. However, there is a scarcity of annotated corpus and annotation guidelines in the Hindi language. Annotation of mentions such as name, pronominal, and nominal in the document is very important for various NLP downstream applications such as Coreference Resolution (CR), Machine Translation (MT), Automatic Text Summarization (ATS), Information Extraction (IE), etc. Almost all these applications require correct identification of mentions. In contrast to English, Hindi is the language of free word order. This free word order imposes additional limitations on various NLP applications. The paper describes a proposal for semi-automatic annotation of mentions in Hindi text. The proposed method is based on Rule-based mention detection followed by post-editing and adopts the Begin Inside-Outside (BIO) format for annotating mentions in Hindi text. The dataset consists of 3.6 K sentences and 78 K tokens. We conclude that our approach can reduce the effort of extending a seed training corpus and show the inter-annotator agreement for Hindi text sentences.

Journal Article

Share this book

Add to My Shelf

Locating targets through mention in Twitter

by Zhu, Hengshu , Ni, Zhiwei , Tang, Liyang in Computer Science , Database Management , Digital media

2015

With the explosive development of social networks, there are excessive amount of user-generated contents available on social media platforms. Indeed, in social networks, it is now a big challenge to promote the right information to the right audiences at the right time. To this end, in this paper, we propose an integrated study of the mention mechanism in social media platforms, such as Twitter, towards locating target audiences for specific information. The study goal is to identify effective targets with high relevance and achieve high response rate as well. Along this line, we formulate the problem of locating targets when posting promotion-oriented messages as a ranking based recommendation task, and present a context-aware recommendation framework as a solution. Specifically, we first extract four categories of features, namely content, social, location and time based features, to measure the relevance among publishers, targets and promotion messages. Then, we employ Ranking Support Vector Machine (SVM) model as the solution to our ranking based recommendation problem. By introducing two bias adjustment parameters, i.e., confidence contributions of publishers and the responsiveness of targets, our framework can effectively recommend top K proper users to mention. Finally, to validate the proposed approach, we conduct extensive experiments on a real world dataset collected from Twitter. The experimental results clearly show that our approach outperforms other baselines with a significant margin.

Journal Article

Share this book

Add to My Shelf

Mention Me Announced Winner of Retail Week Buzz Showcase

in Awards & honors , Electronic commerce , Electronics industry

2016

Newsletter

Share this book

Add to My Shelf

eHarmony.co.uk and Mention Me Announce Marketing Partnership

in Dating services , Distribution channels , eHarmony.co.uk and Mention Me

2016