Catalogue Search | MBRL

Kladenští type as a problem of automatic morphological analysis

by Žižková, Hana , Osolsobě, Klára in automatic morphological analysis , derivational type , part of speech transition

2021

The aim of our paper is to demonstrate the procedures by which the data needed to refine tools for automatic morphological analysis of Czech can be obtained using a corpus, namely the Araneum Bohemicum IV Maximum (Czech, 20.03) 7.10 G web corpus of the ARANEA series and Araneum Bohemicum Maximum (Czech, 15.04) 3,20 G (hereinafter Araneum). Particularly, we will focus on propria of the Kladenští type, i.e., substantivized adjectives of denoting groups of persons according to affiliation. The goal of the probe into the Aranea web corpus is: 1) a corpus-based description of frequented properties of the type, which can be used as a starting point for rule disambiguation; 2) creating a list of the most frequent lemmas belonging to the type, which can then be included into dictionaries of automatic morphological analyzers (e.g. the MorfFlex dictionary by Hajič and Hlaváčová). We believe that the probe can help improve the results of tools for automatic morphological analysis of Czech.

Journal Article

Share this book

Add to My Shelf

Improving Nominalized Adjectives Tagging

by Žižková, Hana , Osolsobě, Klára in Adjectives , Alliances , automatic morphological analysis

2019

Part of speech transitions represent an interesting issue in terms of Automatic Morphological Analysis (AMA). In these cases, two parts of speech have to be considered: initial and final. However, their automatic recognition is complicated by the same form. This article presents the results of a corpus study aimed at mapping nominalized adjectives tagging with a focus on detecting candidates for nominalization among frequent adjectives. Analysis of the data obtained from the ČNK SYN v5 corpus shows different reasons for incorrect tagging. Taking into account these reasons, we propose three solutions for the improvement nominalized adjectives tagging.

Journal Article

Share this book

Add to My Shelf

Nástroj na tvaroslovnou analýzu staré angličtiny : Morphological Analyser of Old English

by Ondřej Tichý in automatic morphological analysis , automatická morfologická analýza , computational linguistics

2017

The paper describes the construction and testing of an electronic application for semi-automatic morphological analysis of Old English. It introduces the state of the art in the field of electronic analysis of Old English, provides a brief overview of Old English morphology and discusses the reasoning behind our theoretical framework. An account of the chosen methodology is offered and a specific description of its implementation is provided: from the acquisition and preparation of the lexical input data, through the programming of the forms generator to the testing of the results by analysing Old English text. The resulting recall of 95% is a success; however, the paper also hints at how it may be improved. It also discusses further use and development of the analyser, especially the disambiguation of its results. The paper makes a future semi-automatic morphological tagging of Old English texts a real possibility.

Journal Article

Share this book

Add to My Shelf

Compound Adverbs as an Issue in Machine Analysis of Czech language

by Žižková, Hana in Adverbials , Adverbs , automatic morphological analysis

2017

Compound adverbs represent an interesting issue in terms of Automatic Morphological Analysis (AMA). The reason is that compound adverbs in Czech are expressions formed by compounding existing words that are different parts of speech without any change in their form. An indicative sign of compound adverbs is that they can always be decomposed again. Compound adverbs may be written as one word but sometimes a multiword form coexists. A word that is originally a different part of speech gains an adverbial meaning and becomes an adverb. This article presents the results of a corpus probe aimed at mapping expressions that are demonstrably compound adverbs and were not recognized by AMA or were incorrectly tagged by AMA as another part of speech. Analysis of data obtained from the Czech National Corpus (ČNK) SYN v3 show that the unrecognized and incorrectly tagged units can be divided into several groups. Based on knowledge of these groups it is possible to refine part of speech tagging by AMA. The corpus probe examined units written in accordance with the current codification as well as substandard units.

Journal Article

Share this book

Add to My Shelf

Korpusy jako zdroje dat pro úpravy nástrojů automatické morfologické analýzy (Slovotvorné varianty adjektiv na (ou)|ící z hlediska morfologického značkování) : Corpora as Data Sources for the Up-Grading of Morphological Tagging

by Osolsobě, Klára in automatic morphological analysis , automatická morfologická analýza , derivational

2015

Adjectives ending with -oucí/-ící are regularly derived from verbs and hence are not usually listed in any of the Czech monolingual dictionaries. On the level of automatic morphological analysis (the dictionary) of Czech they should be generated from verbal roots and tagged as verbal adjectives (pos tag: AG.*). The data from Czech corpora prove a) inconsistencies in tagging and b) gaps in the dictionary. The main cause of both kinds of insufficiency is the existence of variants on the level of verbal forms from which the verbal adjectives are potentially derived. Consequently, text corpora are a significant sourceof knowledge about the formation and use of adjectives with endings -oucí/-ící that can be important for both a) automatic morphological analysis of Czech and b) theoretical description of Czech grammar(derivational morphology). Our goal is to present a corpus-based study of the Czech gerund, i.e. verbaladjectives with -oucí/-ící. The link between the inflected and the word-formation variants will bedemonstrated using material from the SYN corpus (2,6 billion tokens of written Czech) and the large web corpus czTenTen12 (5,2 billion tokens of Czech text from the Internet — cleaned and deduplicated).

Journal Article

Share this book

Add to My Shelf

Nástroj na tvaroslovnou analýzu staré angličtiny

by Tichý, Ondřej in Computational linguistics , Historical text analysis , Language history

2017

The paper describes the construction and testing of an electronic application for semi-automatic morphological analysis of Old English. It introduces the state of the art in the field of electronic analysis of Old English, provides a brief overview of Old English morphology and discusses the reasoning behind our theoretical framework. An account of the chosen methodology is offered and a specific description of its implementation is provided: from the acquisition and preparation of the lexical input data, through the programming of the forms generator to the testing of the results by analysing Old English text. The resulting recall of 95% is a success; however, the paper also hints at how it may be improved. It also discusses further use and development of the analyser, especially the disambiguation of its results. The paper makes a future semi-automatic morphological tagging of Old English texts a real possibility.

Journal Article

Share this book

Add to My Shelf

Korpusy jako zdroje dat pro úpravy nástrojů automatické morfologické analýzy (Slovotvorné varianty adjektiv na (ou)|ící z hlediska morfologického značkování)

by Osolsobě, Klára in Adjectives , Computational linguistics , Corpus linguistics

2015

Adjectives ending with -oucí/-ící are regularly derived from verbs and hence are not usually listed in any of the Czech monolingual dictionaries. On the level of automatic morphological analysis (the dictionary) of Czech they should be generated from verbal roots and tagged as verbal adjectives (pos tag: AG.*). The data from Czech corpora prove a) inconsistencies in tagging and b) gaps in the dictionary. The main cause of both kinds of insufficiency is the existence of variants on the level of verbal forms from which the verbal adjectives are potentially derived. Consequently, text corpora are a significant source of knowledge about the formation and use of adjectives with endings -oucí/-ící that can be important for both a) automatic morphological analysis of Czech and b) theoretical description of Czech grammar (derivational morphology). Our goal is to present a corpus-based study of the Czech gerund, i.e. verbal adjectives with -oucí/-ící. The link between the inflected and the word-formation variants will be demonstrated using material from the SYN corpus (2,6 billion tokens of written Czech) and the large web corpus czTenTen12 (5,2 billion tokens of Czech text from the Internet — cleaned and deduplicated).

Journal Article

Share this book

Add to My Shelf

ANÁLISIS MORFOLÓGICO CON HERRAMIENTAS INFORMÁTICAS: RECONOCIMIENTO DE NOMBRES EN TEXTOS DE ESPAÑOL CON EL SISTEMA NOOJ

by Tramallino, Carolina Paola in análisis automático morfológico , Automatic Morphological Analysis , Computational Linguistics

2013

Este trabajo tiene como objetivo mostrar los alcances de la lingüística computacional en el uso de herramientas informáticas para el análisis automático morfológico. Se describen dos programas: por un lado, Smorph, software creado por Gabriel Bes, cuya formalización refiere al lema y terminaciones; por otro, el sistema Nooj, diseñado por Marx Silverstein para realizar el análisis morfológico, sintáctico y semántico de lenguas naturales. Debido a que este aún no posee datos lingüísticos correspondientes al español, se mostrará la adaptación de los modelos correspondientes a la categoría nombre, declarados en Smorph para la creación de gramáticas y diccionarios en español, necesarios en Nooj.

Journal Article

Share this book

Add to My Shelf

Automatic recognition of different types of acute leukaemia in peripheral blood by image analysis

by Acevedo, Andrea , Rodellar, José , Molina, Angel in Artificial intelligence , Automatic classification , Automation

2019

AimsMorphological differentiation among different blast cell lineages is a difficult task and there is a lack of automated analysers able to recognise these abnormal cells. This study aims to develop a machine learning approach to predict the diagnosis of acute leukaemia using peripheral blood (PB) images.MethodsA set of 442 smears was analysed from 206 patients. It was split into a training set with 75% of these smears and a testing set with the remaining 25%. Colour clustering and mathematical morphology were used to segment cell images, which allowed the extraction of 2,867 geometric, colour and texture features. Several classification techniques were studied to obtain the most accurate classification method. Afterwards, the classifier was assessed with the images of the testing set. The final strategy was to predict the patient’s diagnosis using the PB smear, and the final assessment was done with the cell images of the smears of the testing set.ResultsThe highest classification accuracy was achieved with the selection of 700 features with linear discriminant analysis. The overall classification accuracy for the six groups of cell types was 85.8%, while the overall classification accuracy for individual smears was 94% as compared with the true confirmed diagnosis.ConclusionsThe proposed method achieves a high diagnostic precision in the recognition of different types of blast cells among other mononuclear cells circulating in blood. It is the first encouraging step towards the idea of being a diagnostic support tool in the future.

Journal Article

Share this book

Add to My Shelf

Towards Automatic Expressive Pipa Music Transcription Using Morphological Analysis of Photoelectric Signals

by Wang, Qiao , Zhang, Yunxiao , Wang, Yuancheng in Algorithms , amplitude modulation-frequency modulation (AM-FM) , Analysis

2025

The musical signal produced by plucked instruments often exhibits non-stationarity due to variations in the pitch and amplitude, making pitch estimation a challenge. In this paper, we assess different transcription processes and algorithms applied to signals captured by optical sensors mounted on a pipa—a traditional Chinese plucked instrument—played using a range of techniques. The captured signal demonstrates a distinctive arched feature during plucking. This facilitates onset detection to avoid the impact of the spurious energy peaks within vibration areas that arise from pitch-shift playing techniques. Subsequently, we developed a novel time–frequency feature, known as continuous time-period mapping (CTPM), which contains pitch curves. The proposed process can also be applied to playing techniques that mix pitch shifts and tremolo. When evaluated on four renowned pipa music pieces of varying difficulty levels, our fully time-domain-based onset detectors outperformed four short-time methods, particularly during tremolo. Our zero-crossing-based pitch estimator achieved a performance comparable to short-time methods with a far better computational efficiency, demonstrating its suitability for use in a lightweight algorithm in future work.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter