Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
15,816
result(s) for
"Data Collection - classification"
Sort by:
Using systematic data categorisation to quantify the types of data collected in clinical trials: the DataCat project
by
Crowley, Evelyn
,
McDonald, Alison
,
Breeman, Suzanne
in
Biomedicine
,
Clinical trials
,
Clinical Trials as Topic - statistics & numerical data
2020
Background
Data collection consumes a large proportion of clinical trial resources. Each data item requires time and effort for collection, processing and quality control procedures. In general, more data equals a heavier burden for trial staff and participants. It is also likely to increase costs. Knowing the types of data being collected, and in what proportion, will be helpful to ensure that limited trial resources and participant goodwill are used wisely.
Aim
The aim of this study is to categorise the types of data collected across a broad range of trials and assess what proportion of collected data each category represents.
Methods
We developed a standard operating procedure to categorise data into primary outcome, secondary outcome and 15 other categories. We categorised all variables collected on trial data collection forms from 18, mainly publicly funded, randomised superiority trials, including trials of an investigational medicinal product and complex interventions. Categorisation was done independently in pairs: one person having in-depth knowledge of the trial, the other independent of the trial. Disagreement was resolved through reference to the trial protocol and discussion, with the project team being consulted if necessary.
Key results
Primary outcome data accounted for 5.0% (median)/11.2% (mean) of all data items collected. Secondary outcomes accounted for 39.9% (median)/42.5% (mean) of all data items. Non-outcome data such as participant identifiers and demographic data represented 32.4% (median)/36.5% (mean) of all data items collected.
Conclusion
A small proportion of the data collected in our sample of 18 trials was related to the primary outcome. Secondary outcomes accounted for eight times the volume of data as the primary outcome. A substantial amount of data collection is not related to trial outcomes. Trialists should work to make sure that the data they collect are only those essential to support the health and treatment decisions of those whom the trial is designed to inform.
Journal Article
Performance criteria for verbal autopsy-based systems to estimate national causes of death: development and application to the Indian Million Death Study
2014
Background
Verbal autopsy (VA) has been proposed to determine the cause of death (COD) distributions in settings where most deaths occur without medical attention or certification. We develop performance criteria for VA-based COD systems and apply these to the Registrar General of India’s ongoing, nationally-representative Indian Million Death Study (MDS).
Methods
Performance criteria include a low ill-defined proportion of deaths before old age; reproducibility, including consistency of COD distributions with independent resampling; differences in COD distribution of hospital, home, urban or rural deaths; age-, sex- and time-specific plausibility of specific diseases; stability and repeatability of dual physician coding; and the ability of the mortality classification system to capture a wide range of conditions.
Results
The introduction of the MDS in India reduced the proportion of ill-defined deaths before age 70 years from 13% to 4%. The cause-specific mortality fractions (CSMFs) at ages 5 to 69 years for independently resampled deaths and the MDS were very similar across 19 disease categories. By contrast, CSMFs at these ages differed between hospital and home deaths and between urban and rural deaths. Thus, reliance mostly on urban or hospital data can distort national estimates of CODs. Age-, sex- and time-specific patterns for various diseases were plausible. Initial physician agreement on COD occurred about two-thirds of the time. The MDS COD classification system was able to capture more eligible records than alternative classification systems. By these metrics, the Indian MDS performs well for deaths prior to age 70 years. The key implication for low- and middle-income countries where medical certification of death remains uncommon is to implement COD surveys that randomly sample all deaths, use simple but high-quality field work with built-in resampling, and use electronic rather than paper systems to expedite field work and coding.
Conclusions
Simple criteria can evaluate the performance of VA-based COD systems. Despite the misclassification of VA, the MDS demonstrates that national surveys of CODs using VA are an order of magnitude better than the limited COD data previously available.
Journal Article
Studying user income through language, behaviour and affect in social media
by
Lampos, Vasileios
,
Volkova, Svitlana
,
Bachrach, Yoram
in
Affect
,
Age differences
,
Artificial intelligence
2015
Automatically inferring user demographics from social media posts is useful for both social science research and a range of downstream applications in marketing and politics. We present the first extensive study where user behaviour on Twitter is used to build a predictive model of income. We apply non-linear methods for regression, i.e. Gaussian Processes, achieving strong correlation between predicted and actual user income. This allows us to shed light on the factors that characterise income on Twitter and analyse their interplay with user emotions and sentiment, perceived psycho-demographics and language use expressed through the topics of their posts. Our analysis uncovers correlations between different feature categories and income, some of which reflect common belief e.g. higher perceived education and intelligence indicates higher earnings, known differences e.g. gender and age differences, however, others show novel findings e.g. higher income users express more fear and anger, whereas lower income users express more of the time emotion and opinions.
Journal Article
Residential scene classification for gridded population sampling in developing countries using deep convolutional neural networks on satellite imagery
by
Jones, Kasey
,
Amer, Safaa
,
Chew, Robert F.
in
Clustering
,
Complex sample design
,
Data Collection - classification
2018
Background
Conducting surveys in low- and middle-income countries is often challenging because many areas lack a complete sampling frame, have outdated census information, or have limited data available for designing and selecting a representative sample. Geosampling is a probability-based, gridded population sampling method that addresses some of these issues by using geographic information system (GIS) tools to create logistically manageable area units for sampling. GIS grid cells are overlaid to partition a country’s existing administrative boundaries into area units that vary in size from 50 m × 50 m to 150 m × 150 m. To avoid sending interviewers to unoccupied areas, researchers manually classify grid cells as “residential” or “nonresidential” through visual inspection of aerial images. “Nonresidential” units are then excluded from sampling and data collection. This process of manually classifying sampling units has drawbacks since it is labor intensive, prone to human error, and creates the need for simplifying assumptions during calculation of design-based sampling weights. In this paper, we discuss the development of a deep learning classification model to predict whether aerial images are residential or nonresidential, thus reducing manual labor and eliminating the need for simplifying assumptions.
Results
On our test sets, the model performs comparable to a human-level baseline in both Nigeria (94.5% accuracy) and Guatemala (96.4% accuracy), and outperforms baseline machine learning models trained on crowdsourced or remote-sensed geospatial features. Additionally, our findings suggest that this approach can work well in new areas with relatively modest amounts of training data.
Conclusions
Gridded population sampling methods like geosampling are becoming increasingly popular in countries with outdated or inaccurate census data because of their timeliness, flexibility, and cost. Using deep learning models directly on satellite images, we provide a novel method for sample frame construction that identifies residential gridded aerial units. In cases where manual classification of satellite images is used to (1) correct for errors in gridded population data sets or (2) classify grids where population estimates are unavailable, this methodology can help reduce annotation burden with comparable quality to human analysts.
Journal Article
The Assignment of Scores Procedure for Ordinal Categorical Data
by
Chen, Han-Ching
,
Wang, Nae-Sheng
in
Alcohol Drinking - adverse effects
,
Alcohol Drinking - epidemiology
,
Analysis
2014
Ordinal data are the most frequently encountered type of data in the social sciences. Many statistical methods can be used to process such data. One common method is to assign scores to the data, convert them into interval data, and further perform statistical analysis. There are several authors who have recently developed assigning score methods to assign scores to ordered categorical data. This paper proposes an approach that defines an assigning score system for an ordinal categorical variable based on underlying continuous latent distribution with interpretation by using three case study examples. The results show that the proposed score system is well for skewed ordinal categorical data.
Journal Article
Reaching black men who have sex with men: a comparison between respondent-driven sampling and time-location sampling
by
Colfax, Grant N
,
McFarland, Willi
,
Raymond, H Fisher
in
Adolescent
,
Adult
,
African Continental Ancestry Group
2012
Objectives The authors explored whether respondent-driven sampling (RDS) can generate a more diverse sample of black men who have sex with men (MSM) than time-location sampling (TLS) by comparing sample characteristics accrued by each method in two independent studies. Methods The first study exclusively recruited black MSM through RDS (N=256), while the second recruited MSM through TLS including a subsample of black MSM (N=69). Crude and adjusted point estimates and 95% CIs were calculated for socio-demographic and behavioural characteristics, HIV prevalence and prevalence of unrecognised infections, and were compared using the Z-test. Results The samples differed significantly regarding all socio-demographic and some behavioural characteristics. Compared with TLS, RDS estimated higher proportions of older, less educated, poorer, currently homeless and self-identified bisexual black MSM. Participants in RDS were less likely to have a main partner, had fewer male partners, were more likely to have a female partner and have both male and female partners, and reported greater methamphetamine, crack and heroin use. Prevalence of HIV and unrecognised infections were slightly higher among RDS participants. Conclusions The RDS sample comprised black MSM who were more diverse with respect to socio-demographic characteristics and may also be at higher risk for HIV. Thus, RDS has advantages in reaching higher risk black MSM who are most hidden from intervention research and service delivery. Future studies of black MSM using RDS could use steering strategies to recruit younger participants and other subgroups of greatest interest to public health and prevention.
Journal Article
A method for encoding clinical datasets with SNOMED CT
2010
Background
Over the past decade there has been a growing body of literature on how the Systematised Nomenclature of Medicine Clinical Terms (SNOMED CT) can be implemented and used in different clinical settings. Yet, for those charged with incorporating SNOMED CT into their organisation's clinical applications and vocabulary systems, there are few detailed encoding instructions and examples available to show how this can be done and the issues involved. This paper describes a heuristic method that can be used to encode clinical terms in SNOMED CT and an illustration of how it was applied to encode an existing palliative care dataset.
Methods
The encoding process involves: identifying input data items; cleaning the data items; encoding the cleaned data items; and exporting the encoded terms as output term sets. Four outputs are produced: the SNOMED CT reference set; interface terminology set; SNOMED CT extension set and unencodeable term set.
Results
The original palliative care database contained 211 data elements, 145 coded values and 37,248 free text values. We were able to encode ~84% of the terms, another ~8% require further encoding and verification while terms that had a frequency of fewer than five were not encoded (~7%).
Conclusions
From the pilot, it would seem our SNOMED CT encoding method has the potential to become a general purpose terminology encoding approach that can be used in different clinical systems.
Journal Article
A classification of tasks for the systematic study of immune response using functional genomics data
by
BEHNKE, J. M.
,
HAMSHERE, M. G.
,
ELSE, K. J.
in
Allergy and Immunology - classification
,
Animals
,
Biological and medical sciences
2006
A full understanding of the immune system and its responses to infection by different pathogens is important for the development of anti-parasitic vaccines. A growing number of large-scale experimental techniques, such as microarrays, are being used to gain a better understanding of the immune system. To analyse the data generated by these experiments, methods such as clustering are widely used. However, individual applications of these methods tend to analyse the experimental data without taking publicly available biological and immunological knowledge into account systematically and in an unbiased manner. To make best use of the experimental investment, to benefit from existing evidence, and to support the findings in the experimental data, available biological information should be included in the analysis in a systematic manner. In this review we present a classification of tasks that shows how experimental data produced by studies of the immune system can be placed in a broader biological context. Taking into account available evidence, the classification can be used to identify different ways of analysing the experimental data systematically. We have used the classification to identify alternative ways of analysing microarray data, and illustrate its application using studies of immune responses in mice to infection with the intestinal nematode parasites Trichuris muris and Heligmosomoides polygyrus.
Journal Article