Catalogue Search | MBRL

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing

by Wallace, Jonathan , Mulvenna, Maurice , Epelde, Gorka in Datasets , Health care policy , Information sharing

2020

The exploitation of synthetic data in health care is at an early stage. Synthetic data could unlock the potential within health care datasets that are too sensitive for release. Several synthetic data generators have been developed to date; however, studies evaluating their efficacy and generalizability are scarce. This work sets out to understand the difference in performance of supervised machine learning models trained on synthetic data compared with those trained on real data. A total of 19 open health datasets were selected for experimental work. Synthetic data were generated using three synthetic data generators that apply classification and regression trees, parametric, and Bayesian network approaches. Real and synthetic data were used (separately) to train five supervised machine learning models: stochastic gradient descent, decision tree, k-nearest neighbors, random forest, and support vector machine. Models were tested only on real data to determine whether a model developed by training on synthetic data can used to accurately classify new, real examples. The impact of statistical disclosure control on model performance was also assessed. A total of 92% of models trained on synthetic data have lower accuracy than those trained on real data. Tree-based models trained on synthetic data have deviations in accuracy from models trained on real data of 0.177 (18%) to 0.193 (19%), while other models have lower deviations of 0.058 (6%) to 0.072 (7%). The winning classifier when trained and tested on real data versus models trained on synthetic data and tested on real data is the same in 26% (5/19) of cases for classification and regression tree and parametric synthetic data and in 21% (4/19) of cases for Bayesian network-generated synthetic data. Tree-based models perform best with real data and are the winning classifier in 95% (18/19) of cases. This is not the case for models trained on synthetic data. When tree-based models are not considered, the winning classifier for real and synthetic data is matched in 74% (14/19), 53% (10/19), and 68% (13/19) of cases for classification and regression tree, parametric, and Bayesian network synthetic data, respectively. Statistical disclosure control methods did not have a notable impact on data utility. The results of this study are promising with small decreases in accuracy observed in models trained with synthetic data compared with models trained with real data, where both are tested on real data. Such deviations are expected and manageable. Tree-based classifiers have some sensitivity to synthetic data, and the underlying cause requires further investigation. This study highlights the potential of synthetic data and the need for further evaluation of their robustness. Synthetic data must ensure individual privacy and data utility are preserved in order to instill confidence in health care departments when using such data to inform policy decision-making.

Journal Article

Share this book

Add to My Shelf

Interferon Lambda in the Pathogenesis of Inflammatory Bowel Diseases

by Nice, Timothy J. , Constant, David A. , Wallace, Jonathan W. in Animals , Apoptosis - immunology , Cell death

2021

Interferon λ (IFN-λ) is critical for host viral defense at mucosal surfaces and stimulates immunomodulatory signals, acting on epithelial cells and few other cell types due to restricted IFN-λ receptor expression. Epithelial cells of the intestine play a critical role in the pathogenesis of Inflammatory Bowel Disease (IBD), and the related type II interferons (IFN-γ) have been extensively studied in the context of IBD. However, a role for IFN-λ in IBD onset and progression remains unclear. Recent investigations of IFN-λ in IBD are beginning to uncover complex and sometimes opposing actions, including pro-healing roles in colonic epithelial tissues and potentiation of epithelial cell death in the small intestine. Additionally, IFN-λ has been shown to act through non-epithelial cell types, such as neutrophils, to protect against excessive inflammation. In most cases IFN-λ demonstrates an ability to coordinate the host antiviral response without inducing collateral hyperinflammation, suggesting that IFN-λ signaling pathways could be a therapeutic target in IBD. This mini review discusses existing data on the role of IFN-λ in the pathogenesis of inflammatory bowel disease, current gaps in the research, and therapeutic potential of modulating the IFN-λ-stimulated response.

Journal Article

Share this book

Add to My Shelf

Exploring patient information needs in type 2 diabetes: A cross sectional study of questions

by McTear, Michael F. , Bradley, Colin , Kearney, Patricia M. in Biology and Life Sciences , Breast cancer , Care and treatment

2018

This study set out to analyze questions about type 2 diabetes mellitus (T2DM) from patients and the public. The aim was to better understand people's information needs by starting with what they do not know, discovered through their own questions, rather than starting with what we know about T2DM and subsequently finding ways to communicate that information to people affected by or at risk of the disease. One hundred and sixty-four questions were collected from 120 patients attending outpatient diabetes clinics and 300 questions from 100 members of the public through the Amazon Mechanical Turk crowdsourcing platform. Twenty-three general and diabetes-specific topics and five phases of disease progression were identified; these were used to manually categorize the questions. Analyses were performed to determine which topics, if any, were significant predictors of a question's being asked by a patient or the public, and similarly for questions from a woman or a man. Further analysis identified the individual topics that were assigned significantly more often to the crowdsourced or clinic questions. These were Causes (CI: [-0.07, -0.03], p < .001), Risk Factors ([-0.08, -0.03], p < .001), Prevention ([-0.06, -0.02], p < .001), Diagnosis ([-0.05, -0.02], p < .001), and Distribution of a Disease in a Population ([-0.05,-0.01], p = .0016) for the crowdsourced questions and Treatment ([0.03, 0.01], p = .0019), Disease Complications ([0.02, 0.07], p < .001), and Psychosocial ([0.05, 0.1], p < .001) for the clinic questions. No highly significant gender-specific topics emerged in our study, but questions about Weight were more likely to come from women and Psychosocial questions from men. There were significantly more crowdsourced questions about the time Prior to any Diagnosis ([(-0.11, -0.04], p = .0013) and significantly more clinic questions about Health Maintenance and Prevention after diagnosis ([0.07. 0.17], p < .001). A descriptive analysis pointed to the value provided by the specificity of questions, their potential to disclose emotions behind questions, and the as-yet unrecognized information needs they can reveal. Large-scale collection of questions from patients across the spectrum of T2DM progression and from the public-a significant percentage of whom are likely to be as yet undiagnosed-is expected to yield further valuable insights.

Journal Article

Share this book

Add to My Shelf

Identifying comorbidity patterns of mental health disorders in community-dwelling older adults: a cluster analysis

by McNulty, Helene , Wang, Jinling , Horigan, Geraldine in Aged , Aged, 80 and over , Aging

2025

As global life expectancy increases, understanding mental health patterns and their associated risk factors in older adults becomes increasingly critical. Using data from the cross-sectional Trinity Ulster Department of Agriculture study (TUDA, 2008-2012; n = 5186 ; mean age 74.0 years) and a subset of participants followed-up longitudinally (TUDA 5+, 2014-2018; n = 953 ), we perform a multi-view co-clustering analysis to identify distinct mental health profiles and their relationships with potential risk factors. The TUDA multi-view dataset consists of five views: (1) mental health, measured with Center for Epidemiologic Studies Depression Scale [CES-D] and Hospital Anxiety and Depression Scale [HADS], (2) cognitive and neuropsychological function, (3) illness diagnoses and medical prescription history, (4) lifestyle and nutritional attainment, and (5) physical well-being. That is, each participant is described by five distinct sets of features. The mental health view serves as the target feature set, while the other four views are analyzed as potential contributors to mental health risks. Under the multi-view co-clustering framework, for each view data, the participants (rows) are partitioned into different row-clusters, and the features (columns) are partitioned into different column-clusters. Each row-cluster is most effectively explained by the features in one or two column-clusters. Notably, the row-clusterings across views are dependent. By analyzing the associations between row clusters in the mental health view and those in each of the other four views, we can identify which risk factors co-occur and contribute to an increased risk of poor mental health. We identify five distinct row-clusters in the mental-health view data, characterized by varying levels of depression and anxiety: Group 1, mild depressive symptoms and no symptoms of anxiety; Group 2, acute depression and anxiety; Group 3, less severe but persistent depression and anxiety symptoms; Group 4, symptoms of anxiety with no depressive symptoms; and Group 5, no symptoms of either depression or anxiety. Cross-view association analysis revealed the following key insights: Participants in Group 3 exhibit lower neuropsychological function, are older, more likely to live alone, come from more deprived regions, and have reduced physical independence. Contrasting Group 3, participants in Group 2 show better neuropsychological function, greater physical independence, and higher socioeconomic status. Participants in Group 5 report fewer medical diagnoses and prescriptions, more affluent backgrounds, less solitary living, and stronger physical independence. A significant portion of this group aligns with cognitive health row-clusters 1 and 3, suggesting a strong link between cognitive and mental health in older age. Participants with only depressive (Group 1) or anxiety symptoms (Group 4) exhibit notable differences. Those with anxiety symptoms are associated with healthier clusters across other views. The co-clustering methodology also categorizes the questions in the CES-D and HADS scales into meaningful clusters, providing valuable insights into the underlying dimensions of mental health assessment. In the CES-D scale, the questions are divided into four clusters: those related to loneliness and energy, those addressing feelings of insecurity, worthlessness, and fear, those concerning concentration and effort, and those focused on sleep disturbances. Similarly, the HADS questions are grouped into clusters that reflect themes such as a strong sense of impending doom, nervousness or unease, and feelings of tension or restlessness. By organizing the questions from both scales into these smaller groups, the methodology highlights distinct symptom patterns and their varying severity among participants. This approach could be leveraged to develop abridged versions of the assessment scales, enabling faster and more efficient triage in clinical practice.

Journal Article

Share this book

Add to My Shelf

Discovering and comparing types of general practitioner practices using geolocational features and prescribing behaviours by means of K-means clustering

by McGlade, Kieran , Cleland, Brian , R Bond, Raymond in 639/705/117 , 639/705/531 , Business services

2021

Traditionally General Practitioner (GP) practices have been labelled as being in Rural, Urban or Semi-Rural areas with no statistical method of identifying which practices fall into each category. The main aim of this study is to investigate whether location and other characteristics can provide a tautology to identify different types of GP practice and compare the prescribing behaviours associated with the different practice types. To achieve this monthly open source prescription data were analysed by practice considering location, practice size, population density and deprivation rankings. One year’s data was subjected to k-means clustering with the results showing that only two different types of GP practice can be classified that are dependent on location characteristics in Northern Ireland. Traditional labels did not describe the two classifications fully and new classifications of Metropolitan and Non-Metropolitan were used. Whilst prescribing patterns were generally similar, it was found that Metropolitan practices generally had higher prescribing rates than Non-Metropolitan practices. Examining prescribing behaviours in accordance with British National Formulary (BNF) categories (known as chapters) showed that Chapter 4 (Central Nervous System) was responsible for most of the difference in prescribing levels. Within Chapter 4 higher prescribing levels were attributable to Analgesic and Antidepressant prescribing. The clusters were finally examined regarding the level of deprivation experienced in the area in which the practice was located. This showed that the Metropolitan cluster, having higher prescription rates, also had a higher proportion of practices located in highly deprived areas making deprivation a contributing factor.

Journal Article

Share this book

Add to My Shelf

Teacher evaluation: A conversation among educators

by Wallace, Jonathan D. in Accountability , Catenas , Children

2012

Eleven educators came together in spring 2012 for a wide-ranging discussion of teacher evaluation and professional development in the era of high-stakes testing and data-based accountability. The participants in this conversation are alumni of the Mid-Career Doctoral Program in Educational Leadership at the University of Pennsylvania Graduate School of Education. The group included public and private school leaders, superintendents and other top administrators, higher education faculty and staff, and executives of educational nonprofits.

Journal Article

Share this book

Add to My Shelf

Teacher evaluation a conversation among educators: listen in as teachers and principals explore some of the themes involved in the broader teacher evaluation discussion

by Wallace, Jonathan D in Analysis , Conferences, meetings and seminars , Teacher evaluation

2012

Journal Article

Share this book

Add to My Shelf

What they wish they had learned

by Desimone, Laura M. , Gitomer, Madeline , Pottinger, Danielle in Alignment (Education) , Beginning Teachers , Children

2013

No one has clearer ideas about what is lacking in a teacher education program than a recent graduate of that program grappling with his or her first year of teaching. Researchers analyzed interviews with first-year, middle school math teachers, their principals and formal mentors to get their views on how their preservice training could have better prepared them for their first jobs. Three major themes emerged: Their training could have better prepared them for the diverse student body they would encounter; their student teaching experience was poorly aligned with their first job, and; they lacked sufficient math content knowledge in general or grade-level knowledge of math content in particular.

Journal Article

Share this book

Add to My Shelf

Quantifying the Pleistocene Incision and Integration History of the Middle Allegheny River, a Glacial Margin Continental Drainage, in Northwestern Pennsylvania, USA

by Wallace, Jonathan in Environmental science , Geology , Geomorphology

2025

A new 10Be terrestrial cosmogenic nuclide (TCN) and optically-stimulated luminescence (OSL)- based age model for eight fluvial terraces in the middle Allegheny River and upstream correlative glacial deposits has been constructed from a USGS EDMAP-funded surficial geologic map of the Parker and Emlenton 7.5 min quadrangles. The map, age model, and existing data for Glacial Lake Monongahela (GLM) test long-held views for when and where low divides were breached in the assembly of the modern Allegheny River and the respective roles of upstream glacial margin or downstream base level change in driving post-glacial river incision. The age model is anchored by a ~20 m thick paired fill terrace containing abundant rock-types exotic to the Allegheny watershed (Qt3), with a strath ~60 m above the modern channel (AMC). A TCN burial age in Qt3 of 1.1+0.4/-0.3 Ma indicates a south-flowing Allegheny River connected to the glacial margin in the early Pleistocene, and a long-term rate of incision of ~45 m/Ma. In contrast, above Qt3 are few, scattered strath terraces (Qt1 and Qt2) that lack exotic clasts, and have opposing north and south gradients astride a now breached low divide upstream of the Clarion-Allegheny rivers confluence. Inset 5 m below Qt3 lies an extensive, paired, low-relief strath terrace (Qt4-the Parker Strath), followed by scattered, unpaired, and poorly preserved strath terraces (Qt5) that decorate the steep bedrock valley walls and extend down to within ~ 20 m AMC. At least three thick, paired fill terraces containing abundant exotic material (Qt6, Qt7, and Qt8), the bases of which are not exposed, are inset into the inner Allegheny valley. The tread of Qt6 lies ~15 m AMC; the underlying alluvium has a TCN burial age of 0.513+0.15/- 0.17 Ma, and it is subsequently capped by thick colluvial deposits with a TCN burial age of 0.24+0.071/-0.06 Ma. Qt7 is a late Pleistocene terrace with an OSL age of 0.017+/-0.002 Ma. These middle and late Pleistocene terraces and colluvia have similar ages to two tills exposed ~45 km to the north at Franklin, PA dated to 0.4+0.31/-0.18 Ma and 0.14+/-0.19 Ma using burial TCN and OSL respectively. Some preliminary geochemical data for the soils capping the Qt3 and Mapledale units is presented. A key finding is that the Allegheny River has experienced an average incision rate of ~40-45 m/Ma over the past half-million years or more, but this may have been as low as 25m/Ma earlier in the Pleistocene, approaching the unglaciated basin-scale erosion rate of ~30 m/Ma. The location of an early Pleistocene integration reach near modern Foxburg PA is argued. Collectively, these data suggest reversal and assembly of the Allegheny River during a very early glacial advance, perhaps the same one that was responsible for the formation of the Ohio River via spillover of GLM >> 1 Ma.

Dissertation

Share this book

Add to My Shelf

What they wish they had learned: middle school math teachers feel unprepared for the diversity in their classrooms and short on content knowledge

by Desimone, Laura M , Gitomer, Madeline , Pottinger, Danielle in Influence , Mathematics , Mathematics education

2013

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter