Catalogue Search | MBRL

Survey Methods for Estimating the Size of Weak-Tie Personal Networks

by Son, Vo Hai , Feehan, Dennis M. , Abdul-Quader, Abu in Aggregate data , Censuses , Data

2022

Researchers increasingly use aggregate relational data to learn about the size and distribution of survey respondents’ weak-tie personal networks. Aggregate relational data are collected by asking questions about respondents’connectedness to many different groups (e.g., “How many teachers do you know?”). This approach can be powerful, but to use aggregate relational data, researchers must locate external information about the size of each group from a census or administrative records (e.g., the number of teachers in the population). This need for external information makes aggregate relational data difficult or impossible to collect in many settings. Here, the authors show that relatively simple modifications can overcome this need for external data, significantly increasing the flexibility of the method and weakening key assumptions required by the associated estimators. The key idea is to estimate the size of these groups from the sample of survey respondents, rather than relying on external sources of information. These methods are appropriate for using a sample survey to study the size and distribution of weak-tie network connections. They can also be used as part of the network scale-up method to estimate the size of hidden populations. The authors illustrate this approach with two empirical studies: a large simulation study and original household survey data collected in Hanoi, Vietnam.

Journal Article

Share this book

Add to My Shelf

A survey on missing data in machine learning

by Tabona, Oteng , Emmanuel, Tlamelo , Mpoeleng, Dimane in Aggregate data , Algorithms , Big Data

2021

Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur because of various factors like missing completely at random, missing at random or missing not at random. All these may result from system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper, we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of missing values imputation techniques, how they perform, their limitations and the kind of data they are most suitable for. We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both missForest and the k nearest neighbor can successfully handle missing values and offer some possible future research direction.

Journal Article

Share this book

Add to My Shelf

Trade, Migration, and Productivity

by Tombe, Trevor , Zhu, Xiaodong in Aggregate data , Costs , Economic models

2019

We study how goods- and labor-market frictions affect aggregate labor productivity in China. Combining unique data with a general equilibrium model of internal and international trade, and migration across regions and sectors, we quantify the magnitude and consequences of trade and migration costs. The costs were high in 2000, but declined afterward. The decline accounts for 36 percent of the aggregate labor productivity growth between 2000 and 2005. Reductions in internal trade and migration costs are more important than reductions in external trade costs. Despite the decline, migration costs are still high and potential gains from further reform are large.

Journal Article

Share this book

Add to My Shelf

Natural Questions: A Benchmark for Question Answering Research

by Parikh, Ankur , Uszkoreit, Jakob , Redfield, Olivia in Aggregate data , Annotations , Answers

2019

We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature.

Journal Article

Share this book

Add to My Shelf

COVID-19 and the Drug Overdose Crisis: Uncovering the Deadliest Months in the United States, January‒July 2020

by Friedman, Joseph , Akre, Samir in Aggregate data , Aggregates , Algorithms

2021

Objectives. To determine the magnitude of increases in monthly drug-related overdose mortality during the COVID-19 pandemic in the United States. Methods. We leveraged provisional records from the Centers for Disease Control and Prevention provided as rolling 12-month sums, which are helpful for smoothing, yet may mask pandemic-related spikes in overdose mortality. We cross-referenced these rolling aggregates with previous monthly data to estimate monthly drug-related overdose mortality for January through July 2020. We quantified historical errors stemming from reporting delays and estimated empirically derived 95% prediction intervals (PIs). Results. We found that 9192 (95% PI = 8988, 9397) people died from drug overdose in May 2020—making it the deadliest month on record—representing a 57.7% (95% PI = 54.2%, 61.2%) increase over May 2019. Most states saw large-magnitude increases, with the highest in West Virginia, Kentucky, and Tennessee. We observed low concordance between rolling 12-month aggregates and monthly pandemic-related shocks. Conclusions. Unprecedented increases in overdose mortality occurred during the pandemic, highlighting the value of presenting monthly values alongside smoothed aggregates for detecting shocks. Public Health Implications. Drastic exacerbations of the US overdose crisis warrant renewed investments in overdose surveillance and prevention during the pandemic response and postpandemic recovery efforts.

Journal Article

Share this book

Add to My Shelf

Unemployment Fluctuations, Match Quality, and the Wage Cyclicality of New Hires

by GERTLER, MARK , HUCKFELDT, CHRISTOPHER , TRIGARI, ANTONELLA in Aggregate data , Data quality , Equilibrium

2020

We revisit the issue of the high cyclicality of wages of new hires.We show that after controlling for composition effects likely involving procyclical upgrading of job match quality, the wages of new hires are no more cyclical than those of existing workers. The key implication is that the sluggish behaviour of wages for existing workers is a better guide to the cyclicality of the marginal cost of labour than is the high measured cyclicality of new hires wages unadjusted for composition effects. Key to our identification is distinguishing between new hires from unemployment versus those who are job changers. We argue that to a reasonable approximation, the wages of the former provide a composition-free estimate of the wage flexibility, while the same is not true for the latter. We then develop a quantitative general equilibrium model with sticky wages via staggered contracting, on-the-job search, and heterogeneous match quality, and show that it can account for both the panel data evidence and aggregate evidence on labour market volatility.

Journal Article

Share this book

Add to My Shelf

The Impact of Regional and Sectoral Productivity Changes on the U.S. Economy

by CALIENDO, LORENZO , ROSSI-HANSBERG, ESTEBAN , SARTE, PIERRE-DANIEL in Aggregate data , Changes , Computer industry

2018

We study the impact of intersectoral and interregional trade linkages in propagating disaggregated productivity changes to the rest of the economy. Using U.S. regional and industry data, we obtain the aggregate, regional and sectoral elasticities of measured total factor productivity, GDP, and employment to regional and sectoral productivity changes. We find that the elasticities vary significantly depending on the sectors and regions affected, and are importantly determined by the spatial structure of the economy. We use our calibrated model to perform a variety of counterfactual exercises including several specific studies of the aggregate and disaggregate effects of shocks to productivity and infrastructure. The specific episodes we study include the boom in California’s computer industry, the productivity boom in North Dakota associated with the shale oil boom, the disruptions in New York’s finance and real state industries during the 2008 crisis, as well as the effect of the destruction of infrastructure in Louisiana following hurricane Katrina.

Journal Article

Share this book

Add to My Shelf

HOW DESTRUCTIVE IS INNOVATION?

by Klenow, Peter J. , Hsieh, Chang-Tai , Garcia-Macia, Daniel in Aggregate data , Competitors , creative destruction

2019

Entrants and incumbents can create new products and displace the products of competitors. Incumbents can also improve their existing products. How much of aggregate productivity growth occurs through each of these channels? Using data from the U.S. Longitudinal Business Database on all nonfarm private businesses from 1983 to 2013, we arrive at three main conclusions: First, most growth appears to come from incumbents. We infer this from the modest employment share of entering firms (defined as those less than 5 years old). Second, most growth seems to occur through improvements of existing varieties rather than creation of brand new varieties. Third, own-product improvements by incumbents appear to be more important than creative destruction. We infer this because the distribution of job creation and destruction has thinner tails than implied by a model with a dominant role for creative destruction.

Journal Article

Share this book

Add to My Shelf

MICRO DATA AND MACRO TECHNOLOGY

by Raval, Devesh , Oberfield, Ezra in Accumulation , Aggregate data , aggregation

2021

We develop a framework to estimate the aggregate capital-labor elasticity of substitution by aggregating the actions of individual plants. The aggregate elasticity reflects substitution within plants and reallocation across plants; the extent of heterogeneity in capital intensities determines their relative importance. We use micro data on the cross-section of plants to build up to the aggregate elasticity at a point in time. Interpreting our econometric estimates through the lens of several different models, we find that the aggregate elasticity for the U. S. manufacturing sector is in the range of 0.5–0.7, and has declined slightly since 1970. We use our estimates to measure the bias of technical change and assess the decline in labor’s share of income in the U.S. manufacturing sector. Mechanisms that rely on changes in the relative supply of factors, such as an acceleration of capital accumulation, cannot account for the decline.

Journal Article

Share this book

Add to My Shelf

STRUCTURAL CHANGE WITH LONG-RUN INCOME AND PRICE EFFECTS

by Lashkari, Danial , Mestieri, Martí , Comin, Diego in Aggregate data , Agriculture , Growth models

2021

We present a new multi-sector growth model that features nonhomothetic, constant elasticity of substitution preferences, and accommodates long-run demand and supply drivers of structural change for an arbitrary number of sectors. The model is consistent with the decline in agriculture, the hump-shaped evolution of manufacturing, and the rise of services over time. We estimate the demand system derived from the model using household-level data from the United States and India, as well as historical aggregatelevel panel data for 39 countries during the postwar period. The estimated model parsimoniously accounts for the broad patterns of sectoral reallocation observed among rich, miracle, and developing economies. Our estimates support the presence of strong nonhomotheticity across time, income levels, and countries. We find that income effects account for the bulk of the within-country evolution of sectoral reallocation.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter