Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
28
result(s) for
"Yang, Mochen"
Sort by:
Understanding User-Generated Content and Customer Engagement on Facebook Business Pages
by
Yang, Mochen
,
Adomavicius, Gediminas
,
Ren, Yuqing
in
Business
,
Consumer behavior
,
customer engagement
2019
With the growth and prevalence of social media platforms, many companies have been using them to engage with customers and encourage user-generated content about their products and services. In this paper, we analyze user-generated posts from the Facebook business pages of multiple companies across several industries to understand what users post on Facebook business pages and how post valence and content characteristics affect engagement, measured as the number of likes and comments received by a post. Our analysis demonstrates that negative posts are significantly more prevalent than positive posts, and negative posts also tend to attract more likes and more comments than positive posts. Importantly, engagement depends not only on the valence of a post but also on the specific post content. We observe three types of customer complaints respectively related to product and service quality, money issues, and corporate social responsibility issues. We show that social complaints receive more likes, but fewer comments, than quality or money complaints. Our findings reveal the practical challenges of managing Facebook business pages as a new channel of interacting with customers, and they highlight the need to explore effective response strategies to manage customer complaints and other service requests on social media.
With the growth and prevalence of social media platforms, many companies have been using them to engage with customers and encourage user-generated content (UGC) about their products and services. However, there has not been much research on the characteristics of UGC on these platforms and, correspondingly, their impact on customer engagement. In this paper, we analyze user-generated posts from Facebook business pages of multiple companies to understand what users post on Facebook business pages and how post valence and content characteristics affect engagement, measured as the number of likes and comments received by a post. We control for a variety of factors, including post linguistic features, poster characteristics, and post context heterogeneity. Our analysis demonstrates that for user-generated posts on Facebook business pages, negative posts are significantly more prevalent than positive posts, which contrasts with the J-shaped valence distribution of online consumer reviews. We also show that engagement depends not only on the valence of a post but also on the specific ways in which a post is positive or negative. We observe three types of customer complaints, respectively, related to product and service quality, money issues, and social and environmental issues. Our analyses show that social complaints receive more likes, but fewer comments, than quality or money complaints. Such nuances can only be uncovered by analyzing the actual post content, going beyond the valence of the posts. Furthermore, we theoretically discuss and empirically demonstrate that liking and commenting are engagement behaviors with different antecedents. For example, positive posts tend to attract more likes yet fewer comments than neutral posts. Overall, our research shows that user-generated posts on Facebook business pages represent a distinctive form of UGC that is conceptually different from online consumer reviews. Our work advances the knowledge on UGC and has practical implications for firms’ social media marketing strategy.
Journal Article
Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining
by
Yang, Mochen
,
Adomavicius, Gediminas
,
Burtch, Gordon
in
Analysis
,
Data mining
,
Econometric models
2018
The application of predictive data mining techniques in information systems research has grown in recent years, likely because of their effectiveness and scalability in extracting information from large amounts of data. A number of scholars have sought to combine data mining with traditional econometric analyses. Typically, data mining methods are first used to generate new variables (e.g., text sentiment), which are added into subsequent econometric models as independent regressors. However, because prediction is almost always imperfect, variables generated from the first-stage data mining models inevitably contain measurement error or misclassification. These errors, if ignored, can introduce systematic biases into the second-stage econometric estimations and threaten the validity of statistical inference. In this commentary, we examine the nature of this bias, both analytically and empirically, and show that it can be severe even when data mining models exhibit relatively high performance. We then show that this bias becomes increasingly difficult to anticipate as the functional form of the measurement error or the specification of the econometric model grows more complex. We review several methods for error correction and focus on two simulation-based methods, SIMEX and MC-SIMEX, which can be easily parameterized using standard performance metrics from data mining models, such as error variance or the confusion matrix, and can be applied under a wide range of econometric specifications. Finally, we demonstrate the effectiveness of SIMEX and MC-SIMEX by simulations and subsequent application of the methods to econometric estimations employing variables mined from three real-world data sets related to travel, social networking, and crowdfunding campaign websites.
The online appendix is available at
https://doi.org/10.1287/isre.2017.0727
.
Journal Article
Designing Real-Time Feedback for Bidders in Homogeneous-Item Continuous Combinatorial Auctions
2019
Although combinatorial auctions are important mechanisms for many specialized applications, their adoption in general-purpose marketplaces is still fairly limited, partly due to the inherent difficulty in evaluating the efficacy of bids without the availability of comprehensive bidder support. In this paper, we present both theoretical results and computational designs to support real-time feedback to bidders in continuous combinatorial auctions, where bidders are free to join and leave the auction at any time. In particular, we focus on the broad class of single-item multi-unit (SIMU) combinatorial auctions, where multiple identical units of one homogenous item are being auctioned. We also consider two common ways to express bidding preferences: OR bids and XOR bids. For SIMU auctions with each of the two bid types, we present comprehensive analyses of auction dynamics, which can determine winning bids that satisfy allocative fairness, and compute critical evaluative metrics needed to provide bidder support, including bid winning and deadness levels. We also design the data structures and algorithms needed to provide bidder support in real time for SIMU auctions of practically relevant sizes. The computational tools proposed in this paper can facilitate the efficient and more transparent implementation of SIMU combinatorial auctions in business- and consumer-oriented markets.
Journal Article
Designing Real-Time Feedback for Bidders in Homogeneous-Item Continuous Combinatorial Auctions1
2019
Although combinatorial auctions are important mechanisms for many specialized applications, their adoption in general-purpose marketplaces is still fairly limited, partly due to the inherent difficulty in evaluating the efficacy of bids without the availability of comprehensive bidder support. In this paper, we present both theoretical results and computational designs to support real-time feedback to bidders in continuous combinatorial auctions, where bidders are free to join and leave the auction at any time. In particular, we focus on the broad class of single-item multi-unit (SIMU) combinatorial auctions, where multiple identical units of one homogenous item are being auctioned. We also consider two common ways to express bidding preferences: OR bids and XOR bids. For SIMU auctions with each of the two bid types, we present comprehensive analyses of auction dynamics, which can determine winning bids that satisfy allocative fairness, and compute critical evaluative metrics needed to provide bidder support, including bid winning and deadness levels. We also design the data structures and algorithms needed to provide bidder support in real time for SIMU auctions of practically relevant sizes. The computational tools proposed in this paper can facilitate the efficient and more transparent implementation of SIMU combinatorial auctions in business- and consumer-oriented markets.
Journal Article
Efficient Computational Strategies for Dynamic Inventory Liquidation
by
Gupta, Alok
,
Yang, Mochen
,
Adomavicius, Gediminas
in
Analysis
,
Computational efficiency
,
Computational intelligence
2019
We examine the dynamic inventory liquidation problem, in which a retailer liquidates a fixed number of identical items over a time period by strategically setting prices periodically according to knowledge about stochastic demand. We propose to solve the liquidation problem by deriving a deterministic representation of stochastic demand. Assuming that customer arrival and valuations follow known statistical distributions (e.g., estimated from past transaction data), the expected arrivals and expected order statistics of valuation distributions represent informative and advantageous approximations of demand. Under the deterministic demand representation, we develop a greedy heuristic for finding the optimal liquidation strategy that result in maximum total revenue. The heuristic approach is computationally highly efficient and provides optimal solutions under deterministic demand representation when customer valuation follows various typical statistical distributions. Compared with two simple and commonly used liquidation strategies (i.e., the fixed-price strategy and the fixed-quantity strategy), our heuristic yields higher liquidation revenue. Compared with sophisticated approaches that can find optimal liquidation strategies under stochastic demand (e.g., stochastic dynamic programming), our approach runs several magnitudes faster and still yields near optimal expected revenue. Therefore, the heuristic approach can serve as a useful tool for managers to make liquidation-related decisions in realistic, stochastic demand scenarios.
We develop efficient computational strategies for the inventory liquidation problem, which is characterized by a retailer disposing of a fixed amount of inventory over a period of time. Liquidating end-of-cycle products optimally represents a challenging problem owing to its inherent stochasticity. The growing scale of liquidation problems further increases the need for solutions that are revenue- and time-efficient. We propose to address the inventory liquidation problem by deriving deterministic representations of stochastic demand, which provides significant theoretical and practical benefits as well as an intuitive understanding of the problem and the proposed solution. First, this paper develops a dynamic programming approach and a greedy heuristic approach to find the optimal liquidation strategy under deterministic demand representation. Importantly, we show that our heuristic approach is optimal under realistic conditions and is computationally less complex than dynamic programming. Second, we explore the relationships between liquidation revenue and several key elements of the liquidation problem via both computational experiments and theoretical analyses. We derive multiple managerial implications and demonstrate how the proposed heuristic approach can serve as an efficient decision support tool for inventory managers. Third, under stochastic demand, we conduct a comprehensive set of simulation experiments to benchmark the performance of our proposed heuristic approach with alternatives, including other simple approaches (e.g., the fixed-price strategy) as well as advanced stochastic approaches (e.g., stochastic dynamic programming). In particular, we consider a strategy that uses the proposed greedy heuristic to determine prices iteratively throughout the liquidation period. Computational experiments demonstrate that such iterative strategy stably produces higher total revenue than other alternatives and produces near-optimal total revenue in expectation while maintaining significant computational efficiency, compared with advanced techniques that solve the liquidation problem directly under stochastic demand. Our work advances the computational design for inventory liquidation and provides practical insights.
The online supplement is available at
https://doi.org/10.1287/isre.2018.0819
.
Journal Article
Research Commentary
2018
The application of predictive data mining techniques in information systems research has grown in recent years, likely because of their effectiveness and scalability in extracting information from large amounts of data. A number of scholars have sought to combine data mining with traditional econometric analyses. Typically, data mining methods are first used to generate new variables (e.g., text sentiment), which are added into subsequent econometric models as independent regressors. However, because prediction is almost always imperfect, variables generated from the first-stage data mining models inevitably contain measurement error or misclassification. These errors, if ignored, can introduce systematic biases into the second-stage econometric estimations and threaten the validity of statistical inference. In this commentary, we examine the nature of this bias, both analytically and empirically, and show that it can be severe even when data mining models exhibit relatively high performance. We then show that this bias becomes increasingly difficult to anticipate as the functional form of the measurement error or the specification of the econometric model grows more complex. We review several methods for error correction and focus on two simulation-based methods, SIMEX and MC-SIMEX, which can be easily parameterized using standard performance metrics from data mining models, such as error variance or the confusion matrix, and can be applied under a wide range of econometric specifications. Finally, we demonstrate the effectiveness of SIMEX and MC-SIMEX by simulations and subsequent application of the methods to econometric estimations employing variables mined from three real-world data sets related to travel, social networking, and crowdfunding campaign websites.
Journal Article
Toward a Comprehensive Understanding of User-Generated Content and Engagement Behavior on Facebook Business Pages
2018
Social media platforms such as Facebook empower individual users to interact with companies and with each other on company-managed business pages. Users can generate content by posting directly to the business pages, and other users can engage with the content through multiple engagement features. Although such user-generated content (UGC in short) and associated engagement behaviors bear important consequences to the companies, they are not well understood. The three essays of my dissertation fill in this gap, by analyzing data collected from Facebook business pages with multiple empirical methods. The first essay examines the valence and content characteristics of user-generated posts on the Facebook business pages of multiple large companies across key consumer-oriented industries. It demonstrates that user posts on Facebook business pages represent a new form of UGC that is distinct from online product reviews generated by consumers, in terms of valence distribution and content types. Further, it highlights the important valence and content factors that influence two canonical types of engagement activities, i.e., liking and commenting. The second essay discusses how user engagement behaviors are shaped by engagement features on Facebook, and in particular, how the introduction of a new engagement feature affects the usage of existing features as well as overall engagement activities. It aims to uncover new insights regarding the interplay of multiple engagement features. Analyses show that, despite distinct functionalities, the usage of different features is not independent, and user posts that have received engagement are likely to obtain even more engagement of various types. The third essay addresses a methodological challenge of studying UGC on social media or other online contexts, where researchers frequently seek to combine data mining with econometric modeling, but ignore the issue of measurement error and misclassification. Findings of my dissertation advance understanding of UGC and engagement behavior on social media brand pages, and have practical implications for social media platforms as well as businesses that have presence on these platforms.
Dissertation
Robustness is Important: Limitations of LLMs for Data Fitting
by
Liu, Hejia
,
Yang, Mochen
,
Adomavicius, Gediminas
in
Cognitive tasks
,
Large language models
,
Robustness
2025
Large Language Models (LLMs) are being applied in a wide array of settings, well beyond the typical language-oriented use cases. In particular, LLMs are increasingly used as a plug-and-play method for fitting data and generating predictions. Prior work has shown that LLMs, via in-context learning or supervised fine-tuning, can perform competitively with many tabular supervised learning techniques in terms of predictive performance. However, we identify a critical vulnerability of using LLMs for data fitting -- making changes to data representation that are completely irrelevant to the underlying learning task can drastically alter LLMs' predictions on the same data. For example, simply changing variable names can sway the size of prediction error by as much as 82% in certain settings. Such prediction sensitivity with respect to task-irrelevant variations manifests under both in-context learning and supervised fine-tuning, for both close-weight and open-weight general-purpose LLMs. Moreover, by examining the attention scores of an open-weight LLM, we discover a non-uniform attention pattern: training examples and variable names/values which happen to occupy certain positions in the prompt receive more attention when output tokens are generated, even though different positions are expected to receive roughly the same attention. This partially explains the sensitivity in the presence of task-irrelevant variations. We also consider a state-of-the-art tabular foundation model (TabPFN) trained specifically for data fitting. Despite being explicitly designed to achieve prediction robustness, TabPFN is still not immune to task-irrelevant variations. Overall, despite LLMs' impressive predictive capabilities, currently they lack even the basic level of robustness to be used as a principled data-fitting tool.
EnsembleIV: Creating Instrumental Variables from Ensemble Learners for Robust Statistical Inference
by
Burtch, Gordon
,
McFowland, Edward
,
Yang, Mochen
in
Deep learning
,
Error analysis
,
Machine learning
2024
Despite increasing popularity in empirical studies, the integration of machine learning generated variables into regression models for statistical inference suffers from the measurement error problem, which can bias estimation and threaten the validity of inferences. In this paper, we develop a novel approach to alleviate associated estimation biases. Our proposed approach, EnsembleIV, creates valid and strong instrumental variables from weak learners in an ensemble model, and uses them to obtain consistent estimates that are robust against the measurement error problem. Our empirical evaluations, using both synthetic and real-world datasets, show that EnsembleIV can effectively reduce estimation biases across several common regression specifications, and can be combined with modern deep learning techniques when dealing with unstructured data.
Regurgitative Training: The Value of Real Data in Training Large Language Models
by
Zhang, Jinghui
,
Wei, Qiang
,
Yang, Mochen
in
Large language models
,
Machine translation
,
Performance enhancement
2024
What happens if we train a new Large Language Model (LLM) using data that are at least partially generated by other LLMs? The explosive success of LLMs means that a substantial amount of content online will be generated by LLMs rather than humans, which will inevitably enter the training datasets of next-generation LLMs. We evaluate the implications of such \"regurgitative training\" on LLM performance. Through fine-tuning GPT-3.5 with data generated either by itself or by other LLMs in a machine translation task, we find strong evidence that regurgitative training clearly handicaps the performance of LLMs. The same performance loss of regurgitative training is observed on transformer models that we train from scratch. We find suggestive evidence that the performance disadvantage of regurgitative training can be attributed to at least two mechanisms: (1) higher error rates and (2) lower lexical diversity in LLM-generated data as compared to real data. Based on these mechanisms, we propose and evaluate three different strategies to mitigate the performance loss of regurgitative training. First, we devise data-driven metrics to gauge the quality of each LLM-generated data instance, and then carry out an ordered training process where high-quality data are added before low-quality ones. Second, we combine data generated by multiple different LLMs (as an attempt to increase lexical diversity). Third, we train an AI detection classifier to differentiate between LLM- and human-generated data, and include LLM-generated data in the order of resemblance to human-generated data. All three strategies can improve the performance of regurgitative training to some extent but are not always able to fully close the gap from training with real data. Our results highlight the value of real, human-generated data in training LLMs, which cannot be easily substituted by synthetic, LLM-generated data.