alphaXiv

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Papers Benchmarks Models

22 Oct 2025

economics general-economics economics

State capacity, innovation, and endogenous development in Chile

Camilo José Cela University

The study explores the evolution of Chile's industrial policy from 1990 to 2022 through the lens of state capacity, innovation and endogenous development. In a global context where governments are reasserting their role as active agents of innovation, Chile presents a paradox. It is a stable and open economy that has expanded investment in science and technology but still struggles to transform this effort into sustainable capabilities. Drawing on the works of Mazzucato, Aghion, Howitt, Mokyr, Samuelson and Sampedro, the study integrates evolutionary economics, public policy and humanist ethics. Using a longitudinal case study approach and official data, it finds that Chile has improved its innovation institutions but continues to experience weak coordination, regional inequality and a fragile culture of knowledge. The research concludes that achieving inclusive innovation requires adaptive governance and an ethical vision of innovation as a public good.

22 Oct 2025

economics general-economics economics

Government Transparency Affects Innovation: Evidence from Wireless Products

Central European University

Does government transparency affect innovation? I evaluate the launch of a government database with detailed technical information on the universe of wireless-enabled products on the U.S. market (N 347 thousand). The results show the launch approximately doubled the use of new technologies in the following ten years, an indicator of follow-on innovation. The increase affected both products in the same and new product classes, suggesting novelty; waned over several years, potentially due to an increase in secrecy and patenting; and boosted foreign more than U.S. domestic competitors. These results highlight the importance of information for private sector innovation.

17 Oct 2025

economics computer-science machine-learning

Learning Correlated Reward Models: Statistical Barriers and Opportunities

MIT

Random Utility Models (RUMs) are a classical framework for modeling user preferences and play a key role in reward modeling for Reinforcement Learning from Human Feedback (RLHF). However, a crucial shortcoming of many of these techniques is the Independence of Irrelevant Alternatives (IIA) assumption, which collapses \emph{all} human preferences to a universal underlying utility function, yielding a coarse approximation of the range of human preferences. On the other hand, statistical and computational guarantees for models avoiding this assumption are scarce. In this paper, we investigate the statistical and computational challenges of learning a \emph{correlated} probit model, a fundamental RUM that avoids the IIA assumption. First, we establish that the classical data collection paradigm of pairwise preference data is \emph{fundamentally insufficient} to learn correlational information, explaining the lack of statistical and computational guarantees in this setting. Next, we demonstrate that \emph{best-of-three} preference data provably overcomes these shortcomings, and devise a statistically and computationally efficient estimator with near-optimal performance. These results highlight the benefits of higher-order preference data in learning correlated utilities, allowing for more fine-grained modeling of human preferences. Finally, we validate these theoretical guarantees on several real-world datasets, demonstrating improved personalization of human preferences.

31 Oct 2025

economics agents causal-inference

Generative AI and Firm Productivity: Field Experiments in Online Retail

Zhejiang University

Columbia University Zhejiang University of Finance & Economics

We quantify the impact of Generative Artificial Intelligence (GenAI) on firm productivity through a series of large-scale randomized field experiments involving millions of users and products at a leading cross-border online retail platform. Over six months in 2023-2024, GenAI-based enhancements were integrated into seven consumer-facing business workflows. We find that GenAI adoption significantly increases sales, with treatment effects ranging from

0\%

16.3\%

, depending on GenAI's marginal contribution relative to existing firm practices. Because inputs and prices were held constant across experimental arms, these gains map directly into total factor productivity improvements. Across the four GenAI applications with positive effects, the implied annual incremental value is approximately

\

5$ per consumer-an economically meaningful impact given the retailer's scale and the early stage of GenAI adoption. The primary mechanism operates through higher conversion rates, consistent with GenAI reducing frictions in the marketplace and improving consumer experience. We also document substantial heterogeneity: smaller and newer sellers, as well as less experienced consumers, exhibit disproportionately larger gains. Our findings provide novel, large-scale causal evidence on the productivity effects of GenAI in online retail, highlighting both its immediate value and broader potential.

27 Oct 2025

economics computer-science conversational-ai

Token Is All You Price

Stanford University Stanford GSB

We build a mechanism design framework where a platform designs GenAI models to screen users who obtain instrumental value from the generated conversation and privately differ in their preference for latency. We show that the revenue-optimal mechanism is simple: deploy a single aligned (user-optimal) model and use token cap as the only instrument to screen the user. The design decouples model training from pricing, is readily implemented with token metering, and mitigates misalignment pressures.

06 Oct 2025

economics econometrics economics

Estimating Treatment Effects Under Bounded Heterogeneity

University College London

Brown University CEMFI

Researchers at Brown University and UCL introduced `regulaTE`, a generalized ridge regression estimator that robustly estimates average treatment effects by explicitly bounding the variance of conditional average treatment effects. The method achieves an optimal bias-variance tradeoff, providing valid confidence intervals even when covariate distributions have limited or no overlap.

04 Oct 2025

economics economics theoretical-economics

An analysis of government subsidy policies in vaccine supply chain: Innovation, Production, or Consumption?

Nankai University

Vaccines play a crucial role in the prevention and control of infectious diseases. However, the vaccine supply chain faces numerous challenges that hinder its efficiency. To address these challenges and enhance public health outcomes, many governments provide subsidies to support the vaccine supply chain. This study analyzes a government-subsidized, three-tier vaccine supply chain within a continuous-time differential game framework. The model incorporates dynamic system equations that account for both vaccine quality and manufacturer goodwill. The research explores the effectiveness and characteristics of different government subsidy strategies, considering factors such as price sensitivity, and provides actionable managerial insights. Key findings from the analysis and numerical simulations include the following: First, from a long-term perspective, proportional subsidies for technological investments emerge as a more strategic approach, in contrast to the short-term focus of volume-based subsidies. Second, when the public is highly sensitive to vaccine prices and individual vaccination benefits closely align with government objectives, a volume-based subsidy policy becomes preferable. Finally, the integration of blockchain technology positively impacts the vaccine supply chain, particularly by improving vaccine quality and enhancing the profitability of manufacturers in the later stages of production.

25 Aug 2025

economics computer-science machine-learning

The Statistical Fairness-Accuracy Frontier

UC Berkeley

Rice University INRIA Paris

Machine learning models must balance accuracy and fairness, but these goals often conflict, particularly when data come from multiple demographic groups. A useful tool for understanding this trade-off is the fairness-accuracy (FA) frontier, which characterizes the set of models that cannot be simultaneously improved in both fairness and accuracy. Prior analyses of the FA frontier provide a full characterization under the assumption of complete knowledge of population distributions -- an unrealistic ideal. We study the FA frontier in the finite-sample regime, showing how it deviates from its population counterpart and quantifying the worst-case gap between them. In particular, we derive minimax-optimal estimators that depend on the designer's knowledge of the covariate distribution. For each estimator, we characterize how finite-sample effects asymmetrically impact each group's risk, and identify optimal sample allocation strategies. Our results transform the FA frontier from a theoretical construct into a practical tool for policymakers and practitioners who must often design algorithms with limited data.

21 Aug 2025

economics causal-inference computer-science

Effect Identification and Unit Categorization in the Multi-Score Regression Discontinuity Design with Application to LED Manufacturing

University of Hamburg ams OSRAM

This research develops a comprehensive framework for Multi-Score Regression Discontinuity Designs (MRD) that rigorously defines unit behaviors under general Boolean assignment rules and establishes conditions for identifying causal effects. The approach demonstrates that selectively excluding identifiable non-compliant units and integrating causal machine learning techniques can significantly reduce estimation variance and improve the precision of causal effect estimates, especially when applied to real-world LED manufacturing rework decisions.

19 Aug 2025

economics general-economics economics

Interpreting the Interpreter: Can We Model post-ECB Conferences Volatility with LLM Agents?

Central Bank of Malta

This paper develops a novel method to simulate financial market reactions to European Central Bank (ECB) press conferences using a Large Language Model (LLM). We create a behavioral, agent-based simulation of 30 synthetic traders, each with distinct risk preferences, cognitive biases, and interpretive styles. These agents forecast Euro interest rate swap levels at 3-month, 2-year, and 10-year maturities, with the variation across forecasts serving as a measure of market uncertainty or disagreement. We evaluate three prompting strategies, naive, few-shot (enriched with historical data), and an advanced iterative 'LLM-as-a-Judge' framework, to assess the effect of prompt design on predictive performance. Even the naive approach generates a strong correlation (roughly 0.5) between synthetic disagreement and actual market outcomes, particularly for longer-term maturities. The LLM-as-a-Judge framework further improves accuracy at the first iteration. These results demonstrate that LLM-driven simulations can capture interpretive uncertainty beyond traditional measures, providing central banks with a practical tool to anticipate market reactions, refine communication strategies, and enhance financial stability.

27 Aug 2025

economics econometrics economics

The purpose of an estimator is what it does: Misspecification, estimands, and over-identification

Stanford University

MIT

In over-identified models, misspecification -- the norm rather than exception -- fundamentally changes what estimators estimate. Different estimators imply different estimands rather than different efficiency for the same target. A review of recent applications of generalized method of moments in the American Economic Review suggests widespread acceptance of this fact: There is little formal specification testing and widespread use of estimators that would be inefficient were the model correct, including the use of "hand-selected" moments and weighting matrices. Motivated by these observations, we review and synthesize recent results on estimation under model misspecification, providing guidelines for transparent and robust empirical research. We also provide a new theoretical result, showing that Hansen's J-statistic measures, asymptotically, the range of estimates achievable at a given standard error. Given the widespread use of inefficient estimators and the resulting researcher degrees of freedom, we thus particularly recommend the broader reporting of J-statistics.

113

04 Aug 2025

economics agentic-frameworks agents

What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce

This research evaluates how state-of-the-art AI shopping agents behave in e-commerce, using a controlled simulator to quantify their product choices, sensitivities to attributes and platform features, and responses to seller optimizations. It finds that agents exhibit heterogeneous positional biases, discount sponsored products, are strongly influenced by "Overall Pick" badges, and show varying levels of basic rationality, with AI-optimized product descriptions capable of significantly shifting market shares.

05 Jul 2025

economics computer-science computer-science-and-game-theory

Deterministic Refund Mechanisms

University of Toronto

Google Research

University of Texas at Austin

Duke University

We consider a mechanism design setting with a single item and a single buyer who is uncertain about the value of the item. Both the buyer and the seller have a common model for the buyer's value, but the buyer discovers her true value only upon receiving the item. Mechanisms in this setting can be interpreted as randomized refund mechanisms, which allocate the item at some price and then offer a (partial and/or randomized) refund to the buyer in exchange for the item if the buyer is unsatisfied with her purchase. Motivated by their practical importance, we study the design of optimal deterministic mechanisms in this setting. We characterize optimal mechanisms as virtual value maximizers for both continuous and discrete type settings. We then use this characterization, along with bounds on the menu size complexity, to develop efficient algorithms for finding optimal and near-optimal deterministic mechanisms.

23 Jun 2025

economics econometrics economics

The Persistent Effects of Peru's Mining MITA: Double Machine Learning Approach

UCLA

This study examines the long-term economic impact of the colonial Mita system in Peru, building on Melissa Dell's foundational work on the enduring effects of forced labor institutions. The Mita, imposed by the Spanish colonial authorities from 1573 to 1812, required indigenous communities within a designated boundary to supply labor to mines, primarily near Potosi. Dell's original regression discontinuity design (RDD) analysis, leveraging the Mita boundary to estimate the Mita's legacy on modern economic outcomes, indicates that regions subjected to the Mita exhibit lower household consumption levels and higher rates of child stunting. In this paper, I replicate Dell's results and extend this analysis. I apply Double Machine Learning (DML) methods--the Partially Linear Regression (PLR) model and the Interactive Regression Model (IRM)--to further investigate the Mita's effects. DML allows for the inclusion of high-dimensional covariates and enables more flexible, non-linear modeling of treatment effects, potentially capturing complex relationships that a polynomial-based approach may overlook. While the PLR model provides some additional flexibility, the IRM model allows for fully heterogeneous treatment effects, offering a nuanced perspective on the Mita's impact across regions and district characteristics. My findings suggest that the Mita's economic legacy is more substantial and spatially heterogeneous than originally estimated. The IRM results reveal that proximity to Potosi and other district-specific factors intensify the Mita's adverse impact, suggesting a deeper persistence of regional economic inequality. These findings underscore that machine learning addresses the realistic non-linearity present in complex, real-world systems. By modeling hypothetical counterfactuals more accurately, DML enhances my ability to estimate the true causal impact of historical interventions.

30 Jun 2025

economics causal-inference computer-science

What Makes Treatment Effects Identifiable? Characterizations and Estimators Beyond Unconfoundedness

Yale University

Most of the widely used estimators of the average treatment effect (ATE) in causal inference rely on the assumptions of unconfoundedness and overlap. Unconfoundedness requires that the observed covariates account for all correlations between the outcome and treatment. Overlap requires the existence of randomness in treatment decisions for all individuals. Nevertheless, many types of studies frequently violate unconfoundedness or overlap, for instance, observational studies with deterministic treatment decisions - popularly known as Regression Discontinuity designs - violate overlap. In this paper, we initiate the study of general conditions that enable the identification of the average treatment effect, extending beyond unconfoundedness and overlap. In particular, following the paradigm of statistical learning theory, we provide an interpretable condition that is sufficient and necessary for the identification of ATE. Moreover, this condition also characterizes the identification of the average treatment effect on the treated (ATT) and can be used to characterize other treatment effects as well. To illustrate the utility of our condition, we present several well-studied scenarios where our condition is satisfied and, hence, we prove that ATE can be identified in regimes that prior works could not capture. For example, under mild assumptions on the data distributions, this holds for the models proposed by Tan (2006) and Rosenbaum (2002), and the Regression Discontinuity design model introduced by Thistlethwaite and Campbell (1960). For each of these scenarios, we also show that, under natural additional assumptions, ATE can be estimated from finite samples. We believe these findings open new avenues for bridging learning-theoretic insights and causal inference methodologies, particularly in observational studies with complex treatment mechanisms.

23 May 2025

economics computer-science artificial-intelligence

Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions

LLM-based digital twin simulation, where large language models are used to emulate individual human behavior, holds great promise for research in AI, social science, and digital experimentation. However, progress in this area has been hindered by the scarcity of real, individual-level datasets that are both large and publicly available. This lack of high-quality ground truth limits both the development and validation of digital twin methodologies. To address this gap, we introduce a large-scale, public dataset designed to capture a rich and holistic view of individual human behavior. We survey a representative sample of

N = 2,058

participants (average 2.42 hours per person) in the US across four waves with 500 questions in total, covering a comprehensive battery of demographic, psychological, economic, personality, and cognitive measures, as well as replications of behavioral economics experiments and a pricing survey. The final wave repeats tasks from earlier waves to establish a test-retest accuracy baseline. Initial analyses suggest the data are of high quality and show promise for constructing digital twins that predict human behavior well at the individual and aggregate levels. By making the full dataset publicly available, we aim to establish a valuable testbed for the development and benchmarking of LLM-based persona simulations. Beyond LLM applications, due to its unique breadth and scale the dataset also enables broad social science research, including studies of cross-construct correlations and heterogeneous treatment effects.

16 May 2025

economics computer-science machine-learning

IISE PG&E Energy Analytics Challenge 2025: Hourly-Binned Regression Models Beat Transformers in Load Forecasting

Columbia University

Accurate electricity load forecasting is essential for grid stability, resource optimization, and renewable energy integration. While transformer-based deep learning models like TimeGPT have gained traction in time-series forecasting, their effectiveness in long-term electricity load prediction remains uncertain. This study evaluates forecasting models ranging from classical regression techniques to advanced deep learning architectures using data from the ESD 2025 competition. The dataset includes two years of historical electricity load data, alongside temperature and global horizontal irradiance (GHI) across five sites, with a one-day-ahead forecasting horizon. Since actual test set load values remain undisclosed, leveraging predicted values would accumulate errors, making this a long-term forecasting challenge. We employ (i) Principal Component Analysis (PCA) for dimensionality reduction and (ii) frame the task as a regression problem, using temperature and GHI as covariates to predict load for each hour, (iii) ultimately stacking 24 models to generate yearly forecasts. Our results reveal that deep learning models, including TimeGPT, fail to consistently outperform simpler statistical and machine learning approaches due to the limited availability of training data and exogenous variables. In contrast, XGBoost, with minimal feature engineering, delivers the lowest error rates across all test cases while maintaining computational efficiency. This highlights the limitations of deep learning in long-term electricity forecasting and reinforces the importance of model selection based on dataset characteristics rather than complexity. Our study provides insights into practical forecasting applications and contributes to the ongoing discussion on the trade-offs between traditional and modern forecasting methods.

10 Jun 2025

economics econometrics economics

rd2d: Causal Inference in Boundary Discontinuity Designs

Princeton University

Boundary discontinuity designs -- also known as Multi-Score Regression Discontinuity (RD) designs, with Geographic RD designs as a prominent example -- are often used in empirical research to learn about causal treatment effects along a continuous assignment boundary defined by a bivariate score. This article introduces the R package rd2d, which implements and extends the methodological results developed in Cattaneo, Titiunik and Yu (2025) for boundary discontinuity designs. The package employs local polynomial estimation and inference using either the bivariate score or a univariate distance-to-boundary metric. It features novel data-driven bandwidth selection procedures, and offers both pointwise and uniform estimation and inference along the assignment boundary. The numerical performance of the package is demonstrated through a simulation study.

06 May 2025

economics econometrics economics

Causal Inference in Counterbalanced Within-Subjects Designs

Harvard University

UC Berkeley

Experimental designs are fundamental for estimating causal effects. In some fields, within-subjects designs, which expose participants to both control and treatment at different time periods, are used to address practical and logistical concerns. Counterbalancing, a common technique in within-subjects designs, aims to remove carryover effects by randomizing treatment sequences. Despite its appeal, counterbalancing relies on the assumption that carryover effects are symmetric and cancel out, which is often unverifiable a priori. In this paper, we formalize the challenges of counterbalanced within-subjects designs using the potential outcomes framework. We introduce sequential exchangeability as an additional identification assumption necessary for valid causal inference in these designs. To address identification concerns, we propose diagnostic checks, the use of washout periods, and covariate adjustments, and alternative experimental designs to counterbalanced within-subjects design. Our findings demonstrate the limitations of counterbalancing and provide guidance on when and how within-subjects designs can be appropriately used for causal inference.

09 Jun 2025

economics economics theoretical-economics

Optimal Platform Design

Columbia University

Search and matching increasingly takes place on online platforms. These platforms have elements of centralized and decentralized matching; platforms can alter the search process for its users, but are unable to eliminate search frictions entirely. I study a model where platforms can change the distribution of potential partners that an agent searches over and characterize search equilibria on platforms. When agents possess private information about their match characteristics and the platform designer acts as a profit maximizing monopolist, I characterize the optimal platform. If match characteristics are complementary and utility is transferable, I show that the only possible source of inefficiency in the optimal platform is exclusion, unlike standard non-linear pricing problems. That is, the optimal platform is efficient conditional on inclusion. Matching on the optimal platform is perfectly assortative -- there is no equilibrium mismatch.

145

There are no more papers matching your filters at the moment.

Events

Watch recordings

Personalize Your Feed

Install Browser Extension

Blog|We're hiring

alphaXiv

Explore

Login

Labs

Feedback

Dark mode

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Events

Personalize Your Feed

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

State capacity, innovation, and endogenous development in Chile

Government Transparency Affects Innovation: Evidence from Wireless Products

Learning Correlated Reward Models: Statistical Barriers and Opportunities

Generative AI and Firm Productivity: Field Experiments in Online Retail

Token Is All You Price

Estimating Treatment Effects Under Bounded Heterogeneity

An analysis of government subsidy policies in vaccine supply chain: Innovation, Production, or Consumption?

The Statistical Fairness-Accuracy Frontier

Effect Identification and Unit Categorization in the Multi-Score Regression Discontinuity Design with Application to LED Manufacturing

Interpreting the Interpreter: Can We Model post-ECB Conferences Volatility with LLM Agents?

The purpose of an estimator is what it does: Misspecification, estimands, and over-identification

What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce

Deterministic Refund Mechanisms

The Persistent Effects of Peru's Mining MITA: Double Machine Learning Approach

What Makes Treatment Effects Identifiable? Characterizations and Estimators Beyond Unconfoundedness

Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions

IISE PG&E Energy Analytics Challenge 2025: Hourly-Binned Regression Models Beat Transformers in Load Forecasting

rd2d: Causal Inference in Boundary Discontinuity Designs

Causal Inference in Counterbalanced Within-Subjects Designs

Optimal Platform Design

Events

Personalize Your Feed