LLM-based digital twin simulation, where large language models are used to
emulate individual human behavior, holds great promise for research in AI,
social science, and digital experimentation. However, progress in this area has
been hindered by the scarcity of real, individual-level datasets that are both
large and publicly available. This lack of high-quality ground truth limits
both the development and validation of digital twin methodologies. To address
this gap, we introduce a large-scale, public dataset designed to capture a rich
and holistic view of individual human behavior. We survey a representative
sample of
N=2,058 participants (average 2.42 hours per person) in the US
across four waves with 500 questions in total, covering a comprehensive battery
of demographic, psychological, economic, personality, and cognitive measures,
as well as replications of behavioral economics experiments and a pricing
survey. The final wave repeats tasks from earlier waves to establish a
test-retest accuracy baseline. Initial analyses suggest the data are of high
quality and show promise for constructing digital twins that predict human
behavior well at the individual and aggregate levels. By making the full
dataset publicly available, we aim to establish a valuable testbed for the
development and benchmarking of LLM-based persona simulations. Beyond LLM
applications, due to its unique breadth and scale the dataset also enables
broad social science research, including studies of cross-construct
correlations and heterogeneous treatment effects.