A curated research map for proactive agents: AI systems that infer latent user needs, decide when to intervene, ask for missing context or consent, and initiate useful assistance before a complete explicit command.
If this list is useful, a ⭐ helps others find it.
- Research Map: question-driven clusters for quickly locating papers by intervention timing, inference, long-term intent, personalization, evaluation, and safety.
- Benchmark Matrix: side-by-side benchmark comparison by domain, input stream, proactive target, user model, data type, and metrics.
This list prioritizes papers where proactivity is a central research target. The list is broader than computer-use agents: it includes proactive dialogue, planning, recommendation, wearable assistance, GUI/mobile/OS agents, programming assistants, personalization, memory, benchmarks, optimization, and human factors.
Typical inclusion signals:
- The agent predicts latent intent or missing context before a complete user instruction.
- The agent decides when to ask, suggest, remind, intervene, execute, or stay silent.
- The paper evaluates proactive behavior, intervention timing, user control, consent, interruption cost, or personalization.
- The benchmark or dataset makes proactivity the primary task rather than a side effect of general tool use.
Resource labels:
- Paper: arXiv, ACL Anthology, DOI, OpenReview, ACM, Springer, or official proceedings page.
- Website: project page, conference page, lab page, or documentation.
- Code / Dataset: GitHub, released code, released benchmark, or released dataset.
- Notes: short English decision card with why the paper matters, proactivity signal, evaluation setup, limitations, and use cases.
Selected starting points for understanding the field.
For detailed comparison, see BENCHMARKS.md.
Tags are intentionally compact and reusable. They describe the paper's main contribution, not every detail.
| Tag | Meaning |
|---|---|
Definition |
Defines or reframes proactive agents, proactive dialogue, or design-space boundaries. |
Survey |
Synthesizes a broad proactive-agent subfield or taxonomy. |
Human Factors |
Studies interruption, control, satisfaction, workload, adoption, or developer experience. |
Trust |
Focuses on competence perception, calibrated reliance, or trustworthy interaction. |
Safety & Consent |
Covers confirmation, autonomy boundaries, reversibility, rejection, or risk control. |
Privacy |
Centers privacy management, data minimization, or personal-context governance. |
Intervention Timing |
Focuses on when an agent should act, ask, suggest, or remain silent. |
Intent Inference |
Infers latent goals, hidden constraints, future tasks, or missing information. |
Clarification |
Proactively asks questions before planning, execution, or recommendation. |
Dialogue |
Proactive behavior in conversational, persuasive, or task-oriented interaction. |
Planning |
Proactive decomposition, task planning, scheduling, or future-state reasoning. |
Tool Use |
Tool calling, API execution, GUI operation, or action orchestration. |
Recommendation |
Proactive recommendation or suggestion ranking. |
Collaboration |
Multi-party or human-agent collaborative problem solving. |
Education |
Learning, tutoring, reflection, or student engagement contexts. |
Long-horizon |
Multi-session, dynamic, future-event, or long-running task maintenance. |
Personalization |
User preferences, personas, profiles, long-term user history, or user-specific adaptation. |
Memory |
Persistent memory, episodic memory, visual memory, skill memory, or cognitive memory structures. |
Simulation |
User simulation, environment simulation, synthetic users, or synthetic workflows. |
Optimization |
RL, reward modeling, multi-objective optimization, self-evolution, or behavior tuning. |
Skill Learning |
Skill creation, skill internalization, skill memory, or reusable procedure learning. |
Benchmark |
Introduces a dataset, evaluation suite, benchmark, simulator, or diagnostic protocol. |
Real-world Data |
Uses real user traces, field-study data, or deployment-like logs. |
Desktop |
Desktop activity streams, workstation context, or event logs. |
GUI |
Graphical interface agents, browser/app screens, or visual UI interaction. |
Mobile |
Mobile GUI, Android/iOS workflows, phone sensors, or mobile user context. |
OS |
Operating-system agents, cross-app workflows, or OS-level verification. |
IDE |
Programming assistants, code editors, or developer tooling. |
Multimodal / Wearable |
Video, audio, AR, smart glasses, egocentric streams, or open-world sensory context. |
Sensing |
Active context acquisition, sensor selection, or on-demand sensory capture. |
Embodied |
Robots, physical environments, or human-populated embodied settings. |
Pull requests are welcome.
Before adding a paper, check that it satisfies at least one of:
- It predicts latent user intent before a complete explicit instruction.
- It decides when to intervene, ask, suggest, execute, remind, or stay silent.
- It evaluates proactive assistance, interruption cost, user control, consent, or personalization.
- It contributes a benchmark or dataset where proactivity is the primary task.
Suggested note template:
# Paper Title
## Why It Matters
...
## Proactivity Signal
...
## Evaluation Setup
...
## Key Limitations
...
## Use For
...Maintained by Low Entropy AI.
