Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
e55db81
init commit for webarena verified
NicolasAG Sep 16, 2025
e8a5594
upd Makefile
NicolasAG Sep 17, 2025
8f33d10
adding the basic files
NicolasAG Sep 17, 2025
02dfc57
update dependencies
NicolasAG Sep 19, 2025
48acaeb
start adding integration with wa_verified
NicolasAG Sep 22, 2025
5be792d
upd readme
NicolasAG Sep 23, 2025
aae906c
use custom backend for webarena_verified
NicolasAG Sep 23, 2025
2b04c7d
pass the wa instance to the evaluator
NicolasAG Sep 23, 2025
0d7e8dc
pass the wa instance to the evaluator
NicolasAG Sep 23, 2025
b57c0f8
cleanup evaluator
NicolasAG Sep 26, 2025
0330f72
remove custom webarena verified instance
NicolasAG Oct 3, 2025
ab0437b
update requirements to latest wav code
NicolasAG Oct 3, 2025
bd43467
use simpler and cleaner wav eval
NicolasAG Oct 3, 2025
fecedb1
enable tracing
NicolasAG Oct 16, 2025
8fdebe6
fix wav
NicolasAG Oct 20, 2025
4bdfa7e
update to new webarena verified version
NicolasAG Oct 22, 2025
e59f754
update task name template to webarena_verified.templateID.taskID
NicolasAG Oct 22, 2025
5b05044
fix config
NicolasAG Oct 23, 2025
56574eb
fix csv file
NicolasAG Oct 23, 2025
81f930c
add webarena_verified backend
NicolasAG Oct 25, 2025
cbca5a2
fix wav tasks
NicolasAG Oct 25, 2025
8d4381b
do not check reachable if url is todo
NicolasAG Oct 25, 2025
63b4b07
fix tmp trace creation, update goal to prompt model to satisfy wav re…
NicolasAG Oct 27, 2025
b7f847a
create webarena_verified action space with special submit function to…
NicolasAG Oct 28, 2025
3e6b5b7
look for extra header file path in environment variable
NicolasAG Oct 29, 2025
525fd3b
undo special action set for webarena_verified
NicolasAG Oct 31, 2025
4272b5e
remove wav actions
NicolasAG Oct 31, 2025
fea25ed
load extra context headers for webarena(+lite)
NicolasAG Nov 3, 2025
fc090f0
update README
NicolasAG Nov 5, 2025
377dcca
update requirements
NicolasAG Nov 5, 2025
1f02f3f
update makefile and readme
NicolasAG Nov 5, 2025
f86a2b3
update readme
NicolasAG Nov 6, 2025
2bf2539
Merge remote-tracking branch 'origin/main' into wa_verified
NicolasAG Nov 6, 2025
df3bfa4
update requirements
NicolasAG Nov 6, 2025
bf6cd9a
update readme
NicolasAG Nov 6, 2025
76ab14e
update test
NicolasAG Nov 6, 2025
eae4152
black formater
NicolasAG Nov 6, 2025
afdf218
upd makefile
NicolasAG Nov 10, 2025
c3814bf
update to new webarena_verified dataset version
NicolasAG Nov 10, 2025
c0c0814
small debug
NicolasAG Nov 10, 2025
d7dc845
add massage of shopping_admin tasks
NicolasAG Nov 13, 2025
f7363c8
Merge remote-tracking branch 'origin/main' into wa_verified
NicolasAG Dec 2, 2025
b8a666a
assume all endpoints are running
NicolasAG Dec 2, 2025
49506a7
update to latest version before the public release
NicolasAG Dec 2, 2025
f326bbf
update instructions to fetch latest version before the public release
NicolasAG Dec 2, 2025
ced1021
exponential backoff
NicolasAG Dec 4, 2025
106a685
update README
NicolasAG Dec 4, 2025
0019f4e
compare json with the one in the library
NicolasAG Dec 4, 2025
e02a299
update install instructions
NicolasAG Dec 4, 2025
045d0e4
update makefile
NicolasAG Dec 9, 2025
5435db3
update pypi deployment with webarena-verified
amanjaiswal73892 Dec 9, 2025
e5c75ca
fix assets directory
amanjaiswal73892 Dec 9, 2025
c2d1536
fix task id template
NicolasAG Dec 12, 2025
75738c4
Merge branch 'wa_verified' of github.com:ServiceNow/BrowserGym into w…
NicolasAG Dec 12, 2025
55a57b0
remove task json file, use the one from the webarena-verified library…
NicolasAG Dec 12, 2025
cf11699
remove metadata and create it dynamically
NicolasAG Dec 15, 2025
29ce81b
do not hardcode revision number
NicolasAG Dec 15, 2025
4b73bb4
fix
NicolasAG Dec 15, 2025
ed6d668
run black formater
NicolasAG Dec 15, 2025
89b6460
fix format?
NicolasAG Dec 15, 2025
333d368
always create the metadata file
NicolasAG Dec 15, 2025
6535641
version-bump-dev
amanjaiswal73892 Dec 15, 2025
84c1246
Remove git dependency and add ins to install from source
amanjaiswal73892 Dec 15, 2025
ddeb2e7
version-bump-dev 0.14.3.dev3
amanjaiswal73892 Dec 15, 2025
0989834
Merge branch 'main' into wa_verified
amanjaiswal73892 Dec 16, 2025
e61b022
add webarena-verified package as a dependency
amanjaiswal73892 Jan 8, 2026
731852c
version-bump-dev 0.14.3.dev4
amanjaiswal73892 Jan 8, 2026
4367cc7
add webarena-verified in the dev requirements.txt
amanjaiswal73892 Jan 8, 2026
be63600
update gitignore
NicolasAG Jan 20, 2026
00448b4
fix links
NicolasAG Jan 20, 2026
4e2a80e
Merge remote-tracking branch 'origin/main' into wa_verified
NicolasAG Jan 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ _Example of a GPT4-V agent executing openended tasks (top row, chat interactive)
BrowserGym includes the following benchmarks by default:
- [MiniWoB](https://miniwob.farama.org/)
- [WebArena](https://webarena.dev/)
- [WebArenaVerified](https://github.com/ServiceNow/platform-labs-webarena-verified)
- [WebArenaVerified](https://github.com/ServiceNow/webarena-verified)
- [VisualWebArena](https://jykoh.com/vwa)
- [WorkArena](https://github.com/ServiceNow/WorkArena)
- [AssistantBench](https://github.com/oriyor/assistantbench)
Expand Down
2 changes: 1 addition & 1 deletion browsergym/webarena_verified/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# WebArena Verified benchmark for BrowserGym

This package provides `browsergym.webarena_verified`, which integrates the [WebArena Verified benchmark](https://github.com/ServiceNow/platform-labs-webarena-verified) into BrowserGym.
This package provides `browsergym.webarena_verified`, which integrates the [WebArena Verified benchmark](https://github.com/ServiceNow/webarena-verified) into BrowserGym.

## WebArena Server Deployment

Expand Down