This repository contains a script-driven implementation of a risk-sensitive distributional actor-critic for sales dialogue control, together with the paper, synthetic simulator, benchmark tooling, and inference utilities.
The project is intentionally separated from the presentation repo. The nested bantr-presentation/ directory is ignored here and remains its own GitHub repository.
paper.texandpaper.pdf: research papersales_rl_core.py: simulator, models, training loops, evaluation, plotting, checkpoint helperstrain_sales_rl_agent.py: train scalar or distributional controllers and save checkpointsuse_sales_rl_agent.py: load a checkpoint, score a manual state, or run simulator test rolloutsrun_sales_benchmark.py: reproduce the benchmark figures used in the papergenerate_architecture_figure.py: regenerate the external architecture diagram used in the papersample_state.json: example state input for the inference scriptfigures/: paper figuresartifacts/: benchmark summaries and metrics
For local CPU work:
python -m pip install torch
python -m pip install -r requirements.txtFor an NVIDIA A100:
python -m pip install --upgrade pip
python -m pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision torchaudio
python -m pip install -r requirements.txtTrain the distributional agent on GPU and save a checkpoint:
python train_sales_rl_agent.py \
--algorithm distributional_a2c \
--device cuda \
--batch-envs 256 \
--hidden-dim 256 \
--total-updates 480 \
--evaluate-episodes 512Train both scalar and distributional baselines and regenerate comparison figures:
python train_sales_rl_agent.py \
--algorithm both \
--device cuda \
--batch-envs 256 \
--hidden-dim 256 \
--total-updates 480 \
--evaluate-episodes 512Score a single manually specified sales state:
python use_sales_rl_agent.py \
--checkpoint checkpoints/distributional_a2c.pt \
--device cuda \
--state-file sample_state.jsonRun greedy simulator tests with the saved checkpoint:
python use_sales_rl_agent.py \
--checkpoint checkpoints/distributional_a2c.pt \
--device cuda \
--simulate-episodes 5python generate_architecture_figure.py
python run_sales_benchmark.py
pdflatex -interaction=nonstopmode -halt-on-error paper.tex
pdflatex -interaction=nonstopmode -halt-on-error paper.tex