This project evaluates the robustness of Large Language Model (LLM) based re-rankers against prompt injection attacks. The project consists of two parts:
- Preference Vulnerability (ASR) Evaluation - Direct manipulation assessment
- Ranking Vulnerability (nDCG@10) Evaluation - Impact on full pipeline
Install required dependencies:
pip install -r requirements.inThis module evaluates how often attacks successfully manipulate LLM outputs.
Try one of the minimal examples to test a single attack:
cd LLM_prompt_attack
# Run setwise attack example
bash example_setwise.shThese examples use reduced sample sizes (1024 instead of 4096) for quick testing.
Use generate_jobs.sh to create individual scripts for systematic experiments:
cd LLM_prompt_attack
bash generate_jobs.shThis will generate multiple runnable scripts like run_Qwen3-1.7B_msmarco-passage-trec-dl-2019_setwise.sh. Execute any of them:
bash run_Qwen3-1.7B_msmarco-passage-trec-dl-2019_setwise.shEdit these in generate_jobs.sh or example scripts:
| Parameter | Options | Description |
|---|---|---|
MODELS |
Any HuggingFace model ID | Models to evaluate |
DATASETS |
See Supported Datasets | Datasets to test |
SETTINGS |
setwise, listwise, pairwise |
Ranking methods |
ATTACKS |
so (DOH), sd (DCH) |
Attack types |
POSITIONS |
front, back |
Attack injection positions |
Example Configuration:
MODELS=(
"Qwen/Qwen3-1.7B"
"google/gemma-3-12b-it"
)
DATASETS=(
"beir/trec-covid"
"beir/scifact/test"
)
SETTINGS=(setwise listwise pairwise)
ATTACKS=(so sd)
POSITIONS=(front back)| Parameter | Default | Description |
|---|---|---|
NUM_SAMPLES |
4096 |
Number of samples per experiment |
SET_SIZE |
4 |
Documents per ranking set |
N_JOBS |
4 |
Parallel workers for API calls |
| Parameter | Default | Description |
|---|---|---|
GPU_MEMORY_UTILIZATION |
0.85 |
Fraction of GPU memory (0.0-1.0) |
MAX_MODEL_LEN |
32768 |
Maximum sequence length |
MAX_NUM_SEQS |
8 |
Max concurrent requests |
SERVER_WAIT_TIMEOUT |
900 |
Server startup timeout (seconds) |
BASE_PORT |
8000 |
Server port |
Guidelines:
- Large models (>70B): Set
GPU_MEMORY_UTILIZATION=0.90andtensor-parallel-size 2 - Smaller models (<40B): Can use
GPU_MEMORY_UTILIZATION=0.85 MAX_MODEL_LENshould be kept at 32768 for consistent comparisons
| Dataset | Description | Relevance Levels |
|---|---|---|
msmarco-passage/trec-dl-2019 |
TREC DL 2019 | [0, 1, 2, 3] |
msmarco-passage/trec-dl-2020 |
TREC DL 2020 | [0, 1, 2, 3] |
beir/trec-covid |
COVID-19 research | [-1, 0, 1, 2] |
beir/webis-touche2020/v2 |
Argumentative search | [-2, 1, 2, 3, 4, 5] |
beir/scifact/test |
Scientific fact verification | [0, 1] |
beir/dbpedia-entity/test |
Entity retrieval | [0, 1, 2] |
You can simply integrate any datasets from ir_datasets by follow the dataset specification in the dataset_config.py
This module evaluates the impact of attacks on full IR pipeline using NDCG@10.
cd LLM_re_ranker
bash generate_setwise_jobs.shThis generates individual scripts. Run any of them:
bash run_Qwen3-32B_trec-dl-2019_none_back.sh
bash run_Qwen3-32B_trec-dl-2019_so_back.shWe use BM25 as the first-stage retriever. Generate BM25 runs using pyserini:
# TREC DL 2019 example
python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v1-passage \
--topics dl19-passage \
--output run.msmarco-v1-passage.bm25-default.dl19.txt \
--bm25 --k1 0.9 --b 0.4
# Evaluate BM25 baseline
python -m pyserini.eval.trec_eval -c -l 2 -m ndcg_cut.10 dl19-passage \
run.msmarco-v1-passage.bm25-default.dl19.txtExpected output:
ndcg_cut_10 all 0.5058
Manual execution example:
CUDA_VISIBLE_DEVICES=0 python3 run_attack.py \
run --model_name_or_path Qwen/Qwen3-32B \
--tokenizer_name_or_path Qwen/Qwen3-32B \
--run_path run.msmarco-v1-passage.bm25-default.dl19.txt \
--save_path outputs/run.setwise.heapsort.txt \
--ir_dataset_name msmarco-passage/trec-dl-2019 \
--hits 100 \
--query_length 32 \
--passage_length 128 \
--scoring generation \
--device cuda \
--attack_type so \
--attack_position back \
setwise --num_child 3 \
--method heapsort \
--k 10
# Evaluate
python -m pyserini.eval.trec_eval -c -l 2 -m ndcg_cut.10 dl19-passage \
outputs/run.setwise.heapsort.txtParameters:
--num_child: Number of child documents to compare (3 means 3 documents + 1 parent = 4 total)--attack_type: Attack method (none,so,sd)--attack_position: Where to inject attack (front,back)
Our detailed experiments can be found in the Results/.
.
├── LLM_prompt_attack/ # ASR evaluation
│ ├── generate_jobs.sh # Generate experiment scripts
│ ├── example_setwise.sh # Quick setwise example
│ ├── example_listwise.sh # Quick listwise example
│ ├── example_pairwise.sh # Quick pairwise example
│ ├── setwise_ranking_attack_openai.py
│ ├── listwise_ranking_attack_openai.py
│ ├── pairwise_ranking_attack_openai.py
│ └── dataset_config.py
│
├── LLM_re_ranker/ # NDCG evaluation
│ ├── generate_setwise_jobs.sh
│ ├── run_attack.py
│ └── llmrankers/
│ ├── setwise_attack.py
│ └── rankers.py
│
├── Results/
│
│
├── requirements.txt
└── README.md
This project is released under the MIT License.
