forked from uf-hobi-informatics-lab/ClinicalTransformerNER
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathrun_transformer_batch_prediction.sh
More file actions
26 lines (24 loc) · 1.13 KB
/
run_transformer_batch_prediction.sh
File metadata and controls
26 lines (24 loc) · 1.13 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
: '
The script is used to run multi-file batch prediction using transformer ner
We only use bert as example, the roberta, XLNet should be the same
The input files must have offset information
If no offset information, just combine all the files into one test.txt and use the do_pred from run_transformer_ner.sh for prediction
This script is design for mainly production using to generate brat/BioC formatted outputs with offset information.
'
################# BERT example #####################
export CUDA_VISIBLE_DEVICES=0
# config and tokenizer information can be found in the pretrained model dir
# use format 1 for BRAT, 2 for BioC, 0 as default for BIO
python ./src/run_transformer_batch_prediction.py \
--model_type bert \
--pretrained_model <your pretrained model path> \
--raw_text_dir <path to the original text files> \
--preprocessed_text_dir <path to the bio formatted files> \
--output_dir <path to save predicted results> \
--max_seq_length 128 \
--do_lower_case \
--eval_batch_size 8 \
--log_file ./log.txt\
--do_format 1 \
--do_copy \
--data_has_offset_information