Replies: 4 comments 5 replies
-
|
您好,PP-OCRv5_rec模型是基于非常庞大的训练集训练,而您在自己少量数据上微调的话会破坏模型原有权重,丧失通用能力,尤其是冻结backbone只训练head的话甚至连您自己的数据都拟合不了,您可以在星河零代码产线https://aistudio.baidu.com/pipeline/mine 选择文本识别模型在您自己数据的基础上修改 |
Beta Was this translation helpful? Give feedback.
-
|
数据集是不开源的,可以在星河上训练,然后把权重下载到本地训练,另外如果你是英文场景的话,可以试试en_PP-OCRv5_mobile_rec 这个模型,对o,0区分会好一些 |
Beta Was this translation helpful? Give feedback.
-
|
对,只有server模型可以融合训练,如何本地训练的话如果有大量的高质量数据集,也是可以的,把两份数据融合到一起。 |
Beta Was this translation helpful? Give feedback.
-
|
没有了,这个形近字问题确实很难完全解决 |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
需求:模型加入少量新样本微调,如何不影响原本模型的识别效果
问题:下载官网rec训练模型进行微调,同时冻结除了head以外的层数。目的是提高对某些字符的识别能力(模型对绝大多数字符识别的都很准,只有几张图片会把0识别成o或者O,所以想单独训这部分图片),但是训练15轮就发现模型识别总体的精度大大下降,微调前原本能识别非常准确的字符,用微调后的模型就识别的很乱。
en_PP-OCRv5_mobile_rec.yaml参数:
Global:
model_name: en_PP-OCRv5_mobile_rec # To use static model for inference.
debug: false
use_gpu: true
epoch_num: 50
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/0202_freeze2/en_rec/model
save_epoch_step: 1
eval_batch_step: [0, 5]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img:
character_dict_path: ./ppocr/utils/dict/ppocrv5_en_dict.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/0202_freeze/en_rec/predicts_en_ppocrv5.txt
d2s_train_image_shape: [3, 48, 320]
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.000005
warmup_epoch: 5
regularizer:
name: L2
factor: 0.0005 #0.00003
Architecture:
model_type: rec
algorithm: SVTR_LCNet
Transform:
Backbone:
name: PPLCNetV3
scale: 0.95
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 120
depth: 2
hidden_dims: 120
kernel_size: [1, 3]
use_guide: True
Head:
fc_decay: 0.00001
- NRTRHead:
nrtr_dim: 384
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- NRTRLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: False
Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: 'C:/Users/user/Desktop/Vendor2'
ext_op_transform_idx: 1
label_file_list:
- 'C:/Users/user/Desktop/Vendor2/rec_gt.txt'
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
max_text_length: *max_text_length
- RecAug:
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
sampler:
name: MultiScaleSampler
scales: [[320, 32], [320, 48], [320, 64]]
first_bs: &bs 4
fix_bs: false
divided_factor: [8, 16] # w, h
is_training: True
loader:
shuffle: true
batch_size_per_card: *bs
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: 'C:/Users/user/Desktop/Vendor2'
label_file_list:
- 'C:/Users/user/Desktop/Vendor2/rec_gt.txt'
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
shuffle: true
drop_last: false
batch_size_per_card: 2
num_workers: 4
其他:我搜索了类似的问题,但是好像都没有有用的回复。希望大佬们能指点一二
Beta Was this translation helpful? Give feedback.
All reactions