Add support for PaddleOCRv5 models for character recognition.#747
Add support for PaddleOCRv5 models for character recognition.#747
Conversation
src/monolithic/gst/inference_elements/common/post_processor/converters/to_tensor/paddle_ocr.cpp
Show resolved
Hide resolved
src/monolithic/gst/inference_elements/common/post_processor/converters/to_tensor/paddle_ocr.cpp
Show resolved
Hide resolved
src/monolithic/gst/inference_elements/common/post_processor/converters/to_tensor/paddle_ocr.cpp
Show resolved
Hide resolved
| double exp_sum = 0.0; | ||
| for (size_t v = 0; v < vocab_size; ++v) | ||
| exp_sum += std::exp(static_cast<double>(row[v] - row_max)); | ||
| log_conf_sum += std::log(1.0 / exp_sum + 1e-10); |
There was a problem hiding this comment.
I see this line computes log(1/exp_sum + 1e-10) as a log-softmax approximation. the 1e-10 is added after the division to prevent log(0), but this is numerically odd: standard log-softmax would be log_max - log(exp_sum), not this form.
please check for correctness
There was a problem hiding this comment.
this is to avoid divide by zero error, changed to explicit check if not zero
|
|
||
| export_ppocr_v5_model() { | ||
| local MODEL_NAME=$1 | ||
| MODEL_DIR="$MODELS_PATH/public/$MODEL_NAME" |
There was a problem hiding this comment.
MODEL_DIR should be declared as local
the other variables in this function (MODEL_NAME, DST_FILE1, DST_FILE2) all use local, but MODEL_DIR is in the parent scope
|
|
||
| // Output shape: [batch_size, seq_len, vocab_size] | ||
| const auto &dims = blob->GetDims(); | ||
| const size_t vocab_size = (dims.size() == 3) ? dims[2] : 0; |
There was a problem hiding this comment.
these lines could set vocab_size and seq_len to 0 if the tensor rank is wrong, but the subsequent check at line 224 produces a misleading error message "Unexpected vocabulary size".
It would be clearer to add an explicit tensor rank check, that would give a clear error when the model output shape is wrong, rather than defaulting and failing later
oonyshch
left a comment
There was a problem hiding this comment.
So far everything that I've not mentioned in comments LGTM
Description
Add converters for PaddleOCR v5 recognition models; add support to download models from HuggingFace repository + load vocabulary form HF config file.
Fixes # (issue)
Any Newly Introduced Dependencies
N/A.
How Has This Been Tested?
Validated locally; to be added to CI tests as part of new sample.
Checklist: