weixin_39526872
2020-12-08 19:10 阅读 3

Text threshold vs detection threshold

I think you might have mixed up the detection_threshold and text_threshold variables in detection.py since they differ from the original CRAFT implementation and sine there is an incongruity between a comment and the variable name in one instance.

Here you have:

python
_, text_score = cv2.threshold(textmap,
    thresh=text_threshold,
    maxval=1,
    type=cv2.THRESH_BINARY)

The original implementation uses the low_text parameter here (what you're calling detection_threshold). So I believe text_threshold should be replaced with detection_threshold.

And then a few lines down you have:

python
# If the maximum value within this connected component is less than
# text threshold, we skip it.
if np.max(textmap[labels == component_id]) < detection_threshold:
     continue

Should detection_threshold be text_threshold as the comment says and as it is in the original implementation?

If so, the fix seems easy. I could open the PR if you'd like. Or you as the maintainer can take it.

Again, thanks for your awesome work on this 👏

该提问来源于开源项目:faustomorales/keras-ocr

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

4条回答 默认 最新

  • weixin_39775910 weixin_39775910 2020-12-08 19:10

    Thanks for taking a close look under the hood! You're right that the variable names don't match. But I believe the values used in the function (and therefore the net effect) do match.

    python
    parser.add_argument('--text_threshold', default=0.7, type=float, help='text confidence threshold')
    parser.add_argument('--low_text', default=0.4, type=float, help='text low-bound score')
    parser.add_argument('--link_threshold', default=0.4, type=float, help='link confidence threshold')
    

    [source]

    The variable names are hard for me to keep in my head so I made a table to summarize the differences.

    | purpose | keras-ocr variable name | keras-ocr variable value | CRAFT-pytorch variable name | CRAFT-pytorch | |------------------------|-------------------------|--------------------------|-----------------------------|----------------| | threshold the text map | text_threshold | 0.4 | low_text | 0.4 | | threshold the link map | link_threshold | 0.4 | link_threshold | 0.4 | | filter out detections | detection_threshold | 0.7 | text_threshold | 0.7 |

    That still leaves us with the question of whether we should change the variable names to match the original implementation. In my humble opinion, the new names are more semantically descriptive. text_threshold and link_threshold are used in similar ways, so I think it makes sense for their variable names to have similar structure. This is in contrast with the low_text / link_threshold naming which, to me, implies that these values are used in different ways. Could you share your thoughts on that?

    I'd also appreciate you checking to see if I've made a mistake above -- this can all be a little confusing and I may have misread something.

    点赞 评论 复制链接分享
  • weixin_39526872 weixin_39526872 2020-12-08 19:10

    You're right, your values do match the original implementation, so there's no impact on post-processing. It was, however, a little confusing when I adjusted the values and wasn't seeing the effects I expected.

    I do agree with you about not changing the variable names to match the original ones. Yours are more descriptive, particularly with detection_threshold, which is used with the connected-component labeling that actually does the "detecting". I also like how you added the size_threshold, which was originally just hardcoded.

    As such, I think a little more documentation that highlights the meaning and usage of these params is all you need. A docstring in the source would probably suffice, as most users likely won't want/need/know to alter these for their use case. This discussion in the original repo does a pretty good job at describing each's purpose in plain English.

    点赞 评论 复制链接分享
  • weixin_39775910 weixin_39775910 2020-12-08 19:10

    I think updating the docstring is a fine idea. Unfortunately, the description in the linked comment defines the variables in terms of their perceived effect as opposed to how they are actually used. Below are what I believe are more accurate definitions that discuss how the values are used in addition to their effect. I'm interested in your feedback on the definitions. If you agree, I'll add them to the docstring for detector.detect. Or, since it was your idea, you're welcome to file a PR. I'd very much like you to get credit!

    • text_threshold: When the text map is processed, it is converted from confidence (float from zero to one) values to classification (0 for not text, 1 for text) using binary thresholding. The text_threshold value determines the breakpoint at which a value is converted to a 1 or a 0. For example, if text_threshold is 0.4 and a value for a particular point on the text map is 0.5, that value gets converted to a 1. The higher this value is, the less likely it is that characters will be merged together into a single word. The lower this value is, the more likely it is that non-text will be detected. Therein lies the balance.
    • link_threshold: This is the same as text_threshold, but is applied to the link map instead of the text map.
    • detection_threshold: We want to avoid including boxes that may have represented large regions of low confidence text predictions. To do this, we do a final check for each word box to make sure the maximum confidence value exceeds some detection threshold. This is the threshold used for this check.
    点赞 评论 复制链接分享
  • weixin_39775910 weixin_39775910 2020-12-08 19:10

    I've added this documentation in 717bfcde4e0447c6630487f78787ff62bdabe73e. Thanks for raising these questions!

    点赞 评论 复制链接分享

相关推荐