OCR Guidelines (1)
OCR Guidelines (1)
Chinese Version:
根据场景不同,每张图平均30-40框
标注内容包含文本框的位置坐标,4点标注,且顺序为左上、右上、右下、左下,每个点应该
有x坐标、y坐标。
准确率:所有交付数据准确率均为99%(文本框位置应准确包围文字,框与文字边缘应有1
像素空白间隙;文本框位置正确的框,继续评估文本框内容是否与图像上一致,统计时按字
段统计,即除了空格外,其他任意有一个字符错,该字段(该框)算错;文本框位置错误的
框,直接算该字段(该框)错,不再评估文本内容是否一致; 遗漏文本框也算入总文本框数
量内,算该文本框错误;标注准确率:位置准确且内容一致的文本框数量 / 总文本框数
量)。
1. 遇到非目标语言是要转写吗?一整句都是非目标语言怎么转写,一句话里目标语言句子里
夹杂几个非目标语言单词的情况怎么转写?
一整句内存在非目标语言,也需要转写,按实际转写即可,图文一致。
2. 图片上所有目标语言都要转写成文本嘛?针对不清楚的文本界限是什么样的?肉眼看不清
就不需要转写嘛?包括文本被曝光等情况。
图片上所有文本都需要转写文本。肉眼看不清,但确实是文本框的,文本内容可以用特殊标
签,###。
3. 一张图片上平均下来有多少行文字需要转写?是否有针对超过多少行文字则不需要转写有
个界限?
图片上的文字都需要转写,没有界限。最后交付总的>=图片个数同时>=有效文本框数
1/6
4. 若文本有截断,比如English的h被截断了,但我一看就知道是Englsih,那h要不要转写出
来啊?
要转,转成English
被截断且无完整字,不用框
有完整的字,其他字被截断且无法联想出被截断的那个字是啥,拉框并标###
有完整的字,其他字被截断但可以联想出被截断的内容,正常进行拉框和转写,如图示需转
出完整内容【用编】
5. 拉框,框之间是否可以交叉或者重叠?
框之间不要出现重叠
6. 标点符号是用哪种输入模式还是半角?
用半角模式的标点
7. 一个框里有一个或几个字符看不清也猜不出是什么,这一个或几个字符可能是连在一起
的,也可能是分散在句子里不同位置的。这2种情况下都是整个框标记###吗?
是的
8.文本有下划线不需要转写下划线
9.文本有较长的空白,用一个空格表示就好
10.旋转图片是无效的,需要在全局属性标invalid
11.只能一行一个框吗,遇到书籍那种是不是也不可以一个段落拉一个框
不可以一个段落一个框
2/6
12.对水印和印章:在不同背景文字重叠时,仅标注水平或特别小角度倾斜的文本;与背景文
字重叠时,完全忽略印章、水印,只标背景文字。
13.date: ________
name: _______
这种很多个下划线的情况该怎么转写?
用一个下划线表示即可
14.
若图片有效,则必须拉框,否则会报错无法提交
若图片无效,则必须无框,否则会报错无法提交
多边形框必须时四个点,四个点的顺序应为左上右上右下左下,若不是四个点,则无法提交
English Version:
Depending on the scenario, there are an average of 30-40 frames per image
The annotation content includes the position coordinates of the text box, 4 points are
annotated, and the order is upper left, upper right, lower right, and lower left. Each point
should have an x coordinate and a y coordinate.
Accuracy: The accuracy of all delivered data is 99% (the text box position should
accurately surround the text, and there should be a 1-pixel blank gap between the box
and the text edge; for boxes with correct text box positions, continue to evaluate whether
the text box content is consistent with the image, and count by field when counting, that
is, except for spaces, if any other character is wrong, the field (the box) is counted as
wrong; for boxes with incorrect text box positions, directly count the field (the box) as
3/6
wrong, and no longer evaluate whether the text content is consistent; missing text boxes
are also counted in the total number of text boxes and counted as errors in the text box;
annotation accuracy: the number of text boxes with accurate positions and consistent
content / the total number of text boxes).
If there are non-target languages in a whole sentence, we also need to transcribe it. We
can transcribe it according to the actual situation, and the picture and text are consistent.
2. Do we need to transcribe all the target languages in the picture into text? What is the
boundary for unclear text? Do we not need to transcribe if it is not clear to the naked eye?
Including the situation where the text is exposed.
All texts in the picture need to be transcribed. If it is not clear to the naked eye, but it is
indeed a text box,
3. How many lines of text on an image need to be transcribed on average? Is there a limit
for not needing to transcribe more than a certain number of lines of text?
All texts on the picture need to be transcribed, no limit. The final delivery total >= number
of pictures and >= number of valid text boxes
4. If the text is broken, for example, the h in English is broken, but I know it is Englsih at a
glance, should the h be transcribed?
4/6
If there is a complete word, other words are broken and you can't associate what the
broken word is, draw a frame and mark ###
If there is a complete word, other words are truncated but you can associate the truncated
content, draw a frame and transcribe normally, if the icon needs to transcribe the
complete content [use editor]
7.One or more characters in a box cannot be seen clearly and guessed. These characters
may be connected or scattered in different positions in the sentence. In both cases, is the
whole box marked with ###?
yes
10.The rotated image is invalid and needs to be marked invalid in the global attribute
11.Can a line only have one box? Can a book have one box per paragraph?
when different background text overlaps, only horizontal or especially small angled text is
marked; When it overlaps with the background text, the seal and watermark are
completely ignored and only the background text is marked.
13.
date: ________
5/6
name: _______
14.
If the image is valid, you must draw a frame, otherwise an error will be reported and the
image cannot be submitted.
If the image is invalid, you must have no frame, otherwise an error will be reported and the
image cannot be submitted.
The polygonal frame must have four points, and the order of the four points should be top
left, top right, bottom right, bottom left. If it is not four points, it cannot be submitted.
6/6