Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When '.' char not in output string, output is also considered to be a… #1457

Merged
merged 7 commits into from
Oct 19, 2022

Conversation

Enchanted0911
Copy link

… folder

Motivation

In my case, 'demo' is a folder containing multiple images, and 'demo_res' is the folder where I want the program prediction results to be saved, which has not been created yet. When I use the following code, the program executes as expected.

from mmocr.utils.ocr import MMOCR
ocr = MMOCR(det='TextSnake', recog=None)
results = ocr.readtext('demo/', output='demo_res/', export='demo_res/')

Note that the demo_res folder has not been created before the above program is executed.
The program will automatically create the 'demo_res' folder for me, thanks to the related logic of 'export'.

So I modified the above program slightly.

from mmocr.utils.ocr import MMOCR
ocr = MMOCR(det='TextSnake', recog=None)
results = ocr.readtext('demo/', output='demo_output_res/', export='demo_res/')

Note that 'demo_output_res' and 'demo_res' folders have not been created yet.

Then the program throws the following exception.

cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-h4wtvo23\opencv\modules\imgcodecs\src\loadsave.cpp:926: error: (-2:Unspecified error) could not find encoder for the specified extension in function 'cv::imencode'

Because the program treats the uncreated folder as a file and tries to write the file with opencv, and requires a specific file extension.

So I think files without file extension should be treated as folders in this program. Moreover, regardless of whether 'output' is a file or not, when it has no file extension, an exception will be throw in this program.

Modification

When the file extension identifies '.' does not exist, the file is considered to be a folder

Copy link
Collaborator

@gaotongxiao gaotongxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your insight! This is certainly a bug but the solution here might not exactly address the issue.

Path.is_dir() is robust enough to tell whether a path points to a directory or a file regardless of the form of the name. For example, Path('test_dir.py').is_dir() returns true if test_dir itself is a directory.

The root cause of this issue is that OpenCV can't write to a path where the target directory does not exist. So the solution here could be creating the targeting directory before opencv tries to write images.

@Enchanted0911
Copy link
Author

Thanks for your insight! This is certainly a bug but the solution here might not exactly address the issue.

Path.is_dir() is robust enough to tell whether a path points to a directory or a file regardless of the form of the name. For example, Path('test_dir.py').is_dir() returns true if test_dir itself is a directory.

The root cause of this issue is that OpenCV can't write to a path where the target directory does not exist. So the solution here could be creating the targeting directory before opencv tries to write images.

I debugged for both cases (folder created, folder not created) and the results confuse me because output_path.is_dir() returns true when the output I pass in is an existing folder , and when the output I pass in is a non-existing folder, output_path.is_dir() returns false

if output_path.is_dir():

@gaotongxiao
Copy link
Collaborator

@JunYao1020 Yes, since it can't tell if a non-existent path will be a path or folder. In this case, it's better to check the existence of the path with Path.exists() first and act accordingly.

Besides, I think always assuming the non-existent path to be a folder would be a safe option, since we don't want users to input several images but write everything to a single image.

@Enchanted0911
Copy link
Author

@JunYao1020 Yes, since it can't tell if a non-existent path will be a path or folder. In this case, it's better to check the existence of the path with Path.exists() first and act accordingly.

Besides, I think always assuming the non-existent path to be a folder would be a safe option, since we don't want users to input several images but write everything to a single image.

That's right.When the incoming ’output‘ is a non-existing file, it should be divided into two cases. The first is that the image input to be predicted is an image folder instead of a single image, so it is better to think that the output is a folder. Type, the image input a specific image, and the name of the output has a extension, then it can be assumed that the output is an image file.

@gaotongxiao
Copy link
Collaborator

gaotongxiao commented Oct 17, 2022

@JunYao1020 I don't think it's a good idea to act according to the type inferred from the path by simply checking if the filename contains an extension. The filename is not a reliable indicator of its type - "xxx.xxx" is always a valid name for both a file and a directory. Such an ambiguity prevents the program from behaving always as expected, which is undesirable for developers and users.

Hence I think it's better to change this interface a little bit in order to keep it simple and consistent. We can always treat the string in output as a directory path.

@Enchanted0911
Copy link
Author

@JunYao1020 I don't think it's a good idea to act according to the type inferred from the path by simply checking if the filename contains an extension. The filename is not a reliable indicator of its type - "xxx.xxx" is always a valid name for both a file and a directory. Such an ambiguity prevents the program from behaving always as expected, which is undesirable for developers and users.

Hence I think it's better to change this interface a little bit in order to keep it simple and consistent. We can always treat the string in output as a directory path.

I think this is good, sacrificing the user's ability to customize the output image name of a single image to ensure that the program runs as expected.

@gaotongxiao
Copy link
Collaborator

Great. Could you implement the patch in this PR?

@Enchanted0911
Copy link
Author

Great. Could you implement the patch in this PR?

sure

Copy link
Collaborator

@gaotongxiao gaotongxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to create a directory if it doesn't exist. On the other hand, if the path points to a file, an error must be raised.

@Enchanted0911
Copy link
Author

We need to create a directory if it doesn't exist. On the other hand, if the path points to a file, an error must be raised.

got it

@Enchanted0911
Copy link
Author

We need to create a directory if it doesn't exist. On the other hand, if the path points to a file, an error must be raised.

When the program receives a valid 'output', the program will execute to the location of mmcv.imwrite, which will automatically create a non-existing folder.

mmcv.imwrite(img, out_file)

In fact, the 'file_client' in mmcv.imwrite is automatically created for us. So, I thought maybe I don't need to create it beforehand for non-existing folders?
https://github.com/open-mmlab/mmcv/blob/bd1da5ab473eceb0dbeaf7cf3f57ba846a92e44b/mmcv/image/io.py#L282

@gaotongxiao
Copy link
Collaborator

gaotongxiao commented Oct 18, 2022

Good catch! Right, MMCV can handle this case.

The code overall looks good to me. The final step would be to revise the documentation (EN, CN). If you search for "output", there will be 2 places where the output path refers to a file. The parameter description also needs to be updated.

@Enchanted0911
Copy link
Author

Good catch! Right, MMCV can handle this case.

The code overall looks good to me. The final step would be to revise the documentation (EN, CN). If you search for "output", there will be 2 places where the output path refers to a file. The parameter description also needs to be updated.

I thought that the modification of the document should have another PR.

Copy link
Author

@Enchanted0911 Enchanted0911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done

@codecov
Copy link

codecov bot commented Oct 18, 2022

Codecov Report

Base: 83.63% // Head: 83.71% // Increases project coverage by +0.07% 🎉

Coverage data is based on head (260ee65) compared to base (c0ebf3d).
Patch coverage: 0.00% of modified lines in pull request are covered.

❗ Current head 260ee65 differs from pull request most recent head 8a14929. Consider uploading reports for the commit 8a14929 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1457      +/-   ##
==========================================
+ Coverage   83.63%   83.71%   +0.07%     
==========================================
  Files         170      170              
  Lines       11365    11363       -2     
  Branches     1694     1693       -1     
==========================================
+ Hits         9505     9512       +7     
+ Misses       1481     1473       -8     
+ Partials      379      378       -1     
Flag Coverage Δ
unittests 83.71% <0.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmocr/utils/ocr.py 77.24% <0.00%> (-0.12%) ⬇️
mmocr/datasets/pipelines/transforms.py 83.99% <0.00%> (+1.42%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@gaotongxiao
Copy link
Collaborator

Thanks!

@gaotongxiao gaotongxiao merged commit 57b2a76 into open-mmlab:main Oct 19, 2022
@Enchanted0911 Enchanted0911 deleted the main branch October 19, 2022 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants