Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError from add_baseline(geom) #100

Open
MehmedGIT opened this issue May 22, 2024 · 3 comments
Open

AssertionError from add_baseline(geom) #100

MehmedGIT opened this issue May 22, 2024 · 3 comments

Comments

@MehmedGIT
Copy link

I have found an issue for a specific page but I am not sure what exactly the problem is other than that the page seems empty:

20:43:59.071 DEBUG ocrd.processor.helpers.run_processor - Running processor <class 'ocrd_cis.ocropy.segment.OcropySegment'>
20:43:59.071 DEBUG ocrd.processor.helpers.run_processor - Processor instance <ocrd_cis.ocropy.segment.OcropySegment object at 0x7f3d684b9220> (ocrd-cis-ocropy-segment v0.1.5 doing layout/segmentation/region)
20:43:59.072 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'mimetype': None, 'page_id': 'PHYS_0510', 'file_grp': 'OCR-D-CLIP'})
20:43:59.246 DEBUG ocrd.processor.base - adding file FILE_0510_OCR-D-CLIP for page PHYS_0510 to input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0000.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0001.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0004.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0007.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0008.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0009.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0010.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.processor.base - another file FILE_0510_OCR-D-CLIP_region0011.IMG-CLIP for page PHYS_0510 in input file group OCR-D-CLIP
20:43:59.246 DEBUG ocrd.workspace.download_file - 'local_filename' OCR-D-CLIP/FILE_0510_OCR-D-CLIP.xml already within /vd18_data/PPN831977752_513pages - nothing to do
20:43:59.249 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'DEFAULT/FILE_0510_DEFAULT.jpg'})
20:43:59.706 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'DEFAULT/FILE_0510_DEFAULT.jpg'})
20:44:00.485 DEBUG ocrd.workspace.image_from_page - page 'FILE_0510_OCR-D-CLIP' has border, orientation=0 skew=0.00
20:44:00.485 DEBUG ocrd.workspace.image_from_page - Using AlternativeImage 5 {'', 'deskewed', 'cropped', 'binarized', 'clipped', 'despeckled'} for page 'FILE_0510_OCR-D-CLIP'
20:44:00.485 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'OCR-D-SEG-BLOCK-TESSERACT/FILE_0510_OCR-D-SEG-BLOCK-TESSERACT.IMG-BIN.png'})
20:44:01.273 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-46 -49]
20:44:01.273 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-359.  -618.5]
20:44:01.273 DEBUG ocrd.utils.coords.rotate_coordinates - rotating coordinates by 0.00° around [359.  618.5]
20:44:01.273 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [359.  618.5]
20:44:01.274 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [0 0]
20:44:01.277 DEBUG ocrd.utils.crop_image - cropping image to (346, 828, 381, 863)
20:44:01.278 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-346 -828]
20:44:01.279 DEBUG ocrd.workspace.image_from_segment - segment 'region0000' has orientation=0 skew=0.01
20:44:01.279 DEBUG ocrd.workspace.image_from_segment - Using AlternativeImage 1 {'despeckled', 'clipped', 'binarized'} for segment 'region0000'
20:44:01.279 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'OCR-D-CLIP/FILE_0510_OCR-D-CLIP_region0000.IMG-CLIP.png'})
20:44:02.078 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-17.5 -17.5]
20:44:02.078 DEBUG ocrd.utils.coords.rotate_coordinates - rotating coordinates by 0.01° around [17.5 17.5]
20:44:02.079 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [17.50371747 17.50371747]
20:44:02.079 DEBUG ocrd.workspace.image_from_segment - Rotating AlternativeImage for segment 'region0000' by 0.01°
20:44:02.079 DEBUG ocrd.utils.rotate_image - rotating image by 0.01°
20:44:02.079 DEBUG ocrd.workspace.image_from_segment - Recropping AlternativeImage for segment 'region0000'
20:44:02.080 DEBUG ocrd.utils.crop_image - cropping image to (0, 0, 35, 35)
20:44:02.080 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [0 0]
20:44:02.083 DEBUG ocrd.utils.crop_image - cropping image to (261, 435, 367, 845)
20:44:02.085 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [-261 -435]
20:44:02.085 DEBUG ocrd.workspace.image_from_segment - segment 'region0001' has orientation=0 skew=0.01
20:44:02.085 DEBUG ocrd.workspace.image_from_segment - Using AlternativeImage 1 {'despeckled', 'clipped', 'binarized'} for segment 'region0001'
20:44:02.085 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_vd18_data_PPN831977752_513pages_mets_xml.sock] - find_files({'local_filename': 'OCR-D-CLIP/FILE_0510_OCR-D-CLIP_region0001.IMG-CLIP.png'})
20:44:02.835 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [ -53. -205.]
20:44:02.836 DEBUG ocrd.utils.coords.rotate_coordinates - rotating coordinates by 0.01° around [ 53. 205.]
20:44:02.836 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [ 53.04355096 205.0112552 ]
20:44:02.836 DEBUG ocrd.workspace.image_from_segment - Rotating AlternativeImage for segment 'region0001' by 0.01°
20:44:02.836 DEBUG ocrd.utils.rotate_image - rotating image by 0.01°
20:44:02.837 DEBUG ocrd.workspace.image_from_segment - Recropping AlternativeImage for segment 'region0001'
20:44:02.839 DEBUG ocrd.utils.crop_image - cropping image to (0, 0, 106, 410)
20:44:02.839 DEBUG ocrd.utils.coords.shift_coordinates - shifting coordinates by [0 0]
20:44:03.055 ERROR ocrd.processor.helpers.run_processor - Failure in processor 'ocrd-cis-ocropy-segment'
Traceback (most recent call last):
  File "/home/mm/repos/core/build/__editable__.ocrd-2.65.0-py3-none-any/ocrd/processor/helpers.py", line 130, in run_processor
    processor.process()
  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 500, in process
    self._process_element(region, ignore, region_image, region_coords,
  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 788, in _process_element
    line_polygons, _ = masks2polygons(line_labels, baselines, element_bin,
  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 234, in masks2polygons
    base = join_baselines([baseline.intersection(polygon)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 959, in join_baselines
    add_baseline(geom)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 951, in add_baseline
    assert all(p1[0] < p2[0] for p1, p2 in zip(result[:-1], result[1:])), result
AssertionError: [(52.0, 277.0), (74.5, 279.5), (77.875, 279.875), (88.0, 281.0), (89.0, 275.0), (89.0, 281.0), (90.0, 275.0), (90.0, 281.0), (91.0, 274.0), (91.0, 282.0), (92.0, 274.0), (92.0, 282.0), (93.0, 274.0), (93.0, 282.0), (94.0, 273.0), (94.0, 282.0), (95.0, 273.0), (95.0, 282.0), (96.0, 273.0), (96.0, 283.0), (97.0, 273.0), (97.0, 283.0), (98.0, 273.0), (98.0, 283.0), (99.0, 273.0), (99.0, 283.0), (100.0, 272.0), (100.0, 283.0), (101.0, 272.0), (101.0, 283.0), (102.0, 272.0), (102.0, 283.0), (103.0, 272.0), (103.0, 283.0), (104.0, 272.0), (104.0, 283.0), (105.0, 272.0)]

The used workflow:

cis-ocropy-binarize      -I DEFAULT                   -O OCR-D-BINPAGE             -P dpi 300
anybaseocr-crop          -I OCR-D-BINPAGE             -O OCR-D-SEG-PAGE-ANYOCR     -P dpi 300
cis-ocropy-denoise       -I OCR-D-SEG-PAGE-ANYOCR     -O OCR-D-DENOISE-OCROPY      -P dpi 300
cis-ocropy-deskew        -I OCR-D-DENOISE-OCROPY      -O OCR-D-DESKEW-OCROPY       -P level-of-operation page
tesserocr-segment-region -I OCR-D-DESKEW-OCROPY       -O OCR-D-SEG-BLOCK-TESSERACT -P dpi 300 -P padding 5.0  -P find_tables false
segment-repair           -I OCR-D-SEG-BLOCK-TESSERACT -O OCR-D-SEGMENT-REPAIR      -P plausibilize true       -P plausibilize_merge_min_overlap 0.7
cis-ocropy-clip          -I OCR-D-SEGMENT-REPAIR      -O OCR-D-CLIP
cis-ocropy-segment       -I OCR-D-CLIP                -O OCR-D-SEGMENT-OCROPY      -P dpi 300
cis-ocropy-dewarp        -I OCR-D-SEGMENT-OCROPY      -O OCR-D-DEWARP
tesserocr-recognize      -I OCR-D-DEWARP              -O OCR-D-OCR                 -P model Fraktur

Here is the problematic image of page 510:
FILE_0510_DEFAULT

It is worth mentioning that other similar pages did not fail. E.g. pages 508 and 509:

FILE_0508_DEFAULT
FILE_0509_DEFAULT

@bertsky
Copy link
Collaborator

bertsky commented May 24, 2024

  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 951, in add_baseline
    assert all(p1[0] < p2[0] for p1, p2 in zip(result[:-1], result[1:])), result
AssertionError: [(52.0, 277.0), (74.5, 279.5), (77.875, 279.875), (88.0, 281.0), (89.0, 275.0), (89.0, 281.0), (90.0, 275.0), (90.0, 281.0), (91.0, 274.0), (91.0, 282.0), (92.0, 274.0), (92.0, 282.0), (93.0, 274.0), (93.0, 282.0), (94.0, 273.0), (94.0, 282.0), (95.0, 273.0), (95.0, 282.0), (96.0, 273.0), (96.0, 283.0), (97.0, 273.0), (97.0, 283.0), (98.0, 273.0), (98.0, 283.0), (99.0, 273.0), (99.0, 283.0), (100.0, 272.0), (100.0, 283.0), (101.0, 272.0), (101.0, 283.0), (102.0, 272.0), (102.0, 283.0), (103.0, 272.0), (103.0, 283.0), (104.0, 272.0), (104.0, 283.0), (105.0, 272.0)]

Thanks @MehmedGIT for the detailled report!

What this means is that while trying to join baseline segments for the line, ordering them by their x coordinate, the sequence did not turn out strictly monotonous:

baseline-points

Obviously, we are trying to extract a single baseline from two neighbouring lines here.

I'll try to reproduce and see what I can do.

@bertsky
Copy link
Collaborator

bertsky commented May 29, 2024

Note: I cannot reproduce with current head of https://github.com/bertsky/ocrd_cis/tree/fix-alpha-shape and ocrd_tesserocr (based on Tesseract 5.3.4) and ocrd_segment. The workflow runs through – here is a screenshot from OCRD Browser:

OCR-D-OCR

Could you please try to update said modules and try again?

@MehmedGIT
Copy link
Author

Could you please try to update said modules and try again?

I will. Thanks for having a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants