1
0
Fork 0
mirror of https://github.com/mealie-recipes/mealie.git synced 2025-07-27 17:19:40 +02:00
mealie/tests/unit_tests/services_tests/test_ocr_service.py

59 lines
1.8 KiB
Python
Raw Normal View History

feat (WIP): bring png OCR scanning support (#1670) * Add pytesseract * Add simple ocr endpoint replace extension argument * feat/ocr-editor gui * fix frontend linting issues * Add service unit tests * Add split text modes & single ingredient/instruction editing * make split mode really reactive * Remove default step and ingredient * make the linter haappy * Accept only image uploads * Add automatic recipe title suggestion * Correct regex * fix incorrect array.map method usage * make the linter happy again * Swap route to use asset name * Rearange buttons * fix test data * feat: Allow making image the recipe image * Add translation * Make the linter happy * Restrict function setPropertyValueByPath generic * Restrict template literal type * Add a more friendly icon to creation page * update poetry lock file * Correct sloppy ocr classes * Make MyPy happy * Rewrite safer tests * Add tesseract to backend test CI container dependencies * Make canvas element a component global * Remove unwanted spaces in selected text * Add way to know if recipe was created with ocr * Access to ocr-editor for ocr recipes * Update Alembic revision * Make the frontend build * Fix scrolling offset bug * Allow creation of recipes with custom settings * Fix rebasing mistakes * Add format_tsv_output test * Exclude the tests data directory only * Enforce camelCase for frontend functions * Remove import of unused component * Fix type and class initialization * Add multi-language support * Highlight words in mount * Fix image ratio bug * Better ocr creation page * Revert awkward feature to scroll in Selection mode * Rebasing alembic migrations sux * Remove obsolete getShared function * Add function docstring * Move down ocr creation option * Make toolbar icons more generic * Show help at the bottom of the page * move ocr types to own file * Use template ref for the canvas * Use i18n.tc to get strings directly * Correct naming mistake * Move Ocr editor to own directory * Create Ocr Editor parts * Safeguard recipe properties access * Add loading frontend animation due to longer request time * minor cleanup chores Co-authored-by: Miroito <alban.vachette@gmail.com>
2022-09-25 15:00:45 -08:00
from pathlib import Path
import pytest
from mealie.services.ocr.pytesseract import OcrService
ocr_service = OcrService()
@pytest.mark.skip("Tesseract is not reliable between environments")
def test_image_to_string():
with open(Path("tests/data/images/test-ocr.png"), "rb") as image:
result = ocr_service.image_to_string(image)
with open(Path("tests/data/text/test-ocr.txt"), encoding="utf-8") as expected_result:
feat (WIP): bring png OCR scanning support (#1670) * Add pytesseract * Add simple ocr endpoint replace extension argument * feat/ocr-editor gui * fix frontend linting issues * Add service unit tests * Add split text modes & single ingredient/instruction editing * make split mode really reactive * Remove default step and ingredient * make the linter haappy * Accept only image uploads * Add automatic recipe title suggestion * Correct regex * fix incorrect array.map method usage * make the linter happy again * Swap route to use asset name * Rearange buttons * fix test data * feat: Allow making image the recipe image * Add translation * Make the linter happy * Restrict function setPropertyValueByPath generic * Restrict template literal type * Add a more friendly icon to creation page * update poetry lock file * Correct sloppy ocr classes * Make MyPy happy * Rewrite safer tests * Add tesseract to backend test CI container dependencies * Make canvas element a component global * Remove unwanted spaces in selected text * Add way to know if recipe was created with ocr * Access to ocr-editor for ocr recipes * Update Alembic revision * Make the frontend build * Fix scrolling offset bug * Allow creation of recipes with custom settings * Fix rebasing mistakes * Add format_tsv_output test * Exclude the tests data directory only * Enforce camelCase for frontend functions * Remove import of unused component * Fix type and class initialization * Add multi-language support * Highlight words in mount * Fix image ratio bug * Better ocr creation page * Revert awkward feature to scroll in Selection mode * Rebasing alembic migrations sux * Remove obsolete getShared function * Add function docstring * Move down ocr creation option * Make toolbar icons more generic * Show help at the bottom of the page * move ocr types to own file * Use template ref for the canvas * Use i18n.tc to get strings directly * Correct naming mistake * Move Ocr editor to own directory * Create Ocr Editor parts * Safeguard recipe properties access * Add loading frontend animation due to longer request time * minor cleanup chores Co-authored-by: Miroito <alban.vachette@gmail.com>
2022-09-25 15:00:45 -08:00
assert result == expected_result.read()
@pytest.mark.skip("Tesseract is not reliable between environments")
def test_image_to_tsv():
with open(Path("tests/data/images/test-ocr.png"), "rb") as image:
result = ocr_service.image_to_tsv(image.read())
with open(Path("tests/data/text/test-ocr.tsv"), encoding="utf-8") as expected_result:
feat (WIP): bring png OCR scanning support (#1670) * Add pytesseract * Add simple ocr endpoint replace extension argument * feat/ocr-editor gui * fix frontend linting issues * Add service unit tests * Add split text modes & single ingredient/instruction editing * make split mode really reactive * Remove default step and ingredient * make the linter haappy * Accept only image uploads * Add automatic recipe title suggestion * Correct regex * fix incorrect array.map method usage * make the linter happy again * Swap route to use asset name * Rearange buttons * fix test data * feat: Allow making image the recipe image * Add translation * Make the linter happy * Restrict function setPropertyValueByPath generic * Restrict template literal type * Add a more friendly icon to creation page * update poetry lock file * Correct sloppy ocr classes * Make MyPy happy * Rewrite safer tests * Add tesseract to backend test CI container dependencies * Make canvas element a component global * Remove unwanted spaces in selected text * Add way to know if recipe was created with ocr * Access to ocr-editor for ocr recipes * Update Alembic revision * Make the frontend build * Fix scrolling offset bug * Allow creation of recipes with custom settings * Fix rebasing mistakes * Add format_tsv_output test * Exclude the tests data directory only * Enforce camelCase for frontend functions * Remove import of unused component * Fix type and class initialization * Add multi-language support * Highlight words in mount * Fix image ratio bug * Better ocr creation page * Revert awkward feature to scroll in Selection mode * Rebasing alembic migrations sux * Remove obsolete getShared function * Add function docstring * Move down ocr creation option * Make toolbar icons more generic * Show help at the bottom of the page * move ocr types to own file * Use template ref for the canvas * Use i18n.tc to get strings directly * Correct naming mistake * Move Ocr editor to own directory * Create Ocr Editor parts * Safeguard recipe properties access * Add loading frontend animation due to longer request time * minor cleanup chores Co-authored-by: Miroito <alban.vachette@gmail.com>
2022-09-25 15:00:45 -08:00
assert result == expected_result.read()
def test_format_tsv_output():
tsv = " level\tpage_num\tblock_num\tpar_num\tline_num\tword_num\tleft\ttop\twidth\theight\tconf\ttext \n1\t1\t0\t0\t0\t0\t0\t0\t640\t480\t-1\t\n5\t1\t1\t1\t1\t1\t36\t92\t60\t24\t87.137558\tThis"
expected_result = [
{
"level": 1,
"page_num": 1,
"block_num": 0,
"par_num": 0,
"line_num": 0,
"word_num": 0,
"left": 0,
"top": 0,
"width": 640,
"height": 480,
"conf": -1.0,
"text": "",
},
{
"level": 5,
"page_num": 1,
"block_num": 1,
"par_num": 1,
"line_num": 1,
"word_num": 1,
"left": 36,
"top": 92,
"width": 60,
"height": 24,
"conf": 87.137558,
"text": "This",
},
]
assert ocr_service.format_tsv_output(tsv) == expected_result