바뀜

← 이전 편집

Tesseract

879 바이트 추가됨, 2021년 10월 28일 (목) 07:00

편집 요약 없음

# = version 4=## == LSTM(Long Short Term Memory) 기반 엔진 구현==

* RNN(Recurrent Neural Network)의 일종

## == 다른 딥러닝 기술==

* CNN(Convolutional Neural Network): 단일문자가 포함된 이미지를 인식하는 경우

## == 설치== 공식문서 https://github.com/tesseract-ocr/tesseract/wiki#installation## ===데비안 계열=== $ sudo apt install tesseract-ocr tesseract-ocr-kor == 명령어 사용법==

$ tesseract 영문텍스트.png stdout -l eng --oem 1 --psm 3

~~###~~ 여러언어 $ tesseract myscan.png out -l eng+deu=== 파라미터 ===~~####~~ ==== oem (OCR Engine modes)====

0 Legacy engine only.

1 Neural nets LSTM engine only.

2 Legacy + LSTM engines.

3 Default, based on what is available.

~~####~~ ==== psm (Page segmentation modes)==== 0 Orientation and script detection (OSD) only.방향 및 스크립트 감지(OSD) 1 Automatic page segmentation with OSD.OSD로 자동 페이지 분할 2 Automatic page segmentation, but no OSD, or OCR.자동 페이지 분할. OSD나 OCR은 없음 3 Fully automatic page segmentation, but no OSD. (Default)완전히 자동으로 페이지 분할, OSD 없음. 4 Assume a single column of text of variable sizes.가변적인 크기의 1라인 텍스트 5 Assume a single uniform block of vertically aligned text.수직 텍스트 6 Assume a single uniform block of text.텍스트의 균일한 단일 블록을 가정 7 Treat the image as a single text line.1 Line 텍스트로 처리 8 Treat the image as a single word.1 단어로 처리 9 Treat the image as a single word in a circle.원 안의 1 단어로 처리 10 Treat the image as a single character.1 문자로 처리 11 Sparse text. Find as much text as possible in no particular order.SPARSE, 특정 순서없이 가능한 한 많은 텍스트 찾기 12 Sparse text with OSD.OSD 가 포함된 SPARSE 13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.원시 라인, 이미지를 단일 텍스트로 처리 using-different-page-segmentation-modes: https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage### using-different-page-segmentation-modes == 결과 향상시키기== * 300DPI이미지 사용하기 [[분류:라이브러리]][[분류:인공지능]]

Jhkim

사무관, 관리자

편집

2,431

번

wwiki β

바뀜

Tesseract

wwiki ^β