OCRツールを使用して画面領域から瞬時にテキストを抽出するにはどうすればよいですか？

Question

Ubuntu 12.10では、入力すると

gnome-screenshot -a | tesseract output

それは返します：

** Message: Unable to use GNOME Shell's builtin screenshot interface, resorting to fallback X11.

画面からテキストを選択して、テキスト（クリップボードまたはドキュメント）に変換するにはどうすればよいですか？

ありがとうございました！

Salem · Accepted Answer

おそらくそれを行うツールが既にあるかもしれませんが、使用しようとしているスクリーンショットツールとtesseractを使用して簡単なスクリプトを作成することもできます。

例として、このスクリプトを使用します（私のシステムでは、/usr/local/bin/screen_tsとして保存しました）：

#!/bin/bash # Dependencies: tesseract-ocr imagemagick scrot select tesseract_lang in eng rus equ ;do break;done # Quick language menu, add more if you need other languages. SCR_IMG=`mktemp` trap "rm $SCR_IMG*" EXIT scrot -s $SCR_IMG.png -q 100 # increase quality with option -q from default 75 to 100 # Typo "$SCR_IMG.png000" does not continue with same name. mogrify -modulate 100,0 -resize 400% $SCR_IMG.png #should increase detection rate tesseract $SCR_IMG.png $SCR_IMG &> /dev/null cat $SCR_IMG.txt exit

クリップボードのサポート：

#!/bin/bash # Dependencies: tesseract-ocr imagemagick scrot xsel select tesseract_lang in eng rus equ ;do break;done # quick language menu, add more if you need other languages. SCR_IMG=`mktemp` trap "rm $SCR_IMG*" EXIT scrot -s $SCR_IMG.png -q 100 # increase image quality with option -q from default 75 to 100 mogrify -modulate 100,0 -resize 400% $SCR_IMG.png #should increase detection rate tesseract $SCR_IMG.png $SCR_IMG &> /dev/null cat $SCR_IMG.txt | xsel -bi exit

scrotを使用して画面を表示し、tesseractを使用してテキストを認識し、catを使用して結果を表示します。クリップボードバージョンはさらに、xselを使用して、出力をクリップボードにパイプします。

sample usage

NOTE：scrot、xsel、imagemagick、およびtesseract-ocrはデフォルトではインストールされませんただし、デフォルトのリポジトリから入手できます。

scrotをgnome-screenshotに置き換えることもできますが、多くの作業が必要になる場合があります。出力に関しては、テキストファイルを読み取ることができるものなら何でも使用できます（テキストエディターで開く、認識されたテキストを通知として表示するなど）。

GUIバージョンのスクリプト

以下に、言語選択ダイアログを含むOCRスクリプトのシンプルなグラフィカルバージョンを示します。

#!/bin/bash # DEPENDENCIES: tesseract-ocr imagemagick scrot yad # AUTHOR: Glutanimate 2013 (http://askubuntu.com/users/81372/) # NAME: ScreenOCR # LICENSE: GNU GPLv3 # # BASED ON: OCR script by Salem (http://askubuntu.com/a/280713/81372) TITLE=ScreenOCR # set yad variables ICON=gnome-screenshot # - tesseract won't work if LC_ALL is unset so we set it here # - you might want to delete or modify this line if you # have a different locale: export LC_ALL=en_US.UTF-8 # language selection dialog LANG=$(yad \ --width 300 --entry --title "$TITLE" \ --image=$ICON \ --window-icon=$ICON \ --button="ok:0" --button="cancel:1" \ --text "Select language:" \ --entry-text \ "eng" "ita" "deu") # - You can modify the list of available languages by editing the line above # - Make sure to use the same ISO codes tesseract does (man tesseract for details) # - Languages will of course only work if you have installed their respective # language packs (https://code.google.com/p/tesseract-ocr/downloads/list) RET=$? # check return status if [ "$RET" = 252 ] || [ "$RET" = 1 ] # WM-Close or "cancel" then exit fi echo "Language set to $LANG" SCR_IMG=`mktemp` # create tempfile trap "rm $SCR_IMG*" EXIT # make sure tempfiles get deleted afterwards scrot -s $SCR_IMG.png -q 100 #take screenshot of area mogrify -modulate 100,0 -resize 400% $SCR_IMG.png # postprocess to prepare for OCR tesseract -l $LANG $SCR_IMG.png $SCR_IMG # OCR in given language cat $SCR_IMG | xsel -bi # pass to clipboard exit

上記の依存関係とは別に、スクリプトを機能させるために webupd8 PPAからのZenity fork YAD をインストールする必要があります。

Ronald · Answer

誰かが私のソリューションを必要とするかどうかわからない。これがウェイランドで実行されるものです。

テキストエディターで文字認識を表示し、「yes」パラメーターを追加すると、ゴーグルトランスツールから翻訳が得られます（インターネット接続が必須です）。使用する前に、tesseract-ocr imagemagickおよびgoogle-transをインストールします。認識したいテキストが表示されたら、Alt + F2でgnomeでスクリプトを開始します。コースラーをテキストの周りに移動します。それでおしまい。このスクリプトはgnome専用です。他のウィンドウマネージャーの場合は、対応する必要があります。他の言語のテキストを翻訳するには、25行目の言語IDを置き換えます。

#!/bin/bash # Dependencies: tesseract-ocr imagemagick google-trans translate="no" translate=$1 SCR_IMG=`mktemp` trap "rm $SCR_IMG*" EXIT gnome-screenshot -a -f $SCR_IMG.png # increase quality with option -q from default 75 to 100 # Typo "$SCR_IMG.png000" does not continue with same name. mogrify -modulate 100,0 -resize 400% $SCR_IMG.png #should increase detection rate tesseract $SCR_IMG.png $SCR_IMG &> /dev/null if [ $translate = "yes" ] ; then trans :de file://$SCR_IMG.txt -o $SCR_IMG.translate.txt gnome-text-editor $SCR_IMG.translate.txt else gnome-text-editor $SCR_IMG.txt fi exit

林果皞 · Answer

私はちょうど blogging を現代のスクリーンショットの使用方法についてしました。私は中国語をターゲットにしていますが、スクリーンキャストとコードは英語です。 OCRは単なる機能の1つです。

OCRの機能：

Konsole + vimx OR geditで開き、さらに編集します。
Vimx + englishの場合、スペルチェックを有効にします。
ハードコードなしで動的な言語選択をサポートします。
変換およびtesseractingの進行ダイアログが遅い。

機能コード：

function ocr () { tmpj="$1" tmpocr="$2" tmpocr_p="$3" atom="$(tesseract --list-langs 2>&1)"; atom=(`echo "${atom#*:}"`); atom=(`echo "$(printf 'FALSE
%s
' "${atom[@]}")"`); atom[0]='True' ans=(`yad --center --height=200 --width=300 --separator='|' --on-top --list --title '' --text='Select Languages:' --radiolist --column '✓' --column 'Languages' "${atom[@]}" 2>/dev/null`) && ans="$(echo "${ans:5:-1}")" && convert "$tmpj[x2000]" -unsharp 15.6x7.8+2.69+0 "$tmpocr_p" | yad --on-top --title '' --text='Converting ...' --progress --pulsate --auto-close 2>/dev/null && tesseract "$tmpocr_p" "$tmpocr" -l "$ans" 2>>/tmp/tesseract.log | yad --percentage=50 --on-top --title '' --text='Tesseracting ...' --progress --pulsate --auto-close 2>/dev/null && if [[ "$ans" == 'eng' ]]; then konsole -e "vimx -c 'setlocal spell spelllang=en_us' -n $tmpocr.txt" 2>/dev/null; else gedit "$tmpocr.txt"; fi rm "$tmpocr_p" }

発信者コード：

for cmd in "mktemp" "convert" "tesseract" "gedit" "konsole" "vimx" "yad"; do command -v $cmd >/dev/null 2>&1 || { LANG=POSIX; xmessage "Require $cmd but it's not installed. Aborting." >&2; exit 1; }; :; done tmpj="$(mktemp /tmp/`date +"%s_%Y-%m-%d"`_XXXXXXXXXX.png)" tmpocr="$(mktemp -u /tmp/`date +"%s_%Y-%m-%d"`_ocr_XXXXX)" tmpocr_p="$tmpocr"+'.png' gnome-screenshot -a -f "$tmpj" 2>&1 >/dev/null | ts >>/tmp/gnome_area_PrtSc_error.log ocr $tmpj $tmpocr $tmpocr_p &

この2つのコードを単一のシェルスクリプトに組み合わせて実行します。

スクリーンショット1：

スクリーンショット2：

Eduard Florinescu · Answer

アイデアは、フォルダに新しいスクリーンショットファイルが表示され、そのフォルダでtesseract OCRが実行され、ファイルエディタで開かれることです。

この実行中のスクリプトは、お気に入りのスクリーンショット出力ディレクトリの出力ディレクトリに残すことができます

#cat wait_for_it.sh inotifywait -m . -e create -e moved_to | while read path action file; do echo "The file '$file' appeared in directory '$path' via '$action'" cd "$path" if [ ${file: -4} == ".png" ]; then tesseract "$file" "$file" sleep 1 gedit "$file".txt & fi done

これを盗む必要があります

Sudo apt install tesseract-ocr Sudo apt install inotify-tools

danpla · Answer

この目的のために、無料のオープンソースプログラムを作成しました。

https://danpla.github.io/dpscreenocr/