スキャンしたドキュメントのテキスト行を分割する

Question

適応型しきい値処理されたスキャン文書のテキスト行を分割する方法を見つけようとしています。現在、ドキュメントのピクセル値を0〜255の符号なし整数として保存し、各行のピクセルの平均を取得し、ピクセル値の平均が250より大きい場合、これが成り立つ行の各範囲の中央値を取ります。ただし、画像に黒い斑点がある場合があるため、この方法は失敗することがあります。

このタスクを行うためのよりノイズに強い方法はありますか？

編集：ここにいくつかのコードがあります。「ワープ」は元の画像の名前、「カット」は画像を分割する場所です。

warped = threshold_adaptive(warped, 250, offset = 10) warped = warped.astype("uint8") * 255 # get areas where we can split image on whitespace to make OCR more accurate color_level = np.array([np.sum(line) / len(line) for line in warped]) cuts = [] i = 0 while(i < len(color_level)): if color_level[i] > 250: begin = i while(color_level[i] > 250): i += 1 cuts.append((i + begin)/2) # middle of the whitespace region else: i += 1

編集2：サンプル画像の追加

Miki · Accepted Answer

入力画像から、テキストを白、背景を黒にする必要があります

次に、紙幣の回転角度を計算する必要があります。簡単なアプローチは、すべての白色点（minAreaRect）のfindNonZeroを見つけることです。

次に、請求書を回転させて、テキストが水平になるようにします。

これで、水平射影（reduce）を計算できます。各行の平均値を取得できます。ヒストグラムにしきい値thを適用して、画像のノイズを考慮します（ここでは0、つまりノイズなし）。背景のみの行の値は>0、テキスト行の値は0ヒストグラム。次に、ヒストグラム内のホワイトビンの各連続シーケンスの平均ビン座標を取得します。これは、行のy座標になります。

ここにコード。これはC++ですが、ほとんどの作業はOpenCV関数を使用しているため、Pythonに簡単に変換できるはずです。少なくとも、これを参照として使用できます。

#include <opencv2/opencv.hpp> using namespace cv; using namespace std; int main() { // Read image Mat3b img = imread("path_to_image"); // Binarize image. Text is white, background is black Mat1b bin; cvtColor(img, bin, COLOR_BGR2GRAY); bin = bin < 200; // Find all white pixels vector<Point> pts; findNonZero(bin, pts); // Get rotated rect of white pixels RotatedRect box = minAreaRect(pts); if (box.size.width > box.size.height) { swap(box.size.width, box.size.height); box.angle += 90.f; } Point2f vertices[4]; box.points(vertices); for (int i = 0; i < 4; ++i) { line(img, vertices[i], vertices[(i + 1) % 4], Scalar(0, 255, 0)); } // Rotate the image according to the found angle Mat1b rotated; Mat M = getRotationMatrix2D(box.center, box.angle, 1.0); warpAffine(bin, rotated, M, bin.size()); // Compute horizontal projections Mat1f horProj; reduce(rotated, horProj, 1, CV_REDUCE_AVG); // Remove noise in histogram. White bins identify space lines, black bins identify text lines float th = 0; Mat1b hist = horProj <= th; // Get mean coordinate of white white pixels groups vector<int> ycoords; int y = 0; int count = 0; bool isSpace = false; for (int i = 0; i < rotated.rows; ++i) { if (!isSpace) { if (hist(i)) { isSpace = true; count = 1; y = i; } } else { if (!hist(i)) { isSpace = false; ycoords.Push_back(y / count); } else { y += i; count++; } } } // Draw line as final result Mat3b result; cvtColor(rotated, result, COLOR_GRAY2BGR); for (int i = 0; i < ycoords.size(); ++i) { line(result, Point(0, ycoords[i]), Point(result.cols, ycoords[i]), Scalar(0, 255, 0)); } return 0; }

Kinght 金 · Answer

基本的な手順@Miki、

ソースを読む

脱穀した

minAreaRectを見つける

回転した行列によるワープ

上限と下限を見つけて描く

WhilePythonのコード：

#!/usr/bin/python3 # 2018.01.16 01:11:49 CST # 2018.01.16 01:55:01 CST import cv2 import numpy as np ## (1) read img = cv2.imread("img02.jpg") gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ## (2) threshold th, threshed = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU) ## (3) minAreaRect on the nozeros pts = cv2.findNonZero(threshed) ret = cv2.minAreaRect(pts) (cx,cy), (w,h), ang = ret if w>h: w,h = h,w ang += 90 ## (4) Find rotated matrix, do rotation M = cv2.getRotationMatrix2D((cx,cy), ang, 1.0) rotated = cv2.warpAffine(threshed, M, (img.shape[1], img.shape[0])) ## (5) find and draw the upper and lower boundary of each lines hist = cv2.reduce(rotated,1, cv2.REDUCE_AVG).reshape(-1) th = 2 H,W = img.shape[:2] uppers = [y for y in range(H-1) if hist[y]<=th and hist[y+1]>th] lowers = [y for y in range(H-1) if hist[y]>th and hist[y+1]<=th] rotated = cv2.cvtColor(rotated, cv2.COLOR_GRAY2BGR) for y in uppers: cv2.line(rotated, (0,y), (W, y), (255,0,0), 1) for y in lowers: cv2.line(rotated, (0,y), (W, y), (0,255,0), 1) cv2.imwrite("result.png", rotated)

最終的に結果：