テーブルの画像を処理してデータを取得する

Question

私はこのテーブルの画像を持っています（以下を参照）。そして、私はこのフォーム（テーブル画像の最初の行）のように、テーブルからデータを取得しようとしています：

rows[0] = [x,x, , , , ,x, ,x,x, ,x, ,x, , , , ,x, , , ,x,x,x, ,x, ,x, , , , ]

Xの数とスペースの数が必要です。これに似た他のテーブル画像もあります（すべてxと同じ数の列があります）。

enter image description here

これまでのところ、xの画像を使用してすべてのxを検出することができます。そして、私は線をいくらか検出することができます。私はPython用のopencv2を使用しています。また、houghTransformを使用して、水平線と垂直線を検出しています（これは非常にうまく機能します）。

行ごとに移動して情報をリストに保存する方法を見つけようとしています。

これらはトレーニング画像です：xを検出するために使用されます（コード内のtrain1.png） enter image description here

行の検出に使用されます（コード内のtrain2.png） enter image description here

行の検出に使用されます（コード内のtrain3.png） enter image description here

これは私がこれまでに持っているコードです：

# process images from pytesser import * from PIL import Image from matplotlib import pyplot as plt import pytesseract import numpy as np import cv2 import math import os # the table images images = ['table1.png', 'table2.png', 'table3.png', 'table4.png', 'table5.png'] # the template images used for training templates = ['train1.png', 'train2.png', 'train3.png'] def hough_transform(im): img = cv2.imread('imgs/'+im) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) edges = cv2.Canny(gray, 50, 150, apertureSize=3) lines = cv2.HoughLines(edges, 1, np.pi/180, 200) i = 1 for rho, theta in lines[0]: a = np.cos(theta) b = np.sin(theta) x0 = a*rho y0 = b*rho x1 = int(x0 + 1000*(-b)) y1 = int(y0 + 1000*(a)) x2 = int(x0 - 1000*(-b)) y2 = int(y0 - 1000*(a)) #print '%s - 0:(%s,%s) 1:(%s,%s), 2:(%s,%s)' % (i,x0,y0,x1,y1,x2,y2) cv2.line(img, (x1,y1), (x2,y2), (0,0,255), 2) i += 1 fn = os.path.splitext(im)[0]+'-lines' cv2.imwrite('imgs/'+fn+'.png', img) def match_exes(im, te): img_rgb = cv2.imread('imgs/'+im) img_gry = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY) template = cv2.imread('imgs/'+te, 0) w, h = template.shape[::-1] res = cv2.matchTemplate(img_gry, template, cv2.TM_CCOEFF_NORMED) threshold = 0.71 loc = np.where(res >= threshold) pts = [] exes = [] blanks = [] for pt in Zip(*loc[::-1]): pts.append(pt) cv2.rectangle(img_rgb, pt, (pt[0]+w, pt[1]+h), (0,0,255), 1) fn = os.path.splitext(im)[0]+'-exes' cv2.imwrite('imgs/'+fn+'.png', img_rgb) return pts, exes, blanks def match_horizontal_lines(im, te, te2): img_rgb = cv2.imread('imgs/'+im) img_gry = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY) template = cv2.imread('imgs/'+te, 0) w1, h1 = template.shape[::-1] template2 = cv2.imread('imgs/'+te2, 0) w2, h2 = template2.shape[::-1] # first line template (the downward facing line) res1 = cv2.matchTemplate(img_gry, template, cv2.TM_CCOEFF_NORMED) threshold1 = 0.8 loc1 = np.where(res1 >= threshold1) # second line template (the upward facing line) res2 = cv2.matchTemplate(img_gry, template2, cv2.TM_CCOEFF_NORMED) threshold2 = 0.8 loc2 = np.where(res2 >= threshold2) pts = [] exes = [] blanks = [] # find first line template (the downward facing line) for pt in Zip(*loc1[::-1]): pts.append(pt) cv2.rectangle(img_rgb, pt, (pt[0]+w1, pt[1]+h1), (0,0,255), 1) # find second line template (the upward facing line) for pt in Zip(*loc2[::-1]): pts.append(pt) cv2.rectangle(img_rgb, pt, (pt[0]+w2, pt[0]+h2), (0,0,255), 1) fn = os.path.splitext(im)[0]+'-horiz' cv2.imwrite('imgs/'+fn+'.png', img_rgb) return pts, exes, blanks # process text = '' for img in images: print 'processing %s' % img hough_transform(img) pts, exes, blanks = match_exes(img, templates[0]) pts1, exes1, blanks1 = match_horizontal_lines(img, templates[1], templates[2]) text += '%s: %s x\'s & %s horizontal lines
' % (img, len(pts), len(pts1)) # statistics file outputFile = open('counts.txt', 'w') outputFile.write(text) outputFile.close()

また、出力画像は次のようになります（ご覧のとおり、すべてのxが検出されますが、すべての行は検出されません）x enter image description here

水平線 enter image description here

ハフ変換 enter image description here

私が言ったように、私は実際には次のフォーム（テーブル画像の最初の行）のように、テーブルからデータを取得しようとしています：

row a = [x,x, , , , ,x, ,x,x, ,x, ,x, , , , ,x, , , ,x,x,x, ,x, ,x, , , , ]

Xの数とスペースの数が必要です。これに類似した他のテーブルイメージもあります（すべてxがあり、列の数が同じで、行の数が異なります）。

また、私はpython 2.7を使用しています

user · Accepted Answer

わかりました、私はそれを理解しました。 @beakerから提供されたグリッド線の間を見るという提案を使用しました。

その前に、ハフ変換コードから重複行を削除する必要がありました。次に、残りの行を垂直と水平の2つのリストに並べ替えました。そこから、水平、次に垂直をループして、関心領域（roi）画像を作成できます。各roiイメージは、テーブルマスターイメージの「セル」を表します。これらの各セルの輪郭を確認したところ、セルに「x」がある場合はlen(contours) >= 2であることがわかりました。したがって、len(contours) < 2は空白スペースでした（これを理解するためにいくつかのテストプログラムを実行しました）。これが私がそれを機能させるために使用したコードです：

import cv2 import numpy as np import os # the list of images (tables) images = ['table1.png', 'table2.png', 'table3.png', 'table4.png', 'table5.png'] # the list of templates (used for template matching) templates = ['train1.png'] def remove_duplicates(lines): # remove duplicate lines (lines within 10 pixels of eachother) for x1, y1, x2, y2 in lines: for index, (x3, y3, x4, y4) in enumerate(lines): if y1 == y2 and y3 == y4: diff = abs(y1-y3) Elif x1 == x2 and x3 == x4: diff = abs(x1-x3) else: diff = 0 if diff < 10 and diff is not 0: del lines[index] return lines def sort_line_list(lines): # sort lines into horizontal and vertical vertical = [] horizontal = [] for line in lines: if line[0] == line[2]: vertical.append(line) Elif line[1] == line[3]: horizontal.append(line) vertical.sort() horizontal.sort(key=lambda x: x[1]) return horizontal, vertical def hough_transform_p(image, template, tableCnt): # open and process images img = cv2.imread('imgs/'+image) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) edges = cv2.Canny(gray, 50, 150, apertureSize=3) # probabilistic hough transform lines = cv2.HoughLinesP(edges, 1, np.pi/180, 200, minLineLength=20, maxLineGap=999)[0].tolist() # remove duplicates lines = remove_duplicates(lines) # draw image for x1, y1, x2, y2 in lines: cv2.line(img, (x1, y1), (x2, y2), (0, 0, 255), 1) # sort lines into vertical & horizontal lists horizontal, vertical = sort_line_list(lines) # go through each horizontal line (aka row) rows = [] for i, h in enumerate(horizontal): if i < len(horizontal)-1: row = [] for j, v in enumerate(vertical): if i < len(horizontal)-1 and j < len(vertical)-1: # every cell before last cell # get width & height width = horizontal[i+1][1] - h[1] height = vertical[j+1][0] - v[0] else: # last cell, width = cell start to end of image # get width & height width = tW height = tH tW = width tH = height # get roi (region of interest) to find an x roi = img[h[1]:h[1]+width, v[0]:v[0]+height] # save image (for testing) dir = 'imgs/table%s' % (tableCnt+1) if not os.path.exists(dir): os.makedirs(dir) fn = '%s/roi_r%s-c%s.png' % (dir, i, j) cv2.imwrite(fn, roi) # if roi contains an x, add x to array, else add _ roi_gry = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY) ret, thresh = cv2.threshold(roi_gry, 127, 255, 0) contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) if len(contours) > 1: # there is an x for 2 or more contours row.append('x') else: # there is no x when len(contours) is <= 1 row.append('_') row.pop() rows.append(row) # save image (for testing) fn = os.path.splitext(image)[0] + '-hough_p.png' cv2.imwrite('imgs/'+fn, img) def process(): for i, img in enumerate(images): # perform probabilistic hough transform on each image hough_transform_p(img, templates[0], i) if __name__ == '__main__': process()

したがって、サンプル画像： enter image description here

そして、出力（テキストファイルを生成するコードは簡潔にするために削除されました）： enter image description here

ご覧のとおり、テキストファイルには画像と同じ位置に同じ数のxが含まれています。難しい部分が終わったので、私は自分の割り当てを続けることができます！