TensorFlowオブジェクト検出APIチュートリアルで境界ボックスの座標を取得します

Question

pythonとTensorflowの両方に慣れていません。 Tensorflow Object Detection API からobject_detection_tutorialファイルを実行しようとしていますが、座標を取得できる場所が見つかりませんオブジェクトが検出されたときの境界ボックスの。

関連コード：

 # The following processing is only for single image detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0]) detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])

...

バウンディングボックスが描画されると仮定する場所は次のとおりです。

 # Visualization of the results of a detection. vis_util.visualize_boxes_and_labels_on_image_array( image_np, output_dict['detection_boxes'], output_dict['detection_classes'], output_dict['detection_scores'], category_index, instance_masks=output_dict.get('detection_masks'), use_normalized_coordinates=True, line_thickness=8) plt.figure(figsize=IMAGE_SIZE) plt.imshow(image_np)

Output_dict ['detection_boxes']を印刷しようとしましたが、数字の意味がわかりません。たくさんあります。

array([[ 0.56213236, 0.2780568 , 0.91445708, 0.69120586], [ 0.56261235, 0.86368728, 0.59286624, 0.8893863 ], [ 0.57073039, 0.87096912, 0.61292225, 0.90354401], [ 0.51422435, 0.78449738, 0.53994244, 0.79437423],

......

 [ 0.32784131, 0.5461576 , 0.36972913, 0.56903434], [ 0.03005961, 0.02714229, 0.47211722, 0.44683522], [ 0.43143299, 0.09211366, 0.58121657, 0.3509962 ]], dtype=float32)

同様の質問に対する答えを見つけましたが、boxsという変数はありません。座標を取得するにはどうすればよいですか？ありがとうございました！

MFisherKDX · Accepted Answer

Output_dict ['detection_boxes']を印刷しようとしましたが、数字の意味がわかりません

自分でコードをチェックアウトできます。 visualize_boxes_and_labels_on_image_arrayが定義されていますここ。

use_normalized_coordinates=Trueを渡していることに注意してください。関数呼び出しをトレースすると、数値[ 0.56213236, 0.2780568 , 0.91445708, 0.69120586]などが値[ymin, xmin, ymax, xmax]であることがわかります。ここで画像の座標は次のとおりです。

(left, right, top, bottom) = (xmin * im_width, xmax * im_width, ymin * im_height, ymax * im_height)

関数によって計算されます：

def draw_bounding_box_on_image(image, ymin, xmin, ymax, xmax, color='red', thickness=4, display_str_list=(), use_normalized_coordinates=True): """Adds a bounding box to an image. Bounding box coordinates can be specified in either absolute (pixel) or normalized coordinates by setting the use_normalized_coordinates argument. Each string in display_str_list is displayed on a separate line above the bounding box in black text on a rectangle filled with the input 'color'. If the top of the bounding box extends to the Edge of the image, the strings are displayed below the bounding box. Args: image: a PIL.Image object. ymin: ymin of bounding box. xmin: xmin of bounding box. ymax: ymax of bounding box. xmax: xmax of bounding box. color: color to draw bounding box. Default is red. thickness: line thickness. Default value is 4. display_str_list: list of strings to display in box (each to be shown on its own line). use_normalized_coordinates: If True (default), treat coordinates ymin, xmin, ymax, xmax as relative to the image. Otherwise treat coordinates as absolute. """ draw = ImageDraw.Draw(image) im_width, im_height = image.size if use_normalized_coordinates: (left, right, top, bottom) = (xmin * im_width, xmax * im_width, ymin * im_height, ymax * im_height)

Vadim · Answer

私はまったく同じ物語を持っています。およそ100個のボックス（output_dict['detection_boxes']）画像に1つだけ表示された場合。長方形を描画しているコードをさらに掘り下げると、それを抽出してinference.py：

#so detection has happened and you've got output_dict as a # result of your inference # then assume you've got this in your inference.py in order to draw rectangles vis_util.visualize_boxes_and_labels_on_image_array( image_np, output_dict['detection_boxes'], output_dict['detection_classes'], output_dict['detection_scores'], category_index, instance_masks=output_dict.get('detection_masks'), use_normalized_coordinates=True, line_thickness=8) # This is the way I'm getting my coordinates boxes = output_dict['detection_boxes'] # get all boxes from an array max_boxes_to_draw = boxes.shape[0] # get scores to get a threshold scores = output_dict['detection_scores'] # this is set as a default but feel free to adjust it to your needs min_score_thresh=.5 # iterate over all objects found for i in range(min(max_boxes_to_draw, boxes.shape[0])): # if scores is None or scores[i] > min_score_thresh: # boxes[i] is the box which will be drawn class_name = category_index[output_dict['detection_classes'][i]]['name'] print ("This box is gonna get used", boxes[i], output_dict['detection_classes'][i])