必要なライブラリ

ファイルの処理にpandas, 基本的な画像処理にPIL, 基本的な数値計算にnumpyを使います. それぞれpipでインストール可能です.

必要なデータの用意

https://signate.jp/competitions/31/data へアクセスし, 「データをダウンロード」タブを押し, dtc_train_images_1.zip,
dtc_train_images_2.zip, dtc_train_master.tsvをダウンロードしてください. 画像データをそれぞれ解凍し,
作業ディレクトリ上で学習データをdtc_train_imagesフォルダへまとめておいてください.

実装

まず必要なライブラリをインポートします.

import pandas as pd
import numpy as np
import copy
import os
from PIL import Image

1. 評価関数の実装

https://signate.jp/competitions/31 にあるように料理領域検出部門の評価関数はmean average precisionです.
まず予測したbounding boxが正解のbounding boxを検出できたかどうかを判定するために重なり具合を計算する関数を実装します.
重なり具合は0~1の値を取り, 値が大きければ大きいほど重なり具合が大きくなります.

def compute_ratio(bb_true, bb_predict):
    """
    bb_true: ground truth bounding box
    bb_predict: predicted bounding box
    bounding box = (x, y, width, height)
    """
    intersection_x = max(min(bb_true[0]+bb_true[2], bb_predict[0]+bb_predict[2]) - max(bb_true[0], bb_predict[0]), 0)
    intersection_y = max(min(bb_true[1]+bb_true[3], bb_predict[1]+bb_predict[3]) - max(bb_true[1], bb_predict[1]), 0)

    area_intersection = intersection_x*intersection_y
    area_true = bb_true[2]*bb_true[3]
    area_predict = bb_predict[2]*bb_predict[3]
    area_union = area_true + area_predict - area_intersection

    ratio = area_intersection/float(area_union)

    return ratio

次にaverage precisionを計算する関数を実装します. 1サンプルに対する正解(y_true)と予測(y_pred)を渡し, 上で実装した
重なり具合がthreshold以上なら検出できたと判定します.

def compute_ap(y_true, y_predict, threshold):
    """
    y_true: list of ground truth bounding box -> [bb_0, bb_1, ...]
    y_predict: list of predicted bounding box sorted in order -> [bb_0, bb_1, ...]
    """
    delrecall = []
    precision = []
    undetected = copy.copy(y_true)
    results = []
    pred_count = 0
    for y_pred in y_predict:
        if len(undetected) > 0:
            pred_count += 1
            #print('--------', pred_count, '-------')
            #print('predicted bb:', y_pred)
            ratios = np.array([(y_t, compute_ratio(y_t, y_pred)) for y_t in undetected])
            #print('score:', ratios[:,1].max())
            detected = ratios[:,1].max() > threshold
            results.append(detected)
            if detected:
                #print('detected')
                delrecall.append(1.0/len(y_true))
                detected_bb = ratios[:,0][ratios[:,1].argmax()]
                #print(detected_bb)
                undetected.remove(detected_bb)
            else:
                #print('undetected')
                delrecall.append(0.0)
            #print('precision so far:', float(np.array(results).sum())/len(results))
            precision.append(float(np.array(results).sum())/len(results))
            #print('remaining', len(undetected), 'bb(s) so far')
            #for u in undetected:
            #    print(u)

        else:
            break
    #print('\nDone.\n')
    #print('precision:', precision)
    #print('delrecall:', delrecall)
    ap = np.sum(np.array(precision)*np.array(delrecall))
    #print('average precision:', ap, '\n')

    return ap

次にmean average precisionを計算する関数を実装します. Average precisionの, 全サンプルに関する平均です.
提出(submit)と正解(ans)を渡して計算します. 検出判定のための閾値thresholdは0.9とします.

def compute_map(submit, ans, threshold = 0.9):
    """
    submit: pandas.DataFrame
    ans: pandas.DataFrame
    """
    ans_scores = []
    for t in list(set(ans[0])):
        #print(t)
        pred = submit[submit[0]==t]
        pred = pred.sort_values(by=1, ascending=False)
        #print(pred)
        p_t_bb = []
        for p in pred.iterrows():
            p_t_bb.append((p[1][2],p[1][3],p[1][4],p[1][5]))

        ans_true = ans[ans[0]==t]
        ans_t_bb = []
        for t in ans_true.iterrows():
            ans_t_bb.append((t[1][1],t[1][2],t[1][3],t[1][4]))
        ans_scores.append(compute_ap(ans_t_bb, p_t_bb, threshold))

    return np.mean(np.array(ans_scores))

2. スコアの出力

評価関数が実装できたので, 実際に提出ファイルを作成しスコアを計算してみます.
提出ファイルはシンプルに予測するbounding boxは画像の左上の座標(0, 0)と大きさ(width, height)
とし, 信頼度はそれぞれ1としたものにします.

file_name = []
confidence = []
x = []
y = []
width = []
height = []
image_path = 'dtc_train_images'
ans_path = 'dtc_train_master.tsv'
for f in os.listdir(image_path):
    file_name.append(f)
    confidence.append(1)
    image = Image.open(os.path.join(image_path,f))
    x.append(0)
    y.append(0)
    width.append(image.size[0])
    height.append(image.size[1])
submit = pd.DataFrame(0:file_name, 1:confidence, 2:x, 3:y, 4:width, 5:height)
ans = pd.read_csv(ans_path, sep='\t')

実際にスコアを計算してみます.

score = compute_map(submit, ans)
print(score)

0.492847202381

まとめ

評価関数（MAP）の実装と提出ファイルの作成例を示し, スコアを計算してみました.
ちゃんとモデリングしたうえで提出ファイルを作成することは大変ですが, 色々試してみて
自身で評価し, 実際に結果を投稿してみてください. 皆さんの応募をお待ちしております.

（1）料理領域検出部門評価関数の実装例

必要なライブラリ

必要なデータの用意

実装

1. 評価関数の実装

2. スコアの出力

まとめ

Article 1. Definitions

Article 2. Competition

Article 3 Reward and Vesting of Rights

Article 4 Confidentiality

Article 5 Prohibited Acts of Participants

Article 6. Change, Discontinuation or Termination of Provision of Services under These Terms

Article 7. Modification of Terms

This is a forum used by SIGNATE members to exchange thoughts and ideas on data science and competitions. As your membership here is conditional, please keep in mind to familiarize yourself before joining in on discussion.

（1）料理領域検出部門 評価関数の実装例

必要なライブラリ

必要なデータの用意

実装

1. 評価関数の実装

2. スコアの出力

まとめ

SIGNATE Competition

Article 1. Definitions

Article 2. Competition

Article 3 Reward and Vesting of Rights

Article 4 Confidentiality

Article 5 Prohibited Acts of Participants

Article 6. Change, Discontinuation or Termination of Provision of Services under These Terms

Article 7. Modification of Terms

General posting guidelines

This is a forum used by SIGNATE members to exchange thoughts and ideas on data science and competitions. As your membership here is conditional, please keep in mind to familiarize yourself before joining in on discussion.

Please sign in

Must update your profile to join the competition

本コンペに参加するには下記項目への入力が必須です

本コンペに参加するには下記項目への入力が必須です

Must update your profile to join the competition

Must update your profile to join the competition

本コンペに参加するには下記項目への入力が必須です

本コンペに参加するには下記項目への入力が必須です

本コンペに参加するには下記項目への入力が必須です

Must update your profile to join the competition

Must update your profile to join the competition

Error details

Preparing to download the contents.

（1）料理領域検出部門評価関数の実装例