NEDO Challenge, Motion Decoding Using Biosignals スケートボードトリック分類チャレンジ

create date : Jul. 24, 2024 at 15:16:26

スケートボードトリック分類チャレンジチュートリアル¶

これはスケートボードトリック分類チャレンジの分析とモデル構築チュートリアルです. 配布されるデータの分析から投稿ファイル作成までの手順の一例を示しています. 課題に取り組む際の参考にしてください. このチュートリアルの構成は以下の通りとなります.

環境構築
ライブラリのインポート
前分析
モデリング用のデータの作成
データ生成器の構築
モデルの構築
モデルの学習
応募用ファイルの作成

環境構築¶

GPU環境を想定しています. 必要なPythonのライブラリは以下の通りです.

pandas==2.2.2
numpy==1.26.4
torch==2.2.1
matplotlib==3.8.4
pymatreader==0.0.32
scikit-learn==1.3.1
mne==1.7.1

まずはこれらのライブラリをインストールしてください. GPU環境が手元にない場合は必要に応じてGoogle Colabも活用してください.

ライブラリのインポート¶

分析やモデリングに使うライブラリをインポートします.

In [ ]:

import os
import pandas as pd
import numpy as np
import mne
import sys
import random
import glob
import torch
import matplotlib.pyplot as plt
from pymatreader import read_mat
from torch.utils.data import Dataset
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, classification_report
%matplotlib widget

前分析¶

学習用データ¶

配布されている学習用データtrain.zipをこのノートブックと同じディレクトリに配置して, 解凍すると, trainという名前のディレクトリが作成されます. 各被験者ごとにデータが格納されています. subject0のtrain2.matを読み込んで中身を確認してみます.

In [ ]:

data = read_mat(os.path.join('.', 'train', 'subject0', 'train2.mat'))
print(data.keys()) # type: ignore

dict_keys(['__header__', '__version__', '__globals__', 'event', 'times', 'data', 'ch_labels'])

'times', 'data', 'ch_labels'はそれぞれ時刻(単位はミリ秒), EEGデータ, チャンネルの名称リストです. EEGデータは(チャンネル, 測定データ)という多次元配列となっています. 測定データの単位はμVで, 時間間隔は'times'と対応しており, 2ミリ秒間隔(サンプルレートが500Hz)で, チャンネルは'ch_labels'に対応しており, 全部で72チャンネル存在します.

In [ ]:

def make_mne_data(data):
    # MNEのチャンネル情報の設定
    ch_names = [c.replace(' ', '') for c in data['ch_labels']]  # チャンネル名を取得
    ch_types = ['eeg'] * len(ch_names)  # チャンネルタイプ（全てEEGと仮定）

    # チャンネル情報を組み立てる
    info = mne.create_info(ch_names=ch_names, sfreq=500, ch_types=ch_types) # type: ignore

    # RawArrayオブジェクトの作成
    raw = mne.io.RawArray(data['data']*1e-6, info) # Vに変換
    raw.set_montage(mne.channels.make_standard_montage('standard_1020'))

    return raw

In [ ]:

raw = make_mne_data(data)
print(raw)
print(raw.info)

Creating RawArray with float64 data, n_channels=72, n_times=357914
    Range : 0 ... 357913 =      0.000 ...   715.826 secs
Ready.
<RawArray | 72 x 357914 (715.8 s), ~196.7 MB, data loaded>
<Info | 8 non-empty values
 bads: []
 ch_names: Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T7, T8, P7, ...
 chs: 72 EEG
 custom_ref_applied: False
 dig: 75 items (3 Cardinal, 72 EEG)
 highpass: 0.0 Hz
 lowpass: 250.0 Hz
 meas_date: unspecified
 nchan: 72
 projs: []
 sfreq: 500.0 Hz
>

各チャンネルの時系列変化を可視化してみます.

In [ ]:

raw.plot(duration=5, n_channels=72)

Using matplotlib as 2D backend.

Out[ ]:

'event'には'init_index', 'type', 'init_time'があり, それぞれインデックス, トリックの種別, イベントが起こった(トリックが行われた)時刻(単位は秒)を表します.

In [ ]:

events = pd.DataFrame(data['event']).astype({'type': int, 'init_index':int}) # type: ignore
events.head()

Out[ ]:

	type	init_time	init_index
0	12	83.828	1
1	22	86.010	2
2	13	88.244	3
3	23	90.428	4
4	12	92.644	5

各トリックを時系列に並べて可視化してみます. 各トリックの具体的な動きについては各自調べてください.

In [ ]:

events['init_time'] = (events['init_time']*500).astype(int) # 2ms間隔のインデックスに変換
events = events.rename(columns={'init_time': 'id', 'init_index':'test', 'type':'event_id'})[['id', 'test', 'event_id']]
event_dict = {
    'led/frontside_kickturn': 11,
    'led/backside_kickturn': 12,
    'led/pumping': 13,
    'laser/frontside_kickturn': 21,
    'laser/backside_kickturn': 22,
    'laser/pumping': 23
} # トリックの種別の対応付け

In [ ]:

mne.viz.plot_events(events, event_id=event_dict, sfreq=500) # サンプルレートは500Hzなので, sfreq=500に設定

Out[ ]:

led->laser->...のようにハーフパイプを行ったり来たりしつつpumping, backside_kickturn, frontside_kickturnのトリックをそれぞれ合計80回, 40回, 40回行っている様子が分かります.

ここでは紹介しませんが, 他にもパワースペクトル密度の可視化, エポックごとに切り出して脳波の変化の可視化など, さらなる詳細な分析も可能です. 詳しくはmneのチュートリアルを参照してください.

評価用データ¶

配布されている評価用データtest.zipをこのノートブックと同じディレクトリに配置して, 解凍すると, testという名前のディレクトリが作成されます. 各被験者ごとにデータが格納されています. subject0.matを読み込んで中身を確認してみます.

In [ ]:

test_data = read_mat(os.path.join('.', 'test', 'subject0.mat'))
print(test_data.keys()) # type: ignore

dict_keys(['__header__', '__version__', '__globals__', 'data', 'ch_labels'])

In [ ]:

print(test_data['data'].shape, test_data['ch_labels'].shape) # type: ignore

(160, 72, 250) (72,)

'data'は(サンプル, チャンネル, 測定データ)の多次元配列で, 'ch_labels'は学習用データと同じです. 各サンプルについて(チャンネル, 測定データ)をもとにトリックの種別を予測するアルゴリズムをこれから考えていく形となります.

モデリング用のデータの作成¶

各被験者について'train1.set', 'train2.set'を学習用, 'train3.set'を検証用として作成します. 各トリックにおいて(平面から始まりスロープへ向かってトリックを行った)時系列データのうち0.5秒間(平面に進入した瞬間から0.2秒後~0.7秒後の間)を切り出して作成しています. これは評価用データの作成方法と同じとなっています.

In [ ]:

def make_data(src_dir, dst_dir, subject_id):
    print(subject_id)
    # split to train and val
    os.makedirs(os.path.join(dst_dir, 'train', subject_id), exist_ok=True)
    os.makedirs(os.path.join(dst_dir, 'val', subject_id), exist_ok=True)
    labels = {
        '11': 'frontside_kickturn',
        '12': 'backside_kickturn',
        '13': 'pumping',
        '21': 'frontside_kickturn',
        '22': 'backside_kickturn',
        '23': 'pumping'
    }
    counts = {'frontside_kickturn':0, 'backside_kickturn':0, 'pumping':0}
    for fname in os.listdir(os.path.join(src_dir, 'train', subject_id)):
        data = read_mat(os.path.join(src_dir, 'train', subject_id, fname))
        event = pd.DataFrame(data['event'])[['init_time', 'type']] # type: ignore
        ts = pd.DataFrame(np.concatenate([np.array([data['times']]), data['data']]).T, columns=['Time']+list(data['ch_labels'])) # type: ignore
        for i, d in event.iterrows():
            it = d['init_time']+0.2
            et = d['init_time']+0.7
            event_type = str(int(d['type']))
            ts_seg = ts[(ts['Time']>=it*1e3)&(ts['Time']<=et*1e3)]

            if fname!='train3.mat':
                if not os.path.exists(os.path.join(dst_dir, 'train', subject_id, labels[event_type])):
                    os.makedirs(os.path.join(dst_dir, 'train', subject_id, labels[event_type]), exist_ok=True)
                del ts_seg['Time']
                ts_seg.to_csv(os.path.join(dst_dir, 'train', subject_id, labels[event_type], '{:03d}.csv'.format(counts[labels[event_type]])), index=False, header=False)
            else:
                if not os.path.exists(os.path.join(dst_dir, 'val', subject_id, labels[event_type])):
                    os.makedirs(os.path.join(dst_dir, 'val', subject_id, labels[event_type]), exist_ok=True)
                del ts_seg['Time']
                ts_seg.to_csv(os.path.join(dst_dir, 'val', subject_id, labels[event_type], '{:03d}.csv'.format(counts[labels[event_type]])), index=False, header=False)


            counts[labels[event_type]]+=1

In [ ]:

src_dir = '.'
dst_dir = 'test_modeling'
subject_ids = ['subject0', 'subject1', 'subject2', 'subject3', 'subject4']
for subject_id in subject_ids:
    make_data(src_dir=src_dir, dst_dir=dst_dir, subject_id=subject_id)

subject0
subject1
subject2
subject3
subject4

上記のセルを実行後, ./test_modeling/train/以下に各被験者の学習用データが, ./test_modeling/val/以下に検証用データがトリックの種別(frontside_kickturn, backside_kickturn, pumping)ごとに格納されます.

データ生成器の構築¶

深層学習モデルに渡すためのデータ(EEGデータとそのラベルの組のミニバッチ)を生成するデータ生成器を構築します.

In [ ]:

class SeqDataset(Dataset):
    def __init__(self, root, seq_length, is_train, transform=None):
        self.transform = transform
        self.seqs = []
        self.seq_labels = []
        self.class_names = os.listdir(root)
        self.class_names.sort()
        self.numof_classes = len(self.class_names)
        self.seq_length = seq_length
        self.is_train = is_train

        for (i,x) in enumerate(self.class_names):
            temp = glob.glob(os.path.join(root, x, '*'))
            temp.sort()
            self.seq_labels.extend([i]*len(temp))
            for t in temp:
                df = pd.read_csv(t, header=None)
                tensor = preprocess(df)
                self.seqs.append(tensor)

    def __getitem__(self, index):
        seq = self.seqs[index]
        if self.transform is not None:
            seq = self.transform(seq, is_train=self.is_train, seq_length=self.seq_length)
        return {'seq':seq, 'label':self.seq_labels[index]}


    def __len__(self):
        return len(self.seqs)


def preprocess(df: pd.DataFrame)->np.ndarray:
    # transpose
    mat = df.T.values

    # standerization
    mat = standardization(mat, axis=1)

    return mat


def standardization(a, axis=None, ddof=0):
    a_mean = a.mean(axis=axis, keepdims=True)
    a_std = a.std(axis=axis, keepdims=True, ddof=ddof)
    a_std[np.where(a_std==0)] = 1

    return (a - a_mean) / a_std


def add_noise(data, noise_level=0.01):
    noise = np.random.normal(0, noise_level, data.shape)
    data_noisy = data + noise
    return data_noisy.astype(np.float32)


def time_shift(data, shift):
    data_shifted = np.roll(data, shift)
    return data_shifted


def transform(array, is_train, seq_length):
    if is_train:
        _, n = array.shape
        s = random.randint(0, n-seq_length)
        ts = array[:,s:s+seq_length]
        ts = add_noise(ts).astype(np.float32)
        if random.randint(0,1):
            ts_r = ts[:,::-1].copy()
            return ts_r
        return ts
    else:
        ts = array[:,:seq_length].astype(np.float32)
        return ts

In [ ]:

batch_size=10
train_dir = os.path.join('test_modeling', 'train', subject_id)
dataset = SeqDataset(root=train_dir, seq_length=250, is_train=True, transform=transform)
data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True) # type: ignore

ラベルのクラス名とミニバッチのデータのshapeを確認してみます.

In [ ]:

print(dataset.class_names)
for i, mini_batch in enumerate(data_loader):
    print(mini_batch['seq'].shape, mini_batch['label'])

['backside_kickturn', 'frontside_kickturn', 'pumping']
torch.Size([10, 72, 250]) tensor([0, 2, 2, 1, 1, 1, 2, 1, 1, 2])
torch.Size([10, 72, 250]) tensor([0, 2, 2, 2, 0, 0, 0, 0, 2, 2])
torch.Size([10, 72, 250]) tensor([0, 0, 2, 1, 2, 0, 2, 0, 2, 0])
torch.Size([10, 72, 250]) tensor([2, 2, 0, 2, 2, 1, 2, 1, 1, 2])
torch.Size([10, 72, 250]) tensor([0, 1, 1, 1, 2, 2, 0, 2, 1, 0])
torch.Size([10, 72, 250]) tensor([2, 2, 2, 0, 2, 0, 2, 1, 2, 2])
torch.Size([10, 72, 250]) tensor([0, 2, 2, 1, 2, 0, 0, 0, 1, 2])
torch.Size([10, 72, 250]) tensor([0, 0, 2, 2, 2, 2, 2, 2, 0, 0])
torch.Size([10, 72, 250]) tensor([2, 1, 2, 0, 2, 1, 2, 0, 2, 2])
torch.Size([10, 72, 250]) tensor([0, 1, 2, 1, 2, 2, 2, 0, 1, 2])
torch.Size([10, 72, 250]) tensor([2, 2, 2, 1, 2, 1, 2, 0, 1, 2])
torch.Size([10, 72, 250]) tensor([2, 1, 1, 2, 2, 2, 1, 1, 2, 2])
torch.Size([10, 72, 250]) tensor([2, 2, 1, 2, 0, 2, 2, 2, 2, 1])
torch.Size([10, 72, 250]) tensor([0, 2, 2, 1, 2, 2, 1, 0, 2, 1])
torch.Size([10, 72, 250]) tensor([2, 2, 2, 1, 0, 2, 0, 0, 0, 2])
torch.Size([10, 72, 250]) tensor([0, 2, 0, 2, 2, 1, 0, 2, 0, 0])
torch.Size([10, 72, 250]) tensor([0, 2, 2, 2, 1, 1, 1, 2, 2, 2])
torch.Size([10, 72, 250]) tensor([1, 2, 1, 2, 1, 2, 2, 1, 2, 0])
torch.Size([10, 72, 250]) tensor([1, 2, 2, 2, 2, 2, 2, 2, 0, 0])
torch.Size([10, 72, 250]) tensor([0, 0, 2, 2, 1, 2, 1, 2, 2, 0])
torch.Size([10, 72, 250]) tensor([1, 2, 2, 0, 2, 2, 1, 2, 2, 1])
torch.Size([10, 72, 250]) tensor([0, 0, 2, 2, 2, 1, 0, 1, 2, 2])
torch.Size([10, 72, 250]) tensor([0, 2, 1, 2, 2, 0, 0, 1, 2, 2])
torch.Size([10, 72, 250]) tensor([0, 0, 0, 2, 2, 2, 2, 0, 2, 2])
torch.Size([10, 72, 250]) tensor([0, 1, 1, 2, 1, 1, 2, 0, 1, 1])
torch.Size([10, 72, 250]) tensor([1, 2, 0, 1, 2, 2, 0, 1, 1, 1])
torch.Size([10, 72, 250]) tensor([2, 0, 2, 2, 1, 2, 2, 0, 2, 0])
torch.Size([10, 72, 250]) tensor([2, 0, 1, 2, 2, 2, 1, 2, 1, 2])
torch.Size([10, 72, 250]) tensor([1, 0, 0, 1, 0, 0, 1, 2, 2, 2])
torch.Size([10, 72, 250]) tensor([2, 1, 2, 1, 1, 1, 1, 0, 1, 1])
torch.Size([10, 72, 250]) tensor([2, 0, 2, 2, 0, 0, 2, 2, 2, 0])
torch.Size([8, 72, 250]) tensor([0, 2, 0, 2, 0, 2, 2, 0])

EEGデータはpreprocess関数によって正規化されます. 学習時(is_train==True)には決められた長さseq_lengthでランダムに時系列データを切り出してadd_noiseによってガウシアンノイズが加えられます. このような学習時の水増し処理は他にも色々やり方があるので, 独自の工夫を考えて精度を上げるために最適化してみてください. 学習時以外(is_train==False)は最初から決められた長さseq_lengthが切り出されるだけの処理となります.

モデルの構築¶

EEGデータに対してトリックの種別を判別する1DConv深層学習モデルを構築します.

In [ ]:

class Net1DBN(torch.nn.Module):
    def __init__(self, num_channels, num_classes):
        super(Net1DBN, self).__init__()
        self.conv1 = torch.nn.Conv1d(num_channels, 128, kernel_size=3, stride=1)
        self.conv2 = torch.nn.Conv1d(128, 128, kernel_size=3, stride=1)
        self.conv3 = torch.nn.Conv1d(128, 128, kernel_size=3, stride=1)
        self.conv4 = torch.nn.Conv1d(128, 128, kernel_size=3, stride=1)
        self.bn1 = torch.nn.BatchNorm1d(128)
        self.bn2 = torch.nn.BatchNorm1d(128)
        self.bn3 = torch.nn.BatchNorm1d(128)
        self.bn4 = torch.nn.BatchNorm1d(128)
        self.maxpool = torch.nn.MaxPool1d(kernel_size=3, stride=2)
        self.gap = torch.nn.AdaptiveAvgPool1d(1)
        self.fc = torch.nn.Linear(128, num_classes)


    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = torch.relu(x)
        x = self.maxpool(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = torch.relu(x)
        x = self.maxpool(x)

        x = self.conv3(x)
        x = self.bn3(x)
        x = torch.relu(x)
        x = self.maxpool(x)

        x = self.conv4(x)
        x = self.bn4(x)
        x = self.gap(x)
        x = x.squeeze(2)

        x = self.fc(x)

        return x

出力のshapeを確認します. 各トリックの(確率に関わる)重みがサンプルごとに格納されていることが分かります.

In [ ]:

num_channels = 72  # チャンネル数
num_classes = 3    # 判別するトリックの種別数
model = Net1DBN(num_channels, num_classes)
in_data = torch.randn(8, num_channels, 300)
out_data = model(in_data)
print(out_data)

tensor([[ 0.0580, -0.1046,  0.0523],
        [ 0.1909,  0.0649,  0.0362],
        [-0.0513,  0.1652,  0.1426],
        [-0.0546, -0.0892,  0.1308],
        [-0.0468,  0.0842,  0.1439],
        [-0.0731,  0.1884, -0.2715],
        [ 0.0436, -0.2393, -0.0940],
        [-0.0304,  0.0863,  0.0066]], grad_fn=<AddmmBackward0>)

モデルの学習¶

各被験者についてモデルを学習させます.

In [ ]:

def train(log_interval, model, device, train_loader, optimizer, epoch, iteration):
    model.train()
    criterion = torch.nn.CrossEntropyLoss()
    for sample_batched in train_loader:
        data, target = sample_batched['seq'].to(device), sample_batched['label'].to(device)
        optimizer.zero_grad()
        output = model(data)
        pred = output.max(1, keepdim=True)[1]
        correct = pred.eq(target.view_as(pred)).sum().item()
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        iteration += 1
        if iteration % log_interval == 0:
            sys.stdout.write('\repoch:{0:>3} iteration:{1:>6} train_loss: {2:.6f} train_accracy: {3:5.2f}%'.format(
                            epoch, iteration, loss.item(), 100.*correct/float(len(sample_batched['label']))))
            sys.stdout.flush()
    return iteration


def val(model, device, test_loader):
    model.eval()
    criterion = torch.nn.CrossEntropyLoss(reduction='sum')
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for sample_batched in test_loader:
            data, target = sample_batched['seq'].to(device), sample_batched['label'].to(device)
            output = model(data)
            test_loss += criterion(output, target).item()
            pred = output.max(1, keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= float(len(test_loader.dataset))
    correct /= float(len(test_loader.dataset))
    print('\n  Validation: Accuracy: {0:.2f}%  test_loss: {1:.6f}'.format(100. * correct, test_loss))
    return test_loss, 100. * correct


def evaluate(model, device, test_loader):
    preds = []
    trues = []
    model.eval()
    with torch.no_grad():
        for sample_batched in test_loader:
            data, target = sample_batched['seq'].to(device), sample_batched['label'].to(device)
            output = model(data)
            pred = [test_loader.dataset.class_names[i] for i in list(output.max(1)[1].cpu().detach().numpy())]
            preds += pred
            true = [test_loader.dataset.class_names[i] for i in list(target.cpu().detach().numpy())]
            trues += true
    labels = test_loader.dataset.class_names
    cm = confusion_matrix(trues, preds, labels=labels)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=labels)
    disp.plot()
    plt.show()
    cr = classification_report(trues, preds, target_names=labels)
    print(cr)
    correct = 0
    for pred, true in zip(preds, trues):
        if pred == true:
            correct += 1
    df = pd.DataFrame({'pred': preds, 'true': trues})

    return correct/len(trues), df


def train_evaluate(train_dir, val_dir, log_interval, num_epoches, seq_length, transform=None, num_channels=72, num_classes = 3):
    model = Net1DBN(num_channels=num_channels, num_classes=num_classes)
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)
    train_loader = torch.utils.data.DataLoader(SeqDataset(root=train_dir, seq_length=seq_length, is_train=True, transform=transform), batch_size=20, shuffle=True) # type: ignore
    optimizer = torch.optim.Adam(model.parameters())
    val_loader = torch.utils.data.DataLoader(SeqDataset(root=val_dir, seq_length=seq_length, is_train=False, transform=transform), batch_size=20, shuffle=False) # type: ignore
    iteration = 0
    for epoch in range(1, 1+num_epoches):
        iteration = train(log_interval, model, device, train_loader, optimizer, epoch, iteration)
        if epoch%10==0:
            test_loss, test_acc = val(model, device, val_loader)
    acc, df = evaluate(model, device, val_loader)
    print(acc)
    return model

各被験者について, 100エポック学習させて検証用データに対する精度評価を確認します.

In [ ]:

log_interval = 5000
num_epoches = 100
seq_length = 250
models = {}
for subject_id in subject_ids:
    train_dir = os.path.join('test_modeling', 'train', subject_id)
    val_dir = os.path.join('test_modeling', 'val', subject_id)
    model = train_evaluate(train_dir, val_dir, log_interval, num_epoches, seq_length, transform)
    models[subject_id] = model

  Validation: Accuracy: 54.09%  test_loss: 1.891572

  Validation: Accuracy: 66.67%  test_loss: 1.409148

  Validation: Accuracy: 77.36%  test_loss: 0.915978

  Validation: Accuracy: 74.21%  test_loss: 1.070139

  Validation: Accuracy: 79.25%  test_loss: 0.939362

  Validation: Accuracy: 78.62%  test_loss: 1.111083

  Validation: Accuracy: 76.73%  test_loss: 1.595384

  Validation: Accuracy: 79.25%  test_loss: 1.056733

  Validation: Accuracy: 77.99%  test_loss: 1.278809

  Validation: Accuracy: 80.50%  test_loss: 0.984952

                    precision    recall  f1-score   support

 backside_kickturn       0.75      0.62      0.68        39
frontside_kickturn       0.69      0.83      0.76        41
           pumping       0.90      0.89      0.89        79

          accuracy                           0.81       159
         macro avg       0.78      0.78      0.77       159
      weighted avg       0.81      0.81      0.80       159

0.8050314465408805

  Validation: Accuracy: 73.58%  test_loss: 0.712705

  Validation: Accuracy: 67.92%  test_loss: 1.141067

  Validation: Accuracy: 76.73%  test_loss: 0.941605

  Validation: Accuracy: 66.67%  test_loss: 1.316263

  Validation: Accuracy: 77.36%  test_loss: 0.949576

  Validation: Accuracy: 73.58%  test_loss: 1.196790

  Validation: Accuracy: 65.41%  test_loss: 1.830253

  Validation: Accuracy: 77.99%  test_loss: 1.007843

  Validation: Accuracy: 84.91%  test_loss: 0.765744

  Validation: Accuracy: 83.65%  test_loss: 0.819752

                    precision    recall  f1-score   support

 backside_kickturn       0.81      0.72      0.76        40
frontside_kickturn       0.77      0.85      0.81        40
           pumping       0.89      0.89      0.89        79

          accuracy                           0.84       159
         macro avg       0.82      0.82      0.82       159
      weighted avg       0.84      0.84      0.84       159

0.8364779874213837

  Validation: Accuracy: 69.62%  test_loss: 1.112549

  Validation: Accuracy: 70.89%  test_loss: 1.473466

  Validation: Accuracy: 60.13%  test_loss: 2.130931

  Validation: Accuracy: 72.78%  test_loss: 1.255365

  Validation: Accuracy: 68.99%  test_loss: 1.765945

  Validation: Accuracy: 78.48%  test_loss: 1.104978

  Validation: Accuracy: 71.52%  test_loss: 1.318160

  Validation: Accuracy: 74.68%  test_loss: 1.257874

  Validation: Accuracy: 65.19%  test_loss: 2.031269

  Validation: Accuracy: 76.58%  test_loss: 1.248977

                    precision    recall  f1-score   support

 backside_kickturn       0.62      0.87      0.72        39
frontside_kickturn       0.88      0.72      0.79        40
           pumping       0.83      0.73      0.78        79

          accuracy                           0.77       158
         macro avg       0.78      0.78      0.77       158
      weighted avg       0.79      0.77      0.77       158

0.7658227848101266

  Validation: Accuracy: 56.25%  test_loss: 1.645411

  Validation: Accuracy: 86.25%  test_loss: 0.724722

  Validation: Accuracy: 80.62%  test_loss: 0.847240

  Validation: Accuracy: 70.62%  test_loss: 1.354542

  Validation: Accuracy: 82.50%  test_loss: 0.787393

  Validation: Accuracy: 84.38%  test_loss: 0.758522

  Validation: Accuracy: 82.50%  test_loss: 1.203661

  Validation: Accuracy: 83.12%  test_loss: 0.739231

  Validation: Accuracy: 88.12%  test_loss: 0.706658

  Validation: Accuracy: 78.75%  test_loss: 1.444969

                    precision    recall  f1-score   support

 backside_kickturn       0.59      0.93      0.72        40
frontside_kickturn       0.85      0.85      0.85        40
           pumping       0.96      0.69      0.80        80

          accuracy                           0.79       160
         macro avg       0.80      0.82      0.79       160
      weighted avg       0.84      0.79      0.79       160

0.7875

  Validation: Accuracy: 66.25%  test_loss: 1.452248

  Validation: Accuracy: 71.88%  test_loss: 0.926929

  Validation: Accuracy: 70.62%  test_loss: 1.281376

  Validation: Accuracy: 66.25%  test_loss: 1.130868

  Validation: Accuracy: 65.00%  test_loss: 1.951477

  Validation: Accuracy: 75.62%  test_loss: 0.891186

  Validation: Accuracy: 68.75%  test_loss: 1.381740

  Validation: Accuracy: 65.00%  test_loss: 2.391677

  Validation: Accuracy: 72.50%  test_loss: 1.857807

  Validation: Accuracy: 76.25%  test_loss: 1.151513

                    precision    recall  f1-score   support

 backside_kickturn       0.95      0.50      0.66        42
frontside_kickturn       0.65      0.79      0.71        38
           pumping       0.77      0.89      0.83        80

          accuracy                           0.76       160
         macro avg       0.79      0.73      0.73       160
      weighted avg       0.79      0.76      0.75       160

0.7625

エポック数を変えたりモデルの構造を変えたり最適化手法(optimizer)を工夫したりして精度改善ができるか試してください.

応募用ファイルの作成¶

提出用のファイルを作成します. サンプルID(各被験者ごとのサンプル)ごとにトリックの種別を予測したもので, ヘッダーなしのcsvファイルです.

サンプルID
- フォーマットはsubject{0,1,2,3,4}_{sample_id}
- 型はstr
トリックの種別
- frontside_kickturn, backside_kickturn, pumpingのいずれか
- 型はstr

In [ ]:

def output_pred(src_dir, root_dir, subject_ids, models, seq_length, transform, device):
    predictions = {}
    for subject_id in subject_ids:
        train_dir =os.path.join(root_dir, subject_id)
        class_names = os.listdir(train_dir)
        class_names.sort()
        data = read_mat(os.path.join(src_dir, 'test', '{}.mat'.format(subject_id)))
        for i, ts in enumerate(data['data']): # type: ignore
            tensor = torch.from_numpy(transform(standardization(ts, axis=1), is_train=False, seq_length=seq_length)).unsqueeze(0).to(device) # type: ignore
            pred = models[subject_id](tensor)
            _, output_index = pred.max(1)
            pred = output_index.squeeze(0).cpu().detach().numpy()
            predictions['{}_{:03d}'.format(subject_id, i)]=class_names[pred]
    result = pd.Series(predictions)

    return result

In [ ]:

src_dir = '.'
root_dir = os.path.join('test_modeling', 'train')
subject_ids = ['subject0', 'subject1', 'subject2', 'subject3', 'subject4']
seq_length = 250
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
result = output_pred(src_dir, root_dir, subject_ids, models, seq_length, transform, device)
result.to_csv('submit.csv', header=False)

実行に成功すると./submit.csvが作成されます. 投稿して精度を確認し, 改善してさらに投稿を繰り返して精度向上を目指してください. 以上でチュートリアルは終わりです. たくさんの応募お待ちしております.

create date : Jul. 24, 2024 at 15:16:26

NEDO Challenge, Motion Decoding Using Biosignals スケートボードトリック分類チャレンジ

スケートボードトリック分類チャレンジチュートリアル¶

環境構築¶

ライブラリのインポート¶

前分析¶

学習用データ¶

評価用データ¶

モデリング用のデータの作成¶

データ生成器の構築¶

モデルの構築¶

モデルの学習¶

応募用ファイルの作成¶

Terms of Participation in SIGNATE Competition

Article 1 Definitions

Article 2 Competition

Article 3 Reward and Vesting of Rights

Article 4 Confidentiality

Article 5 Prohibited Acts of Participants

Article 6 (Modification of Terms)

This is a forum used by SIGNATE members to exchange thoughts and ideas on data science and competitions. As your membership here is conditional, please keep in mind to familiarize yourself before joining in on discussion.

NEDO Challenge, Motion Decoding Using Biosignals スケートボードトリック分類チャレンジ

スケートボードトリック分類チャレンジチュートリアル¶

環境構築¶

ライブラリのインポート¶

前分析¶

学習用データ¶

評価用データ¶

モデリング用のデータの作成¶

データ生成器の構築¶

モデルの構築¶

モデルの学習¶

応募用ファイルの作成¶

SIGNATE Competition

Terms of Participation in SIGNATE Competition

Article 1 Definitions

Article 2 Competition

Article 3 Reward and Vesting of Rights

Article 4 Confidentiality

Article 5 Prohibited Acts of Participants

Article 6 (Modification of Terms)

General posting guidelines

This is a forum used by SIGNATE members to exchange thoughts and ideas on data science and competitions. As your membership here is conditional, please keep in mind to familiarize yourself before joining in on discussion.

Please sign in

本コンペに参加するには下記項目への入力が必須です

本コンペに参加するには下記項目への入力が必須です

Must update your profile to join the competition

Must update your profile to join the competition

本コンペに参加するには下記項目への入力が必須です

本コンペに参加するには下記項目への入力が必須です

本コンペに参加するには下記項目への入力が必須です

Must update your profile to join the competition

Error details

Preparing to download the contents.

Must update your profile to join the competition