2019-05-25

Metabase における週初めは日曜

Java Clojure

Metabase を試していたところ、以下の点が気になりました。

週単位で集計すると週初めが日曜になる（日曜から土曜までの集計）

（画面例）

f:id:fits:20190525210517p:plain

DB 等、一般的なシステムにおける週初めは月曜になる（ISO 8601）はずなので、Metabase が日曜へ変えているのは確実。

そこで、「SQLを見る」をクリックして SQL の内容を確認してみると、やはり日曜へ変える（週初めの月曜 - 1日）ようになっていました。（接続先 DB は PostgreSQL）

クエリビルダーで生成された SQL

SELECT (date_trunc('week', CAST((CAST("public"."stock_move"."date" AS timestamp) + INTERVAL '1 day') AS timestamp)) - INTERVAL '1 day') AS "date", sum("public"."stock_move"."product_qty") AS "sum"
FROM "public"."stock_move"
GROUP BY (date_trunc('week', CAST((CAST("public"."stock_move"."date" AS timestamp) + INTERVAL '1 day') AS timestamp)) - INTERVAL '1 day')
ORDER BY (date_trunc('week', CAST((CAST("public"."stock_move"."date" AS timestamp) + INTERVAL '1 day') AS timestamp)) - INTERVAL '1 day') ASC

特に設定も見当たらないので（タイムゾーンや言語を設定しても無駄だった）、該当箇所のソースを見てみると、日曜へ変える事しか考慮していない事が判明。

src/matabase/driver/postgres.clj

・・・

(defmethod sql.qp/date [:postgres :week]            [_ _ expr] (hx/- (date-trunc :week (hx/+ (hx/->timestamp expr)
                                                                                             one-day))
                                                                     one-day))

・・・

（現時点では）週初めを月曜へ変えるには Metabase のソースを書き換える事になりそうですが、日曜を前提に作られている点が懸念されます。

また、Allow organizations to determine the start of their week #1779 などを見る限り、Metabase 側の対応に期待するのも厳しそうです。

PostgreSQL 検索時の週初めを月曜へ変更

試しに、Java の Instrumentation 機能を利用し、PostgreSQL 検索時に週初めが月曜となるようにしてみます。

Clojure で実装された該当処理（上記 postgres.clj の処理内容）が Java 上でどのように処理されるのか調べたところ、以下のようになっていました。

metabase.driver.postgres__init クラスの static initializer 実行時に metabase.driver.sql.query-processor 名前空間に属する date 変数の rawRoot に [:postgres :week] をキーにして addMethod している

つまり、この処理が終わった後で [:postgres :week] の処理を差し替えれば何とかなりそうです。

実装

ソースは http://github.com/fits/try_samples/tree/master/blog/20190525/

postgres__init クラスの初期化後に処理を実施したいので、ClassFileTransformer を使って任意のクラスのロード時に処理を差し込めるようにしました。（クラスのロード時に transform メソッドが呼ばれる）

ここでは org/postgresql/Driver クラスのロード時に処理するようにしましたが、postgres__init の初期化が済んでいればどのタイミングでも問題ないと思います。

差し替え後の処理 PgWeekFunc は Clojure のコードで (date-trunc :week expr) を実施するように実装しています。

sample/SampleAgent.java

package sample;

import java.lang.instrument.*;
import java.security.*;
import clojure.lang.*;

public class SampleAgent {
    public static void premain(String agentArgs, Instrumentation inst) {
        inst.addTransformer(new PgWeekTransformer());
    }

    static class PgWeekTransformer implements ClassFileTransformer {
        public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer) {
            // org/postgresql/Driver クラスのロード時に実施
            if (className.equals("org/postgresql/Driver")) {
                replacePgWeek();
            }

            return null;
        }

        // [:postgres :week] の処理を置き換えるための処理
        private void replacePgWeek() {
            // metabase.driver.sql.query-processor 名前空間の取得
            Namespace n = Namespace.find(
                Symbol.intern("metabase.driver.sql.query-processor")
            );

            // date 変数の取得
            Var v = n.findInternedVar(Symbol.intern("date"));

            MultiFn root = (MultiFn)v.getRawRoot();

            // キー [:postgres :week] の作成
            IPersistentVector k = Tuple.create(
                RT.keyword(null, "postgres"), 
                RT.keyword(null, "week")
            );

            // [:postgres :week] へ紐づいた処理を差し替え
            root.removeMethod(k);
            root.addMethod(k, new PgWeekFunc());
        }
    }

    // 週の処理関数を定義
    static class PgWeekFunc extends AFunction {
        public static final Var const__1 = RT.var(
            "metabase.driver.postgres", 
            "date-trunc"
        );

        public static final Keyword const__2 = RT.keyword(null, "week");

        public static Object invokeStatic(Object obj1, Object obj2, Object expr) {
            // (date-trunc :week expr) の実施
            return ((IFn)const__1.getRawRoot()).invoke(const__2, expr);
        }

        public Object invoke(Object obj1, Object obj2, Object obj3) {
            return invokeStatic(obj1, obj2, obj3);
        }
    }
}

META-INF/MANIFEST.MF

Manifest-Version: 1.0
Premain-Class: sample.SampleAgent

上記ソースをビルドして JAR ファイル化（例. sample-agent.jar）しておきます。

実行

Metabase の実行時に（上で作成した）sample-agent.jar を -javaagent オプションで適用します。

Metabase 実行（Instrumentation 適用）

> java -javaagent:sample-agent.jar -jar metabase.jar

SQL を確認してみると、INTERVAL '1 day' の減算等が無くなり、処理の差し替えが効いている事を確認できました。

クエリビルダーで生成された SQL（差し替え後）

SELECT date_trunc('week', CAST("public"."stock_move"."date" AS timestamp)) AS "date", sum("public"."stock_move"."product_qty") AS "sum"
FROM "public"."stock_move"
GROUP BY date_trunc('week', CAST("public"."stock_move"."date" AS timestamp))
ORDER BY date_trunc('week', CAST("public"."stock_move"."date" AS timestamp)) ASC

ついでに、サーバーからのレスポンス内容を確認してみると、日付が月曜になり集計結果が変わっている事を確認できました。

レスポンス内容（一部）

"data":{
    "rows":[
        ["2019-04-22T00:00:00.000+09:00",2139.0],
        ["2019-05-06T00:00:00.000+09:00",30.0],
        ["2019-05-13T00:00:00.000+09:00",13.0]
    ],
    "columns":["date","sum"],
    ・・・
}

ただし、Web 画面上は JavaScript が週の範囲を生成している事から、（date: Week の）表示上は日曜から土曜となってしまいます。

（画面例）

f:id:fits:20190525210556p:plain

このように、週初めを月曜へ変えるには以下のような JavaScript の処理に関しても考慮が必要になりそうです。

frontend/src/metabase/lib/formatting.js

・・・

function formatWeek(m: Moment, options: FormattingOptions = {}) {
  // force 'en' locale for now since our weeks currently always start on Sundays
  m = m.locale("en");
  return formatMajorMinor(m.format("wo"), m.format("gggg"), options);
}

・・・

2019-05-06

Keras.js によるランドマーク検出の Web アプリケーション化2

Deeplearning JavaScript HTML5 Node.js

前回はランドマーク検出対象の画像サイズを固定（256x256）しましたが、今回は任意の画像サイズに対応できるように改造してみます。

Keras.js 1.0.3

ソースは http://github.com/fits/try_samples/tree/master/blog/20190506/

可変サイズ対応

ドラッグアンドドロップした画像のサイズに合わせてランドマーク検出を実施するようにしてみます。（ファイル構成などは前回と同じ）

ただ、Keras.js を通常とは異なる使い方をするため、何らかの不都合が生じるかもしれませんし、別バージョンでは動作しないかもしれません。

(a) UI 処理（src/app.js）

canvas のサイズを画像サイズに合わせて変更し、ランドマーク検出処理へ画像サイズ（幅、高さ）の情報を渡すようにします。

src/app.js

・・・
const loadImage = url => new Promise(resolve => {
    const img = new Image()

    img.addEventListener('load', () => {
        // canvas のサイズを画像サイズに合わせて変更
        canvas.width = img.width
        canvas.height = img.height

        ctx.clearRect(0, 0, canvas.width, canvas.height)

        ctx.drawImage(img, 0, 0)

        const d = ctx.getImageData(0, 0, canvas.width, canvas.height)
        resolve({width: img.width, height: img.height, data: imgToArray(d)})
    })

    img.src = url
})

・・・

const ready = () => {
    ・・・

    canvas.addEventListener('drop', ev => {
        ev.preventDefault()
        canvas.classList.remove('dragging')

        const file = ev.dataTransfer.files[0]

        if (imageTypes.includes(file.type)) {
            clearLandmarksInfo()

            const reader = new FileReader()

            reader.onload = ev => {
                loadImage(reader.result)
                    .then(d => {
                        detectDialog.showModal()
                        // 画像のサイズ情報を追加
                        worker.postMessage({type: 'predict', input: d.data, width: d.width, height: d.height})
                    })
            }

            reader.readAsDataURL(file)
        }
    }, false)
}

・・・

(b) ランドマーク検出処理（src/worker.js）

通常は（Keras.js の）モデル内でレイヤー毎の入出力の形状が固定化されているので、このままでは任意の画像サイズには対応できません。

そこで、検出処理の度に入出力の形状を強制的にリセットする処理（以下）を加える事で可変サイズに対応します。

(1) 入力データの形状（画像サイズ）を変更
(2) 各レイヤーの出力形状をクリア
(3) inputTensorsMap のリセット

src/worker.js

・・・

onmessage = ev => {
    switch (ev.data.type) {
        ・・・
        case 'predict':
            const inputLayerName = model.inputLayerNames[0]
            const outputLayerName = model.outputLayerNames[0]

            const w = ev.data.width
            const h = ev.data.height

            // (1) 入力データの形状（画像サイズ）を変更
            const inputLayer = model.modelLayersMap.get(inputLayerName)
            inputLayer.shape[0] = h
            inputLayer.shape[1] = w

            // (2) 各レイヤーの出力形状をクリア
            model.modelLayersMap.forEach(n => {
                if (n.outputShape) {
                    n.outputShape = null
                    n.imColsMat = null
                }
            })

            // (3) inputTensorsMap のリセット
            model.resetInputTensors()

            const data = {}
            data[inputLayerName] = ev.data.input

            Promise.resolve(model.predict(data))
                .then(r => {
                    const shape = model.modelLayersMap.get(outputLayerName)
                                                .output.tensor.shape

                    return new KerasJS.Tensor(r[outputLayerName], shape)
                })
                .then(detectLandmarks)
                .then(r => 
                    postMessage({type: ev.data.type, output: r})
                )
                .catch(err => {
                    console.log(err)
                    postMessage({type: ev.data.type, error: err.message})
                })

            break
    }
}

動作確認

(1) 画像サイズ 128x128

f:id:fits:20190506114638p:plain

(2) 画像サイズ 307x307

f:id:fits:20190506114655p:plain

(3) 画像サイズ 100x128

f:id:fits:20190506114713p:plain

(4) 画像サイズ 200x256

f:id:fits:20190506114736p:plain

2019-04-22

SonarAnalyzer.CSharp でサイクロマティック複雑度を算出

C# .NET Metrics

C# ソースファイルのサイクロマティック複雑度（循環的複雑度）を算出するサンプルを SonarC# （SonarAnalyzer.CSharp）の API を利用して作ってみました。

今回、使用した環境は以下の通りです。

SonarC# 7.13
.NET Core SDK 3.0 preview3

ソースは http://github.com/fits/try_samples/tree/master/blog/20190422/

準備

dotnet コマンドを使ってプロジェクトを作成します。

プロジェクトの作成

> dotnet new console

C# のソースを構文解析する必要があるので Microsoft.CodeAnalysis.CSharp パッケージを追加します。

Microsoft.CodeAnalysis.CSharp の追加

> dotnet add package Microsoft.CodeAnalysis.CSharp

次に、SonarAnalyzer.CSharp パッケージを追加しますが、これは IDE（VisualStudio）用パッケージのようなので、単に add package してもプロジェクトで参照できるようにはなりません。（analyzers ディレクトリへ .dll が配置されているため）

そこで、以下のように指定のディレクトリへパッケージを配置し ※、.csproj を編集する事で対応してみました。

 ※ 普通に add package して .nuget/packages ディレクトリへ
    配置された dll のパスを設定する方法も考えられる

SonarAnalyzer.CSharp の追加（pkg ディレクトリへ配置）

> dotnet add package SonarAnalyzer.CSharp --package-directory pkg

上記コマンドで追加された PackageReference 要素をコメントアウトし、代わりに Reference 要素を追加します。（HintPath で SonarAnalyzer.CSharp.dll のパスを指定）

sonar_sample.csproj の編集

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp3.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.CodeAnalysis.CSharp" Version="3.0.0" />
    <!-- 以下をコメントアウト
    <PackageReference Include="SonarAnalyzer.CSharp" Version="7.13.0.8313">
      <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
      <PrivateAssets>all</PrivateAssets>
    </PackageReference>
    -->
    <!-- 以下を追加 -->
    <Reference Include="SonarAnalyzer.CSharp">
      <HintPath>./pkg/sonaranalyzer.csharp/7.13.0.8313/analyzers/SonarAnalyzer.CSharp.dll</HintPath>
    </Reference>
  </ItemGroup>

</Project>

実装

C# のソースコードをパースして MethodDeclarationSyntax を取り出し、CSharpCyclomaticComplexityMetric.GetComplexity メソッドへ渡す事でサイクロマティック複雑度を算出します。

Program.cs

using System;
using System.Linq;
using System.IO;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
using SonarAnalyzer.Metrics.CSharp;

namespace CyclomaticComplexity
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var reader = new StreamReader(args[0]))
            {
                // ソースコードのパース
                var tree = CSharpSyntaxTree.ParseText(reader.ReadToEnd());
                var root = tree.GetCompilationUnitRoot();

                // MethodDeclarationSyntax の取得
                var methods = root.DescendantNodes()
                                    .OfType<MethodDeclarationSyntax>();

                foreach(var m in methods)
                {
                    var c = CSharpCyclomaticComplexityMetric.GetComplexity(m);

                    Console.WriteLine("{0},{1}", m.Identifier, c.Complexity);
                }
            }
        }
    }
}

実行

ビルドして実行してみます。

ビルド

> dotnet build

・・・
ビルドに成功しました。
    0 個の警告
    0 エラー

Program.cs の複雑度を算出してみます。

実行1

> dotnet run Program.cs

Main,2

SonarC# のソースで試してみます。

実行2

> cd ..
> git clone https://github.com/SonarSource/sonar-dotnet.git
・・・

> cd sonar_sample
> dotnet run ../sonar-dotnet/sonaranalyzer-dotnet/src/SonarAnalyzer.CSharp/Metrics/CSharpMetrics.cs

GetCognitiveComplexity,1
GetCyclomaticComplexity,1
IsClass,4
IsCommentTrivia,1
IsDocumentationCommentTrivia,4
IsEndOfFile,1
IsFunction,16
IsNoneToken,1
IsStatement,28

2019-03-31

Keras.js によるランドマーク検出の Web アプリケーション化

Deeplearning JavaScript HTML5 Node.js

前回の「CNN でランドマーク検出」の学習済みモデルを Keras.js を使って Web ブラウザ上で実行できるようにしてみます。

Keras.js 1.0.3

ソースは http://github.com/fits/try_samples/tree/master/blog/20190331/

準備

npm で Keras.js をインストールします。

Keras.js インストール

> npm install --save keras-js

Keras.js に含まれている encoder.py スクリプトを使って、Python の Keras で学習したモデル（model/cnn_landmark_400.h5）を Keras.js 用に変換します。

モデルファイル（HDF5 形式）を Keras.js 用に変換

> python node_modules/keras-js/python/encoder.py model/cnn_landmark_400.h5

生成された .bin ファイル（model/cnn_landmark_400.bin）のパス（URL）を KerasJS.Model へ指定して使う事になります。

ついでに、webpack もインストールしておきます。（webpack コマンドを使うには webpack-cli も必要）

webpack インストール

> npm install --save-dev webpack webpack-cli

Web アプリケーション作成

今回、作成する Web アプリケーションのファイル構成は以下の通りです。

index.html
js/bundle_app.js
js/bundle_worker.js
model/cnn_landmark_400.bin

処理は全て Web ブラウザ上で実行するようにし、Keras.js の処理（今回のランドマーク検出）はそれなりに重いので Web Worker として実行します。

index.html の内容は以下の通りです。

index.html

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <style type="text/css">
        canvas {
            border: 1px solid;
        }

        canvas.dragging {
            border: 3px solid red;
        }

        table {
            text-align: center;
            border-collapse: collapse;
        }

        table th, table td {
            border: 1px solid;
            padding: 8px;
        }
    </style>
</head>
<body>
    <dialog id="load-dialog">loading model ...</dialog>
    <dialog id="detect-dialog">detecting landmarks ...</dialog>
    <dialog id="error-dialog">ERROR</dialog>

    <div>
        <canvas width="256" height="256"></canvas>
    </div>

    <br>

    <div>
        <div id="landmarks"></div>
    </div>

    <script src="./js/bundle_app.js"></script>
</body>
</html>

bundle_xxx.js を生成するため、以下のような webpack 設定ファイルを用意します。

fs: 'empty' の箇所は Keras.js を webpack で処理するために必要な設定で、これが無いと Module not found: Error: Can't resolve 'fs' のようなエラーが出る事になります。

webpack.config.js

module.exports = {
    entry: {
        bundle_app: __dirname + '/src/app.js',
        bundle_worker: __dirname + '/src/worker.js'
    },
    output: {
        path: __dirname + '/js',
        filename: '[name].js',
    },
    // Keras.js 用の設定
    node: {
        fs: 'empty'
    }
}

Web Worker では Actor モデルのようにメッセージパッシング（postMessage で送信、onmessage で受信）を使ってメインの UI 処理とのデータ連携を行います。

今回は、以下のようなメッセージを Web Worker（Keras.js を使った処理）とやり取りするようにします。

Web Worker（Keras.js の処理）とのメッセージ内容

処理	送信メッセージ	受信メッセージ（成功時）	受信メッセージ（エラー時）
初期化	`{type: 'init', url: <モデルファイルのURL>}`	`{type: 'init'}`	`{type: 'init', error: <エラーメッセージ>}`
ランドマーク検出	`{type: 'predict', input: <入力データ>}`	`{type: 'predict', output: <ランドマーク検出結果>}`	`{type: 'predict', error: <エラーメッセージ>}`

(a) ランドマーク検出処理（src/worker.js）

Web Worker として実装するため、postMessage で UI 処理へメッセージを送信し、onmessage でメッセージを受信するようにします。

Keras.js 1.0.3 における Dense 処理の問題

実は、Keras.js 1.0.3 では今回の CNN モデルを正しく処理できません。

というのも、Keras.js 1.0.3 における Dense の処理（GPU を使わない場合）は以下のようになっています。

node_modules/keras-js/lib/layers/core/Dense.js の問題個所

  _callCPU(x) {
    this.output = new _Tensor.default([], [this.units]);

    ・・・
  }

今回の CNN モデルでは Dense の結果が 3次元 (256, 256, 7) になる必要がありますが、上記 Dense 処理では (7) のように 1次元になってしまい正しい結果を得られません。 ※

 ※ ついでに、Keras.js の softmax 処理にも不都合な点があった

そこで、今回は（GPU を使わない事を前提に）Dense の _callCPU を実行時に書き換える事で回避しました。

処理内容としては、元の処理を 2重ループ内で実施するようにしています。

Dense 問題の回避措置（src/worker.js）

import KerasJS from 'keras-js'
import { gemv } from 'ndarray-blas-level2'
import ops from 'ndarray-ops'

・・・

// Dense の _callCPU を実行時に変更
KerasJS.layers.Dense.prototype._callCPU = function(x) {
    const h = x.tensor.shape[0]
    const w = x.tensor.shape[1]

    this.output = new KerasJS.Tensor([], [h, w, this.units])

    for (let i = 0; i < h; i++) {
        for (let j = 0; j < w; j++) {

            const xt = x.tensor.pick(i, j)
            const ot = this.output.tensor.pick(i, j)

            if (this.use_bias) {
                ops.assign(ot, this.weights['bias'].tensor)
            }

            gemv(1, this.weights['kernel'].tensor.transpose(1, 0), xt, 1, ot)

            this.activationFunc({tensor: ot})
        }
    }
}

ランドマーク検出の実装

KerasJS.Model の predict へ入力データを渡したり結果を取り出すにはレイヤー名を指定する必要があり、これらのレイヤー名は iuputLayerNames と outputLayerNames でそれぞれ取得できます。

predict の結果は、各座標のランドマーク該当確率 (256, 256, 7) となるので、ここではランドマーク毎 ※ に最も確率の高かった座標のみを結果として返すようにしています。

 ※ ランドマーク 0 はランドマークに該当しなかった場合なので結果に含めていない

src/worker.js

import KerasJS from 'keras-js'
import { gemv } from 'ndarray-blas-level2'
import ops from 'ndarray-ops'

let model = null

// モデルデータの読み込み
const loadModel = file => {
    const model = new KerasJS.Model({ filepath: file })

    return model.ready().then(r => model)
}

// Keras.js の Dense 問題への対応
KerasJS.layers.Dense.prototype._callCPU = function(x) {
    ・・・
}

// predict の結果を処理（ランドマーク毎に最も確率の高い座標を抽出）
const detectLandmarks = ts => {
    const res = {}

    for (let h = 0; h < ts.tensor.shape[0]; h++) {
        for (let w = 0; w < ts.tensor.shape[1]; w++) {
            const t = ts.tensor.pick(h, w)

            const wrkProb = {landmark: 0, prob: 0, x: w, y: h}

            for (let c = 0; c < t.shape[0]; c++) {
                const prob = t.get(c)

                if (prob > wrkProb.prob) {
                    wrkProb.landmark = c
                    wrkProb.prob = prob
                }
            }
            // ランドマーク 0 （ランドマークでは無い）は除外
            if (wrkProb.landmark > 0) {
                const curProb = res[wrkProb.landmark]

                if (!curProb || curProb.prob < wrkProb.prob) {
                    res[wrkProb.landmark] = wrkProb
                }
            }
        }
    }

    return res
}

// UI 処理からのメッセージ受信
onmessage = ev => {
    switch (ev.data.type) {
        case 'init':
            loadModel(ev.data.url)
                .then(m => {
                    model = m
                    postMessage({type: ev.data.type})
                })
                .catch(err => {
                    console.log(err)
                    postMessage({type: ev.data.type, error: err.message})
                })

            break
        case 'predict':
            const outputLayerName = model.outputLayerNames[0]

            const shape = model.modelLayersMap.get(outputLayerName)
                                                .output.tensor.shape

            const data = {}
            // 入力データの設定
            data[model.inputLayerNames[0]] = ev.data.input

            Promise.resolve(model.predict(data))
                .then(r => new KerasJS.Tensor(r[outputLayerName], shape)) // predict 実行結果の取り出し
                .then(detectLandmarks)
                .then(r => 
                    // UI 処理へ結果送信
                    postMessage({type: ev.data.type, output: r})
                )
                .catch(err => {
                    console.log(err)
                    postMessage({type: ev.data.type, error: err.message})
                })

            break
    }
}

(b) UI 処理（src/app.js）

画像データの変換（入力データの作成）

KerasJS.Model で predict するために、今回のケースでは画像データを 256（高さ）× 256（幅）× 3（RGB） サイズの 1次元配列 Float32Array へ変換する必要があります。

今回は以下のように canvas を利用して変換を行いました。

(1) canvas へドラッグアンドドロップした画像を canvas へ描画
(2) getImageData で canvas から ImageData を取得
(3) ImageData.data の内容を RGB 並びの Float32Array へ変換

ImageData.data は RGBA 並びの 1次元配列 Uint8ClampedArray となっているので、RGB 部分のみを取り出して（A の内容を除外する）Float32Array を生成しています。

ちなみに、今回の CNN モデル自体は画像サイズに依存しない（Fully Convolutional Networks 的な）構成になっています。

そのため、任意サイズの画像を処理する事もできるのですが、現時点の Keras.js ではそんな事を考慮してくれていないので、実現するにはそれなりの工夫が必要になります。（一応、実現は可能でした）

ここでは、単純に canvas へ描画した 256x256 範囲の内容だけ（つまりは固定サイズ）を使うようにしています。※

 ※ この方法では 256x256 以外のサイズで欠けや余白の入り込みが発生する

画像データ変換部分（src/app.js）

・・・
const imageTypes = ['image/jpeg']

const canvas = document.getElementsByTagName('canvas')[0]
const ctx = canvas.getContext('2d')

・・・

// RGBA 並びの Uint8ClampedArray を RGB 並びの Float32Array へ変換
const imgToArray = imgData => new Float32Array(
    imgData.data.reduce(
        (acc, v, i) => {
            // RGBA の A 部分を除外
            if (i % 4 != 3) {
                acc.push(v)
            }
            return acc
        },
        []
     )
)
// 画像の読み込み
const loadImage = url => new Promise(resolve => {
    const img = new Image()

    img.addEventListener('load', () => {
        ctx.clearRect(0, 0, canvas.width, canvas.height)

        // 画像サイズが canvas よりも小さい場合の考慮
        const w = Math.min(img.width, canvas.width)
        const h = Math.min(img.height, canvas.height)

        // canvas へ画像を描画
        ctx.drawImage(img, 0, 0, w, h, 0, 0, w, h)

        // ImageData の取得
        const d = ctx.getImageData(0, 0, canvas.width, canvas.height)

        resolve(imgToArray(d))
    })

    img.src = url
})

・・・

// モデルデータ読み込み完了時の処理
const ready = () => {
    canvas.addEventListener('dragover', ev => {
        ev.preventDefault()
        canvas.classList.add('dragging')
    }, false)

    canvas.addEventListener('dragleave', ev => {
        canvas.classList.remove('dragging')
    }, false)

    // ドロップ時の処理
    canvas.addEventListener('drop', ev => {
        ev.preventDefault()
        canvas.classList.remove('dragging')

        const file = ev.dataTransfer.files[0]

        if (imageTypes.includes(file.type)) {
            ・・・
            const reader = new FileReader()

            reader.onload = ev => {
                loadImage(reader.result)
                    .then(img => {
                        ・・・
                    })
            }

            reader.readAsDataURL(file)
        }
    }, false)
}

・・・

Web Worker との連携

Web Worker とメッセージをやり取りし、ランドマークの検出結果を描画する部分の実装です。

Web Worker との連携部分（src/app.js）

const colors = ['rgb(255, 255, 255)', 'rgb(255, 0, 0)', 'rgb(0, 255, 0)', 'rgb(0, 0, 255)', 'rgb(255, 255, 0)', 'rgb(0, 255, 255)', 'rgb(255, 0, 255)']

const radius = 5
const imageTypes = ['image/jpeg']

const modelFile = '../model/cnn_landmark_400.bin'

// Web Worker の作成
const worker = new Worker('./js/bundle_worker.js')

・・・

// 検出したランドマークを canvas へ描画
const drawLandmarks = lms => {
    Object.values(lms).forEach(v => {
        ctx.fillStyle = colors[v.landmark]
        ctx.beginPath()
        ctx.arc(v.x, v.y, radius, 0, Math.PI * 2, false)
        ctx.fill()
    })
}

・・・

// 検出したランドマークの内容を table（HTML）化して表示
const showLandmarksInfo = lms => {
    ・・・

    infoNode.innerHTML = `
      <table>
        <tr>
          <th>landmark</th>
          <th>coordinate</th>
          <th>prob</th>
        </tr>
        ${rowsHtml}
      </table>
    `
}

// モデルデータ読み込み完了後
const ready = () => {
    ・・・
    canvas.addEventListener('drop', ev => {
        ・・・
        if (imageTypes.includes(file.type)) {
            ・・・
            reader.onload = ev => {
                loadImage(reader.result)
                    .then(img => {
                        detectDialog.showModal()
                        // Web Worker へのランドマーク検出指示
                        worker.postMessage({type: 'predict', input: img})
                    })
            }

            reader.readAsDataURL(file)
        }
    }, false)
}

// Web Worker からのメッセージ受信
worker.onmessage = ev => {
    if (ev.data.error) {
        ・・・
    }
    else {
        switch (ev.data.type) {
            case 'init':
                ready()

                loadDialog.close()

                break
            case 'predict':
                const res = ev.data.output

                console.log(res)
                detectDialog.close()

                drawLandmarks(res)
                showLandmarksInfo(res)

                break
        }
    }
}

loadDialog.showModal()
// Web Worker へのモデルデータ読み込み指示
worker.postMessage({type: 'init', url: modelFile})

(c) ビルド

webpack コマンドを実行し、js/bundle_app.js と js/bundle_worker.js を生成します。

webpack によるビルド

> webpack

(d) 動作確認

HTTP サーバーを使って動作確認を行います。今回は http-server を使って実行しました。

http-server 実行

> http-server

Starting up http-server, serving ./
Available on:
  ・・・
  http://127.0.0.1:8080
Hit CTRL-C to stop the server

http://localhost:8080/index.html へ Chrome ※ でアクセスして画像ファイルをドラッグアンドドロップすると以下のような結果となりました。

f:id:fits:20190331192210j:plain

 ※ HTMLDialogElement.showModal() を使っている関係で
    現時点では Chrome でしか動作しませんが、
    dialog 以外の部分（Keras.js の処理等）は
    Firefox でも動作するようになっています

2019-02-17

CNN でランドマーク検出

Python Keras Deeplearning

前回の「CNNで輪郭の検出」で試した手法を工夫し、ランドマーク（特徴点）検出へ適用してみました。

Keras + Tensorflow
Jupyter Notebook

ソースは http://github.com/fits/try_samples/tree/master/blog/20190217/

輪郭の検出では画像をピクセル単位で二値分類（輪郭以外 = 0, 輪郭 = 1）しましたが、今回はこれを多クラス分類（ランドマーク以外 = 0, ランドマーク1 = 1, ランドマーク2 = 2, ・・・）へ変更します。

ちなみに、Deeplearning でランドマーク検出を行うような場合、ランドマークの座標を直接予測するような手法が考えられますが、今回試してみた限りでは納得のいく結果（座標の精度や汎用性など）を出せなくて、代わりに思いついたのが今回の手法となっています。

はじめに

データセット

今回は、DeepFashion: In-shop Clothes Retrieval のランドマーク用データセットから以下の条件を満たすものに限定して使います。

clothes_type の値が 1 （upper-body clothes）
variation_type の値が 1 （normal pose）
landmark_visibility_1 ～ 6 の値が 0（visible）

ランドマークには 6種類（landmark_location_x_1 ～ 6、landmark_location_y_1 ～ 6）の座標を使います。

教師データ

入力データには画像を使うため、データ形状は (<バッチサイズ>, 256, 256, 3) ※ となります。

 ※ (<バッチサイズ>, <高さ>, <幅>, <チャンネル数>)

ラベルデータは landmark_location 1 ～ 6 の値を元に動的に生成します。

ピクセル単位でランドマーク以外（= 0）とランドマーク 1 ～ 6 の多クラス分類を行うため、データ形状は (<バッチサイズ>, 256, 256, 7) とします。

ランドマーク毎に 1ピクセルだけランドマークへ分類しても上手く学習できないので ※、一定の大きさ（範囲）をランドマークへ分類する必要があります。

 ※ 全てをランドマーク以外（= 0）とするようになってしまう

そこで、ランドマーク周辺の一定範囲をランドマークへ分類するとともに、以下の図（中心がランドマーク）のようにランドマークから離れると確率値が下がるように工夫します。

f:id:fits:20190217023108p:plain

学習

学習処理は Jupyter Notebook 上で実行しました。

(1) 入力データの準備

まずは、list_landmarks_inshop.txt ファイルを読み込んで必要なデータを抜き出します。

今回は学習時間の短縮のため、先頭から 100件だけを使用しています。

データ読み込みとフィルタリング

import pandas as pd

df = pd.read_table('list_landmarks_inshop.txt', sep = '\s+', skiprows = 1)

s = 100

dfa = df[(df['clothes_type'] == 1) & (df['variation_type'] == 1) &
         (df['landmark_visibility_1'] == 0) & (df['landmark_visibility_2'] == 0) & 
         (df['landmark_visibility_3'] == 0) & (df['landmark_visibility_4'] == 0) &
         (df['landmark_visibility_5'] == 0) & (df['landmark_visibility_6'] == 0)][:s]

次に、入力データとして使う画像を読み込みます。

入力データ（画像）読み込み

import numpy as np
from keras.preprocessing.image import load_img, img_to_array

imgs = np.array([ img_to_array(load_img(f)) for f in dfa['image_name']])

入力データの形状は以下のようになります。

imgs.shape

(100, 256, 256, 3)

(2) ラベルデータの生成

先述したように landmark_location の値から得られたランドマーク座標の周辺に確率値を設定していきます。

ここでは、確率の構成内容や確率値の設定対象とする周辺座標の取得処理を引数で指定できるようにしてみました。

また、他のランドマークの範囲と重なった場合、今回は単純に上書き（後勝ち）するようにしましたが、確率値の大きい方を選択するか確率値を分配するようにした方が望ましいと思われます。

ラベルデータ作成処理

cols = [f'landmark_location_{t}_{i + 1}' for i in range(6) for t in ['x', 'y'] ]
labels_t = dfa[cols].values.astype(int)

def gen_labels(prob, around_func):
    res = np.zeros(imgs.shape[:-1] + (int(len(cols) / 2) + 1,))
    res[:, :, :, 0] = 1.0
    
    for i in range(len(res)):
        r = res[i]
        
        # ランドマーク毎の設定
        for j in range(0, len(labels_t[i]), 2):
            # ランドマークの座標
            x = labels_t[i, j]
            y = labels_t[i, j + 1]
            
            # ランドマークの分類（1 ～ 6）
            c = int(j / 2) + 1
            
            for k in range(len(prob)):
                p = prob[k]
                
                # （相対的な）周辺座標の取得
                for a in around_func(k):
                    ax = x + a[0]
                    ay = y + a[1]
                    
                    if ax >= 0 and ax < imgs.shape[2] and ay >= 0 and ay < imgs.shape[1]:
                        # 他のランドマークと範囲が重なった場合への対応（設定値のクリア）
                        r[ay, ax, :] = 0.0
                        
                        # ランドマーク c へ該当する確率
                        r[ay, ax, c] = p
                        # ランドマーク以外へ該当する確率
                        r[ay, ax, 0] = 1.0 - p

    return res

今回は以下のような内容でラベルデータを作りました。

ラベルデータ作成

def around_square(n):
    return [(x, y) for x in range(-n, n + 1) for y in range(-n, n + 1) if abs(x) == n or abs(y) == n]

labels = gen_labels([1.0, 1.0, 1.0, 0.8, 0.8, 0.7, 0.7, 0.6, 0.6, 0.5], around_square)

ラベルデータの形状は以下の通りです。

labels.shape

(100, 256, 256, 7)

ラベルデータの内容確認

ランドマーク周辺の値を見てみると以下のようになっており、問題無さそうです。

labels[0, 59, 105:126]

array([[1. , 0. , 0. , 0. , 0. , 0. , 0. ],
       [0.5, 0.5, 0. , 0. , 0. , 0. , 0. ],
       [0.4, 0.6, 0. , 0. , 0. , 0. , 0. ],
       [0.4, 0.6, 0. , 0. , 0. , 0. , 0. ],
       [0.3, 0.7, 0. , 0. , 0. , 0. , 0. ],
       [0.3, 0.7, 0. , 0. , 0. , 0. , 0. ],
       [0.2, 0.8, 0. , 0. , 0. , 0. , 0. ],
       [0.2, 0.8, 0. , 0. , 0. , 0. , 0. ],
       [0. , 1. , 0. , 0. , 0. , 0. , 0. ],
       [0. , 1. , 0. , 0. , 0. , 0. , 0. ],
       [0. , 1. , 0. , 0. , 0. , 0. , 0. ],
       [0. , 1. , 0. , 0. , 0. , 0. , 0. ],
       [0. , 1. , 0. , 0. , 0. , 0. , 0. ],
       [0.2, 0.8, 0. , 0. , 0. , 0. , 0. ],
       [0.2, 0.8, 0. , 0. , 0. , 0. , 0. ],
       [0.3, 0.7, 0. , 0. , 0. , 0. , 0. ],
       [0.3, 0.7, 0. , 0. , 0. , 0. , 0. ],
       [0.4, 0.6, 0. , 0. , 0. , 0. , 0. ],
       [0.4, 0.6, 0. , 0. , 0. , 0. , 0. ],
       [0.5, 0.5, 0. , 0. , 0. , 0. , 0. ],
       [1. , 0. , 0. , 0. , 0. , 0. , 0. ]])

labels[0, 50:71, 149]

array([[1. , 0. , 0. , 0. , 0. , 0. , 0. ],
       [0.5, 0. , 0.5, 0. , 0. , 0. , 0. ],
       [0.4, 0. , 0.6, 0. , 0. , 0. , 0. ],
       [0.4, 0. , 0.6, 0. , 0. , 0. , 0. ],
       [0.3, 0. , 0.7, 0. , 0. , 0. , 0. ],
       [0.3, 0. , 0.7, 0. , 0. , 0. , 0. ],
       [0.2, 0. , 0.8, 0. , 0. , 0. , 0. ],
       [0.2, 0. , 0.8, 0. , 0. , 0. , 0. ],
       [0. , 0. , 1. , 0. , 0. , 0. , 0. ],
       [0. , 0. , 1. , 0. , 0. , 0. , 0. ],
       [0. , 0. , 1. , 0. , 0. , 0. , 0. ],
       [0. , 0. , 1. , 0. , 0. , 0. , 0. ],
       [0. , 0. , 1. , 0. , 0. , 0. , 0. ],
       [0.2, 0. , 0.8, 0. , 0. , 0. , 0. ],
       [0.2, 0. , 0.8, 0. , 0. , 0. , 0. ],
       [0.3, 0. , 0.7, 0. , 0. , 0. , 0. ],
       [0.3, 0. , 0.7, 0. , 0. , 0. , 0. ],
       [0.4, 0. , 0.6, 0. , 0. , 0. , 0. ],
       [0.4, 0. , 0.6, 0. , 0. , 0. , 0. ],
       [0.5, 0. , 0.5, 0. , 0. , 0. , 0. ],
       [1. , 0. , 0. , 0. , 0. , 0. , 0. ]])

これだけだと分かり難いので、単純な可視化を行ってみます。（ランドマークの該当確率をピクセル毎に合計しているだけ）

ラベルデータの可視化処理

matplotlib inline

import matplotlib.pyplot as plt

def imshow_label(index):
    plt.imshow(labels[index, :, :, 1:].sum(axis = -1), cmap = 'gray')

imshow_label(0)

f:id:fits:20190217023149p:plain

imshow_label(1)

f:id:fits:20190217023204p:plain

特に問題は無さそうです。

(3) CNN モデル

前回と同様に Encoder-Decoder の構成を採用し、Encoder・Decoder をそれぞれ 1段階深くしました。（4段階に縮小して拡大）

多クラス分類を行うために、出力層の活性化関数を softmax にして、損失関数を categorical_crossentropy としています。

モデル内容

from keras.models import Model
from keras.layers import Input, Dense, Dropout, UpSampling2D
from keras.layers.convolutional import Conv2D, Conv2DTranspose
from keras.layers.pooling import MaxPool2D
from keras.layers.normalization import BatchNormalization

input = Input(shape = imgs.shape[1:])

x = input

x = BatchNormalization()(x)

x = Conv2D(16, 3, padding='same', activation = 'relu')(x)
x = Conv2D(16, 3, padding='same', activation = 'relu')(x)
x = MaxPool2D()(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = Conv2D(32, 3, padding='same', activation = 'relu')(x)
x = Conv2D(32, 3, padding='same', activation = 'relu')(x)
x = Conv2D(32, 3, padding='same', activation = 'relu')(x)
x = MaxPool2D()(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = Conv2D(64, 3, padding='same', activation = 'relu')(x)
x = Conv2D(64, 3, padding='same', activation = 'relu')(x)
x = Conv2D(64, 3, padding='same', activation = 'relu')(x)
x = MaxPool2D()(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = Conv2D(128, 3, padding='same', activation = 'relu')(x)
x = Conv2D(128, 3, padding='same', activation = 'relu')(x)
x = Conv2D(128, 3, padding='same', activation = 'relu')(x)
x = MaxPool2D()(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = Conv2D(256, 3, padding='same', activation = 'relu')(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = UpSampling2D()(x)
x = Conv2DTranspose(128, 3, padding = 'same', activation = 'relu')(x)
x = Conv2DTranspose(128, 3, padding = 'same', activation = 'relu')(x)
x = Conv2DTranspose(128, 3, padding = 'same', activation = 'relu')(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = UpSampling2D()(x)
x = Conv2DTranspose(64, 3, padding = 'same', activation = 'relu')(x)
x = Conv2DTranspose(64, 3, padding = 'same', activation = 'relu')(x)
x = Conv2DTranspose(64, 3, padding = 'same', activation = 'relu')(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = UpSampling2D()(x)
x = Conv2DTranspose(32, 3, padding = 'same', activation = 'relu')(x)
x = Conv2DTranspose(32, 3, padding = 'same', activation = 'relu')(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = UpSampling2D()(x)
x = Conv2DTranspose(16, 3, padding = 'same', activation = 'relu')(x)
x = Conv2DTranspose(16, 3, padding = 'same', activation = 'relu')(x)

x = Dropout(0.3)(x)

output = Dense(labels.shape[-1], activation = 'softmax')(x)

model = Model(inputs = input, outputs = output)

model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['acc'])

model.summary()

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         (None, 256, 256, 3)       0         
_________________________________________________________________
batch_normalization_46 (Batc (None, 256, 256, 3)       12        
_________________________________________________________________
conv2d_56 (Conv2D)           (None, 256, 256, 16)      448       
_________________________________________________________________
conv2d_57 (Conv2D)           (None, 256, 256, 16)      2320      
_________________________________________________________________
max_pooling2d_20 (MaxPooling (None, 128, 128, 16)      0         
_________________________________________________________________
batch_normalization_47 (Batc (None, 128, 128, 16)      64        
_________________________________________________________________
dropout_46 (Dropout)         (None, 128, 128, 16)      0         
_________________________________________________________________
conv2d_58 (Conv2D)           (None, 128, 128, 32)      4640      
_________________________________________________________________
conv2d_59 (Conv2D)           (None, 128, 128, 32)      9248      
_________________________________________________________________
conv2d_60 (Conv2D)           (None, 128, 128, 32)      9248      
_________________________________________________________________
max_pooling2d_21 (MaxPooling (None, 64, 64, 32)        0         
_________________________________________________________________
batch_normalization_48 (Batc (None, 64, 64, 32)        128       
_________________________________________________________________
dropout_47 (Dropout)         (None, 64, 64, 32)        0         
_________________________________________________________________
conv2d_61 (Conv2D)           (None, 64, 64, 64)        18496     
_________________________________________________________________
conv2d_62 (Conv2D)           (None, 64, 64, 64)        36928     
_________________________________________________________________
conv2d_63 (Conv2D)           (None, 64, 64, 64)        36928     
_________________________________________________________________
max_pooling2d_22 (MaxPooling (None, 32, 32, 64)        0         
_________________________________________________________________
batch_normalization_49 (Batc (None, 32, 32, 64)        256       
_________________________________________________________________
dropout_48 (Dropout)         (None, 32, 32, 64)        0         
_________________________________________________________________
conv2d_64 (Conv2D)           (None, 32, 32, 128)       73856     
_________________________________________________________________
conv2d_65 (Conv2D)           (None, 32, 32, 128)       147584    
_________________________________________________________________
conv2d_66 (Conv2D)           (None, 32, 32, 128)       147584    
_________________________________________________________________
max_pooling2d_23 (MaxPooling (None, 16, 16, 128)       0         
_________________________________________________________________
batch_normalization_50 (Batc (None, 16, 16, 128)       512       
_________________________________________________________________
dropout_49 (Dropout)         (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_67 (Conv2D)           (None, 16, 16, 256)       295168    
_________________________________________________________________
batch_normalization_51 (Batc (None, 16, 16, 256)       1024      
_________________________________________________________________
dropout_50 (Dropout)         (None, 16, 16, 256)       0         
_________________________________________________________________
up_sampling2d_20 (UpSampling (None, 32, 32, 256)       0         
_________________________________________________________________
conv2d_transpose_44 (Conv2DT (None, 32, 32, 128)       295040    
_________________________________________________________________
conv2d_transpose_45 (Conv2DT (None, 32, 32, 128)       147584    
_________________________________________________________________
conv2d_transpose_46 (Conv2DT (None, 32, 32, 128)       147584    
_________________________________________________________________
batch_normalization_52 (Batc (None, 32, 32, 128)       512       
_________________________________________________________________
dropout_51 (Dropout)         (None, 32, 32, 128)       0         
_________________________________________________________________
up_sampling2d_21 (UpSampling (None, 64, 64, 128)       0         
_________________________________________________________________
conv2d_transpose_47 (Conv2DT (None, 64, 64, 64)        73792     
_________________________________________________________________
conv2d_transpose_48 (Conv2DT (None, 64, 64, 64)        36928     
_________________________________________________________________
conv2d_transpose_49 (Conv2DT (None, 64, 64, 64)        36928     
_________________________________________________________________
batch_normalization_53 (Batc (None, 64, 64, 64)        256       
_________________________________________________________________
dropout_52 (Dropout)         (None, 64, 64, 64)        0         
_________________________________________________________________
up_sampling2d_22 (UpSampling (None, 128, 128, 64)      0         
_________________________________________________________________
conv2d_transpose_50 (Conv2DT (None, 128, 128, 32)      18464     
_________________________________________________________________
conv2d_transpose_51 (Conv2DT (None, 128, 128, 32)      9248      
_________________________________________________________________
batch_normalization_54 (Batc (None, 128, 128, 32)      128       
_________________________________________________________________
dropout_53 (Dropout)         (None, 128, 128, 32)      0         
_________________________________________________________________
up_sampling2d_23 (UpSampling (None, 256, 256, 32)      0         
_________________________________________________________________
conv2d_transpose_52 (Conv2DT (None, 256, 256, 16)      4624      
_________________________________________________________________
conv2d_transpose_53 (Conv2DT (None, 256, 256, 16)      2320      
_________________________________________________________________
dropout_54 (Dropout)         (None, 256, 256, 16)      0         
_________________________________________________________________
dense_13 (Dense)             (None, 256, 256, 7)       119       
=================================================================
Total params: 1,557,971
Trainable params: 1,556,525
Non-trainable params: 1,446
_________________________________________________________________

(4) 学習

教師データ 100件では少なすぎると思いますが、今回はその中の 80件のみ学習に使用して 20件を検証に使ってみます。（validation_split で指定）

ここで、ランドマークとそれ以外でデータ数に大きな偏りがあるため（ランドマーク以外が大多数）、そのままでは上手く学習できない恐れがあります。

以下では class_weight を使ってランドマーク分類の重みを大きくしています。

実行例（351 ～ 400 エポック）

# 分類毎の重みを定義（ランドマークは 256*256 に設定）
wg = np.ones(labels.shape[-1]) * (imgs.shape[1] * imgs.shape[2])
# ランドマーク以外（= 0）の重み設定
wg[0] = 1

hist = model.fit(imgs, labels, initial_epoch = 350, epochs = 400, batch_size = 10, class_weight = wg, validation_split = 0.2)

結果例

Train on 80 samples, validate on 20 samples
Epoch 351/400
80/80 [===・・・ - loss: 0.0261 - acc: 0.9924 - val_loss: 0.1644 - val_acc: 0.9782
Epoch 352/400
80/80 [===・・・ - loss: 0.0263 - acc: 0.9924 - val_loss: 0.1638 - val_acc: 0.9784
・・・
Epoch 399/400
80/80 [===・・・ - loss: 0.0255 - acc: 0.9930 - val_loss: 0.1719 - val_acc: 0.9775
Epoch 400/400
80/80 [===・・・ - loss: 0.0253 - acc: 0.9931 - val_loss: 0.1720 - val_acc: 0.9777

(5) 確認

fit の戻り値から学習・検証の loss と acc の値をそれぞれグラフ化してみます。

fit 結果表示

%matplotlib inline

import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (16, 4)

plt.subplot(1, 4, 1)
plt.plot(hist.history['loss'])

plt.subplot(1, 4, 2)
plt.plot(hist.history['acc'])

plt.subplot(1, 4, 3)
plt.plot(hist.history['val_loss'])

plt.subplot(1, 4, 4)
plt.plot(hist.history['val_acc'])

結果例（351 ～ 400 エポック）

f:id:fits:20190217023309p:plain

val_loss と val_acc の値が良くないのは、データ量が少なすぎる点にあると考えています。

(6) ランドマーク検出

下記 4種類の画像を出力して、ランドマーク検出（predict）結果とラベルデータ（正解）を比較してみます。

(a) ラベルデータの分類（ピクセル毎に確率値が最大の分類で色分け）
(b) 予測結果（predict）の分類（ピクセル毎に確率値が最大の分類で色分け）
(c) 元画像と (b) の重ね合わせ
(d) ランドマークの描画（各ランドマークの確率値が最大の座標へ円を描画）

今回はランドマークは分類毎に 1点のみなので、確率が最大値の座標がランドマークと判断できます。

ランドマーク検出と結果出力

import cv2

colors = [(255, 255, 255), (255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0), (0, 255, 255), (255, 0, 255), (255, 165, 0), (210, 180, 140)]

def predict(index, n = 0, c_size = 5, s = 5.0):
    plt.rcParams['figure.figsize'] = (s * 4, s)
    
    img = imgs[index]

    # 予測結果（ランドマーク分類結果）
    p = model.predict(np.array([img]))[0]

    # (a) ラベルデータの分類（ピクセル毎に確率値が最大の分類で色分け）
    img1 = np.apply_along_axis(lambda x: colors[x.argmax()], -1, labels[index])
    # (b) 予測結果の分類（ピクセル毎に確率値が最大の分類で色分け）
    img2 = np.apply_along_axis(lambda x: colors[x.argmax()], -1, p)
    # (c) 元画像への重ね合わせ
    img3 = cv2.addWeighted(img.astype(int), 0.4, img2, 0.6, 0)
    
    plt.subplot(1, 4, 1)
    plt.imshow(img1)
    
    plt.subplot(1, 4, 2)
    plt.imshow(img2)
    
    plt.subplot(1, 4, 3)
    plt.imshow(img3)

    img4 = img.astype(int)

    pdf = pd.DataFrame(
        [[np.argmax(vx), x, y, np.max(vx)] for y, vy in enumerate(p) for x, vx in enumerate(vy)], 
        columns = ['landmark', 'x', 'y', 'prob']
    )
    
    for c, v in pdf[pdf['landmark'] > 0].sort_values('prob', ascending = False).groupby('landmark'):
        # (d) ランドマークを描画（確率値が最大の座標へ円を描画）
        img4 = cv2.circle(img4, tuple(v[['x', 'y']].values[0]), c_size, colors[c], -1)
        
        if n > 0:
            print(f"landmark {c} : x = {labels_t[index, (c - 1) * 2]}, {labels_t[index, (c - 1) * 2 + 1]}")
            print(v[:n])

    plt.subplot(1, 4, 4)
    plt.imshow(img4)

学習データの結果例

左から (a) ラベルデータの分類、(b) 予測結果の分類、(c) 元画像との重ね合わせ、(d) ランドマーク検出結果となっています。

f:id:fits:20190217035001p:plain f:id:fits:20190217035116p:plain

ラベルデータにかなり近い結果が出ているように見えます。

下記のように、ランドマーク毎の確率値 TOP 3 とラベルデータを数値で比較してみると、かなり近い値になっている事を確認できました。

predict(0, n = 3)

landmark 1 : x = 115, 59
       landmark    x   y      prob
15475         1  115  60  0.893763
15476         1  116  60  0.893605
15220         1  116  59  0.893044

landmark 2 : x = 149, 60
       landmark    x   y      prob
15510         2  150  60  0.878173
15766         2  150  61  0.872413
15509         2  149  60  0.872222

landmark 3 : x = 82, 153
       landmark   x    y      prob
39250         3  82  153  0.882741
39249         3  81  153  0.881362
39248         3  80  153  0.879979

landmark 4 : x = 185, 150
       landmark    x    y      prob
38841         4  185  151  0.836826
38585         4  185  150  0.836212
38840         4  184  151  0.836164

landmark 5 : x = 93, 198
       landmark   x    y      prob
50782         5  94  198  0.829380
50526         5  94  197  0.825815
51038         5  94  199  0.825342

landmark 6 : x = 171, 197
       landmark    x    y      prob
50602         6  170  197  0.881702
50603         6  171  197  0.880731
50858         6  170  198  0.877772

predict(40, n = 3)

landmark 1 : x = 120, 42
      landmark    x   y      prob
8820         1  116  34  0.568582
9075         1  115  35  0.566257
9074         1  114  35  0.561259

landmark 2 : x = 134, 40
       landmark    x   y      prob
10372         2  132  40  0.812515
10371         2  131  40  0.807980
10628         2  132  41  0.807899

landmark 3 : x = 109, 48
       landmark    x   y      prob
12652         3  108  49  0.839624
12653         3  109  49  0.838190
12396         3  108  48  0.837235

landmark 4 : x = 148, 43
       landmark    x   y      prob
11156         4  148  43  0.837879
10900         4  148  42  0.837810
11157         4  149  43  0.836910

landmark 5 : x = 107, 176
       landmark    x    y      prob
45164         5  108  176  0.845494
45420         5  108  177  0.841054
45163         5  107  176  0.839846

landmark 6 : x = 154, 182
       landmark    x    y      prob
46746         6  154  182  0.865920
46747         6  155  182  0.863970
46490         6  154  181  0.862724

なお、predict(40) におけるランドマーク 1（赤色）の結果が振るわないのは、ラベルデータの作り方の問題だと考えられます。（上書きでは無く確率値が大きい方を採用する等で改善するはず）

検証データの結果例

f:id:fits:20190217035136p:plain f:id:fits:20190217035148p:plain

当然ながら、学習に使っていないこちらのデータでは結果が悪化していますが、それなりに正しそうな位置を部分的に検出しているように見えます。

学習に使ったデータ量の少なさを考えると、かなり良好な結果が出ているようにも思います。

そもそも、predict(-3) のようなランドマークの左右が反転している背面からの画像なんてのは無理があるように思いますし、predict(-8) のランドマーク 5（水色）はラベルデータの方が間違っている（検出結果の方が正しい）ような気もします。

predict(-1, n = 3)

landmark 1 : x = 96, 60
       landmark   x   y      prob
15969         1  97  62  0.872259
16225         1  97  63  0.869837
15970         1  98  62  0.869681

landmark 2 : x = 126, 59
       landmark    x   y      prob
16254         2  126  63  0.866628
16255         2  127  63  0.865502
15998         2  126  62  0.864939

landmark 3 : x = 66, 125
       landmark   x    y      prob
30521         3  57  119  0.832024
30520         3  56  119  0.831721
30777         3  57  120  0.829537

landmark 4 : x = 157, 117
       landmark    x    y      prob
29099         4  171  113  0.814012
29098         4  170  113  0.813680
28843         4  171  112  0.812420

predict(-8, n = 3)

landmark 1 : x = 133, 40
       landmark    x   y      prob
10629         1  133  41  0.812287
10628         1  132  41  0.810564
10373         1  133  40  0.808298

landmark 2 : x = 157, 47
       landmark    x   y      prob
12704         2  160  49  0.767413
12448         2  160  48  0.764577
12703         2  159  49  0.762571

landmark 3 : x = 105, 77
       landmark    x   y     prob
19300         3  100  75  0.79014
19301         3  101  75  0.78945
19556         3  100  76  0.78496

landmark 4 : x = 181, 86
       landmark    x    y      prob
56242         4  178  219  0.768471
55986         4  178  218  0.768215
56243         4  179  219  0.766977

landmark 5 : x = 137, 211
       landmark    x    y      prob
54370         5   98  212  0.710897
54626         5   98  213  0.707652
54372         5  100  212  0.707127

2019-01-14

CNN で輪郭の検出

Python Keras Deeplearning

画像内の物体の輪郭検出を CNN（畳み込みニューラルネット）で試してみました。

Keras + Tensorflow
Jupyter Notebook

ソースは http://github.com/fits/try_samples/tree/master/blog/20190114/

概要

今回は、画像をピクセル単位で輪郭か否かに分類する事（輪郭 = 1, 輪郭以外 = 0）で輪郭を検出できないか試しました。

そこで、教師データとして以下のような衣服単体の画像（jpg）と衣服の輪郭部分だけを白く塗りつぶした画像（png）を用意しました。

f:id:fits:20190114225837j:plain

教師データを大量に用意するのは困難だったため、240x288 の画像 160 ファイルで学習を行っています。

学習

学習の処理は Jupyter Notebook 上で実行しました。

(1) 入力データの準備

まずは、入力画像（jpg）を読み込みます。（教師データの画像は img ディレクトリへ配置しています）

import glob
import numpy as np
from keras.preprocessing.image import load_img, img_to_array

files = glob.glob('img/*.jpg')

imgs = np.array([img_to_array(load_img(f)) for f in files])

imgs.shape

入力データの形状は以下の通りです。

imgs.shape 結果

(160, 288, 240, 3)

(2) ラベルデータの準備

輪郭画像（png）を読み込み、128 を境にして二値化（輪郭 = 1、輪郭以外 = 0）します。

import os

th = 128

labels = np.array([img_to_array(load_img(f"{os.path.splitext(f)[0]}.png", color_mode = 'grayscale')) for f in files])

labels[labels < th] = 0
labels[labels >= th] = 1

labels.shape

ラベルデータの形状は以下の通りです。

labels.shape 結果

(160, 288, 240, 1)

(3) CNN モデル

どのようなネットワーク構成が適しているのか分からなかったので、セマンティックセグメンテーション等で用いられている Encoder-Decoder の構成を参考にしてみました。

30x36 まで段階的に縮小して（Encoder）、元の大きさ 240x288 まで段階的に拡大する（Decoder）ようにしています。

最終層の活性化関数に sigmoid を使って 0 ～ 1 の値となるようにしています。

損失関数は binary_crossentropy を使うと進捗が遅そうに見えたので ※、代わりに mean_squared_error を使っています。

 ※ 今回の場合、輪郭（= 1）に該当するピクセルの方が少なくなるため
    binary_crossentropy を使用する場合は fit の class_weight 引数で
    調整する必要があったと思われます

また、参考のため mean_absolute_error（mae）の値も出力するように metrics で指定しています。

from keras.models import Model
from keras.layers import Input, Dropout
from keras.layers.convolutional import Conv2D, Conv2DTranspose
from keras.layers.pooling import MaxPool2D
from keras.layers.normalization import BatchNormalization

input = Input(shape = imgs.shape[1:])

x = input

x = BatchNormalization()(x)

# Encoder

x = Conv2D(16, 3, padding='same', activation = 'relu')(x)
x = MaxPool2D()(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = Conv2D(32, 3, padding='same', activation = 'relu')(x)
x = Conv2D(32, 3, padding='same', activation = 'relu')(x)
x = Conv2D(32, 3, padding='same', activation = 'relu')(x)
x = MaxPool2D()(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = Conv2D(64, 3, padding='same', activation = 'relu')(x)
x = Conv2D(64, 3, padding='same', activation = 'relu')(x)
x = Conv2D(64, 3, padding='same', activation = 'relu')(x)
x = MaxPool2D()(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = Conv2D(128, 3, padding='same', activation = 'relu')(x)
x = Conv2D(128, 3, padding='same', activation = 'relu')(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

# Decoder

x = Conv2DTranspose(64, 3, strides = 2, padding='same', activation = 'relu')(x)
x = Conv2D(64, 3, padding='same', activation = 'relu')(x)
x = Conv2D(64, 3, padding='same', activation = 'relu')(x)
x = Conv2D(64, 3, padding='same', activation = 'relu')(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = Conv2DTranspose(32, 3, strides = 2, padding='same', activation = 'relu')(x)
x = Conv2D(32, 3, padding='same', activation = 'relu')(x)
x = Conv2D(32, 3, padding='same', activation = 'relu')(x)
x = Conv2D(32, 3, padding='same', activation = 'relu')(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

x = Conv2DTranspose(16, 3, strides = 2, padding='same', activation = 'relu')(x)
x = Conv2D(16, 3, padding='same', activation = 'relu')(x)

x = BatchNormalization()(x)
x = Dropout(0.3)(x)

output = Conv2D(1, 1, activation = 'sigmoid')(x)

model = Model(inputs = input, outputs = output)

model.compile(loss = 'mse', optimizer = 'adam', metrics = ['mae'])

model.summary()

model.summary() 結果

Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 288, 240, 3)       0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 288, 240, 3)       12        
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 288, 240, 16)      448       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 144, 120, 16)      0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 144, 120, 16)      64        
_________________________________________________________________
dropout_1 (Dropout)          (None, 144, 120, 16)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 144, 120, 32)      4640      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 144, 120, 32)      9248      
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 144, 120, 32)      9248      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 72, 60, 32)        0         
_________________________________________________________________
batch_normalization_3 (Batch (None, 72, 60, 32)        128       
_________________________________________________________________
dropout_2 (Dropout)          (None, 72, 60, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 72, 60, 64)        18496     
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 72, 60, 64)        36928     
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 72, 60, 64)        36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 36, 30, 64)        0         
_________________________________________________________________
batch_normalization_4 (Batch (None, 36, 30, 64)        256       
_________________________________________________________________
dropout_3 (Dropout)          (None, 36, 30, 64)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 36, 30, 128)       73856     
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 36, 30, 128)       147584    
_________________________________________________________________
batch_normalization_5 (Batch (None, 36, 30, 128)       512       
_________________________________________________________________
dropout_4 (Dropout)          (None, 36, 30, 128)       0         
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 72, 60, 64)        73792     
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 72, 60, 64)        36928     
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 72, 60, 64)        36928     
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 72, 60, 64)        36928     
_________________________________________________________________
batch_normalization_6 (Batch (None, 72, 60, 64)        256       
_________________________________________________________________
dropout_5 (Dropout)          (None, 72, 60, 64)        0         
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 144, 120, 32)      18464     
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 144, 120, 32)      9248      
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 144, 120, 32)      9248      
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 144, 120, 32)      9248      
_________________________________________________________________
batch_normalization_7 (Batch (None, 144, 120, 32)      128       
_________________________________________________________________
dropout_6 (Dropout)          (None, 144, 120, 32)      0         
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 288, 240, 16)      4624      
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 288, 240, 16)      2320      
_________________________________________________________________
batch_normalization_8 (Batch (None, 288, 240, 16)      64        
_________________________________________________________________
dropout_7 (Dropout)          (None, 288, 240, 16)      0         
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 288, 240, 1)       17        
=================================================================
Total params: 576,541
Trainable params: 575,831
Non-trainable params: 710
_________________________________________________________________

(4) 学習

教師データが少ないため、全て学習で使う事にします。

実行例（441 ～ 480 エポック）

hist = model.fit(imgs, labels, initial_epoch = 440, epochs = 480, batch_size = 10)

Keras では fit を繰り返し呼び出すと学習を（続きから）再開できるので、40 エポックを何回か繰り返しました。（バッチサイズは 20 で始めて途中で 10 へ変えたりしています）

その場合、正しいエポックを出力するには initial_epoch と epochs の値を調整する必要があります ※。

 ※ initial_epoch を指定しなくても
    fit を繰り返し実行するだけで学習は継続されますが、
    その場合は出力されるエポックの値がクリアされます（1 からのカウントとなる）

結果例

Epoch 441/480
160/160 [=====・・・ - loss: 0.0048 - mean_absolute_error: 0.0126
Epoch 442/480
160/160 [=====・・・ - loss: 0.0048 - mean_absolute_error: 0.0125
Epoch 443/480
160/160 [=====・・・ - loss: 0.0048 - mean_absolute_error: 0.0126
・・・
Epoch 478/480
160/160 [=====・・・ - loss: 0.0045 - mean_absolute_error: 0.0116
Epoch 479/480
160/160 [=====・・・ - loss: 0.0046 - mean_absolute_error: 0.0117
Epoch 480/480
160/160 [=====・・・ - loss: 0.0044 - mean_absolute_error: 0.0115

(5) 確認

fit の戻り値から mean_squared_error（loss）と mean_absolute_error の値の遷移をグラフ化してみます。

%matplotlib inline

import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (8, 4)

plt.subplot(1, 2, 1)
plt.plot(hist.history['loss'])

plt.subplot(1, 2, 2)
plt.plot(hist.history['mean_absolute_error'])

結果例（441 ～ 480 エポック）

f:id:fits:20190114231037p:plain

(6) 検証

(a) 教師データ

教師データの入力画像と輪郭画像（ラベルデータ）、model.predict の結果を並べて表示してみます。

def predict(index, s = 6.0):
    plt.rcParams['figure.figsize'] = (s, s)

    sh = imgs.shape[1:-1]

    # 輪郭の検出（予測処理）
    pred = model.predict(np.array([imgs[index]]))[0]
    pred *= 255

    plt.subplot(1, 3, 1)
    # 入力画像の表示
    plt.imshow(imgs[index].astype(int))

    plt.subplot(1, 3, 2)
    # 輪郭画像（ラベルデータ）の表示
    plt.imshow(labels[index].reshape(sh), cmap = 'gray')

    plt.subplot(1, 3, 3)
    # predict の結果表示
    plt.imshow(pred.reshape(sh).astype(int), cmap = 'gray')

結果例（480 エポック）：入力画像, 輪郭画像, model.predict 結果

f:id:fits:20190114231107j:plain

概ね教師データに近い結果が出るようになっています。

(b) 教師データ以外

教師データとして使っていない画像に対して model.predict を実施し、輪郭の検出を行ってみます。

def predict_eval(file, s = 4.0):
    plt.rcParams['figure.figsize'] = (s, s)

    img = img_to_array(load_img(file))

    # 輪郭の検出（予測処理）
    pred = model.predict(np.array([img]))[0]
    pred *= 255

    plt.subplot(1, 2, 1)
    # 入力画像の表示
    plt.imshow(img.astype(int))

    plt.subplot(1, 2, 2)
    # predict の結果表示
    plt.imshow(pred.reshape(pred.shape[:-1]).astype(int), cmap = 'gray')

結果例（480 エポック）：入力画像, model.predict 結果

f:id:fits:20190114231154j:plain

所々で途切れたりしていますが、ある程度の輪郭は検出できているように見えます。

(7) 保存

学習したモデルを保存します。

model.save('model/c1_480.h5')

輪郭検出

学習済みモデルを使って輪郭検出を行う処理をスクリプト化してみました。

predict_contours.py

import sys
import os
import glob
import numpy as np
from keras.preprocessing.image import load_img, img_to_array
from keras.models import load_model
import cv2

model_file = sys.argv[1]
img_files = sys.argv[2]
dest_dir = sys.argv[3]

model = load_model(model_file)

for f in glob.glob(img_files):
    img = img_to_array(load_img(f))

    # 輪郭の検出
    pred = model.predict(np.array([img]))[0]
    pred *= 255

    file, ext = os.path.splitext(os.path.basename(f))

    # 画像の保存
    cv2.imwrite(f"{dest_dir}/{file}_predict.png", pred)

    print(f"done: {f}")

実行例

python predict_contours.py model/c1_480.h5 img_eval2/*.jpg result

480 エポックの学習モデルを使って、教師データに無いタイプの背景を使った画像（影の影響もある）に試してみました。

輪郭検出結果例

入力画像	処理結果

こちらは難しかったようです。

なお、2つ目の画像は 120 エポックの学習モデルの方が良好な結果（輪郭がより多く検出されていた）でした。

2018-12-10

MongoDB で条件に合致する子要素を抽出

MongoDB

MongoDB で指定の条件に合致する子要素のみを抽出する方法を調査してみました。

MongoDB 4.0.4

はじめに、下記 3つのドキュメントが sample コレクションへ登録されているとします。

ドキュメント内容

{ "_id" : 1, "items" : [
    { "color" : "black", "size" : "S" }, 
    { "color" : "white", "size" : "S" }
] }

{ "_id" : 2, "items" : [
    { "color" : "red",   "size" : "L" }, 
    { "color" : "blue",  "size" : "S" }
] }

{ "_id" : 3, "items" : [
    { "color" : "white", "size" : "L" }, 
    { "color" : "red",   "size" : "L" }, 
    { "color" : "white", "size" : "S" }
] }

ここで、items.color が white のものだけを抽出し、以下の結果を得る事を目指してみます。（items の中身が white のものだけを含むようにする）

目標とする検索結果

{ "_id" : 1, "items" : [
    { "color" : "white", "size" : "S" }
] }

{ "_id" : 3, "items" : [
    { "color" : "white", "size" : "L" }, 
    { "color" : "white", "size" : "S" }
] }

(a) items.color で条件指定

まずは {"items.color": "white"} の条件で find した結果です。

white を持つドキュメントだけを抽出できましたが、ドキュメントの内容はそのままなので black 等の余計なものも含んでしまいます。

> db.sample.find({"items.color": "white"})

{ "_id" : 1, "items" : [ { "color" : "black", "size" : "S" }, { "color" : "white", "size" : "S" } ] }
{ "_id" : 3, "items" : [ { "color" : "white", "size" : "L" }, { "color" : "red", "size" : "L" }, { "color" : "white", "size" : "S" } ] }

(b) $elemMatch 使用

次に $elemMatch を使ってみます。

$elemMatch を find の query（第一引数）で使うか、projection（第二引数）で使うかで結果が変わります。

query で使う場合は先程の (a) と同じ結果になります。

query で使用

> db.sample.find({"items": {$elemMatch: {"color": "white"}}})

{ "_id" : 1, "items" : [ { "color" : "black", "size" : "S" }, { "color" : "white", "size" : "S" } ] }
{ "_id" : 3, "items" : [ { "color" : "white", "size" : "L" }, { "color" : "red", "size" : "L" }, { "color" : "white", "size" : "S" } ] }

query の条件は指定せずに projection で $elemMatch を使った場合の結果は以下です。

全ドキュメントを対象に items の中身がフィルタリングされていますが、条件に合致する全ての子要素が抽出されるわけでは無く、（条件に合致する）先頭の要素しか含まれていません。

projection で使用1

> db.sample.find({}, {"items": {$elemMatch: {"color": "white"}}})

{ "_id" : 1, "items" : [ { "color" : "white", "size" : "S" } ] }
{ "_id" : 2 }
{ "_id" : 3, "items" : [ { "color" : "white", "size" : "L" } ] }

query 条件を指定する事で不要なドキュメントを除く事はできますが、条件に合致する全ての子要素が抽出されない事に変わりはありません。

projection で使用2

> db.sample.find({"items.color": "white"}, {"items": {$elemMatch: {"color": "white"}}})

{ "_id" : 1, "items" : [ { "color" : "white", "size" : "S" } ] }
{ "_id" : 3, "items" : [ { "color" : "white", "size" : "L" } ] }

このように、今回のようなドキュメントに対して $elemMatch を使うと、条件に合致する最初の子要素だけが抽出されるようです。（何らかの回避策があるのかもしれませんが）

(c) aggregate 使用

最後に aggregate を使ってみます。

$unwind を使うと配列の個々の要素を処理できるので、$match で white のみに限定した後、$group でグルーピングすれば良さそうです。

> db.sample.aggregate([
  {$unwind: "$items"}, 
  {$match: {"items.color": "white"}}, 
  {$group: {_id: "$_id", "items": {$push: "$items"}}}
])

{ "_id" : 3, "items" : [ { "color" : "white", "size" : "L" }, { "color" : "white", "size" : "S" } ] }
{ "_id" : 1, "items" : [ { "color" : "white", "size" : "S" } ] }

これで目指した結果は一応得られました。

なお、対象を最初に絞り込むようにしてソートを付けると以下のようになります。

> db.sample.aggregate([
  {$match: {"items.color": "white"}},
  {$unwind: "$items"},
  {$match: {"items.color": "white"}},
  {$group: {_id: "$_id", "items": {$push: "$items"}}},
  {$sort:  {_id: 1}}
])

{ "_id" : 1, "items" : [ { "color" : "white", "size" : "S" } ] }
{ "_id" : 3, "items" : [ { "color" : "white", "size" : "L" }, { "color" : "white", "size" : "S" } ] }