2014-07-21

Roy による関数合成

Node.js JavaScript

Groovy, Scala, F#, Haskell による関数・クロージャの合成や Java SE 8 で関数合成で実施したものと同様の関数合成を Roy で試してみました。

Roy は JavaScript へコンパイルできる関数型のプログラミング言語です。

Roy 0.2.2

ソースは http://github.com/fits/try_samples/tree/master/blog/20140721/

はじめに

Node.js で Roy を実行するには roy を npm install するだけです。

Roy のインストール例

> npm install roy

roy@0.2.2 node_modules\roy
├── unicode-categories@0.9.1
├── underscore@1.2.0
├── source-map@0.1.8 (amdefine@0.1.0)
└── escodegen@0.0.22 (estraverse@0.0.4, esprima@1.0.4)

これでカレントディレクトリの node_modules へ roy がインストールされました。

Underscore.js を使った関数合成

Roy で関数合成を行うには Underscore.js の compose を使えばよいようです。（roy/lib/prelude.roy に compose 関数が定義されているもののコメントアウトされているため）

sample.roy

let _ = require 'underscore'

let plus n = n + 3
let times n = n * 2

let f1 = _.compose times plus
let f2 = _.compose plus times

// times(plus(4)) = 14
console.log(f1 4)
// plus(times(4)) = 11
console.log(f2 4)

Roy のスクリプトファイルを直接実行するには -r オプションを指定して実行します。

実行結果

> roy -r sample.roy

14
11

オプションを指定せずに実行すると Roy スクリプトを JavaScript へコンパイルし、.js ファイルが生成されます。

コンパイル実行例

> roy sample.roy

sample.js （生成されたファイル）

var _ = require('underscore');
var plus = function (n) {
    return n + 3;
};
var times = function (n) {
    return n * 2;
};
var f1 = _.compose(times, plus);
var f2 = _.compose(plus, times);
// times(plus(4)) = 14
console.log(f1(4));
// plus(times(4)) = 11
console.log(f2(4));//@ sourceMappingURL=sample.js.map

自前で compose 関数を定義して関数合成

Underscore.js の代わりに compose 関数を自前で定義するのも簡単です。

sample2.roy

let plus n = n + 3
let times n = n * 2

let compose f g = \x -> f (g x)

let f1 = compose times plus
let f2 = compose plus times

// times(plus(4)) = 14
console.log(f1 4)
// plus(times(4)) = 11
console.log(f2 4)

実行結果

> roy -r sample2.roy

14
11

2014-07-10

Apache Solr を組み込み実行

Java Groovy

オープンソースの全文検索エンジン Apache Solr は、 Servlet として実装されており、通常は jetty 等のサーブレットエンジン（コンテナ）で実行しますが、今回は組み込み実行を試してみました。

Apache Solr 4.9.0

ソースは http://github.com/fits/try_samples/tree/master/blog/20140710/

はじめに

Solr を組み込み実行（サーバーを起動せずに直接処理を実行）するには下記のような方法が考えられます。

(1) EmbeddedSolrServer を使用
(2) SolrCore を使用

(1) は SolrJ の API を使うので高レベル API、(2) は Core API をそのまま使うので低レベル API といったところでしょうか。

(1) EmbeddedSolrServer を使用

EmbeddedSolrServer を使って検索処理を実装するのは簡単です。

今回は、実行時引数（第1引数で Solr のホームディレクトリ、第2引数でコア名）で指定したコアのドキュメントを全件取得するようにしてみました。

search1.groovy

@Grab('org.apache.solr:solr-core:4.9.0')
@Grab('org.slf4j:slf4j-nop:1.7.7')
import org.apache.solr.core.CoreContainer
import org.apache.solr.client.solrj.SolrQuery
import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer

def cores = new CoreContainer(args[0])
cores.load()

def server = new EmbeddedSolrServer(cores, args[1])
// 全件取得のクエリ
def q = new SolrQuery(query: '*:*')
// 検索
def result = server.query(q)
// 結果の出力
result.results.each {
    println it
}

server.shutdown()

なお、EmbeddedSolrServer をインスタンス化している箇所を new HttpSolrServer('http://localhost:8983/solr/collection1') のように書き換えれば、サーブレットエンジン上で実行している Solr サーバーに対して検索を行うようになります。

実行結果は下記の通りです。

実行結果

> groovy search1.groovy /solr-4.9.0/example/solr collection1

[id:GB18030TEST, name:Test with some GB18030 encoded characters, features:[No accents here, ?是一个功能, This is a feature (translated), ??文件是很有光?, This document is very shiny (translated)], price:0.0, price_c:0,USD, inStock:true, _version_:1473119790218346496]
・・・

(2) SolrCore を使用

次に、同様の処理を SolrCore を使って実装してみました。

SolrCore では XXXResponseWriter （BinaryResponseWriter や JSONResponseWriter 等）を使って処理結果をストリームへ出力します。

search2.groovy

@Grab('org.apache.solr:solr-core:4.9.0')
@Grab('org.slf4j:slf4j-nop:1.7.7')
import org.apache.solr.core.CoreContainer
import org.apache.solr.request.LocalSolrQueryRequest
import org.apache.solr.response.SolrQueryResponse
import org.apache.solr.response.BinaryResponseWriter
import org.apache.solr.common.util.JavaBinCodec

def cores = new CoreContainer(args[0])
cores.load()

def core = cores.getCore(args[1])
// 検索ハンドラ取得
def handler = core.getRequestHandler('/query')

def req = new LocalSolrQueryRequest(core, [
    q: ['*:*'] as String[] // 全件取得
])
def res = new SolrQueryResponse()
// 検索
core.execute(handler, req, res)

def writer = new ByteArrayOutputStream()

def rw = new BinaryResponseWriter()
// 検索結果をストリームへ出力
rw.write(writer, req, res)

def resolver = new BinaryResponseWriter.Resolver(req, res.getReturnFields())
def bais = new ByteArrayInputStream(writer.toByteArray())
// バイト配列を Java オブジェクト化 （ここでは SimpleOrderedMap オブジェクトが返ります）
def result = new JavaBinCodec(resolver).unmarshal(bais)
// 結果の出力
result.response.each {
    println it
}

core.close()
cores.shutdown()

Solr に含まれている example/solr の collection1 コアを使う場合は、/query の他に /select ハンドラも利用でき、違いはデフォルト設定値だけのようです。（/query は indent が true に設定されている等）

実行結果は下記の通りです。

実行結果

> groovy search2.groovy /solr-4.9.0/example/solr collection1

[id:GB18030TEST, name:Test with some GB18030 encoded characters, features:[No accents here, ?是一个功能, This is a feature (translated), ??文件是很有光?, This document is very shiny (translated)], price:0.0, price_c:0,USD, inStock:true, _version_:1473119790218346496]
・・・

JSON で結果出力

JSON で検索結果を出力したい場合、BinaryResponseWriter の代わりに JSONResponseWriter を使用します。

search2_json.groovy

・・・
// 検索
core.execute(handler, req, res)

def writer = new StringWriter()

def rw = new JSONResponseWriter()
rw.write(writer, req, res)
// 結果の出力 （JSON）
println writer.toString()
・・・

実行結果

> groovy search2_json.groovy /solr-4.9.0/example/solr collection1

{
  "responseHeader":{
    "status":0,
    "QTime":94,
    "params":{
      "q":"*:*"}},
  "response":{"numFound":32,"start":0,"docs":[
      {
        "id":"GB18030TEST",
        "name":"Test with some GB18030 encoded characters",
        "features":["No accents here",
          "?是一个功能",
          "This is a feature (translated)",
          "??文件是很有光?",
          "This document is very shiny (translated)"],
        "price":0.0,
        "price_c":"0,USD",
        "inStock":true,
        "_version_":1473119790218346496},
  ・・・

2014-06-29

Groovy で Apache Spark を使用

Groovy

Java で Apache Spark を使用や Scala 2.10 で Apache Spark を使用に続き、今回は Groovy で同様の処理を実装してみました。

Apache Spark 1.0

money_count.groovy

@Grab('org.apache.spark:spark-core_2.10:1.0.0')
import org.apache.spark.api.java.JavaSparkContext

def spark = new JavaSparkContext('local', 'MoneyCount')

spark.textFile(args[0]).countByValue().each { k, v ->
    println "$k = $v"
}

今回のように値毎のカウントを取得するだけなら countByValue() を使うと簡単です。ちなみに、countByValue() の結果は Map です。

実行結果

> groovy money_count.groovy input_sample.txt

100 = 2
50 = 1
5 = 3
500 = 1
10 = 2
1 = 2
2000 = 1
1000 = 3
10000 = 2

input_sample.txt の内容

今回のソースは http://github.com/fits/try_samples/tree/master/blog/20140629/

2014-06-22

ジニ不純度の算出3 - Python, R, CoffeeScript

python R coffeescript nodejs

前々回と前回に続き、下記のようなプログラム言語でジニ不純度（ジニ係数）の算出処理を同様に実装してみました。

今回のソースは http://github.com/fits/try_samples/tree/master/blog/20140622/

Python で実装

Python 2.7
IronPython 2.7

Python では itertools モジュールの groupby や combinations 関数が使えます。

groupby は Haskell と同様に隣り合う同じ値をグルーピングできます。（今回のケースでは sorted でソートが必要）

groupby 結果の値部分（グルーピング部分）には直接 len 関数を使えないようなので list 関数でリスト化してから len を適用します。

また、combinations 関数を使用すると Scala の combinations と同様に要素の組み合わせを取得できます。（下記では AB、AC、BC の 3種類）

gini.py

from itertools import *

def size(xs):
    return float(len(xs))

# (a) 1 - (AA + BB + CC)
def giniA(xs):
    return 1 - sum(map(lambda (k, g): (size(list(g)) / size(xs)) ** 2, groupby(sorted(xs))))

def countby(xs):
    return map(lambda (k, v): (k, size(list(v))), groupby(sorted(xs)))

# (b) AB * 2 + AC * 2 + BC * 2
def giniB(xs):
    return sum(map(
        lambda ((xk, xv), (yk, yv)): xv / size(xs) * yv / size(xs) * 2, 
        combinations(countby(xs), 2)
    ))

vlist = ["A", "B", "B", "C", "B", "A"]

print giniA(vlist)
print giniB(vlist)

実行結果

> python gini.py

0.611111111111
0.611111111111

Python 3 で実行するには

Python 3.4 で実行するにはラムダ式と print のところを書き換える必要があります。

Python 3 のラムダ式ではタプル内の複数の要素を個別に引数として取得できないようなので、Python 2 のように lambda (k, v): ・・・ とは書けず、lambda x: ・・・ として個々の要素をインデックスで参照（x[0] 等）する事になります。

gini3.py （Python 3.4）

・・・
# (a) 1 - (AA + BB + CC)
def giniA(xs):
    return 1 - sum(map(lambda x: (size(list(x[1])) / size(xs)) ** 2, groupby(sorted(xs))))
・・・
# (b) AB * 2 + AC * 2 + BC * 2
def giniB(xs):
    return sum(map(
        lambda x: x[0][1] / size(xs) * x[1][1] / size(xs) * 2, 
        combinations(countby(xs), 2)
    ))

vlist = ["A", "B", "B", "C", "B", "A"]

print(giniA(vlist))
print(giniB(vlist))

実行結果（Python 3.4）

> python gini3.py

0.6111111111111112
0.611111111111111

R で実装

R では table 関数で要素毎のカウント値を取得でき、combn 関数で Scala や Python の combinations と同様の組み合わせを行列（matrix）として取得できます。

lapply の結果（リスト）には sum 関数を適用できないようなので Reduce を使って合計しています。

また、apply の第2引数を 2 とすれば列単位にデータを処理できます。

gini.R

# (a) 1 - (AA + BB + CC)
giniA <- function(xs) {
  1 - Reduce("+", lapply(table(xs), function(x) (x / length(xs)) ^ 2))
}

# (b) AB * 2 + AC * 2 + BC * 2
giniB <- function(xs) {
  sum(apply(combn(table(xs), 2), 2, function(x) (x[1] / length(xs)) * (x[2] / length(xs)) * 2))
}

list <- c("A", "B", "B", "C", "B", "A")

giniA(list)
giniB(list)

実行結果

・・・
> giniA(list)
[1] 0.6111111

> giniB(list)
[1] 0.6111111

備考

各処理の結果は下記のようになります。

table(list) の結果

> table(list)

list
A B C 
2 3 1

combn(table(list), 2) の結果

> combn(table(list), 2)

     [,1] [,2] [,3]
[1,]    2    2    3
[2,]    3    1    1

ちなみに、上記は以下のような組み合わせのカウント値です。

     [,1] [,2] [,3]
[1,] "A"  "A"  "B" 
[2,] "B"  "C"  "C"

ジニ不純度の算出2 - Ruby, C#, F#, Erlang

Ruby C# F# Erlang

前回に続き、今回は下記のようなプログラム言語でジニ不純度（ジニ係数）の算出処理を同じように実装してみました。

Ruby
C#
F#
Erlang

今回のソースは http://github.com/fits/try_samples/tree/master/blog/20140608/

Ruby で実装

Ruby 2.0
JRuby 1.7

Ruby では group_by で要素毎の Hash オブジェクトを取得できます。
（下記では {"A"=>["A", "A"], "B"=>["B", "B", "B"], "C"=>["C"]}）

なお、Hash で map した結果は配列になります。
（下記 list.group_by {|x| x }.map {|k, v| [k, v.size.to_f / xs.size] } の結果は [["A", 0.33・・・], ["B", 0.5], ["C", 0.16・・・]]）

また、combination(2) で前回の Scala の関数と同様に 2要素の組み合わせを取得できます。
（下記では、[["A", 0.33・・・], ["B", 0.5]], [["A", 0.33・・・], ["C", 0.16・・・]], [["B", 0.5], ["C", 0.16・・・]]）

gini.rb

#coding:utf-8

# (a) 1 - (AA + BB + CC)
def giniA(xs)
    1 - xs.group_by {|x| x }.inject(0) {|a, (k, v)| a + (v.size.to_f / xs.size) ** 2 }
end

# (b) AB × 2 + AC × 2 +  BC × 2
def giniB(xs)
    xs.group_by {|x| x }.map {|k, v| [k, v.size.to_f / xs.size] }.combination(2).inject(0) {|a, t| a + t.first.last * t.last.last * 2}
end

list = ["A", "B", "B", "C", "B", "A"]

puts giniA(list)
puts giniB(list)

実行結果

> ruby gini.rb

0.6111111111111112
0.611111111111111

C# で実装

.NET Framework 4.5

LINQ の GroupBy メソッドを使えば要素毎にグルーピングした IGrouping<TKey, TSource> のコレクションを取得できます。

要素の組み合わせも LINQ のクエリ式を使えば簡単に作成できます。（下記の combination メソッド）

gini.cs

using System;
using System.Collections.Generic;
using System.Linq;

class Gini
{
    public static void Main(string[] args)
    {
        var list = new List<string>() {"A", "B", "B", "C", "B", "A"};

        Console.WriteLine("{0}", giniA(list));
        Console.WriteLine("{0}", giniB(list));
    }

    // (a) 1 - (AA + BB + CC)
    private static double giniA<K>(IEnumerable<K> xs)
    {
         return 1 - xs.GroupBy(x => x).Select(x => Math.Pow((double)x.Count() / xs.Count(), 2)).Sum();
    }

    // (b) AB + AC + BA + BC + CA + CB
    private static double giniB<K>(IEnumerable<K> xs)
    {
        return
            combination(
                countBy(xs).Select(t =>
                    Tuple.Create(t.Item1, (double)t.Item2 / xs.Count())
                )
            ).Select(x => x.Item1.Item2 * x.Item2.Item2).Sum();
    }

    private static IEnumerable<Tuple<K, int>> countBy<K>(IEnumerable<K> xs) {
        return xs.GroupBy(x => x).Select(g => Tuple.Create(g.Key, g.Count()));
    }

    // 異なる要素の組み合わせを作成
    private static IEnumerable<Tuple<Tuple<K, V>, Tuple<K, V>>> combination<K, V>(IEnumerable<Tuple<K, V>> data) {
        return
            from x in data
            from y in data
            where !x.Item1.Equals(y.Item1)
            select Tuple.Create(x, y);
    }
}

実行結果

> csc gini.cs
> gini.exe

0.611111111111111
0.611111111111111

F# で実装

F# 3.1

F# では Seq.countBy で要素毎のカウント値を取得できます。
（下記では seq [("A", 2); ("B", 3); ("C", 1)]）

要素の組み合わせは内包表記を使えば簡単に作成できます。（下記の combinationCount）

gini.fs

let size xs = xs |> Seq.length |> float

// (a) 1 - (AA + BB + CC)
let giniA xs = xs |> Seq.countBy id |> Seq.sumBy (fun (k, v) -> (float v / size xs) ** 2.0) |> (-) 1.0

let combinationCount cs = [
    for x in cs do
        for y in cs do
            if fst x <> fst y then
                yield (snd x, snd y)
]

// (b) AB + AC + BA + BC + CA + CB
let giniB xs = xs |> Seq.countBy id |> combinationCount |> Seq.sumBy (fun (x, y) -> (float x / size xs) * (float y / size xs))

let list = ["A"; "B"; "B"; "C"; "B"; "A";]

printfn "%A" (giniA list)
printfn "%A" (giniB list)

実行結果

> fsc gini.fs
> gini.exe

0.6111111111
0.6111111111

Erlang で実装

Erlang 5.10

Erlang ではグルーピング処理等は用意されていないようなので自前で実装しました。（今回は dict モジュールを使いました）

リスト内包表記（[<構築子> || <限定子>, ・・・]）で使用するジェネレータ（<変数> <- <式>）の右辺はリストになる式を指定する必要があるため、dict:to_list() でリスト化しています。

gini.erl

-module(gini).
-export([main/1]).

groupBy(Xs) -> lists:foldr(fun(X, Acc) -> dict:append(X, X, Acc) end, dict:new(), Xs).

countBy(Xs) -> dict:map( fun(_, V) -> length(V) end, groupBy(Xs) ).

% (a) 1 - (AA + BB + CC)
giniA(Xs) -> 1 - lists:sum([ math:pow(V / length(Xs), 2) || {_, V} <- dict:to_list(countBy(Xs)) ]).

combinationProb(Xs) -> [ {Vx, Vy} || {Kx, Vx} <- Xs, {Ky, Vy} <- Xs, Kx /= Ky ].

% (b) AB + AC + BA + BC + CA + CB
giniB(Xs) -> lists:sum([ (Vx / length(Xs)) * (Vy / length(Xs)) || {Vx, Vy} <- combinationProb(dict:to_list(countBy(Xs))) ]).

main(_) ->
    List = ["A", "B", "B", "C", "B", "A"],

    io:format("~p~n", [ giniA(List) ]),
    io:format("~p~n", [ giniB(List) ]).

実行結果

> escript gini.erl

0.6111111111111112
0.611111111111111

なお、groupBy(List) と countBy(List) の結果を出力すると下記のようになりました。

groupBy(List) の出力結果

{dict,3,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
      {{[],
        [["B","B","B","B"]],
        [],[],[],[],
        [["C","C"]],
        [],[],[],[],[],
        [["A","A","A"]],
        [],[],[]}}}

countBy(List) の出力結果

{dict,3,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
      {{[],
        [["B"|3]],
        [],[],[],[],
        [["C"|1]],
        [],[],[],[],[],
        [["A"|2]],
        [],[],[]}}}

2014-06-01

ジニ不純度の算出 - Groovy, Scala , Java 8, Frege

Java Groovy Scala Frege

書籍「集合知プログラミング」の「7章決定木によるモデリング」にあったジニ不純度（ジニ係数）の計算を下記の JVM 言語で関数言語的に実装してみました。

今回のソースは http://github.com/fits/try_samples/tree/master/blog/20140601/

はじめに

ジニ不純度の算出は、下記の (1) と (2) で異なるアイテムを取り出す確率を求める事になります。

(1) ある集合から 1つアイテムを取り出す
(2) 取り出したアイテムを戻して再度 1つアイテムを取り出す

例えば、["A", "B", "B", "C", "B", "A"] のような集合の場合に A、B、C を取り出す確率はそれぞれ以下のようになります。

A = 2/6 = 1/3
B = 3/6 = 1/2
C = 1/6

ここで、ジニ不純度は以下のような 2通りの方法で算出できます。（下記の XY は (1) で X が出て (2) で Y が出る確率を表している）

(a) ジニ不純度 = 1 - (AA + BB + CC) = 1 - (1/3 × 1/3 + 1/2 × 1/2 + 1/6 × 1/6) = 11/18 = 0.61
(b) ジニ不純度 = AB + AC + BA + BC + CA + CB = 1/3 × 1/2 + 1/3 × 1/6 + ・・・ = 11/18 = 0.61

(a) の方がシンプルな実装になると思います。

Groovy で実装

それではそれぞれの言語で実装してみます。

Groovy では countBy メソッドで要素毎のカウント値を簡単に取得できます。

異なる要素同士の組み合わせは、今回 nCopies、combinations、findAll を使って取得しました。

下記で list.countBy {it} の結果は [A:2, B:3, C:1]、
nCopies(2, list.countBy { it }) の結果は [[A:2, B:3, C:1], [A:2, B:3, C:1]]、
nCopies(2, list.countBy { it }).combinations() の結果は [[A=2, A=2], [B=3, A=2], ・・・, [B=3, C=1], [C=1, C=1]]
となります。

gini.groovy

import static java.util.Collections.nCopies

// (a) 1 - (AA + BB + CC)
def giniA = { xs ->
    1 - xs.countBy { it }*.value.sum { (it / xs.size()) ** 2 }
}

// (b) AB + AC + BA + BC + CA + CB
def giniB = { xs ->
    nCopies(2, xs.countBy { it }).combinations().findAll {
        // 同じ要素同士の組み合わせを除外
        it.first().key != it.last().key
    }.sum {
        (it.first().value / xs.size()) * (it.last().value / xs.size())
    }
}

def list = ['A', 'B', 'B', 'C', 'B', 'A']

println giniA(list)
println giniB(list)

実行結果

> groovy gini.groovy

0.61111111112222222222
0.61111111112222222222

Scala で実装

Scala では Groovy の countBy に該当するメソッドが無さそうだったので groupBy を使いました。

List で combinations(2) とすればリスト内要素の 2要素の組み合わせ（下記では AB、AC、BC の 3種類の組み合わせ）を取得できます。

下記で list.groupBy(identity) の結果は Map(A -> List(A, A), C -> List(C), B -> List(B, B, B)) となります。

gini.scala

import scala.math.pow

// (a) 1 - (AA + BB + CC)
val giniA = (xs: List[_]) => 1 - xs.groupBy(identity).mapValues( v =>
    pow(v.size.toDouble / xs.size, 2)
).values.sum

// (b) AC × 2 + AB × 2 + CB × 2
val giniB = (xs: List[_]) => xs.groupBy(identity).mapValues( v =>
    v.size.toDouble / xs.size
).toList.combinations(2).map( x =>
    x.head._2 * x.last._2 * 2
).sum

val list = List("A", "B", "B", "C", "B", "A")

println( giniA(list) )
println( giniB(list) )

実行結果

> scala gini.scala

0.6111111111111112
0.611111111111111

Java 8 で実装

Java 8 の Stream API では groupingBy と counting メソッドを組み合わせて collect すると要素毎のカウントを取得できます。

要素の組み合わせを取得するようなメソッドは無さそうだったので自前で実装しました。

下記で countBy(list) の結果は {A=2, B=3, C=1}、
combination(countBy(list)) の結果は [[A=2, B=3], [A=2, C=1], [B=3, A=2], [B=3, C=1], [C=1, A=2], [C=1, B=3]]
のようになります。

Gini.java

import static java.util.stream.Collectors.*;

import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Stream;

class Gini {
    public static void main(String... args) {
        List<String> list = Arrays.asList("A", "B", "B", "C", "B", "A");

        System.out.println( giniA(list) );
        System.out.println( giniB(list) );
    }

    // (a) 1 - (AA + BB + CC)
    private static double giniA(List<String> xs) {
        return 1 - countBy(xs).values().stream().mapToDouble( x -> Math.pow(x.doubleValue() / xs.size(), 2) ).sum();
    }

    // (b) AB + AC + BA + BC + CA + CB
    private static double giniB(List<String> xs) {
        return combination(countBy(xs)).stream().mapToDouble( s ->
            s.stream().mapToDouble( t -> 
                t.getValue().doubleValue() / xs.size()
            ).reduce(1.0, (a, b) -> a * b ) 
        ).sum();
    }

    private static <T> Map<T, Long> countBy(Collection<T> xs) {
        return xs.stream().collect(groupingBy(Function.identity(), counting()));
    }

    private static <T, S> Collection<? extends List<Map.Entry<T, S>>> combination(Map<T, S> data) {
        return data.entrySet().stream().flatMap( x ->
            data.entrySet().stream().flatMap ( y ->
                (x.getKey().equals(y.getKey()))? Stream.empty(): Stream.of(Arrays.asList(x, y))
            )
        ).collect(toList());
    }
}

実行結果

> java Gini

0.6111111111111112
0.611111111111111

Frege で実装

Frege の group 関数では連続した同じ要素をグルーピングしますので sort してから使う必要があります。

下記で、group . sort $ list の結果は [["A", "A"], ["B", "B", "B"], ["C"]] となります。

組み合わせ（AB, AC 等）の確率計算にはリスト内包表記を使ってみました。

gini.fr

package sample.Gini where

import frege.prelude.Math (**)
import Data.List

size = fromIntegral . length

-- (a) 1 - (AA + BB + CC)
giniA xs = (1 - ) . sum . map calc . group . sort $ xs
    where
        listSize = size xs
        calc x = (size x / listSize) ** 2

-- (b) AB + AC + BA + BC + CA + CB
giniB xs = fold (+) 0 . calcProb . map prob . group . sort $ xs
    where
        listSize = size xs
        prob ys = (head ys, size ys / listSize)
        calcProb zs = [ snd x * snd y | x <- zs, y <- zs, fst x /= fst y]

main args = do
    let list = ["A", "B", "B", "C", "B", "A"]

    println $ giniA list
    println $ giniB list

実行結果

> java -cp .;frege3.21.586-g026e8d7.jar sample.Gini

0.6111111111111112
0.611111111111111
runtime ・・・

備考

giniB 関数の fold (+) 0 の部分は sum でも問題ないように思うのですが、sum を使うと下記のようなエラーが発生しました。

giniB 関数で sum を使った場合のエラー内容

E sample.fr:14: inferred type is more constrained than expected type
    inferred:  (Real t17561,Show t17561) => [String] -> IO ()
    expected:  [String] -> IO ()

ちなみに、ほぼ同じコードが Haskell で動作するのですが、Haskell の場合は sum を使っても問題ありませんでした。（gini.hs 参照）

2014-05-18

Gradle を使って JAR ファイルへ AspectJ を適用

Java AspectJ Groovy

Gradle を使って既存の JAR ファイルへ AspectJ を適用してみました。

Gradle 1.12
AspectJ 1.8.0

Gradle 用の AspectJ プラグインとして gradle-aspectj というものがあるようですが、今回は AspectJ （aspectjtools）に含まれている Ant 用の AjcTask （iajc）を Gradle から使う事にします。

AjcTask の利用方法

Gradle で AjcTask を使用するには Gradle ビルドスクリプト（build.gradle）で下記のように定義します。

ant.taskdef(resource: 'org/aspectj/tools/ant/taskdefs/aspectjTaskdefs.properties', classpath: <aspectjtools へのパス>)

ant.iajc() で AjcTask を利用できるようになります。

Struts の JAR ファイルへ AspectJ を適用

今回は下記のアスペクト定義を Struts 1.3.10 の JAR ファイル（struts-core-1.3.10.jar）へ適用してみる事にします。

org.apache.struts.mock パッケージを除いた org.apache.struts 以降のパッケージに属しているクラスの BeanUtils.populate() 呼び出し箇所でプロパティ名がパターンにマッチした際に IllegalArgumentException を発生させるような処理を織り込みます。

アスペクト定義 src/main/java/fits/sample/StrutsAspect.java

package fits.sample;

import java.util.Map;
import java.util.regex.Pattern;

import org.aspectj.lang.annotation.*;
import org.aspectj.lang.*;

@Aspect
public class StrutsAspect {
    private final static Pattern PATTERN = Pattern.compile("(^|\\W)[cC]lass\\W");

    @Around(
        "call(void org.apache..BeanUtils.populate(Object, Map)) &&" + 
        "within(org.apache.struts..*) && " +
        "!within(org.apache.struts.mock.*) && " +
        "args(bean, properties)"
    )
    public void aroundPopulate(ProceedingJoinPoint pjp, Object bean, Map properties) throws Throwable {
        if (properties != null) {
            checkProperties(properties);

            pjp.proceed();
        }
    }

    private void checkProperties(Map properties) {
        for (Object key : properties.keySet()) {
            String property = (String)key;

            if (PATTERN.matcher(property).find()) {
                throw new IllegalArgumentException(key + " is invalid");
            }
        }
    }
}

ビルドスクリプトの内容は下記のようになります。

aspectjtools のパスを取得するために configurations へ ajc を定義し、AspectJ を適用するタスクを ajc という名称で定義しました。

struts-core-1.3.10.jar へ AspectJ を適用するための依存ライブラリは、configurations.compile から取得して、ant.iajc の classpath へ設定するようにしています。

なお、commons-fileupload や servlet-api 等の依存ライブラリはアスペクト定義の内容によって代わると思います。

ビルドスクリプト build.gradle

apply plugin: 'java'

repositories {
    mavenCentral()
}

configurations {
    ajc
}

dependencies {
    ajc "org.aspectj:aspectjtools:1.8.0"
    compile "org.aspectj:aspectjrt:1.8.0"
    compile "org.apache.struts:struts-core:1.3.10"
    compile 'commons-fileupload:commons-fileupload:1.3.1'
    compile 'javax.servlet:servlet-api:2.5'
}

task ajc << {
    ant.taskdef(resource: 'org/aspectj/tools/ant/taskdefs/aspectjTaskdefs.properties', classpath: configurations.ajc.asPath)

    ant.iajc(outJar: "struts-core_custom.jar", source: '1.7', showWeaveInfo: 'true') {
        sourceroots {
            sourceSets.main.java.srcDirs.each {
                pathelement(location: it.absolutePath)
            }
        }
        classpath {
            // 依存ライブラリの設定
            pathelement(location: configurations.compile.asPath)
        }
        inpath {
            // struts-core へのパスを設定
            pathelement(location: configurations.compile.files.find { it.path.contains 'struts-core' }.absolutePath)
        }
    }
}

実行結果は下記のようになり、 ActionServlet クラスと RequestUtils クラスへ StrutsAspect が適用された JAR ファイル（struts-core_custom.jar）がカレントディレクトリへ作成されます。

実行結果

> gradle ajc --info
・・・
:ajc (Thread[main,5,main]) started.
:ajc
Executing task ':ajc' (up-to-date check took 0.0 secs) due to:
  Task has not declared any outputs.
[ant:iajc] weaveinfo Join point 'method-call(void org.apache.commons.beanutils.BeanUtils.populate(java.lang.Object, java.util.Map))' in Type 'org.apache.struts.action.ActionServlet' (ActionServlet.java:845) advised by around advice from 'fits.sample.StrutsAspect' (StrutsAspect.java:19)
[ant:iajc] weaveinfo Join point 'method-call(void org.apache.commons.beanutils.BeanUtils.populate(java.lang.Object, java.util.Map))' in Type 'org.apache.struts.util.RequestUtils' (RequestUtils.java:473) advised by around advice from 'fits.sample.StrutsAspect' (StrutsAspect.java:19)
:ajc (Thread[main,5,main]) completed. Took 5.211 secs.

BUILD SUCCESSFUL

今回使用したサンプルのソースは http://github.com/fits/try_samples/tree/master/blog/20140518/

はじめに

Roy のインストール例

Underscore.js を使った関数合成

sample.roy

実行結果

コンパイル実行例

sample.js （生成されたファイル）

自前で compose 関数を定義して関数合成

sample2.roy

実行結果

はじめに

(1) EmbeddedSolrServer を使用

search1.groovy

実行結果

(2) SolrCore を使用

search2.groovy

実行結果

JSON で結果出力

search2_json.groovy

実行結果

money_count.groovy

実行結果

input_sample.txt の内容

Python で実装

gini.py

実行結果

Python 3 で実行するには

gini3.py （Python 3.4）

実行結果 （Python 3.4）

R で実装

gini.R

実行結果

備考

table(list) の結果

combn(table(list), 2) の結果

CoffeeScript で実装

gini.coffee

実行結果

Ruby で実装

gini.rb

実行結果

C# で実装

gini.cs

実行結果

F# で実装

gini.fs

実行結果

Erlang で実装

gini.erl

実行結果

groupBy(List) の出力結果

countBy(List) の出力結果

はじめに

Groovy で実装

gini.groovy

実行結果

Scala で実装

gini.scala

実行結果

Java 8 で実装

Gini.java

実行結果

Frege で実装

gini.fr

実行結果

備考

giniB 関数で sum を使った場合のエラー内容

AjcTask の利用方法

Struts の JAR ファイルへ AspectJ を適用

アスペクト定義 src/main/java/fits/sample/StrutsAspect.java

ビルドスクリプト build.gradle

実行結果

実行結果（Python 3.4）