2011-09-17

Micro Cloud Foundry 上で Sinatra + MongoMapper アプリケーションを実行する

「Sinatra + Haml で MongoDB を使う id:fits:20110306」で作成した Sinatra + Haml + MongoMapper のサンプルをプライベートクラウド PaaS 環境の Micro Cloud Foundry 上で実行してみました。

サンプルソースは http://github.com/fits/try_samples/tree/master/blog/20110917/

Micro Cloud Foundry は VMware用の仮想マシンイメージとして無償で配布されており、手軽にプライベートクラウドの PaaS 環境を構築できるようになっています。
個人的には Sinatra, Node.js, Grails 等のフレームワークをサポートしている点や MongoDB が使える点（MySQL や Redis も使える）が気に入っています。

アプリケーションの変更

id:fits:20110306 の MongoMapper 版サンプルをそのまま使い Micro Cloud Foundry 用に MongoDB の接続設定を書き換えます。

Micro Cloud Foundry 環境では、アプリケーションがバインドしているサービスの情報（今回は MongoDB）が VCAP_SERVICES という環境変数に設定されているので（形式は JSON）、この値をパースして DB への接続情報などを取得する事になります。

VCAP_SERVICES に設定された MongoDB 接続情報例

{"mongodb-1.8":[{"name":"mongodb-cad3a","label":"mongodb-1.8","plan":"free","tags":["mongodb","mongodb-1.8","nosql"],"credentials":{"hostname":"127.0.0.1","host":"127.0.0.1","port":25001,"username":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","password":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","name":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","db":"db"}}]}

下記では、VCAP_SERVICES から MongoDB 接続情報を取り出し、接続用の URI を組み立てて MongoMapper に接続設定を行っています。

site.rb

require "rubygems"
require "sinatra"
require "haml"
require "mongo_mapper"
require "json"

require "models/book"
require "models/user"
require "models/comment"

configure do
    # MongoDB 接続情報取り出し
    services = JSON.parse(ENV['VCAP_SERVICES'])
    mongoKey = services.keys.select{|s| s =~ /mongodb/i}.first
    mongo = services[mongoKey].first['credentials']

    # MongoDB 接続用 URI 組み立て
    uri = "mongodb://#{mongo['username']}:#{mongo['password']}@#{mongo['host']}:#{mongo['port']}/#{mongo['db']}"

    # MongoMapper 設定
    MongoMapper.connection = Mongo::Connection.from_uri(uri)
    MongoMapper.database = mongo['db']
end

get '/' do
    haml :index, {}, :books => Book.all(:order => 'title'), :users => User.all(:order => 'name'), :action => '/comments'
end
・・・

Micro Cloud Foundry 用に変更するのは MongoDB の接続に関する部分だけです。

アプリケーションのデプロイ

Micro Cloud Foundry のセットアップが済んでおり、クライアントから vmc で接続できるようになっているものとします。（参照 http://support.cloudfoundry.com/entries/20316811-micro-cloud-foundry-installation-setup）

この状態でアプリケーションをデプロイ（vmc push）しても、MongoMapper がインストールされていないため以下のようなエラーが発生します。

デプロイ時のエラー例（MongoMapper 未インストール）

> vmc push msample
・・・
Staging Application: OK
Starting Application: .
Error: Application [msample] failed to start, logs information below.
====> logs/stderr.log <====

/var/vcap/data/packages/dea_ruby18/3/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `gem_original_require': no such file to load -- mongo_mapper (LoadError)
        from /var/vcap/data/packages/dea_ruby18/3/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require'
        from site.rb:4

Should I delete the application? (Y/n)?

このような場合の Micro Cloud Foundry 的な対応方法が分からなかったので、とりあえず Micro Cloud Foundry 環境に SSH でログインし、/var/vcap/data/packages/dea_ruby18 の gem を使って MongoMapper をインストールしてみました。

SSH でログインする際のユーザー名は vcap、パスワードに Micro Cloud Foundry の画面で設定したパスワードを使う点に注意。（vmc のログインに使うユーザー名とは異なります）

MongoMapper のインストール例（SSH で Micro Cloud Foundry 環境にログイン）

vcap@micro:~$ sudo /var/vcap/data/packages/dea_ruby18/3/bin/gem install mongo_mapper

これでアプリケーションを正常に実行できるようになったはずですので、再度デプロイを実行します。（下記では site.rb の配置ディレクトリをカレントディレクトリにして vmc push を実行しています）

アプリケーションのデプロイ例

> vmc push msample
Would you like to deploy from the current directory? [Yn]:
Application Deployed URL: 'msample.fits.cloudfoundry.me'?
Detected a Sinatra Application, is this correct? [Yn]:
Memory Reservation [Default:128M] (64M, 128M, 256M or 512M)
Creating Application: OK
Would you like to bind any services to 'msample'? [yN]: y
Would you like to use an existing provisioned service [yN]?
The following system services are available:
1. mongodb
2. mysql
3. redis
Please select one you wish to provision: 1
Specify the name of the service [mongodb-cad3a]:
Creating Service: OK
Binding Service: OK
Uploading Application:
  Checking for available resources: OK
  Packing application: OK
  Uploading (3K): OK
Push Status: OK

Staging Application: OK

Starting Application: OK

無事成功しました。

上記では、アプリケーションのデプロイ時にバインドするサービス（mongodb-cad3a）も同時に作成していますが、事前に vmc create-service でサービスを作成しておき、後から vmc bind-service でアプリケーションにバインドする事も可能です。

なお、登録されているアプリケーションの状態は vmc apps で確認できます。

アプリケーション一覧

> vmc apps
+-------------+----+---------+-------------------------------+---------------+
| Application | #  | Health  | URLS                          | Services      |
+-------------+----+---------+-------------------------------+---------------+
| msample     | 1  | RUNNING | msample.fits.cloudfoundry.me  | mongodb-cad3a |
+-------------+----+---------+-------------------------------+---------------+

これで、Web ブラウザから msample.fits.cloudfoundry.me にアクセスすれば正常に動作している事を確認できるはずです。

2011-07-18

LINQやコレクションAPIを使ってCSVファイルからデータ抽出 - C#, F#, Scala, Groovy, Ruby の場合

.NET C# F# Java Scala Groovy Ruby

id:fits:20110702 や id:fits:20110709 にて、SQL を使ってデータ抽出した処理を LINQ やコレクション API を使って実施し直してみました。（ただし、今回は station_g_cd でのソートを実施していない等、以前使った SQL と完全に同じではありません）

今回は C#, F#, Scala, Groovy, Ruby（JRuby）で実装していますが、どの言語も似たような API を用意しており、同じように実装できる事が分かると思います。

以前使った SQL 例

SELECT *
FROM (
    SELECT
        pref_name,
        station_g_cd,
        station_name,
        count(*) as lines
    FROM
      CSVREAD('m_station.csv') S
      JOIN CSVREAD('m_pref.csv') P ON S.pref_cd = P.pref_cd
    GROUP BY station_g_cd, station_name
    ORDER BY lines DESC, station_g_cd
)
WHERE ROWNUM <= 10

csv ファイルの内容は id:fits:20110702 を参照。
サンプルソースは http://github.com/fits/try_samples/tree/master/blog/20110718/

C# の場合（LINQ）

C# 4.0 で LINQ を使って実装してみました。
実装内容は以下の通り。

File.ReadAllLines(・・・) で行単位のコレクションを取得
Skip でヘッダー行を無視
join で都道府県を結合
group by でグループ化
- 匿名型を使ってグルーピング
- 戻り値の型は IEnumerable>
order by でソート
Take を使って 10件取得

LINQ を使うと SQL 風に実装できます。

listup_station.cs

using System;
using System.Linq;
using System.IO;
using System.Text;

class ListUpStation
{
    public static void Main(string[] args)
    {
        //都道府県の取得
        var plines = File.ReadAllLines("m_pref.csv", Encoding.Default);
        var prefs = 
            from pline in plines.Skip(1)
                let p = pline.Split(',')
            select new {
                PrefCode = p[0],
                PrefName = p[1]
            };

        //路線数の多い駅の抽出
        var slines = File.ReadAllLines("m_station.csv", Encoding.Default);
        var list = (
            from sline in slines.Skip(1)
                let s = sline.Split(',')
            join p in prefs on s[10] equals p.PrefCode
            group s by new { 
                StationName = s[9],
                PrefName = p.PrefName, 
                StationGroupCode = s[5]
            } into stGroup
            orderby stGroup.Count() descending
            select stGroup
        ).Take(10);

        //結果出力
        foreach(var s in list)
        {
            Console.WriteLine("{0}駅 ({1}) : {2}", s.Key.StationName, s.Key.PrefName, s.Count());
        }
    }
}

実行結果

> csc listup_station.cs
> listup_station.exe
新宿駅 (東京都) : 12
横浜駅 (神奈川県) : 11
東京駅 (東京都) : 10
渋谷駅 (東京都) : 10
池袋駅 (東京都) : 9
大宮駅 (埼玉県) : 9
新橋駅 (東京都) : 7
大船駅 (神奈川県) : 7
上野駅 (東京都) : 7
千葉駅 (千葉県) : 7

F# の場合

F# 2.0.0 では LINQ を使わずコレクションを使って実装してみました。
実装内容は以下の通り。

File.ReadAllLines(・・・) で行単位のコレクションを取得
Seq.skip でヘッダー行を無視
都道府県を Map.ofSeq で Map 化
Seq.groupBy でグループ化
- レコード定義を使ってグルーピング
List.sortWith でソート
- sortWith を使うために List.ofSeq を使って List 化
Seq.take を使って 10件取得

なお、グルーピングの際にレコード定義を使ってますが、Tuple を使っても特に問題ありません。

listup_station.fsx

open System
open System.IO
open System.Text

//レコード定義
type Station = {
    StationName: string
    PrefName: string
    StationGroupCode: string
}

//都道府県の取得
let prefMap = File.ReadAllLines("m_pref.csv", Encoding.Default)
                |> Seq.skip 1
                |> Seq.map (fun l -> 
                        let items = l.Split(',')
                        (items.[0], items.[1])
                    )
                |> Map.ofSeq

//路線数の多い駅の抽出
let lines = File.ReadAllLines("m_station.csv", Encoding.Default)
let list = lines 
            |> Seq.skip 1 
            |> Seq.map (fun l -> l.Split(',')) 
            |> Seq.groupBy (fun s -> 
                    {
                        StationName = s.[9]
                        PrefName = Map.find s.[10] prefMap
                        StationGroupCode = s.[5]
                    }
                ) 
            |> List.ofSeq 
            |> List.sortWith (fun a b -> Seq.length(snd b) - Seq.length(snd a)) 
            |> Seq.take 10

//結果出力
for s in list do
    let st = fst s
    stdout.WriteLine("{0}駅 ({1}) : {2}", st.StationName, st.PrefName, Seq.length((snd s)))

実行結果

> fsi listup_station.fsx
新宿駅 (東京都) : 12
横浜駅 (神奈川県) : 11
東京駅 (東京都) : 10
渋谷駅 (東京都) : 10
池袋駅 (東京都) : 9
大宮駅 (埼玉県) : 9
新橋駅 (東京都) : 7
大船駅 (神奈川県) : 7
上野駅 (東京都) : 7
千葉駅 (千葉県) : 7

Scala の場合

Scala 2.9.0.1 もコレクションで実装してみました。
実装内容は以下の通り。

Source.fromFile(・・・).getLines() で行単位のコレクションを取得
drop でヘッダー行を削除
都道府県を toMap で Map 化
groupBy でグループ化
- ケースクラスを使ってグルーピング
sortWith でソート
take を使って 10件取得

なお、グルーピングでケースクラスを使ってますが、F# と同様に Tuple を使っても問題ありません。

listup_station.scala

import scala.io.Source

case class Station(val stationName: String, val prefName: String, val stationGroupCode: String)

//都道府県の取得
val prefMap = Source.fromFile("m_pref.csv").getLines().drop(1).map {l =>
    val items = l.split(",")
    items(0) -> items(1)
}.toMap

//路線数の多い駅の抽出
val lines = Source.fromFile("m_station.csv").getLines()
val list = lines.drop(1).toList.map(_.split(",")).groupBy {s =>
    Station(s(9), prefMap.get(s(10)).get, s(5))
}.toList.sortWith {(a, b) => 
    a._2.length > b._2.length
} take 10

//結果出力
list.foreach {s =>
    printf("%s駅 (%s) : %d\n", s._1.stationName, s._1.prefName, s._2.length)
}

実行結果

> scala listup_station.scala
新宿駅 (東京都) : 12
横浜駅 (神奈川県) : 11
東京駅 (東京都) : 10
渋谷駅 (東京都) : 10
池袋駅 (東京都) : 9
大宮駅 (埼玉県) : 9
大船駅 (神奈川県) : 7
京都駅 (京都府) : 7
新橋駅 (東京都) : 7
千葉駅 (千葉県) : 7

Groovy の場合

Groovy 1.8.0 もコレクションで実装してみました。
実装内容は以下の通り。

new File(・・・).readLines() で行単位のコレクションを取得
tail でヘッダー行以外を取得
都道府県を collectEntries で Map 化
groupBy でグループ化
- 配列を使ってグルーピング
sort でソート
List の getAt(Range) を使って 10件取得（asList()[0..9] の箇所）
- getAt(Range) を使うために entrySet() で取得した Set を asList() で List 化

なお、a <=> b は a.compareTo(b) と同じです。

listup_station.groovy

//都道府県の取得
def prefMap = new File("m_pref.csv").readLines() tail() collectEntries {
    def items = it.split(",")
    [items[0], items[1]]
}

//路線数の多い駅の抽出
def list = new File("m_station.csv").readLines() tail() collect {
    it.split(",")
} groupBy {
    [it[9], prefMap[it[10]], it[5]]
} sort {a, b -> 
    b.value.size <=> a.value.size
} entrySet() asList()[0..9]

//結果出力
list.each {
    println "${it.key[0]}駅 (${it.key[1]}) : ${it.value.size}"
}

実行結果

> groovy listup_station.groovy
新宿駅 (東京都) : 12
横浜駅 (神奈川県) : 11
東京駅 (東京都) : 10
渋谷駅 (東京都) : 10
池袋駅 (東京都) : 9
大宮駅 (埼玉県) : 9
新橋駅 (東京都) : 7
大船駅 (神奈川県) : 7
上野駅 (東京都) : 7
千葉駅 (千葉県) : 7

Ruby の場合

Ruby（JRuby 1.6.3）では csv モジュールを使わずに実装してみました。
実装内容は以下の通り。

IO.readlines(・・・) で行単位のコレクションを取得
drop でヘッダー行を削除
chop で改行文字を削除
都道府県を Hash[] で Hash 化
group_by でグループ化
- 配列を使ってグルーピング
sort でソート
take を使って 10件取得

listup_station.rb

#都道府県の取得
prefMap = Hash[IO.readlines("m_pref.csv").drop(1).map {|l| l.chop.split(',')}]

#路線数の多い駅の抽出
list = IO.readlines("m_station.csv").drop(1).map {|l|
    l.chop.split(',')
}.group_by {|s|
    [s[9], prefMap[s[10]], s[5]]
}.sort {|a, b|
    b[1].length <=> a[1].length
}.take 10

#結果出力
list.each do |s|
    puts "#{s[0][0]}駅 (#{s[0][1]}) : #{s[1].length}"
end

実行結果

> jruby listup_station.rb
新宿駅 (東京都) : 12
横浜駅 (神奈川県) : 11
渋谷駅 (東京都) : 10
東京駅 (東京都) : 10
大宮駅 (埼玉県) : 9
池袋駅 (東京都) : 9
新橋駅 (東京都) : 7
大船駅 (神奈川県) : 7
京都駅 (京都府) : 7
岡山駅 (岡山県) : 7

2011-06-18

JVM上の WebSocket サーバープログラム - Jetty, Grizzly, Netty, EM-WebSocket を試す

Java Ruby Groovy html5

WebSocket の簡単なサーバープログラムを Jetty, Grizzly, Netty, EM-WebSocket をそれぞれ使って、Groovy や JRuby で実装してみました。

WebSocket のプロトコル仕様は確定しておらず、互換性の無い改訂が行われているようなので、今回は draft-ietf-hybi-thewebsocketprotocol-00 をサポートした Google Chrome 12.0.742.100 の WebSocket クライアントと接続可能なサーバープログラムを作成する事にします。

実際に、draft-ietf-hybi-thewebsocketprotocol-00 で使う Sec-WebSocket-Key1 と Sec-WebSocket-Key2 は、draft-ietf-hybi-thewebsocketprotocol-06 で使わなくなっていたりする等、サーバー・クライアントでサポートしているプロトコル仕様に注意する必要がありました。（現時点での最新仕様は draft-ietf-hybi-thewebsocketprotocol-09 の模様）

使用した環境は以下の通りです。

クライアント
- Google Chrome 12.0.742.100
サーバー

ちなみに、EM-WebSocket を使うのが最も簡単で Netty を使うのが最も面倒でした。

また、Grizzly 2.1 以降は draft-ietf-hybi-thewebsocketprotocol-06 に対応しているものの、draft-ietf-hybi-thewebsocketprotocol-00 に対応していないため、今回の用途では使用できませんでした。

サンプルソースは http://github.com/fits/try_samples/tree/master/blog/20110618/

WebSocket クライアント

まずは Google Chrome 上で実行する WebSocket クライアントです。
ローカルファイルを Chrome 上で実行してサーバープログラムの動作確認に使います。

index.html

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8" />
    <script>
        var ws = new WebSocket("ws://localhost:8080/");

        ws.onopen = function(event) {
            console.log("websocket open");
            stateChange("opened")
        };

        ws.onmessage = function(event) {
            document.getElementById("log").innerHTML += "<li>" + event.data + "</li>";
        };

        ws.onclose = function(event) {
            console.log("websocket close");
            stateChange("closed")
        };

        ws.onerror = function(event) {
            console.log("error");
            stateChange("error")
        };

        function sendMessage() {
            var msg = document.getElementById("message").value;
            ws.send(msg);
        }

        function stateChange(state) {
            document.getElementById("state").innerHTML = state;
        }
    </script>
</head>
<body>
    <input id="message" type="text" />
    <input type="button" value="send" onclick="sendMessage()" />
    <span id="state">closed</span>
    <ul id="log"></ul>
</body>
</html>

Jetty による WebSocket サーバー

Jetty 8.0.0 M3

クライアントが送信してきた文字列の先頭に "echo : " という文字列を加えて返すだけの単純な処理を実装します。

jetty_groovy/echo_server.groovy

import javax.servlet.http.HttpServletRequest
import org.eclipse.jetty.server.Server
import org.eclipse.jetty.websocket.WebSocket
import org.eclipse.jetty.websocket.WebSocket.Connection
import org.eclipse.jetty.websocket.WebSocketHandler

class EchoWebSocket implements WebSocket.OnTextMessage {
    def outbound

    void onOpen(Connection outbound) {
        println("onopen : ${this}")
        this.outbound = outbound
    }

    void onMessage(String data) {
        println("onmessage : ${this} - ${data}")
        this.outbound.sendMessage("echo: ${data}")
    }

    void onClose(int closeCode, String message) {
        println("onclose : ${this} - ${closeCode}, ${message}")
    }
}

def server = new Server(8080)
//WebSocket用の Handler を設定
server.handler = new WebSocketHandler() {
    WebSocket doWebSocketConnect(HttpServletRequest request, String protocol) {
        println("websocket connect : ${protocol} - ${request}")
        new EchoWebSocket()
    }
}

server.start()
server.join()

ユーザーホームディレクトリの .groovy/lib ディレクトリに Jetty の lib ディレクトリ内の JAR ファイルを配置しておき実行します。（Groovy の conf/groovy-starter.conf を編集しても可）

実行例

> groovy echo_server.groovy

Grizzly による WebSocket サーバー

Grizzly 2.0.1 b2

Grizzly 2.1 は draft-ietf-hybi-thewebsocketprotocol-00 に対応していないため、Grizzly 2.0.1 を使う必要があります。

実装内容は Jetty 版と同じような感じです。

grizzly_groovy/echo_server.groovy

import org.glassfish.grizzly.http.server.*
import org.glassfish.grizzly.http.HttpRequestPacket
import org.glassfish.grizzly.websockets.*
import org.glassfish.grizzly.websockets.frame.*

class EchoWebSocketApplication extends WebSocketApplication {
    boolean isApplicationRequest(HttpRequestPacket req) {
        println("${req}")
        true
    }

    void onConnect(WebSocket websocket) {
        println("onConnect : ${websocket}")
        super.onConnect(websocket)
    }

    void onMessage(WebSocket websocket, Frame data) {
        println("onMessage : ${data}")
        websocket.send(Frame.createTextFrame("echo : ${data.asText}"))
    }

    void onClose(WebSocket websocket) {
        println("onClose : ${websocket}")
        super.onClose(websocket)
    }
}

def server = HttpServer.createSimpleServer()
server.getListener("grizzly").registerAddOn(new WebSocketAddOn())

WebSocketEngine.engine.registerApplication("/", new EchoWebSocketApplication())

server.start()
System.in.read()
server.stop()

ユーザーホームディレクトリの .groovy/lib ディレクトリに以下の JAR ファイルを配置しておき実行します。（Groovy の conf/groovy-starter.conf を編集しても可）

gmbal-api-only-3.0.0-b023.jar
grizzly-framework-2.0.1-b2.jar
grizzly-http-2.0.1-b2.jar
grizzly-http-server-2.0.1-b2.jar
grizzly-http-servlet-2.0.1-b2.jar
grizzly-rcm-2.0.1-b2.jar
grizzly-websockets-2.0.1-b2.jar
management-api-3.0.0-b012.jar

実行例

> groovy echo_server.groovy

Netty による WebSocket サーバー

Netty 3.2.4

Netty で実装する場合、Jetty や Grizzly とは異なり、ハンドシェイク処理を自前で実装する事になります。
まず HTTP でハンドシェイクを処理した後に WebSocket 用にパイプラインの構成を変更します。

netty_groovy/echo_server.groovy

import java.net.InetSocketAddress
import java.security.MessageDigest
import java.util.concurrent.Executors
import static org.jboss.netty.handler.codec.http.HttpHeaders.*
import org.jboss.netty.bootstrap.ServerBootstrap
import org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory
import org.jboss.netty.buffer.ChannelBuffers
import org.jboss.netty.channel.*
import org.jboss.netty.channel.Channels
import org.jboss.netty.handler.codec.http.*
import org.jboss.netty.handler.codec.http.websocket.*

class ChatServerHandler extends SimpleChannelUpstreamHandler {
	//メッセージ受信処理
    public void messageReceived(ChannelHandlerContext ctx, MessageEvent e) {
        def msg = e.message
        println("message received : ${msg}")
        handleRequest(ctx, msg)
    }

    //WebSocket draft-ietf-hybi-thewebsocketprotocol-00 用の
    //ハンドシェイク処理（HTTP リクエストの処理）
    def handleRequest(ChannelHandlerContext ctx, HttpRequest req) {
        //ハンドシェイクのレスポンス作成
        def res = new DefaultHttpResponse(HttpVersion.HTTP_1_1, 
                new HttpResponseStatus(101, "Web Socket Protocol Handshake"))

        res.addHeader(Names.UPGRADE, Values.WEBSOCKET)
        res.addHeader(Names.CONNECTION, Values.UPGRADE)

        res.addHeader(Names.SEC_WEBSOCKET_ORIGIN, req.getHeader(Names.ORIGIN))
        res.addHeader(Names.SEC_WEBSOCKET_LOCATION, "ws://localhost:8080/")

        def key1 = req.getHeader(Names.SEC_WEBSOCKET_KEY1)
        def key2 = req.getHeader(Names.SEC_WEBSOCKET_KEY2)

        //キー内の数値のみを取り出し数値化したものをキー内の空白数で割る
        int key1res = (int)Long.parseLong(key1.replaceAll("[^0-9]", "")) / key1.replaceAll("[^ ]", "").length()
        int key2res = (int)Long.parseLong(key2.replaceAll("[^0-9]", "")) / key2.replaceAll("[^ ]", "").length()

        long content = req.content.readLong()

        def input = ChannelBuffers.buffer(16)
        input.writeInt(key1res)
        input.writeInt(key2res)
        input.writeLong(content)

        res.content = ChannelBuffers.wrappedBuffer(MessageDigest.getInstance("MD5").digest(input.array))

        //接続をアップグレード
        //（WebSocket 用に decoder と encoder を変更する）
        def pipeline = ctx.channel.pipeline
        pipeline.replace("decoder", "wsdecoder", new WebSocketFrameDecoder())
        //レスポンス送信
        ctx.channel.write(res)
        //encoder はレスポンス送信に使用するため送信後に WebSocket 用に変更
        pipeline.replace("encoder", "wsencoder", new WebSocketFrameEncoder())
    }

    //WebSocket 処理
    def handleRequest(ChannelHandlerContext ctx, WebSocketFrame msg) {
        ctx.channel.write(new DefaultWebSocketFrame("echo : ${msg.textData}"))
    }
}

def server = new ServerBootstrap(new NioServerSocketChannelFactory(
    Executors.newCachedThreadPool(),
    Executors.newCachedThreadPool()
))

//WebSocket を使うには、まず HTTP で処理する必要があるため
//HTTP 用の decoder と encoder の構成を用意する
server.setPipelineFactory({
    def pipeline = Channels.pipeline()
    pipeline.addLast("decoder", new HttpRequestDecoder())
    pipeline.addLast("encoder", new HttpResponseEncoder())
    pipeline.addLast("handler", new ChatServerHandler())
    pipeline
} as ChannelPipelineFactory)

server.bind(new InetSocketAddress(8080))

netty-3.2.4.Final.jar

実行例

> groovy echo_server.groovy

EM-WebSocket による WebSocket サーバー

EM-WebSocket 0.3.0

まず、EM-WebSocket をインストールしておきます。

EM-WebSocket インストール

> gem install em-websocket

EM-WebSocket を使った WebSocket サーバーは以下のようになります。これまでのサンプルに比べると非常に簡単になっています。

em-websocket_jruby/echo_server.rb

require 'rubygems'
require 'em-websocket'

EventMachine::WebSocket.start(:host => "localhost", :port => 8080, :debug => true) do |ws|
    ws.onopen {puts "onopen"}
    ws.onmessage {|msg| ws.send "echo : #{msg}"}
    ws.onclose {puts "onclose"}
end

実行例

> jruby echo_server.rb

2011-03-21

Maven での BDD - Specs, Specs2, RSpec, Easyb, spock

Java BDD Ruby Scala Groovy

Maven3 を使ったプロジェクトでの BDD（振舞駆動開発）の実施方法をまとめてみました。
今回試した BDD ツールは以下の通りです。

結果として、この中では Specs/Specs2 か spock あたりを使うのが良さそうです。（RSpec と Easyb は問題あり）

なお、実行環境は以下の通りです。

Java SE 6 Update 24
Maven 3.0.3（Easyb のみ Maven 2.2.1）

サンプルのソースは http://github.com/fits/try_samples/tree/master/blog/20110321/

Specs の場合

Scala による BDD ツールの Specs を使用する場合、pom.xml を以下のように設定します。
なお、surefire プラグインのデフォルト設定では XXXSpec クラスは実行対象にならないため、include で **/*Spec.java を追加しています。（実ファイルの拡張子は .scala ですが、拡張子を .java で指定する点に注意）

pom.xml

<project ・・・>
  ・・・
  <properties>
    <scala.version>2.8.1</scala.version>
  </properties>
  <!-- Specs 等 Scala 関連ライブラリを取得するためのリポジトリ設定 -->
  <repositories>
    <repository>
      <id>scala-tools.org</id>
      <name>releases</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
  </repositories>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <configuration>
          <includes>
            <!--
              XXXSpec クラスを実行対象にするための設定
              拡張子を .java で include しなければならない点に注意
             -->
            <include>**/*Spec.java</include>
          </includes>
        </configuration>
      </plugin>
      <!-- Scala によるスペックファイルをテスト時にコンパイルするための設定 -->
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <dependencies>
    <!-- junit の定義が必要 -->
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.8.2</version>
      <scope>test</scope>
    </dependency>
    <!-- Specs の設定 -->
    <dependency>
      <groupId>org.scala-tools.testing</groupId>
      <artifactId>specs_${scala.version}</artifactId>
      <version>1.6.7.2</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

次に、スペックファイルは以下のようになります。src/test/scala に配置します。
他にも org.specs.runner.JUnit4 を使って定義する方法等がありますが、今回はシンプルに定義できる方法を採用しています。

src/test/scala/BookSpec.scala

package fits.sample

import scala.collection.JavaConversions._
import org.specs._

class BookSpec extends SpecificationWithJUnit {

    "初期状態" should {
        val b = new Book()

        "comments は null ではない" in {
            b.getComments() must notBeNull
        }

        "comments は空" in {
            b.getComments() must haveSize(0)
        }
    }

    "Comment を追加" should {
        val b = new Book()
        b.getComments().add(new Comment())

        "Comment が追加されている" in {
            b.getComments() must haveSize(1)
        }
    }
}

mvn test で実行します。

実行例

> mvn test
・・・
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running fits.sample.BookSpec
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.483 sec

Results :

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0

Specs2 の場合

Specs の次期バージョン Specs2 は以下のようになります。基本的に Specs と同様ですが、groupId やパッケージ名、not の使い方に違いがあります。

pom.xml

<project ・・・>
  ・・・
  <properties>
    <scala.version>2.8.1</scala.version>
  </properties>
  <repositories>
    <repository>
      <id>scala-tools.org</id>
      <name>releases</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
  </repositories>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <configuration>
          <includes>
            <include>**/*Spec.java</include>
          </includes>
        </configuration>
      </plugin>
      <!-- Scala によるスペックファイルをテスト時にコンパイルするための設定 -->
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <dependencies>
    <!-- Specs2 の設定 -->
    <dependency>
      <groupId>org.specs2</groupId>
      <artifactId>specs2_${scala.version}</artifactId>
      <version>1.0.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

src/test/scala/BookSpec.scala

package fits.sample

import scala.collection.JavaConversions._
import org.specs2.mutable._

class BookSpec extends SpecificationWithJUnit {

    "初期状態" should {
        val b = new Book()

        "comments は null ではない" in {
            b.getComments() must not beNull
        }

        "comments は空" in {
            b.getComments() must haveSize(0)
        }
    }

    "Comment を追加" should {
        val b = new Book()
        b.getComments().add(new Comment())

        "Comment が追加されている" in {
            b.getComments() must haveSize(1)
        }
    }
}

Specs と同様に mvn test で実行します。

実行例

> mvn test
・・・
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running fits.sample.BookSpec
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.546 sec

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

なぜかテスト数のカウントが Specs と異なっているようです。

RSpec の場合

Ruby の BDD ツール RSpec を Maven で使うには一手間かかりました。（実用的では無いかもしれません）
今回は de.saumya.mojo の rspec-maven-plugin を使っていますが、別のプラグインを使う方法もあるようです。

pom.xml では TorqueBox RubyGems Maven Proxy Repository を使って RSpec の gem を Maven から取得できるようにしました。（RubyGems を直接リポジトリに設定する方法は駄目でした）
また、properties 要素内の jruby.version 要素で実行する JRuby のバージョンを指定できます。

pom.xml

<project ・・・>
  ・・・
  <properties>
    <!-- JRuby のバージョン指定-->
    <jruby.version>1.6.0</jruby.version>
    <jruby.plugins.version>0.25.1</jruby.plugins.version>
  </properties>

  <repositories>
    <!-- TorqueBox RubyGems Maven Proxy Repository -->
    <repository>
      <id>rubygems-proxy</id>
      <name>Rubygems Proxy</name>
      <url>http://rubygems-proxy.torquebox.org/releases</url>
      <layout>default</layout>
      <releases>
        <enabled>true</enabled>
      </releases>
      <snapshots>
        <enabled>fale</enabled>
        <updatePolicy>never</updatePolicy>
      </snapshots>
    </repository>
  </repositories>
  <build>
    <plugins>
      <!-- rspec-maven-plugin の設定 -->
      <plugin>
        <groupId>de.saumya.mojo</groupId>
        <artifactId>rspec-maven-plugin</artifactId>
        <version>${jruby.plugins.version}</version>
      </plugin>
    </plugins>
  </build>
  <dependencies>
    <!-- RSpec の設定 -->
    <dependency>
      <groupId>rubygems</groupId>
      <artifactId>rspec</artifactId>
      <version>2.5.0</version>
      <type>gem</type>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

スペックファイルは以下のようになります。spec に配置します。

spec/book_spec.rb

require "java"

module Fits
    include_package "fits.sample"
end

describe "Book" do
    context "初期状態" do
        before do
            @b = Fits::Book.new
        end

        it "comments は nil ではない" do
            @b.comments.should_not be_nil
        end

        it "comments は空" do
            @b.comments.size.should == 0
        end
    end

    context "Comment を追加" do
        before do
            @b = Fits::Book.new
            @b.comments.add(Fits::Comment.new)
        end

        it "Comment が追加されている" do
            @b.comments.size.should == 1
        end
    end
end

mvn rspec:test で RSpec が実行されますが、実はこのままでは下記のようなエラーが出て失敗します。

実行例（対策前）

> mvn rspec:test
・・・
[INFO] Running RSpec tests from ・・・\20110321\rspec\spec
[WARNING] NameError: uninitialized constant RSpec::Core::Formatters::BaseFormatter::StringIO
・・・
[WARNING]            load at org/jruby/RubyKernel.java:1062
[WARNING]          (root) at -e:1
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.308s
[INFO] Finished at: Mon Mar 21 16:46:28 JST 2011
[INFO] Final Memory: 3M/15M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal de.saumya.mojo:rspec-maven-plugin:0.25.1:test (default-cli) on project maven-rspec-sample1: Execution default-cli of goal de.saumya.mojo:rspec-maven-plugin:0.25.1:test failed: Java returned: 1 -> [Help 1]
・・・

これを回避するには rspec:test の実行時に自動生成された target/rspec-runner.rb の 48 行目をコメントアウト化して、ファイルを読み取り専用にしてしまいます。

target/rspec-runner.rb の48行目をコメントアウト化

::RSpec.configure do |config|
 # config.formatter = ::MultiFormatter
end

とりあえずこれで、ビルドは失敗扱いになるものの、一応 RSpec は実行されるようになります。

実行例（対策後）

> mvn rspec:test
・・・
[ERROR] error emitting .rb
java.io.FileNotFoundException: ・・・\rspec\target\rspec-runner.rb (アクセスが拒否されました。)
・・・
[INFO] ...
[INFO]
[INFO] Finished in 0.016 seconds
[INFO] 3 examples, 0 failures
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
・・・

上記のように rspec-runner.rb の内容を書き換える方法で回避する場合、rspec-maven-plugin の de.saumya.mojo.rspec.RSpec2ScriptFactory クラスの getRSpecRunnerScript メソッドの戻り値を AspectJ とかで書き換えてやればもう少しマシになると思います。

Easyb の場合

Groovy による BDD ツールの Easyb です。こちらは Maven 3.0.3 で実行するとエラーが出るため（http://code.google.com/p/easyb/issues/detail?id=209）、Maven 2.2.1 で実行する事にします。
pom.xml ファイルは以下のようになります。

pom.xml（Maven 2.2.1 用）

<project ・・・>
  ・・・
  <build>
    <plugins>
      <!-- Maven 2.2.1 で JavaSE 6 のソースを使うための設定 -->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>1.6</source>
          <target>1.6</target>
        </configuration>
      </plugin>
      <!-- Easyb の設定 -->
      <plugin>
        <groupId>org.easyb</groupId>
        <artifactId>maven-easyb-plugin</artifactId>
        <version>0.9.7-1</version>
        <executions>
          <execution>
            <goals>
              <goal>test</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

スペックファイルは以下の通りです。src/test/easyb に配置します。
なお、日本語を使用すると実行時にエラーが発生する点に注意。

src/test/easyb/BookStory.groovy

package fits.sample

scenario "init state", {
    given "Book", {
        b = new Book()
    }
    when ""
    then "comments is not null", {
        b.comments.shouldNotBe null
    }
    and
    then "comments is empty", {
        b.comments.size.shouldBe 0
    }
}

scenario "add Comment", {
    given "Book", {
        b = new Book()
    }
    when "add Comment", {
        b.comments.add(new Comment())
    }
    then "added Comment", {
        b.comments.size.shouldBe 1
    }
}

mvn test で実行します。

実行例（Maven 2.2.1 で実行）

> mvn test
・・・
[INFO] Using easyb dependency org.easyb:easyb:jar:0.9.7:compile
[INFO] Using easyb dependency commons-cli:commons-cli:jar:1.1:compile
[INFO] Using easyb dependency org.codehaus.groovy:groovy-all:jar:1.7.2:compile
     [java] Running book story (BookStory.groovy)
     [java] Scenarios run: 2, Failures: 0, Pending: 0, Time elapsed: 0.577 sec
     [java] 2 total behaviors ran with no failures
・・・

Spock の場合

最後に Groovy による BDD ツール spock です。
pom.xml ファイルは以下のようになります。（surefire を設定する点は Specs と同様）

pom.xml

<project ・・・>
  ・・・
  <build>
    <plugins>
      <!-- XXXSpec を実行対象にするための設定 -->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <configuration>
          <includes>
            <include>**/*Spec.java</include>
          </includes>
        </configuration>
      </plugin>
      <!-- Groovy によるスペックファイルをテスト時にコンパイルするための設定 -->
      <plugin>
        <groupId>org.codehaus.gmaven</groupId>
        <artifactId>gmaven-plugin</artifactId>
        <version>1.3</version>
        <configuration>
          <providerSelection>1.7</providerSelection>
        </configuration>
        <executions>
          <execution>
            <goals>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  <dependencies>
    <!-- spock の設定 -->
    <dependency>
      <groupId>org.spockframework</groupId>
      <artifactId>spock-core</artifactId>
      <version>0.5-groovy-1.7</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

スペックファイルは以下の通りです。src/test/groovy に配置します。

src/test/groovy/BookSpec.groovy

package fits.sample

import spock.lang.*

class InitBookSpec extends Specification {
    def b = new Book()

    def "comments は null ではない"() {
        expect:
            b.comments != null
    }

    def "comments は空"() {
        expect:
            b.comments.size == 0
    }
}

class AddCommentSpec extends Specification {
    def b = new Book()

    def "Comment を追加"() {
        when:
            b.comments.add(new Comment())
        then:
            b.comments.size == 1
    }
}

mvn test で実行します。

実行例

> mvn test
・・・
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running fits.sample.AddCommentSpec
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.468 sec
Running fits.sample.InitBookSpec
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.015 sec

Results :

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0

2011-03-06

Sinatra + Haml で MongoDB を使う - Mongoid, MongoMapper で関連とコンポジションを実装

Java Ruby NoSQL

Sinatra で MongoDB を使うために Mongoid と MongoMapper を試してみました。
以下のようなモデル構成を実装する事にします。

環境は以下の通り、テンプレートエンジンに Haml を使っています。

JRuby 1.5.6
- Sinatra 1.1.3
- Haml 3.0.25
- Mongoid 1.9.5
- MongoMapper 0.8.6
MongoDB 1.7.6

サンプルのソースは http://github.com/fits/try_samples/tree/master/blog/20110306/

事前準備

まず、今回使用するパッケージを gem でインストールしておきます。

インストール例

> gem install sinatra
> gem install haml
> gem install mongoid
> gem install mongo_mapper

Mongoid の場合

Mongoid のモデルクラスは以下のようにして定義します。

Mongoid::Document を include する事でモデルクラスを定義
field でフィールドを定義
- :type を使って型を指定
embeds_many と embedded_in でコンポジションを定義（Book と Comment）
belongs_to_related で一方向への関連を定義（Comment から User へ）

models_mongoid/book.rb（Mongoid）

class Book
    include Mongoid::Document

    field :title
    field :isbn

    # コンポジションの定義
    embeds_many :comments
end

models_mongoid/comment.rb（Mongoid）

class Comment
    include Mongoid::Document

    field :content
    field :created_date, :type => Date

    # コンポジションの定義
    embedded_in :book, :inverse_of => :comments

    # User への関連
    belongs_to_related :user
end

models_mongoid/user.rb（Mongoid）

class User
    include Mongoid::Document

    field :name
end

Sinatra による実装は以下です。

Haml テンプレートを使うには haml メソッドにテンプレートのシンボルとオプション、テンプレート内で使用するパラメータを渡します。
以下では全ての Book, User オブジェクト（:books, :users）とコメント追加先のパス（:action）を渡しています。

Mongoid の設定は configure に渡すブロック内で指定できます。以下では使用する DB に book_review を指定しています。

sample_mongoid.rb（Sinatra）

require "rubygems"
require "sinatra"
require "haml"
require "mongoid"

require "models_mongoid/book"
require "models_mongoid/user"
require "models_mongoid/comment"

# Mongoid 設定
Mongoid.configure do |config|
    config.master = Mongo::Connection.new.db('book_review')
end

# Top ページ
get '/' do
    haml :index, {}, :books => Book.all.order_by([[:title, :asc]]), :users => User.all.order_by([[:name, :asc]]), :action => '/comments'
end
・・・
# Book 追加
post '/books' do
    Book.create(params[:post])
    redirect '/books'
end

# Comment 追加
post '/comments' do
    b = Book.find(params[:post][:book_id])
    b.comments << Comment.new(:content => params[:post][:content], :created_date => Time.now, :user_id => params[:post][:user_id])
    b.save

    #以下でも可
    # Book.find(params[:post][:book_id]).comments.create(:content => params[:post][:content], :created_date => Time.now, :user_id => params[:post][:user_id])

    redirect '/'
end
・・・

Top ページの Haml テンプレートは以下の通りです。

注意点として、= や #{・・・} をそのまま使うとデフォルトでは HTML エスケープしてくれないので、クロスサイトスクリプティングなどの対策に HTML エスケープを行いたい場合、:escape_html オプションを true に設定するか（Sinatra の haml メソッドの第2引数で渡せば良い）、代わりに &= や & #{・・・} を使います。
今回は、後者のやり方で実装してみました。

views/index.haml（Haml）

.menu Menu
%ul
  %li
    %a(href="/books") Books List
  %li
    %a(href="/users") Users List

.list Book Comments
%form.post(action='#{action}' method='post')
  %select(name='post[user_id]')
    - users.each do |u|
      %option(value='#{u._id}')&= u.name
  %select(name='post[book_id]')
    - books.each do |b|
      %option(value='#{b._id}')&= b.title
  %input(name='post[content]' type='text')
  %input(type='submit' value='Add')

- books.each do |b|
  %ul
    %li&= b.title
    %ul
      - b.comments.each do |c|
        %li
          & #{c.content} : #{c.user.name}, #{c.created_date}

実行例

MongoDB を実行しておきます。

> mongod -dbpath db

Sinatra を実行します。

> jruby sample_mongoid.rb

実行画面は以下のようになります。

ちなみに、mongo コマンドを使って DB 内の books コレクションの内容を確認してみると、関連とコンポジションの実現方法の違いがよく分かると思います。

mongo コマンドで books コレクションの内容を確認

> mongo
・・・
> use book_review
switched to db book_review
> db.books.find()
{
  "_id" : "4d7253961875e20940000003", 
  "comments" : [{
      "content" : "check",
      "created_date" : ISODate("2011-03-06T00:00:00Z"),
      "user_id" : "4d7253811875e20940000001",
      "_id" : "4d7253b31875e20940000006"
  }], 
  "isbn" : "1234", 
  "title" : "Railsレシピブック"
}
・・・

MongoMapper の場合

MongoMapper のモデルクラスは以下のようにして定義します。

MongoMapper::Document や MongoMapper::EmbeddedDocument を include する事でモデルクラスを定義
key でフィールドを定義
- 第2引数で型を指定
many と MongoMapper::EmbeddedDocument でコンポジションを定義（Book と Comment）
belongs_to で一方向の関連を定義（Comment から User へ）

なお、以下では _id を String 型で定義していますが、これは Mongoid のサンプルで使った DB をそのまま使えるようにするためです。（MongoMapper の場合、_id はデフォルトの ObjectId 型で定義されるため、このままでは Mongoid 版で使った DB が使えません）

models_mongomapper/book.rb（Mongoid）

class Book
    include MongoMapper::Document

    # デフォルトで _id は ObjectId 型になるので String を指定
    key :_id, String
    key :title
    key :isbn

    # コンポジションの定義
    many :comments
end

models_mongomapper/comment.rb（Mongoid）

class Comment
    include MongoMapper::EmbeddedDocument

    # デフォルトで _id は ObjectId 型になるので String を指定
    key :_id, String
    key :content
    key :created_date, Date

    # User への関連
    belongs_to :user
end

models_mongomapper/user.rb（Mongoid）

class User
    include MongoMapper::Document

    # デフォルトで _id は ObjectId 型になるので String を指定
    key :_id, String
    key :name
end

Sinatra による実装は以下。
Mongoid 版とほとんど同じですが、order の指定の仕方が異なります。
なお、Haml テンプレートは Mongoid 版と共通です。

sample_mongomapper.rb（Sinatra）

require "rubygems"
require "sinatra"
require "haml"
require "mongo_mapper"

require "models_mongomapper/book"
require "models_mongomapper/user"
require "models_mongomapper/comment"

# MongoMapper 設定
MongoMapper.connection = Mongo::Connection.new('localhost')
MongoMapper.database = 'book_review'

# Top ページ
get '/' do
    haml :index, {}, :books => Book.all(:order => 'title'), :users => User.all(:order => 'name'), :action => '/comments'
end
・・・
# Comment 追加
post '/comments' do
    b = Book.find(params[:post][:book_id])
    b.comments << Comment.new(:content => params[:post][:content], :created_date => Time.now, :user_id => params[:post][:user_id])
    b.save

    redirect '/'
end
・・・

実行例

> jruby sample_mongomapper.rb

2011-01-03

Ruby でパーサーコンビネータを使った CSV ファイルのパース処理 - RParsec 使用

Java Ruby

id:fits:20101226 や id:fits:20101231 で実施したパーサーコンビネータによる CSV ファイルのパース処理を RParsec を使って JRuby でやってみました。

環境は以下の通り。

JRuby 1.5.6
RParsec 1.0

サンプルのソースは http://github.com/fits/try_samples/tree/master/blog/20110103/

事前準備 - RParsec のインストール

RubyGems を使って RParsec をインストールしておきます。

インストール

> gem install rparsec

CSVファイルのパース

以下の CSV ファイルをパースしてみる事にします。

test.csv

1,テスト1,"改行
含み"
2,test2,"カンマ,含み"
3,てすと3,"ダブルクォーテーション""含み"

注目する点は以下の通り。

改行は \r\n にマッチ
>> と << は Scala の ~> と <~ と同等
separated は Haskell の sepBy と同等
value は Haskell の return と同等
many や separated は Parser のメソッド
Haskell の noneOf の代わりに not_char や not_string、もしくは regexp で正規表現を使用

また、quotedCell の many では結果的に quotedChar の文字コード配列が返される事になるため、bind に渡したブロック内で pack を使って文字列化するようにしました。
（文字列を得るには fragment を使う方法もあるが、fragment だとパース対象文字列の一部をそのまま抜き出す事になり、今回の用途には適さない）

parse_csv.rb

require 'rubygems'
require 'rparsec'

include RParsec::Parsers

eol = string "\r\n"
#packで文字列化できるように、" の文字コードを返すようにしている（value の箇所）
quotedChar = not_char('"') | string('""') >> value('"'[0])
#manyで文字コードの配列が結果的に返るため、packを使って文字列化
quotedCell = char('"') >> quotedChar.many.bind {|s| value(s.pack("c*"))} << char('"')
cell = quotedCell | regexp(/[^,\r\n]*/)
line = cell.separated(char ',')
csvFile = (line << eol).many

cs = $stdin.readlines.join
res = csvFile.parse cs

p res
puts res

実行結果

> jruby parse_csv.rb < test.csv
[["1", "\203e\203X\203g1", "\211\374\215s\r\n\212\334\202\335"], 
["2", "test2","\203J\203\223\203},\212\334\202\335"], 
["3", "\202\304\202\267\202\3063", "\203_\203u\203\213\203N\203H\201[\203e\201[\203V\203\207\203\223\"\212\334\202\335"]]
1
テスト1
改行
含み
2
test2
カンマ,含み
3
てすと3
ダブルクォーテーション"含み

一応、改行等が正しく処理されている事が確認できました。

2010-11-29

Ruby, Groovy, Scala での Excel準拠 CSV ファイルのパース処理 - opencsv使用、Iterator.continually() 等

Java Scala Groovy Ruby

Excel の仕様に準拠した以下のような CSV ファイル（改行・カンマ・ダブルクォーテーションを要素内に含む）をパースし、第1・3の要素を標準出力に出力するサンプルを Ruby、Groovy、Scala で作成してみました。

CSVファイル例 test.csv

1,テスト1,"改行
含み"
2,test2,"カンマ,含み"
3,てすと3,"ダブルクォーテーション""含み"

出力結果例

1 : 改行
含み
2 : カンマ,含み
3 : ダブルクォーテーション"含み

サンプルのソースは http://github.com/fits/try_samples/tree/master/blog/20101129/

Ruby の場合

Ruby では標準添付されている CSV ライブラリを使います。

parse_csv.rb

require "csv"

CSV.foreach(ARGV[0]) do |r|
    puts "#{r[0]} : #{r[2]}"
end

以下の環境で実行してみました。

Ruby 1.9.2
JRuby 1.5.3

実行例

> ruby parse_csv.rb test.csv

> jruby parse_csv.rb test.csv

Groovy の場合

Groovy や後述の Scala には今回のケースのような CSV ファイルを簡単にパースできるライブラリが標準で用意されていないため、opencsv を使う事にします。

parse_csv.groovy

import java.io.FileReader
import au.com.bytecode.opencsv.CSVReader

def reader = new CSVReader(new FileReader(args[0]))

while((r = reader.readNext()) != null) {
    println "${r[0]} : ${r[2]}"
}

以下の環境で実行してみました。

Groovy 1.7.5
opencsv 2.2

実行例

> set CLASSPATH=opencsv-2.2.jar
> groovy parse_csv.groovy test.csv

Scala の場合

Scala でも opencsv を使いました。

ただし、Scala では while((r = reader.readNext()) != null) のように書けないので、代わりに Iterator.continually(reader.readNext).takeWhile(_ != null) を使います。

parse_csv.scala

import java.io.FileReader
import au.com.bytecode.opencsv.CSVReader

val reader = new CSVReader(new FileReader(args(0)))

Iterator.continually(reader.readNext).takeWhile(_ != null).foreach {r =>
    println(r(0) + " : " + r(2))
}

また、以下のようにList化してパターンマッチを使う方法もありかと思います。

parse_csv2.scala

import java.io.FileReader
import au.com.bytecode.opencsv.CSVReader

val reader = new CSVReader(new FileReader(args(0)))

Iterator.continually(reader.readNext).takeWhile(_ != null).map(_.toList).foreach {
    case no :: title :: content :: _ => println(no + " : " + content)
    case _ =>
}

以下の環境で実行してみました。

Scala 2.8.1
opencsv 2.2

実行例

> scala -cp opencsv-2.2.jar parse_csv.scala test.csv

opencsv のビルド方法

http://sourceforge.net/projects/opencsv/ から opencsv-2.2-src-with-libs.tar.gz ファイルをダウンロードし、適当なディレクトリに展開後、Ant でビルドすれば、deploy/opencsv-2.2.jar ファイルが作成されます。

ビルド例

> cd opencsv-2.2
> ant
・・・
BUILD SUCCESSFUL