英辞郎のデータをubunntuで使う

辞書ツールはgolddictにした。

次を参考にした。
英辞郎（第五版）をstardictに変換する方法：iPad/iPhoneで英辞郎を使おう:iPadイングリッシュ：iPad/iPhoneによる英会話独学術
 aaa555 no Zakki (Daily Memo)
aaa555 no Zakki (Daily Memo)

実際の手順は、

Windows上での作業

１）『英辞郎　第五版』をインストール。
２）インストールした「PDIC Unicode for EIJIRO V」を起動し、「File」＞「辞書設定<詳細>」＞「辞書設定」ダイアログで以下の4個のファイル
・EIJI-118.dic
・WAEI-118.dic
・RYAKU118.dic
・REIJI118.dic
を表示し、「辞書の変換」を選択し「辞書変換の設定」ダイアログを表示させる。「変換先ファイル形式」に「PDIC1行テキスト形式」選択する。
３）文字コード変換　「Unicode」から「UTF-8」に変換。
「文字コード変換ツール for .NET 2.0」
http://www.vector.co.jp/soft/winnt/util/se372195.html
を使う。

ubuntuでの作業

4)以下の２つのスクリプトを使って変換する。

#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use Encode;
my $infile = $ARGV[0];
if(!$infile) {
die "Usage: $0 [infile]\n";
}
my $file;
my %W;
open($file, $infile) || die "$!\n";
while(<$file>) {
my $line = decode('cp932', $_);
if($line =~ /^(.+)? : /) {
my $key = $1;
#$key =~ s/\{.+\}//;
#$key =~ s/\s+$//;
# 優先的に最初に持ってくるようにkeyにｽﾍﾟｰｽを追加
$key =~ s/\{/ \{/;
$_ =~ s/\r//;
$W{$key} = $_;
}
}
close($file);
foreach (sort keys %W) {
print $W{$_};
}

use strict;
use utf8;
use Encode;

my ($key, $value, $word, $class, $prevword, $in) = ('', '', '', '', '');
my %desc;
my %classvalue = (
'語源' => 23,
'名' => 22,
'代' => 21,
'動' => 20,
'自他動' => 19,
'他動' => 18,
'自動' => 17,
'形' => 16,
'副' => 15,
'助動' => 14,
'前' => 13,
'助' => 12,
'接続' => 11,
'接頭' => 10,
'接尾' => 9,
'間' => 8,
'句動' => 7,
'句他動' => 6,
'句自動' => 5,
'略' => 4,
'人名' => 3,
'地名' => 2,
'組織' => 1
);

while (<>) {
chomp;
$in = decode('cp932', $_);
if ( ($key, $value) = ($in =~ m/^■(.+) : (.+)$/) ) {
if ( $key =~ m/^([^\{]*[^\{\s])\s+\{([^\}]+)\}$/ ) {
($word, $class) = ($1, $2);
}
else {
($word, $class) = ($key, '-');
}
$value =~ s/\\/\\\\/g;
$value =~ s/■･/\\n /g;
$value =~ s/■/\\n/g;
$value =~ s/([a-zA-Z]+)･/$1?/g;

# $value =~ s/｛[ぁ-?／（）･＿ｰ ]+｝//g; # to remove (most of) FURIGANAs
# $value =~ s/【＠】([^､【]+､)+//; # to remove KANA pronunciations

if ( $word !~ m/[｡､]/ ) { # to remove sentences, i.e. non-words, in order to avoid StarDict crashing
if ( ($word ne $prevword) && ($prevword ne '') ) {
flush($prevword, %desc);
undef %desc;
}
$desc{$class} = $value;
$prevword = $word;
}
}
else {
print STDERR encode('cp932', "irregular line[$.]: $in\n");
}
($key, $value, $word, $class) = ('', '', '', '');
}
flush($prevword, %desc);

sub flush {
my ($word, %desc) = @_;
my ($key);
print encode('utf8', "$word\t");
if ( $desc{'-'} ) {
print encode('utf8', "$desc{'-'}\\n");
delete $desc{'-'};
}
foreach $key (sort byclass keys %desc) {
print encode('utf8', "?$key?$desc{$key}\\n");
}
print "\n";
}

sub byclass {
my ($as, $ac, $am) = ( $a =~ m/^(?:([0-9]+)\-)?([^\-\s]+)(?:\-([0-9]+))?/ );
my ($bs, $bc, $bm) = ( $b =~ m/^(?:([0-9]+)\-)?([^\-\s]+)(?:\-([0-9]+))?/ );
my @diffs = ($as - $bs, $ac - $bc, $classvalue{$bc} - $classvalue{$ac}, $ac cmp $bc, $am - $bm);
my $diff;

foreach $diff (@diffs) {
if ($diff != 0) {
return $diff;
}
}
return 0;
}

5)全てのファイルについて、以下の2種類の文字置換を行う。vimを使った。
・「///」をタブコード「￥t」（半角￥とt）に変換
:%s;///;\t;g

・「￥」（半角￥）を「￥n」（半角￥とn）に変換 ←いらないかも
:%s;/;\n;g

6）stardict-editor コマンドを実行し、 .tab ファイルを指定して [Compile] をクリックする。
7）できたファイルをgoldendictを立ち上げ、編集＞辞書でファイルをインポートする。