utf8フラグを一気に落とすモジュール Unicode::RecursiveDowngrad

http://blog.livedoor.jp/nipotan/archives/50228106.html
おおー
ハッシュや配列のutf8フラグを一括で落としたいときはこれを使うといいらしい。
さっきのWeb::Scraperテストコードで試してみる。

落とす前

use Web::Scraper;
use LWP::Simple;
use Data::Dumper;

my $scraper = scraper {
    process '//th[@class="rank2"]/../td', 'items[]' => scraper {
      process '//div[@class="title"]/p/a', 'text[]' => 'TEXT';
      process '//div[@class="item"]/a', 'link[]' => '@href';
      process '//div[@class="item"]/a/img', 'image[]' => '@src';
      process '//div[@class="price"]/span', 'price[]' => 'TEXT';
    };
};
my $result = $scraper->scrape(get("http://kakaku.com/game/game-console/"));
print Dumper($result);

結果:

$VAR1 = {
          'items' => [
                       {
                         'link' => [
                                     'http://kakaku.com/item/20502010150/'
                                   ],
                         'text' => [
                                     "Wii [\x{30a6}\x{30a3}\x{30fc}] (Wii\x{30ea}\x{30e2}\x{30b3}\x{30f3}\x{30b8}\x{30e3}\x{30b1}\x{30c3}\x{30c8}\x{540c}\x{68b1})"
                                   ],
                         'price' => [
                                      "\x{a5}17,759"
                                    ],
                         'image' => [
                                      'http://img.kakaku.com/images/productimage/m/20502010150.jpg'
                                    ]
                       },
                       {
                         'link' => [
                                     'http://kakaku.com/item/K0000080401/'
                                   ],
                         'text' => [
                                     "\x{30d7}\x{30ec}\x{30a4}\x{30b9}\x{30c6}\x{30fc}\x{30b7}\x{30e7}\x{30f3}3 HDD 250GB \x{30c1}\x{30e3}\x{30b3}\x{30fc}\x{30eb}\x{30fb}\x{30d6}\x{30e9}\x{30c3}\x{30af} CECH-2000B"
                                   ],
                         'price' => [
                                      "\x{a5}31,078"
                                    ],
                         'image' => [
                                      'http://img.kakaku.com/images/productimage/m/K0000080401.jpg'
                                    ]
                       },
                       {
                         'link' => [
                                     'http://kakaku.com/item/20504010167/'
                                   ],
                         'text' => [
                                     "PSP \x{30d7}\x{30ec}\x{30a4}\x{30b9}\x{30c6}\x{30fc}\x{30b7}\x{30e7}\x{30f3}\x{30fb}\x{30dd}\x{30fc}\x{30bf}\x{30d6}\x{30eb} \x{30d4}\x{30a2}\x{30ce}\x{30fb}\x{30d6}\x{30e9}\x{30c3}\x{30af} PSP-3000 PB"
                                   ],
                         'price' => [
                                      "\x{a5}15,400"
                                    ],
                         'image' => [
                                      'http://img.kakaku.com/images/productimage/m/20504010167.jpg'
                                    ]
                       }
                     ]
        };

フラグ付き文字列になってる。

Unicode::RecursiveDowngradを使うと…

use Web::Scraper;
use LWP::Simple;
use Data::Dumper;
use Unicode::RecursiveDowngrade;

my $scraper = scraper {
    process '//th[@class="rank2"]/../td', 'items[]' => scraper {
      process '//div[@class="title"]/p/a', 'text[]' => 'TEXT';
      process '//div[@class="item"]/a', 'link[]' => '@href';
      process '//div[@class="item"]/a/img', 'image[]' => '@src';
      process '//div[@class="price"]/span', 'price[]' => 'TEXT';
    };
};
my $result = $scraper->scrape(get("http://kakaku.com/game/game-console/"));
$result = Unicode::RecursiveDowngrade->new->downgrade($result);
print Dumper($result);

結果:

$VAR1 = {
          'items' => [
                       {
                         'link' => [
                                     'http://kakaku.com/item/20502010150/'
                                   ],
                         'text' => [
                                     'Wii [ウィー] (Wiiリモコンジャケット同梱)'
                                   ],
                         'price' => [
                                      '\17,759'
                                    ],
                         'image' => [
                                      'http://img.kakaku.com/images/productimage/m/20502010150.jpg'
                                    ]
                       },
                       {
                         'link' => [
                                     'http://kakaku.com/item/K0000080401/'
                                   ],
                         'text' => [
                                     'プレイステーション3 HDD 250GB チャコール・ブラック CECH-2000B'
                                   ],
                         'price' => [
                                      '\31,078'
                                    ],
                         'image' => [
                                      'http://img.kakaku.com/images/productimage/m/K0000080401.jpg'
                                    ]
                       },
                       {
                         'link' => [
                                     'http://kakaku.com/item/20504010167/'
                                   ],
                         'text' => [
                                     'PSP プレイステーション・ポータブル ピアノ・ブラック PSP-3000 PB'
                                   ],
                         'price' => [
                                      '\15,400'
                                    ],
                         'image' => [
                                      'http://img.kakaku.com/images/productimage/m/20504010167.jpg'
                                    ]
                       }
                     ]
        };

おおー。素晴らしす