采集金山词霸的每日一句,需要说明的是,并不能直接匹配出音频的地址,匹配出来的是类似于这样个字符串: iciba_common_top_onSecondDelay('http://news.iciba.com/admin/tts/2015-02-27.mp3'); 所以需要
iciba_common_top_onSecondDelay('http://news.iciba.com/admin/tts/2015-02-27.mp3');
所以需要用个回调函数来过滤出音频地址。
1. [代码][PHP]代码
<?php require '../QueryList/QueryList.class.php'; $url = 'http://news.iciba.com/dailysentence'; $reg = ".pic"; $rang = array( //匹配英文 "en" => array(".en>a","text"), //匹配中文 "cn" => array(".cn>a","text"), //匹配音频 "audio" => array(".sound","onmouseover",'',function($v,$k){ //回调函数,用于匹配出音频地址 if(preg_match('/http[^\']+/',$v,$arr)) { $v = $arr[0]; } return $v; }), //匹配图片 "pic" => array("a:eq(0)>img","src"), //匹配页面地址 "page" => array("a:eq(0)","href") ); $data = QueryList::Query($url,$rang,$reg)->jsonArr[0]; print_r($data);
2. [代码]采集结果
Array ( [en] => Courage is not the absence of fear, but rather the judgement that something else is more important than fear. [cn] => 勇者并非无所畏惧,而是判断出还有比恐惧更值得重视的东西。 [audio] => http://news.iciba.com/admin/tts/2015-02-27.mp3 [pic] => http://cdn.iciba.com/news/word/big_2015-02-27b.jpg?rand=8637 [page] => http://news.iciba.com/dailysentence/detail-1212.html )