<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
<title><![CDATA[沧海一粟]]></title> 
<link>http://www.dzhope.com/index.php</link> 
<description><![CDATA[Web系统架构与服务器运维,php开发]]></description> 
<language>zh-cn</language> 
<copyright><![CDATA[沧海一粟]]></copyright>
<item>
<link>http://www.dzhope.com/post//</link>
<title><![CDATA[使用php扩展trie_filter，利用词库，过滤敏感词]]></title> 
<author>jed &lt;jed521@163.com&gt;</author>
<category><![CDATA[服务器技术]]></category>
<pubDate>Sat, 22 Oct 2016 12:10:57 +0000</pubDate> 
<guid>http://www.dzhope.com/post//</guid> 
<description>
<![CDATA[ 
	关键词过滤扩展，用于检查一段文本中是否出现敏感词，基于Double-Array Trie 树实现。<br/><br/>安装步骤<br/><br/>下面的$LIB_PATH为依赖库安装目录，$INSTALL_PHP_PATH为PHP5安装目录。<br/><br/>安装libdatrie依赖库<br/><div class="code"><br/>$ tar zxvf libdatrie-0.2.4.tar.gz<br/>$ cd libdatrie-0.2.4<br/>$ make clean<br/>$ ./configure --prefix=$LIB_PATH<br/>$ make<br/>$ make install<br/><br/></div><br/><br/>安装trie_filter扩展 （<a href="https://github.com/wulijun/php-ext-trie-filter" target="_blank">https://github.com/wulijun/php-ext-trie-filter</a>）<br/><div class="code"><br/>$ $INSTALL_PHP_PATH/bin/phpize<br/>$ ./configure --with-php-config=$INSTALL_PHP_PATH/bin/php-config --with-trie_filter=$LIB_PATH<br/>$ make<br/>$ make install<br/></div><br/><br/>然后修改php.ini，增加一行：extension=trie_filter.so，然后重启PHP。<br/><br/>PHP测试实例<br/><br/><div class="code"><br/>&lt;?php<br/>ini_set(&#039;memory_limit&#039;, &#039;512M&#039;);<br/>$arrWord = file(&#039;dict.txt&#039;);<br/><br/>$resTrie = trie_filter_new();<br/><br/>foreach ($arrWord as $k =&gt; $v) &#123;<br/>&nbsp;&nbsp;&nbsp;&nbsp;trie_filter_store($resTrie, $v);<br/>&#125;<br/>trie_filter_save($resTrie, __DIR__ . &#039;/blackword.tree&#039;);<br/>$resTrie = trie_filter_load(__DIR__ . &#039;/blackword.tree&#039;);<br/><br/>$str = &#039;王玉鹏的媳妇叫刘敏，王玉鹏的邮箱地址是wangyupeng@jiayuan.com，想不想知道他的QQ号呢？&#039;;<br/>$arrRet = trie_filter_search_all($resTrie, $str);<br/><br/>print_all($str, $arrRet);<br/><br/>function print_all($str, $res) &#123;//print_r($res);<br/>&nbsp;&nbsp;&nbsp;&nbsp;echo &quot;$str&#92;n&quot;;<br/>&nbsp;&nbsp;&nbsp;&nbsp;foreach ($res as $k =&gt; $v) &#123;<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;echo $k.&quot;=&gt;&#123;$v&#91;0&#93;&#125;-&#123;$v&#91;1&#93;&#125;-&quot;.substr($str, $v&#91;0&#93;, $v&#91;1&#93;).&quot;&#92;n&quot;;<br/>&nbsp;&nbsp;&nbsp;&nbsp;&#125;<br/>&#125;<br/></div><br/><br/>测试效果，输出格式（顺序值=>该敏感词出现的位置-该敏感词的长度-敏感词）<br/><br/>执行效率，高高高，超级高，速度非常快。<br/><br/><a href="http://www.dzhope.com/attachment.php?fid=77" target="_blank"><img src="http://www.dzhope.com/attachment.php?fid=77" class="insertimage" alt="点击在新窗口中浏览此图片" title="点击在新窗口中浏览此图片" border="0"/></a><br/><br/>注意事项<br/><br/>dict.txt 为敏感词库，一个词一行<br/><br/><a href="http://www.dzhope.com/attachment.php?fid=78" target="_blank"><img src="http://www.dzhope.com/attachment.php?fid=78" class="insertimage" alt="点击在新窗口中浏览此图片" title="点击在新窗口中浏览此图片" border="0"/></a><br/><br/>优化建议<br/><br/>把文本词库生成tree的过程需要时间，该步骤可以异步实现，过滤过程只需要加载tree即刻。<br/><br/><br/><br/>PHP需要5.2以上版本<br/><br/><br/><br/>相关下载：<br/><br/><a href="http://linux.thai.net/pub/thailinux/software/libthai/" target="_blank">http://linux.thai.net/pub/thailinux/software/libthai/</a><br/><br/><a href="https://github.com/wulijun/php-ext-trie-filter" target="_blank">https://github.com/wulijun/php-ext-trie-filter</a><br/><br/>【本文转自松鼠先生---http://blog.41ms.com/post/39.html】<br/><br/>Tags - <a href="http://www.dzhope.com/tags/php/" rel="tag">php</a> , <a href="http://www.dzhope.com/tags/%25E6%2595%258F%25E6%2584%259F%25E8%25AF%258D/" rel="tag">敏感词</a> , <a href="http://www.dzhope.com/tags/trie/" rel="tag">trie</a>
]]>
</description>
</item><item>
<link>http://www.dzhope.com/post//#blogcomment</link>
<title><![CDATA[[评论] 使用php扩展trie_filter，利用词库，过滤敏感词]]></title> 
<author> &lt;user@domain.com&gt;</author>
<category><![CDATA[评论]]></category>
<pubDate>Thu, 01 Jan 1970 00:00:00 +0000</pubDate> 
<guid>http://www.dzhope.com/post//#blogcomment</guid> 
<description>
<![CDATA[ 
	
]]>
</description>
</item>
</channel>
</rss>