<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
<title><![CDATA[沧海一粟]]></title> 
<link>http://www.dzhope.com/index.php</link> 
<description><![CDATA[Web系统架构与服务器运维,php开发]]></description> 
<language>zh-cn</language> 
<copyright><![CDATA[沧海一粟]]></copyright>
<item>
<link>http://www.dzhope.com/post//</link>
<title><![CDATA[利用PHP扩展trie_filter做中文敏感词过滤]]></title> 
<author>jed &lt;jed521@163.com&gt;</author>
<category><![CDATA[服务器技术]]></category>
<pubDate>Sat, 22 Oct 2016 12:28:25 +0000</pubDate> 
<guid>http://www.dzhope.com/post//</guid> 
<description>
<![CDATA[ 
	1.安装libiconv，这个是libdatrie的依赖项<br/><div class="code"><br/>wget http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.14.tar.gz&nbsp;&nbsp; <br/>tar zxvf libiconv-1.14.tar.gz&nbsp;&nbsp; <br/>cd libiconv-1.14&nbsp;&nbsp; <br/>./configure&nbsp;&nbsp; <br/>make&nbsp;&nbsp; <br/>make install&nbsp;&nbsp;<br/></div><br/><br/>2. 安装：libdatrie (<a href="http://linux.thai.net/~thep/datrie/datrie.html#Download)" target="_blank">http://linux.thai.net/~thep/datrie/datrie.html#Download)</a> <br/>网友&nbsp;&nbsp;&nbsp;&nbsp; 囚蝶i&nbsp;&nbsp;&nbsp;&nbsp; 反馈现在这个网站的download&nbsp;&nbsp;已经不能下载了。<br/>可以使用svn 来下载 checkout 地址 ：<a href="http://Linux.thai.NET/svn/software/datrie/tags/r_0_2_4" target="_blank">http://Linux.thai.NET/svn/software/datrie/tags/r_0_2_4</a><br/>谢谢 囚蝶i 的反馈<br/><br/><div class="code"><br/>wget http://linux.thai.net/pub/ThaiLinux/software/libthai/libdatrie-0.2.4.tar.gz<br/>tar zxf libdatrie-0.2.4.tar.gz&nbsp;&nbsp;&nbsp;&nbsp; <br/>cd libdatrie-0.2.4&nbsp;&nbsp;&nbsp;&nbsp; <br/>./configure --prefix=/usr/local&nbsp;&nbsp;&nbsp;&nbsp; <br/>make&nbsp;&nbsp;&nbsp;&nbsp; <br/>make install&nbsp;&nbsp;<br/><br/></div><br/><br/>编译出现错误 trietool.c:125: undefined reference to `libiconv'<br/>解决办法为：./configure LDFLAGS=-L/usr/local/lib LIBS=-liconv<br/><br/>3. 安装 trie_filter 扩展 <br/>由于官方trie_filter扩展对中文支持的不是很好，所以在Git上找到了一个在官方扩展上面改写的扩展经过测试没有问题<br/>安装方法如下：<br/><a href="https://github.com/wulijun/PHP-ext-trie-filter&nbsp;&nbsp;" target="_blank">https://github.com/wulijun/PHP-ext-trie-filter&nbsp;&nbsp;</a>在这里下载源码包<br/><br/><div class="code"><br/>phpize&nbsp;&nbsp;<br/>./configure --with-php-config=/usr/local/bin/php-config&nbsp;&nbsp; <br/>make&nbsp;&nbsp;<br/>make install<br/></div><br/><br/>4. 修改 php.ini 文件，添加 trie_filter 扩展：extension=trie_filter.so，重启PHP。<br/>&nbsp;&nbsp; 查看phpinfo发现trie_filter 扩展可用，如下图所示：<br/><br/>5、生成用语检测的词典，由于上面下载的源码包中并没有带生成词典的命令 所以还需要下载官方的源码包<br/>(<a href="https://code.google.com/p/as3chat/downloads/detail?name=trie_filter-2011-03-21.tar.gz)" target="_blank">https://code.google.com/p/as3chat/downloads/detail?name=trie_filter-2011-03-21.tar.gz)</a><br/><br/><div class="code"><br/>tar zxf trie_filter-2011.03.21.tar.gz&nbsp;&nbsp;&nbsp;&nbsp; <br/>cd trie_filter-2011.03.21&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br/>&nbsp;&nbsp;<br/>gcc -o dpp dpp.c -ldatrie // 生成dpp命令用语编译词典&nbsp;&nbsp;<br/>&nbsp;&nbsp;<br/>./dpp words.txt words.dic&nbsp;&nbsp;//将words.txt 编译成trie_filter使用的词典 words.txt中每个词占一行&nbsp;&nbsp;<br/><br/></div><br/><br/>生成词典的时候 报错：./dpp: error while loading shared libraries: libdatrie.so.1: cannot open shared object file: No such file or directory<br/><br/>解决办法：执行<br/><div class="code"><br/>ldconfig&nbsp;&nbsp;<br/></div><br/>然后在执行<br/><div class="code"><br/>./dpp words.txt words.dic&nbsp;&nbsp;<br/></div><br/><br/>就好了<br/><br/>6、测试：<br/><div class="code"><br/>&lt;?php&nbsp;&nbsp;&nbsp;&nbsp;<br/>/**&nbsp;&nbsp;<br/> * trie_filter 敏感词过滤示例&nbsp;&nbsp;<br/> *&nbsp;&nbsp; <br/> **/&nbsp;&nbsp;&nbsp;&nbsp;<br/>&nbsp;&nbsp;&nbsp;&nbsp;<br/>// 载入词典，成功返回一个 Trie_Filter 资源句柄，失败返回 NULL&nbsp;&nbsp;&nbsp;&nbsp;<br/>$file = trie_filter_load(&#039;./words.dic&#039;);&nbsp;&nbsp;&nbsp;&nbsp;<br/>var_dump($file);&nbsp;&nbsp;&nbsp;&nbsp;<br/>$str1 = &#039;今天利用trie_filter做敏感词过滤示例&#039;;&nbsp;&nbsp;&nbsp;&nbsp;<br/>$str2 = &#039;今天利用trie_filter做过滤示例&#039;;&nbsp;&nbsp;&nbsp;&nbsp;<br/>// 检测文本中是否含有词典中定义的敏感词(假设敏感词设定为：‘敏感词’)&nbsp;&nbsp;&nbsp;&nbsp;<br/>$res1 = trie_filter_search_all($file, $str1);&nbsp;&nbsp;// 一次把所有的敏感词都检测出来&nbsp;&nbsp;<br/>$res2 = trie_filter_search($file, $str2);// 每次只检测一个敏感词&nbsp;&nbsp;&nbsp;&nbsp;<br/>var_dump($res1);&nbsp;&nbsp;&nbsp;&nbsp;<br/>echo &quot;&lt;br/&gt;&quot;;&nbsp;&nbsp;&nbsp;&nbsp;<br/>var_dump($res2);&nbsp;&nbsp;<br/>trie_filter_free($file); //最后别忘记调用free&nbsp;&nbsp;<br/></div><br/>建议使用php 5.3.3以上的版本，我使用的是5.3.3<br/><br/>我使用5.2.17版本时候：trie_filter_search_all 这个函数会有错误<br/>Tags - <a href="http://www.dzhope.com/tags/php/" rel="tag">php</a> , <a href="http://www.dzhope.com/tags/%25E6%2595%258F%25E6%2584%259F%25E8%25AF%258D/" rel="tag">敏感词</a> , <a href="http://www.dzhope.com/tags/trie_filter/" rel="tag">trie_filter</a>
]]>
</description>
</item><item>
<link>http://www.dzhope.com/post//#blogcomment</link>
<title><![CDATA[[评论] 利用PHP扩展trie_filter做中文敏感词过滤]]></title> 
<author> &lt;user@domain.com&gt;</author>
<category><![CDATA[评论]]></category>
<pubDate>Thu, 01 Jan 1970 00:00:00 +0000</pubDate> 
<guid>http://www.dzhope.com/post//#blogcomment</guid> 
<description>
<![CDATA[ 
	
]]>
</description>
</item>
</channel>
</rss>