Location: PHPKode > scripts > Fast Chinese Word Segmentation > fast-chinese-word-segmentation/Readme_EN.htm
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Fast Chinese Word Segmentation</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css">
body {
    margin: 5px;
    background-color: #FFFFFF;
    color: #000000;
    font-family: Arial, ËÎÌå;
    font-size: 100%;
}

td {
    padding-right: 2ex;
    background: #FFFFFF;
    text-align: right;
    font-size: 100%;
}

a {
    color: #004080;
    text-decoration: none;
}

a:visited {
    color: #800080;
    text-decoration: none;
}

a:hover {
    color: #FF0000;
    text-decoration: underline;
}

.header_title {
    margin-bottom: 0.2em;
    text-align: center;
    font-size: x-large;
    font-weight: bold;
}

.header_subtitle {
    text-align: center;
    line-height: 2.5ex;
    font-size: small;
}

h1 {
    margin-bottom: 0.5ex;
    color: #006699;
    font-size: large;
    font-weight: bold;
}

h2 {
    margin-bottom: 0.5ex;
    color: #006699;
    font-size: medium;
    font-weight: bold;
}

pre {
    padding: 1ex;
    border: 1px solid #999999;
    background-color: #EEEEEE;
    color: #000000;
    font-family: "Courier New";
}
</style>
</head>
<body>
<div class="header_title">Fast Chinese Word Segmentation</div>
<div class="header_subtitle">
    Current version: <b>0.05.2</b>, Last updated: <b>08/11/2005</b><br>
    Wudi &lt;<a href="mailto:wudicgi&#64;yahoo&#46;de">wudicgi-at-yahoo.de</a>&gt;, <a href="http://spaces.msn.com/members/wudicgi" target="_blank">MSN Space</a>
</div>
<br>
<h1>NAME</h1>
<p>Fast Chinese Word Segmentation</p>
<h1>SYNOPSIS</h1>
<pre>include_once 'cwordseg_fast.lib.php';
$str = '&#20320;&#19981;&#23601;&#26159;&#36825;&#26679;&#19968;&#22825;&#19968;&#22825;&#26179;&#36807;&#26469;&#30340;&#22043;';
$Segmentation = new Segmentation;
$Segmentation->load('cwordict_fast.tab');
$Segmentation->setLowercase(FALSE);
$Segmentation->setSegmentEnglish(TRUE);
$result = $Segmentation->segmentString($str);
echo $result;</pre>
<h1>DESCRIPTION</h1>
<p>This class can segment Chinese text. It uses the RMM (reverse maximum match) approach. Therefore it may commit some mistakes that cannot be avoided with perfection. It handles English but in a very simple way.</p>
<h1>SPEED</h1>
<table cellSpacing="1" cellPadding="4" border="0" style="margin-top: 2ex; width: 92ex; background-color: #C0C0C0;">
    <tr>
        <td style="background-color: #EEEEEE;">Ratio of Chinese</td>
        <td style="background-color: #EEEEEE;">File Size</td>
        <td style="background-color: #EEEEEE;">Time</td>
        <td style="background-color: #EEEEEE;">Speed</td>
        <td style="background-color: #EEEEEE;">Change</td>
    </tr>
    <tr>
        <td colspan="5" style="text-align: left;"><b>v0.05.2</b>, P-M 1.6G, WinXP SP2, PHP 5.0.4, CLI, Default options, Dict size: 73,270 words</td>
    </tr>
    <tr>
        <td>99%</td>
        <td>211KB</td>
        <td>2.65s</td>
        <td>79.62KB/s</td>
        <td>- 02.16KB/s</td>
    </tr>
    <tr>
        <td>39%</td>
        <td>213KB</td>
        <td>2.05s</td>
        <td>103.90KB/s</td>
        <td>+ 12.09KB/s</td>
    </tr>
    <tr>
        <td>0%</td>
        <td>413KB</td>
        <td>3.25s</td>
        <td>127.08KB/s</td>
        <td>+ 20.09KB/s</td>
    </tr>
    <tr>
        <td colspan="5" style="text-align: left;"><b>v0.05.0 - v0.05.1</b>, P-M 1.6G, WinXP SP2, PHP 5.0.4, CLI, Default options, Dict size: 73,270 words</td>
    </tr>
    <tr>
        <td>99%</td>
        <td>211KB</td>
        <td>2.58s</td>
        <td>81.78KB/s</td>
        <td>+ 00.00KB/s</td>
    </tr>
    <tr>
        <td>39%</td>
        <td>213KB</td>
        <td>2.32s</td>
        <td>91.81KB/s</td>
        <td>+ 00.00KB/s</td>
    </tr>
    <tr>
        <td>0%</td>
        <td>413KB</td>
        <td>3.86s</td>
        <td>106.99KB/s</td>
        <td>+ 00.00KB/s</td>
    </tr>
</table>
<h1>CLASS METHODS</h1>
<h2>bool load ( string filename )</h2>
<h2>bool setLowercase ( bool enable )</h2>
<h2>bool setSegmentEnglish ( bool enable )</h2>
<h2>string segmentString ( string data )</h2>
<h2>string segmentFile ( string filename )</h2>
<p></p>
<h1>HISTORY</h1>
<h2>v0.05.2 (08/11/2005)</h2>
<ul type="square">
<li> Add method getDictName()</li>
<li> Add private method _segmentLines()</li>
<li> Dictionary format changed. The first line define dict's type and name. So user must update the dict, otherwise the class will doesn't work</li>
<li> Do not check the low position byte's ASCII value</li>
<li> Some minor changes</li>
</ul>
<h2>v0.05.1 (08/04/2005)</h2>
<ul type="square">
<li> Check whether the file exists before load a dictionary</li>
<li> Use lowercase for the first word and capitalize only the first letter of each subsequent word that appears in a method name</li>
<li> Some minor changes</li>
</ul>
<h2>v0.05.0 (07/12/2005)</h2>
<ul type="square">
<li> The first public alpha version</li>
</ul>
<h1>AUTHOR</h1>
<p>2005, Wudi &lt;<a href="mailto:wudicgi&#64;yahoo&#46;de">wudicgi-at-yahoo.de</a>&gt;, <a href="http://spaces.msn.com/members/wudicgi" target="_blank">MSN Space</a></p>
<br>
</body>
</html>
Return current item: Fast Chinese Word Segmentation