Location: PHPKode > scripts > Convert Character set (convert charset) > convert-character-set-convert-charset/Doc/class.ConvertCharset.html
			<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><title>ConvertCharset</title><link rel="stylesheet" href="styles/interface.css"/><script language="javascript" type="text/javascript" src="tools/ua.js"></script><script language="javascript" type="text/javascript" src="tools/VisibilitySwitch.js"></script><script language="javascript" type="text/javascript">
	function showAll(){
		
       	showNode('ConvertCharset-RecognizedEncoding');
		
       	showNode('ConvertCharset-Entities');
		
       	showNode('ConvertCharset-UnicodeEntity');
		
       	showNode('ConvertCharset-HexToUtf');
		
       	showNode('ConvertCharset-MakeConvertTable');
		
       	showNode('ConvertCharset-Convert');
		
       	showNode('ConvertCharset-DebugOutput');
		
	}
	function hideAll(){
		
       	hideNode('ConvertCharset-RecognizedEncoding');
		
       	hideNode('ConvertCharset-Entities');
		
       	hideNode('ConvertCharset-UnicodeEntity');
		
       	hideNode('ConvertCharset-HexToUtf');
		
       	hideNode('ConvertCharset-MakeConvertTable');
		
       	hideNode('ConvertCharset-Convert');
		
       	hideNode('ConvertCharset-DebugOutput');
		
	}
</script></head><body bgcolor="#FFFFFF" onLoad="javascript:hideAll();"><a name="Top"></a>
[ <a href="javascript:showAll()">Expand All</a> ]
[ <a href="javascript:hideAll()">Collapse All</a> ] - 
[ <a href="#Properties">Properties (2)</a> ] 
[ <a href="#Methods">Methods (5)</a> ]
- 
[ <a href="tools/Legend.html">Legend</a> ]
<div class="separator"></div><h1><img src="images/iconClass.png" height="22" width="22" alt="" border="0" align="center"/><strong>ConvertCharset</strong> Class <span class="version">v. 1.0 2004-07-27 23:11</span></h1><table class="none" border="0" cellspacing="0" cellpadding="0" width="100%"><tr valign="top"><td width="20%"><div style="border: 1px dotted #CACACA; padding: 3px; margin-right: 10px;"><center><table border="0" cellspacing="1" cellpadding="3"><tr><td class="btCell" align="center" style="padding-left: 10px; padding-right: 10px;"><a href="class.ConvertCharset.html">ConvertCharset</a></td></tr></table></center></div></td><td width="80%"><p><p><h3>1.0 2004-07-28</h3></p><p><h3>The most important thing</h3>
        I want to thank all people who helped me fix all bugs, small and big once.   
        I hope that you don't mind that your names are in this file.</p><p><h3>Some Apache issues</h3>
        I get info from Lukas Lisa, that in some cases with special apache configuration   
        you have to put header() function with proper encoding to get your result   
        displayed correctly.   
        If you want to see what I mean, go to demo.php and demo1.php</p><p><h3>BETA 1.0 2003-10-21</h3></p><p><h3>You should know about...</h3>
        For good understanding this class you shouls read all this stuff first :) but if you are   
        in a hurry just start the demo.php and see what's inside.  <ol><li><span class="li">That I'm not good in english at 03:45 :) - so forgive me all mistakes</span></li><li><span class="li">This class is a BETA version because I haven't tested it enough</span></li><li><span class="li">Feel free to contact me with questions, bug reports and mistakes in PHP and this documentation (email below)</span></li></ol></p><p><h3>In a few words...</h3>
        Why ConvertCharset class?</p><p>I have made this class because I had a lot of problems with diferent charsets. First because people   
        from Microsoft wanted to have thair own encoding, second because people from Macromedia didn't   
        thought about other languages, third because sometimes I need to use text written on MAC, and of course   
        it has its own encoding :)</p><p>Notice &amp; remember:  <ul><li><span class="li">When I'm saying 1 byte string I mean 1 byte per char.</span></li><li><span class="li">When I'm saying multibyte string I mean more than one byte per char.</span></li></ul></p><p>So, this are main FEATURES of this class:  <ul><li><span class="li">conversion between 1 byte charsets</span></li><li><span class="li">conversion from 1 byte to multi byte charset (utf-8)</span></li><li><span class="li">conversion from multibyte charset (utf-8) to 1 byte charset</span></li><li><span class="li">every conversion output can be save with numeric entities (browser charset independent - not a full truth)</span></li></ul></p><p>This is a list of charsets you can operate with, the basic rule is that a char have to be in both charsets,   
        otherwise you'll get an error.</p><p><ul><li><span class="li">WINDOWS</span></li><li><span class="li">windows-1250 - Central Europe</span></li><li><span class="li">windows-1251 - Cyrillic</span></li><li><span class="li">windows-1252 - Latin I</span></li><li><span class="li">windows-1253 - Greek</span></li><li><span class="li">windows-1254 - Turkish</span></li><li><span class="li">windows-1255 - Hebrew</span></li><li><span class="li">windows-1256 - Arabic</span></li><li><span class="li">windows-1257 - Baltic</span></li><li><span class="li">windows-1258 - Viet Nam</span></li><li><span class="li">cp874 - Thai - this file is also for DOS</span></li></ul></p><p><ul><li><span class="li">DOS</span></li><li><span class="li">cp437 - Latin US</span></li><li><span class="li">cp737 - Greek</span></li><li><span class="li">cp775 - BaltRim</span></li><li><span class="li">cp850 - Latin1</span></li><li><span class="li">cp852 - Latin2</span></li><li><span class="li">cp855 - Cyrylic</span></li><li><span class="li">cp857 - Turkish</span></li><li><span class="li">cp860 - Portuguese</span></li><li><span class="li">cp861 - Iceland</span></li><li><span class="li">cp862 - Hebrew</span></li><li><span class="li">cp863 - Canada</span></li><li><span class="li">cp864 - Arabic</span></li><li><span class="li">cp865 - Nordic</span></li><li><span class="li">cp866 - Cyrylic Russian (this is the one, used in IE "Cyrillic (DOS)" )</span></li><li><span class="li">cp869 - Greek2</span></li></ul></p><p><ul><li><span class="li">MAC (Apple)</span></li><li><span class="li">x-mac-cyrillic</span></li><li><span class="li">x-mac-greek</span></li><li><span class="li">x-mac-icelandic</span></li><li><span class="li">x-mac-ce</span></li><li><span class="li">x-mac-roman</span></li></ul></p><p><ul><li><span class="li">ISO (Unix/Linux)</span></li><li><span class="li">iso-8859-1</span></li><li><span class="li">iso-8859-2</span></li><li><span class="li">iso-8859-3</span></li><li><span class="li">iso-8859-4</span></li><li><span class="li">iso-8859-5</span></li><li><span class="li">iso-8859-6</span></li><li><span class="li">iso-8859-7</span></li><li><span class="li">iso-8859-8</span></li><li><span class="li">iso-8859-9</span></li><li><span class="li">iso-8859-10</span></li><li><span class="li">iso-8859-11</span></li><li><span class="li">iso-8859-12</span></li><li><span class="li">iso-8859-13</span></li><li><span class="li">iso-8859-14</span></li><li><span class="li">iso-8859-15</span></li><li><span class="li">iso-8859-16</span></li></ul></p><p><ul><li><span class="li">MISCELLANEOUS</span></li><li><span class="li">gsm0338 (ETSI GSM 03.38)</span></li><li><span class="li">cp037</span></li><li><span class="li">cp424</span></li><li><span class="li">cp500 </span></li><li><span class="li">cp856</span></li><li><span class="li">cp875</span></li><li><span class="li">cp1006</span></li><li><span class="li">cp1026</span></li><li><span class="li">koi8-r (Cyrillic)</span></li><li><span class="li">koi8-u (Cyrillic Ukrainian)</span></li><li><span class="li">nextstep</span></li><li><span class="li">us-ascii</span></li><li><span class="li">us-ascii-quotes</span></li></ul></p><p><ul><li><span class="li">DSP implementation for NeXT</span></li><li><span class="li">stdenc</span></li><li><span class="li">symbol</span></li><li><span class="li">zdingbat</span></li></ul></p><p><ul><li><span class="li">And specially for old Polish programs</span></li><li><span class="li">mazovia</span></li></ul></p><p><h3>Now, to the point...</h3>
        Here are main variables.</p><p>DEBUG_MODE</p><p>You can set this value to:  <ul><li><span class="li">-1 - No errors or comments</span></li><li><span class="li">0  - Only error messages, no comments</span></li><li><span class="li">1  - Error messages and comments</span></li></ul></p><p>Default value is 1, and during first steps with class it should be left as is. </p><p>CONVERT_TABLES_DIR</p><p>This is a place where you store all files with charset encodings. Filenames should have   
        the same names as encodings. My advise is to keep existing names, because thay   
        were taken from unicode.org (  <a href="http://www.unicode.org">www.unicode.org</a>
        ), and after update to unicode 3.0 or 4.0   
        the names of files will be the same, so if you want to save your time...uff, leave the   
        names as thay are for future updates.</p><p>The directory with edings files should be in a class location directory by default,   
        but of course you can change it if you like. </p></p><table border="0" cellspacing="1" cellpadding="3"><tr><th class="btHead">Related Topics</th></tr><tr><td class="btCell"><a href="http://www.unicode.org"><img src="images/link.gif" height="11" width="11" alt="" border="0"/></a> <a href="http://www.unicode.org">Unicode Homepage</a></td></tr></table><table border="0" cellspacing="1" cellpadding="3"><tr><th class="btHead" align="center">
		Author</th><td class="btCell"><a href="mailto:hide@address.com">Mikolaj Jedrzejak</a></td></tr><tr><th class="btHead" align="center">Copyright</th><td class="btCell"> Copyright Mikolaj Jedrzejak (c) 2003-2004
</td></tr></table></td></tr></table><div class="separator"><a name="Properties"></a><strong>Properties</strong> implemented by ConvertCharset</div><a name="m_Entities"/><div><a href="javascript:toggleNodeVisibility('ConvertCharset-Entities');" class="property"><img id="imgConvertCharset-Entities" height="9" width="9" border="0" hspace="3" src="images/minus.gif"/></a><img src="images/propertyPublic.gif" border="0" alt="public method"/> <a href="javascript:toggleNodeVisibility('ConvertCharset-Entities');" class="property"><strong class="property">Entities</strong></a><div class="hideableItem" style="display: block;" id="paneConvertCharset-Entities"><p> This value keeps information if output should be with numeric entities.</p><div class="separator"><a href="#Top"><img src="images/goTop.gif" height="7" width="11" alt="Top" border="0"/></a><a href="#Top">Top</a></div></div></div><a name="m_RecognizedEncoding"/><div><a href="javascript:toggleNodeVisibility('ConvertCharset-RecognizedEncoding');" class="property"><img id="imgConvertCharset-RecognizedEncoding" height="9" width="9" border="0" hspace="3" src="images/minus.gif"/></a><img src="images/propertyPublic.gif" border="0" alt="public method"/> <a href="javascript:toggleNodeVisibility('ConvertCharset-RecognizedEncoding');" class="property"><strong class="property">RecognizedEncoding</strong></a><div class="hideableItem" style="display: block;" id="paneConvertCharset-RecognizedEncoding"><p>This value keeps information if string contains multibyte chars.</p><div class="separator"><a href="#Top"><img src="images/goTop.gif" height="7" width="11" alt="Top" border="0"/></a><a href="#Top">Top</a></div></div></div><div class="separator"><a name="Methods"></a><strong>Methods</strong> implemented by ConvertCharset</div><a name="m_Convert"/><div><a href="javascript:toggleNodeVisibility('ConvertCharset-Convert');" class="method"><img id="imgConvertCharset-Convert" height="9" width="9" border="0" hspace="3" src="images/minus.gif"/></a><img src="images/methodPublic.gif" border="0" alt="public method"/> <a href="javascript:toggleNodeVisibility('ConvertCharset-Convert');" class="method"><strong class="method">Convert</strong></a><div class="hideableItem" style="display: block;" id="paneConvertCharset-Convert"><p><table border="0" cellspacing="1" cellpadding="3"><tr><th class="btHead" align="center">PHP</th><td class="btCell">string <strong>Convert</strong>(string<strong>$StringToChange</strong>, string<strong>$FromCharset</strong>, string<strong>$ToCharset</strong>, boolean<strong>$TurnOnEntities</strong>)
		</td><td class="btCell">v. 1.0 2004-07-27 01:09</td></tr></table></p><p><p>ConvertCharset::Convert()</p><p>This is a basic function you are using. I hope that you can figure out this function syntax :-)</p></p><p><strong>Arguments</strong><ul><li><span class="li"><strong>$StringToChange</strong><p><p>The string you want to change :)</p></p></span></li><li><span class="li"><strong>$FromCharset</strong><p><p>Name of $StringToChange encoding, you have to know it.</p></p></span></li><li><span class="li"><strong>$ToCharset</strong><p><p>Name of a charset you want to get for $StringToChange.</p></p></span></li><li><span class="li"><strong>$TurnOnEntities</strong>[optional, default value = false]<p><p>Set to true or 1 if you want to use numeric entities insted of regular chars.</p></p></span></li></ul></p><p><strong>Return</strong></p><p>Converted string in brand new encoding :)</p><div class="separator"><a href="#Top"><img src="images/goTop.gif" height="7" width="11" alt="Top" border="0"/></a><a href="#Top">Top</a></div></div></div><a name="m_DebugOutput"/><div><a href="javascript:toggleNodeVisibility('ConvertCharset-DebugOutput');" class="method"><img id="imgConvertCharset-DebugOutput" height="9" width="9" border="0" hspace="3" src="images/minus.gif"/></a><img src="images/methodPublic.gif" border="0" alt="public method"/> <a href="javascript:toggleNodeVisibility('ConvertCharset-DebugOutput');" class="method"><strong class="method">DebugOutput</strong></a><div class="hideableItem" style="display: block;" id="paneConvertCharset-DebugOutput"><p><table border="0" cellspacing="1" cellpadding="3"><tr><th class="btHead" align="center">PHP</th><td class="btCell">string <strong>DebugOutput</strong>(integer<strong>$Group</strong>, integer<strong>$Number</strong>, mix<strong>$Value</strong>)
		</td></tr></table></p><p><p>ConvertCharset::DebugOutput()</p><p>This function is not really necessary, the debug output could stay inside of   
          source code but like this, it's easier to manage and translate.   
          Besides I couldn't find good coment/debug class :-) Maybe I'll write one someday... </p><p>All messages depend on DEBUG_MODE level, as I was writing before you can set this value to:  <ul><li><span class="li">-1 - No errors or notces are shown</span></li><li><span class="li">0  - Only error messages are shown, no notices </span></li><li><span class="li">1  - Error messages and notices are shown</span></li></ul></p></p><p><strong>Arguments</strong><ul><li><span class="li"><strong>$Group</strong><p><p>Message groupe: error - 0, notice - 1</p></p></span></li><li><span class="li"><strong>$Number</strong><p><p>Following message number </p></p></span></li><li><span class="li"><strong>$Value</strong>[optional, default value = false]<p><p>This walue is whatever you want, usualy it's some parameter value, for better message understanding.</p></p></span></li></ul></p><p><strong>Return</strong></p><p>String with a proper message.</p><div class="separator"><a href="#Top"><img src="images/goTop.gif" height="7" width="11" alt="Top" border="0"/></a><a href="#Top">Top</a></div></div></div><a name="m_HexToUtf"/><div><a href="javascript:toggleNodeVisibility('ConvertCharset-HexToUtf');" class="method"><img id="imgConvertCharset-HexToUtf" height="9" width="9" border="0" hspace="3" src="images/minus.gif"/></a><img src="images/methodPublic.gif" border="0" alt="public method"/> <a href="javascript:toggleNodeVisibility('ConvertCharset-HexToUtf');" class="method"><strong class="method">HexToUtf</strong></a><div class="hideableItem" style="display: block;" id="paneConvertCharset-HexToUtf"><p><table border="0" cellspacing="1" cellpadding="3"><tr><th class="btHead" align="center">PHP</th><td class="btCell">string <strong>HexToUtf</strong>(string<strong>$UtfCharInHex</strong>)
		</td></tr></table></p><p><p>ConvertCharset::HexToUtf()</p><p>This simple function gets unicode  char up to 4 bytes and return it as a regular char.   
          It is very similar to  UnicodeEntity function (link below). There is one difference    
          in returned format. This time it's a regular char(s), in most cases it will be one or two chars. </p></p><table border="0" cellspacing="1" cellpadding="3"><tr><th class="btHead">See also</th></tr><tr><td class="btCell"><a href="class.ConvertCharset.html#m_UnicodeEntity">ConvertCharset::UnicodeEntity()</a></td></tr></table><p><strong>Arguments</strong><ul><li><span class="li"><strong>$UtfCharInHex</strong><p><p>Hexadecimal value of a unicode char.</p></p></span></li></ul></p><p><strong>Return</strong></p><p>Encoded hexadecimal value as a regular char.</p><div class="separator"><a href="#Top"><img src="images/goTop.gif" height="7" width="11" alt="Top" border="0"/></a><a href="#Top">Top</a></div></div></div><a name="m_MakeConvertTable"/><div><a href="javascript:toggleNodeVisibility('ConvertCharset-MakeConvertTable');" class="method"><img id="imgConvertCharset-MakeConvertTable" height="9" width="9" border="0" hspace="3" src="images/minus.gif"/></a><img src="images/methodPublic.gif" border="0" alt="public method"/> <a href="javascript:toggleNodeVisibility('ConvertCharset-MakeConvertTable');" class="method"><strong class="method">MakeConvertTable</strong></a><div class="hideableItem" style="display: block;" id="paneConvertCharset-MakeConvertTable"><p><table border="0" cellspacing="1" cellpadding="3"><tr><th class="btHead" align="center">PHP</th><td class="btCell">array <strong>MakeConvertTable</strong>(string<strong>$FirstEncoding</strong>, string<strong>$SecondEncoding</strong>)
		</td></tr></table></p><p><p>CharsetChange::MakeConvertTable()</p><p>This function creates table with two SBCS (Single Byte Character Set). Every conversion   
          is through this table.</p><p><ul><li><span class="li">The file with encoding tables have to be save in "Format A" of unicode.org charset table format! This is usualy writen in a header of every charset file.</span></li><li><span class="li">BOTH charsets MUST be SBCS</span></li><li><span class="li">The files with encoding tables have to be complet (Non of chars can be missing, unles you are sure you are not going to use it)</span></li></ul></p><p>"Format A" encoding file, if you have to build it by yourself should aplly these rules:  <ul><li><span class="li">you can comment everything with #</span></li><li><span class="li">first column contains 1 byte chars in hex starting from 0x..</span></li><li><span class="li">second column contains unicode equivalent in hex starting from 0x....</span></li><li><span class="li">then every next column is optional, but in "Format A" it should contain unicode char name or/and your own comment</span></li><li><span class="li">the columns can be splited by "spaces", "tabs", "," or any combination of these</span></li><li><span class="li">below is an example</span></li></ul></p><pre class="code">
#
#	The entries are in ANSI X3.4 order.
#
0x00	0x0000	#	NULL end extra comment, if needed
0x01	0x0001	#	START OF HEADING
# Oh, one more thing, you can make comments inside of a rows if you like.
0x02	0x0002	#	START OF TEXT
0x03	0x0003	#	END OF TEXT
next line, and so on...
</pre><p></p><p>You can get full tables with encodings from   <a href="http://www.unicode.org">http://www.unicode.org</a></p></p><p><strong>Arguments</strong><ul><li><span class="li"><strong>$FirstEncoding</strong><p><p>Name of first encoding and first encoding filename (thay have to be the same)</p></p></span></li><li><span class="li"><strong>$SecondEncoding</strong>[optional, default value = ""]<p><p>Name of second encoding and second encoding filename (thay have to be the same). Optional for building a joined table.</p></p></span></li></ul></p><p><strong>Return</strong></p><p>Table necessary to change one encoding to another.</p><div class="separator"><a href="#Top"><img src="images/goTop.gif" height="7" width="11" alt="Top" border="0"/></a><a href="#Top">Top</a></div></div></div><a name="m_UnicodeEntity"/><div><a href="javascript:toggleNodeVisibility('ConvertCharset-UnicodeEntity');" class="method"><img id="imgConvertCharset-UnicodeEntity" height="9" width="9" border="0" hspace="3" src="images/minus.gif"/></a><img src="images/methodPublic.gif" border="0" alt="public method"/> <a href="javascript:toggleNodeVisibility('ConvertCharset-UnicodeEntity');" class="method"><strong class="method">UnicodeEntity</strong></a><div class="hideableItem" style="display: block;" id="paneConvertCharset-UnicodeEntity"><p><table border="0" cellspacing="1" cellpadding="3"><tr><th class="btHead" align="center">PHP</th><td class="btCell">string <strong>UnicodeEntity</strong>(string<strong>$UnicodeString</strong>)
		</td></tr></table></p><p><p>CharsetChange::NumUnicodeEntity()</p><p>Unicode encoding bytes, bits representation.   
          Each b represents a bit that can be used to store character data.  <ul><li><span class="li">bytes, bits, binary representation</span></li><li><span class="li">1,   7,  0bbbbbbb</span></li><li><span class="li">2,  11,  110bbbbb 10bbbbbb</span></li><li><span class="li">3,  16,  1110bbbb 10bbbbbb 10bbbbbb</span></li><li><span class="li">4,  21,  11110bbb 10bbbbbb 10bbbbbb 10bbbbbb</span></li></ul></p><p>This function is written in a "long" way, for everyone who woluld like to analize   
          the process of unicode encoding and understand it. All other functions like HexToUtf   
          will be written in a "shortest" way I can write tham :) it does'n mean thay are short   
          of course. You can chech it in HexToUtf() (link below) - very similar function.</p><p>IMPORTANT: Remember that $UnicodeString input CANNOT have single byte upper half   
          extended ASCII codes, why? Because there is a posibility that this function will eat   
          the following char thinking it's miltibyte unicode char.</p></p><table border="0" cellspacing="1" cellpadding="3"><tr><th class="btHead">See also</th></tr><tr><td class="btCell"><a href="class.ConvertCharset.html#m_HexToUtf">ConvertCharset::HexToUtf()</a></td></tr></table><p><strong>Arguments</strong><ul><li><span class="li"><strong>$UnicodeString</strong><p><p>Input Unicode string (1 char can take more than 1 byte)</p></p></span></li></ul></p><p><strong>Return</strong></p><p>This is an input string olso with unicode chars, bus saved as entities</p><div class="separator"><a href="#Top"><img src="images/goTop.gif" height="7" width="11" alt="Top" border="0"/></a><a href="#Top">Top</a></div></div></div><div class="separator">
Generated by PHPEdit - <a href="http://www.phpedit.net/">http://www.phpedit.net/</a> - Copyright © 1999-2003 - <a href="mailto:hide@address.com">Sébastien Hordeaux</a></div></body></html>
Return current item: Convert Character set (convert charset)