PHP
downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

xml_error_string> <utf8_decode
Last updated: Tue, 30 Jun 2009

view this page in

utf8_encode

(PHP 4, PHP 5)

utf8_encodeEncodes an ISO-8859-1 string to UTF-8

Description

string utf8_encode ( string $data )

This function encodes the string data to UTF-8, and returns the encoded version. UTF-8 is a standard mechanism used by Unicode for encoding wide character values into a byte stream. UTF-8 is transparent to plain ASCII characters, is self-synchronized (meaning it is possible for a program to figure out where in the bytestream characters start) and can be used with normal string comparison functions for sorting and such. PHP encodes UTF-8 characters in up to four bytes, like this:

UTF-8 encoding
bytes bits representation
1 7 0bbbbbbb
2 11 110bbbbb 10bbbbbb
3 16 1110bbbb 10bbbbbb 10bbbbbb
4 21 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb

Each b represents a bit that can be used to store character data.

Parameters

data

An ISO-8859-1 string.

Return Values

Returns the UTF-8 translation of data .



xml_error_string> <utf8_decode
Last updated: Tue, 30 Jun 2009
 
add a note add a note User Contributed Notes
utf8_encode
rabby
28-Apr-2009 08:29
there is a little auto-detect script for encodings which decides if it is necessary to utf8_encode or not. it can simply be modified to work with iso-8859-1 scripts, too, and decide if utf8_decode or not.
            preg_match('%^(?:
                [\x09\x0A\x0D\x20-\x7E]              # ASCII
                | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
                |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
                | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
                |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
                |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
                | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
                |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
                )*$%xs',
                $s)
As preg_match is a bit tricky with bigger strings $s, let me share the fixed function called autoencode: http://mobile-website.mobi/php-utf8-vs-iso-8859-1-59
bassam at saprinna dot com
28-Apr-2009 03:17
you can convert any encode to utf and save it to mysql from this function :

<?php
   
function convert_charset($item)
    {
        if (
$unserialize = unserialize($item))
        {
            foreach (
$unserialize as $key => $value)
            {
               
$unserialize[$key] = @iconv('windows-1256', 'UTF-8', $value);
            }
           
$serialize = serialize($unserialize);
            return
$serialize;
        }
        else
        {
            return @
iconv('windows-1256', 'UTF-8', $item);
        }
    }
?>
mrezair at azarbod dot com
23-Mar-2009 11:49
I found this little function very useful in fixing strings that are not in utf-8 but need be converted

<?php
// Fixes the encoding to uf8
function fixEncoding($in_str)
{
 
$cur_encoding = mb_detect_encoding($in_str) ;
  if(
$cur_encoding == "UTF-8" && mb_check_encoding($in_str,"UTF-8"))
    return
$in_str;
  else
    return
utf8_encode($in_str);
}
// fixEncoding
?>
dan at birminghampr dot co dot uk
19-Mar-2009 06:31
I use a function like this, rather than utf8_encode() alone, for fixing the encoding of unknown data, for example the contents of get_meta_tags():

<?php
function FixEncoding($x){
  if(
mb_detect_encoding($x)=='UTF-8'){
    return
$x;
  }else{
    return
utf8_encode($x);
  }
}
?>
rogeriogirodo at gmail dot com
19-Mar-2009 05:54
This function may be useful do encode array keys and values [and checks first to see if it's already in UTF format]:

<?php
public
static function to_utf8($in)
{
        if (
is_array($in)) {
            foreach (
$in as $key => $value) {
               
$out[to_utf8($key)] = to_utf8($value);
            }
        } elseif(
is_string($in)) {
            if(
mb_detect_encoding($in) != "UTF-8")
                return
utf8_encode($in);
            else
                return
$in;
        } else {
            return
$in;
        }
        return
$out;
}
?>

Hope this may help.

[NOTE BY danbrown AT php DOT net: Original function written by (cmyk777 AT gmail DOT com) on 28-JAN-09.]
Julio Cesar
20-Jan-2009 06:38
With This Script you can convert a lot of files in
subfolders and convert to UTF8 without problems!

I thought about that when I was converting an eclipse
Project to UTF-8 and I loose all the Accentuation O.o

But with this script YOU WILL NOT! ;-)

I Make this based on Aidan Kehoe's Script and webmaster at
asylum-et dot com of http://www.php.net/scandir:

<?php
ini_set
("implicit_flush", "on");
ini_set("max_execution_time", 0);
ini_set("register_argc_argv", "on");
ini_set("html_errors", "Off");

function
cp1252_to_utf8($str) {
   
$cp1252_map = array ("\xc2\x80" => "\xe2\x82\xac",
   
"\xc2\x82" => "\xe2\x80\x9a",
   
"\xc2\x83" => "\xc6\x92",    
   
"\xc2\x84" => "\xe2\x80\x9e",
   
"\xc2\x85" => "\xe2\x80\xa6",
   
"\xc2\x86" => "\xe2\x80\xa0",
   
"\xc2\x87" => "\xe2\x80\xa1",
   
"\xc2\x88" => "\xcb\x86",
   
"\xc2\x89" => "\xe2\x80\xb0",
   
"\xc2\x8a" => "\xc5\xa0",
   
"\xc2\x8b" => "\xe2\x80\xb9",
   
"\xc2\x8c" => "\xc5\x92",
   
"\xc2\x8e" => "\xc5\xbd",
   
"\xc2\x91" => "\xe2\x80\x98",
   
"\xc2\x92" => "\xe2\x80\x99",
   
"\xc2\x93" => "\xe2\x80\x9c",
   
"\xc2\x94" => "\xe2\x80\x9d",
   
"\xc2\x95" => "\xe2\x80\xa2",
   
"\xc2\x96" => "\xe2\x80\x93",
   
"\xc2\x97" => "\xe2\x80\x94",

   
"\xc2\x98" => "\xcb\x9c",
   
"\xc2\x99" => "\xe2\x84\xa2",
   
"\xc2\x9a" => "\xc5\xa1",
   
"\xc2\x9b" => "\xe2\x80\xba",
   
"\xc2\x9c" => "\xc5\x93",
   
"\xc2\x9e" => "\xc5\xbe",
   
"\xc2\x9f" => "\xc5\xb8"
);
    return
strtr ( utf8_encode ( $str ), $cp1252_map );
}
function
rscandir($base="", &$data=array()) {
 
 
$array = array_diff(scandir($base), array(".", ".."));
 
  foreach(
$array as $value) :
 
    if (
is_dir($base.$value)) :
     
//$data[] = $base.$value."/";
     
$data = rscandir($base.$value."/", $data);
    
    elseif (
is_file($base.$value) &&
!
eregi(".jpg|.gif|.png|.ttf|.dataModel|.wsdlDataModel
|.project|.jsdtscope|.prefs|.name|.container|
.exe|.bat|.cmd|.src|.dll|.ini|.swf|.fla|.bmp\$"
,
$value)) : /* where you put the unwanted extensions  */
   
echo "Converting to UTF8 " . $base.$value . "\r\n"
  
file_put_contents(
       
$base.$value,
           
cp1252_to_utf8(
           
file_get_contents($base.$value)));

    
    endif;
  
  endforeach;
 
  return
$data;
 
}
echo
"Type a Folder (With a Slash in end): ";
$folder = trim(fgets(STDIN));

rscandir($folder);

?>

You can put this on windows Dir and put a Batch like this:

@echo off
php -n C:\windows\ConvertUTF8.php
pause

So you can convert your files from any where, just type on
Execute Command Like: ConvertFilesToUTF8

I think this will help everyone! Enjoy ;-)

P.s: I remove the comments becouse the wordwrap
bitseeker
22-Sep-2008 02:37
...or just use this simple piece of code to check valid utf-8 string:

<?php
   
/**
     * Returns true if $string is valid UTF-8 and false otherwise.
     *
     * @since        1.14
     * @param [mixed] $string     string to be tested
     * @subpackage
     */
   
function is_utf8($string) {
      
       
// From http://w3.org/International/questions/qa-forms-utf-8.html
       
return preg_match('%^(?:
              [\x09\x0A\x0D\x20-\x7E]            # ASCII
            | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
            |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
            | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
            |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
            |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
            | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
            |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
        )*$%xs'
, $string);
      
    }
?>
hmdker at gmail dot com
24-Aug-2008 10:19
Here's my is_utf8 function, to detect valid UTF-8 text.

<?php
function is_utf8($str) {
   
$c=0; $b=0;
   
$bits=0;
   
$len=strlen($str);
    for(
$i=0; $i<$len; $i++){
       
$c=ord($str[$i]);
        if(
$c > 128){
            if((
$c >= 254)) return false;
            elseif(
$c >= 252) $bits=6;
            elseif(
$c >= 248) $bits=5;
            elseif(
$c >= 240) $bits=4;
            elseif(
$c >= 224) $bits=3;
            elseif(
$c >= 192) $bits=2;
            else return
false;
            if((
$i+$bits) > $len) return false;
            while(
$bits > 1){
               
$i++;
               
$b=ord($str[$i]);
                if(
$b < 128 || $b > 191) return false;
               
$bits--;
            }
        }
    }
    return
true;
}

?>
akam
30-Jun-2008 07:44
<?php
// Author akam at akameng dot com
// Support 6 bit
function UTF_to_Unicode($input, $array=False) {

 
$bit1  = pow(64, 0);
 
$bit2  = pow(64, 1);
 
$bit3  = pow(64, 2);
 
$bit4  = pow(64, 3);
 
$bit5  = pow(64, 4);
 
$bit6  = pow(64, 5);
 
 
$value = '';
 
$val   = array();
 
 for(
$i=0; $i< strlen( $input ); $i++){
 
    
$ints = ord ( $input[$i] );
    
    
$z     = ord ( $input[$i] );
    
$y     = ord ( $input[$i+1] ) - 128;
    
$x     = ord ( $input[$i+2] ) - 128;
    
$w     = ord ( $input[$i+3] ) - 128;
    
$v     = ord ( $input[$i+4] ) - 128;
    
$u     = ord ( $input[$i+5] ) - 128;

     if(
$ints >= 0 && $ints <= 127 ){
       
// 1 bit
       
$value .= '&#'.($z * $bit1).';';
       
$val[]  = $value;
     }
     if(
$ints >= 192 && $ints <= 223 ){
       
// 2 bit
       
$value .= '&#'.(($z-192) * $bit2 + $y * $bit1).';';
       
$val[]  = $value;
     }   
     if(
$ints >= 224 && $ints <= 239 ){
       
// 3 bit
       
$value .= '&#'.(($z-224) * $bit3 + $y * $bit2 + $x * $bit1).';';
       
$val[]  = $value;
     }    
     if(
$ints >= 240 && $ints <= 247 ){
       
// 4 bit
       
$value .= '&#'.(($z-240) * $bit4 + $y * $bit3 +
$x * $bit2 + $w * $bit1).';';
       
$val[]  = $value;       
     }    
     if(
$ints >= 248 && $ints <= 251 ){
       
// 5 bit
       
$value .= '&#'.(($z-248) * $bit5 + $y * $bit4
+ $x * $bit3 + $w * $bit2 + $v * $bit1).';';
       
$val[]  = $value;  
     }
     if(
$ints == 252 && $ints == 253 ){
       
// 6 bit
       
$value .= '&#'.(($z-252) * $bit6 + $y * $bit5
+ $x * $bit4 + $w * $bit3 + $v * $bit2 + $u * $bit1).';';
       
$val[]  = $value;
     }
     if(
$ints == 254 || $ints == 255 ){
       echo
'Wrong Result!<br>';
     }
    
 }
 
 if(
$array === False ){
    return
$unicode = $value;
 }
 if(
$array === True ){
    
$val     = str_replace('&#', '', $value);
    
$val     = explode(';', $val);
    
$len = count($val);
     unset(
$val[$len-1]);
    
     return
$unicode = $val;
 }
 
}

 
function
Unicode_to_UTF( $input, $array=TRUE){

    
$utf = '';
    if(!
is_array($input)){
      
$input     = str_replace('&#', '', $input);
      
$input     = explode(';', $input);
      
$len = count($input);
       unset(
$input[$len-1]);
    }
    for(
$i=0; $i < count($input); $i++){
   
    if (
$input[$i] <128 ){
      
$byte1 = $input[$i];
      
$utf  .= chr($byte1);
    }
    if (
$input[$i] >=128 && $input[$i] <=2047 ){
   
      
$byte1 = 192 + (int)($input[$i] / 64);
      
$byte2 = 128 + ($input[$i] % 64);
      
$utf  .= chr($byte1).chr($byte2);
    }
    if (
$input[$i] >=2048 && $input[$i] <=65535){
   
      
$byte1 = 224 + (int)($input[$i] / 4096);
      
$byte2 = 128 + ((int)($input[$i] / 64) % 64);
      
$byte3 = 128 + ($input[$i] % 64);
      
      
$utf  .= chr($byte1).chr($byte2).chr($byte3);
    }
    if (
$input[$i] >=65536 && $input[$i] <=2097151){
   
      
$byte1 = 240 + (int)($input[$i] / 262144);
      
$byte2 = 128 + ((int)($input[$i] / 4096) % 64);
      
$byte3 = 128 + ((int)($input[$i] / 64) % 64);
      
$byte4 = 128 + ($input[$i] % 64);
      
$utf  .= chr($byte1).chr($byte2).chr($byte3).
chr($byte4);
    }
    if (
$input[$i] >=2097152 && $input[$i] <=67108863){
   
      
$byte1 = 248 + (int)($input[$i] / 16777216);
      
$byte2 = 128 + ((int)($input[$i] / 262144) % 64);
      
$byte3 = 128 + ((int)($input[$i] / 4096) % 64);
      
$byte4 = 128 + ((int)($input[$i] / 64) % 64);
      
$byte5 = 128 + ($input[$i] % 64);
      
$utf  .= chr($byte1).chr($byte2).chr($byte3).
chr($byte4).chr($byte5);
    }
    if (
$input[$i] >=67108864 && $input[$i] <=2147483647){
   
      
$byte1 = 252 + ($input[$i] / 1073741824);
      
$byte2 = 128 + (($input[$i] / 16777216) % 64);
      
$byte3 = 128 + (($input[$i] / 262144) % 64);
      
$byte4 = 128 + (($input[$i] / 4096) % 64);
      
$byte5 = 128 + (($input[$i] / 64) % 64);
      
$byte6 = 128 + ($input[$i] % 64);
      
$utf  .= chr($byte1).chr($byte2).chr($byte3).
chr($byte4).chr($byte5).chr($byte6);
    }
   }
   return
$utf;
}
?>
www.tricinty.com
11-Jun-2008 04:13
<?php
   
/**
    * Encodes an ISO-8859-1 mixed variable to UTF-8 (PHP 4, PHP 5 compat)
    * @param    mixed    $input An array, associative or simple
    * @param    boolean  $encode_keys optional
    * @return    mixed     ( utf-8 encoded $input)
    */

   
function utf8_encode_mix($input, $encode_keys=false)
    {
        if(
is_array($input))
        {
           
$result = array();
            foreach(
$input as $k => $v)
            {               
               
$key = ($encode_keys)? utf8_encode($k) : $k;
               
$result[$key] = utf8_encode_mix( $v, $encode_keys);
            }
        }
        else
        {
           
$result = utf8_encode($input);
        }

        return
$result;
    }
?>
klein at buchung-24 dot de
04-Jun-2008 04:52
IF you don´t use the function from ethan dot nelson at ltd dot org in a class, you´ll get an error, so please try

function utf_prepare(&$array)
{
    foreach($array AS $key => &$value)
    {
        if (is_array($value))
        {
            utf_prepare($value);
        } else
        {
            $value = utf8_encode($value);
        }
    }
}
www.qaiser.net
17-Apr-2008 08:26
that isUTF8 function is a killer...

wouldn't something like

if ( preg_match( "~(\x00[\x80-\xff]|[\x00-\x07][\x00-\xff]~", $string ) ) { /* is utf */ };

be a lot more efficient? it doesn't take into account all the ranges, but it has to be a better method and a simple start since it'll quit on the first successful match. think of encoding and decoding a 1mb string--not good. i'm having to work with +20 meg xml files.
renardo13 at free dot fr
01-Apr-2008 06:26
another nice way to implement an isUTF8 function ...

<?php

function isUTF8($string)
{
    return (
utf8_encode(utf8_decode($string)) == $string);
}

?>
tacchete at gmail dot com
13-Dec-2007 06:05
Known problem with Byte Order Mark (BOM) and header() in pages of a site.

For example at sending headings or to a dynamic conclusion in other coding distinct from UTF-8 by means of XSLT (<xsl:output encoding="windows-1251"/>).

To clean all symbols BOM from the text of page:

1. exclude BOM from the main file;
2. write down function of a return call for the buffer

<?php
header
('content-type: text/html; charset: utf-8');
ob_start('ob');
function
ob($buffer)
{
    return
str_replace("\xef\xbb\xbf", '', $buffer);
}
?>

it will exclude BOM from a code of the connected files;
3. do not experience for BOM in connected files;
4. be pleased.
ethan dot nelson at ltd dot org
07-Nov-2007 07:11
This does the same thing as some of the posts below (minus the keys), but I thought I'd share anyway cause it is slightly more elegant.  Also, its a good example using references such that this could be used as a callback function.

  function utf_prepare(&$array) {

    foreach($array AS $key => &$value) {

      if (is_array($value)) {
        $this->utf_prepare($value);
      } else {
        $value = utf8_encode($value);
      }

    }

  }
luka8088 at gmail dot com
22-Jun-2007 07:49
simple HTML to UTF-8 conversion:

function html_to_utf8 ($data)
    {
    return preg_replace("/\\&\\#([0-9]{3,10})\\;/e", '_html_to_utf8("\\1")', $data);
    }

function _html_to_utf8 ($data)
    {
    if ($data > 127)
        {
        $i = 5;
        while (($i--) > 0)
            {
            if ($data != ($a = $data % ($p = pow(64, $i))))
                {
                $ret = chr(base_convert(str_pad(str_repeat(1, $i + 1), 8, "0"), 2, 10) + (($data - $a) / $p));
                for ($i; $i > 0; $i--)
                    $ret .= chr(128 + ((($data % pow(64, $i)) - ($data % ($p = pow(64, $i - 1)))) / $p));
                break;
                }
            }
        }
        else
        $ret = "&#$data;";
    return $ret;
    }

Example:
echo html_to_utf8("a b &#269; &#263; &#382; &#12371; &#12395; &#12385; &#12431; ()[]{}!#$?* &lt; &#62;");

Output:
a b č ć ž こ に ち わ ()[]{}!#$?* &lt; &#62;
hillar dot petersen at gmail dot com
30-May-2007 11:29
In addition to my previous post. If your values are already in utf-8 maybe you want to utf8_encode array keys only. This will do it:

<?php
/**
 * (Recursively) utf8_encode all array keys.
 *
 * @param array $array
 * @return array with utf8_encoded keys
 */

function utf8_encode_array_keys($array)
{
 
$array_type = array_type($array);

  if (
$array_type == "map")
  {
   
$result_array = array();

    foreach(
$array as $key => $value)
    {
      if (
is_array($value))
      {
       
// recursion
       
$result_array[utf8_encode($key)] = utf8_encode_array_keys($value);
      }
      else
      {
       
// value is not an array, no recursion
       
$result_array[utf8_encode($key)] = $value;
      }
    }
   
    return
$result_array;
  }

  else if (
$array_type == "vector")
  {
   
// do not encode anything, just follow the value if it is an array
   
$result_array = array();
   
    foreach (
$array as $key => $value)
    {
      if (
is_array($value))
      {
       
// recursion
       
$result_array[$key] = utf8_encode_array_keys($value);
      }
      else
      {
       
// value is not an array, no recursion
       
$result_array[$key] = $value;
      }
    }
   
    return
$result_array;
  }

  return
false;     // argument is not an array, return false
}
?>

Also note that both this operation (with keys only) and the operation with both keys and values can be reversed by replacing "encode" by "decode".
hillar dot petersen at gmail dot com
29-May-2007 07:36
If you are interested in recursively converting ISO-8859-1-encoded arrays into UTF-8, then this is one way to do it. Could use a small refactor though. (I used it to prepare some ISO-8859-1 arrays for json_encode. Note that for this to work your values and for associative arrays also your keys must be ISO-8859-1-encoded.)

<?php
/**
 * (Recursively) utf8_encode each value in an array.
 *
 * @param array $array
 * @return array utf8_encoded
 */

function utf8_encode_array($array)
{
  if (
is_array($array))
  {
   
$result_array = array();

    foreach(
$array as $key => $value)
    {

      if (
array_type($array) == "map")
      {
       
// encode both key and value

       
if (is_array($value))
        {
         
// recursion
         
$result_array[utf8_encode($key)] = utf8_encode_array($value);
        }
        else
        {
         
// no recursion
         
if (is_string($value))
          {
           
$result_array[utf8_encode($key)] = utf8_encode($value);
          }
          else
          {
           
// do not re-encode non-strings, just copy data
           
$result_array[utf8_encode($key)] = $value;
          }

        }

      }

      else if (
array_type($array) == "vector")
      {
       
// encode value only
       
       
if (is_array($value))
        {
         
// recursion
         
$result_array[$key] = utf8_encode_array($value);
        }
        else
        {
         
// no recursion
         
         
if (is_string($value))
          {
           
$result_array[$key] = utf8_encode($value);
          }
          else
          {
           
// do not re-encode non-strings, just copy data
           
$result_array[$key] = $value;
          }

        }

      }

    }

    return
$result_array;
  }

  return
false;     // argument is not an array, return false
}

/**
 * Determines array type ("vector" or "map"). Returns false if not an array at all.
 * (I hope a native function will be introduced in some future release of PHP, because
 * this check is inefficient and quite costly in worst case scenario.)
 *
 * @param array $array The array to analyze
 * @return string array type ("vector" or "map") or false if not an array
 */

function array_type($array)
{
  if (
is_array($array))
  {
   
$next = 0;

   
$return_value = "vector"// we have a vector until proved otherwise

   
foreach ($array as $key => $value)
    {

      if (
$key != $next)
      {
       
$return_value = "map"// we have a map
       
break;
      }

     
$next++;
    }
   
    return
$return_value;
  }

  return
false;    // not array
}
?>
nikooo adog bk adot ru - Nickolaz
03-May-2007 07:32
You can use this simple code to convert win-1251 into Unicode.

    function rus2uni($str,$isTo = true)
    {
        $arr = array('ё'=>'&#x451;','Ё'=>'&#x401;');
        for($i=192;$i<256;$i++)
            $arr[chr($i)] = '&#x4'.dechex($i-176).';';
        $str =preg_replace(array('@([а-я]) @i','@ ([а-я])@i'),array('$1&#x0a0;','&#x0a0;$1'),$str);
        return strtr($str,$isTo?$arr:array_flip($arr));
    }

That is useful for xml_parser (to parse windows-1251 files like utf-8).
18-Apr-2007 09:36
I just read what I wrote, sorry for the typos it was a long day:

here's the rewritten code:

xml_tpl.php
<?php
    header
("Content-Type: text/html;charset=ISO-8859-1");
    print
"<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n";
   
$names=array('jack','bob','vanessa','catherine','valerie');
?>
<parent>
<?php foreach($names as $name) {?>
    <child name="<?php print $name?>" />
<?php } ?>
</parent>

<?php
function create_xml(){
   
ob_start();
    include
"xml_tpl.php";
   
$trapped_content=ob_get_contents();
   
ob_end_clean();
   
$file_path= "./somefile.xml";
   
$file_handle=fopen($file_path,'w');
   
fwrite($file_handle,utf8_encode($trapped_content));
}

?>
penda ekoka
17-Apr-2007 11:45
creating utf-8 xml files:
this is something that has wasted a lot of my time, I hope this will spare you the headaches:

my method consists of creating an xml template that will look like this (this is probably optional, I'm sure you can use good ol' print or echo statements):

xml_tpl.php
<?php
header
("Content-Type: text/html;charset=ISO-8859-1");
print
"<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n";
$names=array('jack','bob','vanessa','catherine','valerie');
?>
<parent>
<?php foreach($names as $name) {?>
    <child name="<?php print $name?>" />
<?php } ?>
</parent>
?>

from a function or a method I include the previous template and trap the outputted content in an output buffer. The buffured content is then inserted into a file:

<?php
function create_xml(){
   
ob_start();
    include
"xml_php.php";
   
$trapped_content=ob_get_contents();
   
ob_end_clean();
   
$file_path= "./somefile.xml";
   
$file_handle=fopen($somefile,'w');
   
fwrite($file_handle,utf8_encode($trapped_content));
}

?>

Some side notes:
- note that the utf8_encode function goes inside the fwrite() function.
- when troubleshooting, make sure to transfer text file (xml included) and scripts in ascii mode when using ftp. For some unknown reason my ftp client did not have xml set as an ascii transfer candidate and was automatically tranfering them in binary. That little "feature" ended up costing me hours of frustration, as the encoding information would just "vanish" between transfer and I kept scratching my head as to why manually created utf8 files were not behaving as they should.
29-Mar-2007 03:37
<?php

function unicon($str, $to_uni = true) {
   
$cp = Array (
       
"А" => "&#x410;", "а" => "&#x430;",
       
"Б" => "&#x411;", "б" => "&#x431;",
       
"В" => "&#x412;", "в" => "&#x432;",
       
"Г" => "&#x413;", "г" => "&#x433;",
       
"Д" => "&#x414;", "д" => "&#x434;",
       
"Е" => "&#x415;", "е" => "&#x435;",
       
"Ё" => "&#x401;", "ё" => "&#x451;",
       
"Ж" => "&#x416;", "ж" => "&#x436;",
       
"З" => "&#x417;", "з" => "&#x437;",
       
"И" => "&#x418;", "и" => "&#x438;",
       
"Й" => "&#x419;", "й" => "&#x439;",
       
"К" => "&#x41A;", "к" => "&#x43A;",
       
"Л" => "&#x41B;", "л" => "&#x43B;",
       
"М" => "&#x41C;", "м" => "&#x43C;",
       
"Н" => "&#x41D;", "н" => "&#x43D;",
       
"О" => "&#x41E;", "о" => "&#x43E;",
       
"П" => "&#x41F;", "п" => "&#x43F;",
       
"Р" => "&#x420;", "р" => "&#x440;",
       
"С" => "&#x421;", "с" => "&#x441;",
       
"Т" => "&#x422;", "т" => "&#x442;",
       
"У" => "&#x423;", "у" => "&#x443;",
       
"Ф" => "&#x424;", "ф" => "&#x444;",
       
"Х" => "&#x425;", "х" => "&#x445;",
       
"Ц" => "&#x426;", "ц" => "&#x446;",
       
"Ч" => "&#x427;", "ч" => "&#x447;",
       
"Ш" => "&#x428;", "ш" => "&#x448;",
       
"Щ" => "&#x429;", "щ" => "&#x449;",
       
"Ъ" => "&#x42A;", "ъ" => "&#x44A;",
       
"Ы" => "&#x42B;", "ы" => "&#x44B;",
       
"Ь" => "&#x42C;", "ь" => "&#x44C;",
       
"Э" => "&#x42D;", "э" => "&#x44D;",
       
"Ю" => "&#x42E;", "ю" => "&#x44E;",
       
"Я" => "&#x42F;", "я" => "&#x44F;"
   
);
   
    if (
$to_uni) {
       
$str = strtr($str, $cp);
    } else {
        foreach (
$cp as $c) {
           
$cpp[$c] = array_search($c, $cp);
        }
       
$str = strtr($str, $cpp);
    }
   
    return
$str;
}

?>
emze at donazga dot net
17-Dec-2006 11:12
/*
Every function seen so far is incomplete or resource consumpting. Here are two -- integer 2 utf sequence (i3u) and utf sequence to integer (u3i). Below is a code snippet that checks well behavior at the range boundaries.

Someday they might be hardcoded into PHP...
*/

function i3u($i) { // returns UCS-16 or UCS-32 to UTF-8 from an integer
  $i=(int)$i; // integer?
  if ($i<0) return false; // positive?
  if ($i<=0x7f) return chr($i); // range 0
  if (($i & 0x7fffffff) <> $i) return '?'; // 31 bit?
  if ($i<=0x7ff) return chr(0xc0 | ($i >> 6)) . chr(0x80 | ($i & 0x3f));
  if ($i<=0xffff) return chr(0xe0 | ($i >> 12)) . chr(0x80 | ($i >> 6) & 0x3f)
      . chr(0x80  | $i & 0x3f);
  if ($i<=0x1fffff) return chr(0xf0 | ($i >> 18)) . chr(0x80 | ($i >> 12) & 0x3f)
      . chr(0x80 | ($i >> 6) & 0x3f) . chr(0x80  | $i & 0x3f);
  if ($i<=0x3ffffff) return chr(0xf8 | ($i >> 24)) . chr(0x80 | ($i >> 18) & 0x3f)
      . chr(0x80 | ($i >> 12) & 0x3f) . chr(0x80 | ($i >> 6) & 0x3f) . chr(0x80  | $i & 0x3f);
  return chr(0xfc | ($i >> 30)) . chr(0x80 | ($i >> 24) & 0x3f) . chr(0x80 | ($i >> 18) & 0x3f)
      . chr(0x80 | ($i >> 12) & 0x3f) . chr(0x80 | ($i >> 6) & 0x3f) . chr(0x80  | $i & 0x3f);
}

function u3i($s,$strict=1) { // returns integer on valid UTF-8 seq, NULL on empty, else FALSE
  // NOT strict: takes only DATA bits, present or not; strict: length and bits checking
  if ($s=='') return NULL;
  $l=strlen($s); $o=ord($s{0});
  if ($o <= 0x7f && $l==1) return $o;
  if ($l>6 && $strict) return false;
  if ($strict) for ($i=1;$i<$l;$i++) if (ord($s{$i}) > 0xbf || ord($s{$i})< 0x80) return false;
  if ($o < 0xc2) return false; // no-go even if strict=0
  if ($o <= 0xdf && ($l=2 && $strict)) return (($o & 0x1f) << 6 | (ord($s{1}) & 0x3f));
  if ($o <= 0xef && ($l=3 && $strict)) return (($o & 0x0f) << 12 | (ord($s{1}) & 0x3f) << 6
     |  (ord($s{2}) & 0x3f));
  if ($o <= 0xf7 && ($l=4 && $strict)) return (($o & 0x07) << 18 | (ord($s{1}) & 0x3f) << 12
     | (ord($s{2}) & 0x3f) << 6 |  (ord($s{3}) & 0x3f));
  if ($o <= 0xfb && ($l=5 && $strict)) return (($o & 0x03) << 24 | (ord($s{1}) & 0x3f) << 18
     | (ord($s{2}) & 0x3f) << 12 | (ord($s{3}) & 0x3f) << 6 |  (ord($s{4}) & 0x3f));
  if ($o <= 0xfd && ($l=6 && $strict)) return (($o & 0x01) << 30 | (ord($s{1}) & 0x3f) << 24
     | (ord($s{2}) & 0x3f) << 18 | (ord($s{3}) & 0x3f) << 12
     | (ord($s{4}) & 0x3f) << 6 |  (ord($s{5}) & 0x3f));
  return false;
}

// boundary behavior checking
$do=array(0x7f,0x7ff,0xffff,0x1fffff,0x3ffffff,0x7fffffff);
foreach ($do as $ii) for ($i=$ii;$i<=$ii+1; $i++) {
  $o=i3u($i);
  for ($j=0;$j<strlen($o);$j++) print "O[$j]=" . sprintf('%08b',ord($o{$j})) . ", ";
  print "c=$i, o=[$o].\n";
  print "Back: [$o] => [" . u3i($o) . "]\n";
}
sadikkeskin at hotmail dot com
21-Nov-2006 04:19
i wrote a function to convert encoding utf8 to iso-8859-9. This function is very useful if you want to use this for ajax.
you can apply same way for other languages.
<?
function str_encode ($string,$to="iso-8859-9",$from="utf8") {
    if(
$to=="iso-8859-9" && $from=="utf8"){
       
$str_array = array(
      
chr(196).chr(177) => chr(253),
      
chr(196).chr(176) => chr(221),
      
chr(195).chr(182) => chr(246),
      
chr(195).chr(150) => chr