Then have a look at the UTF-8 page: http://en.wikipedia.org/wiki/UTF-8
. Basically, you are accustomed to thinking 1 byte = 1 character as it the case with extended ASCII. UTF-8 has encoding scheme so it can potentially represent the 4 billion characters that are possible with Unicode. 1 character may take just 1 byte, or it may need a sequence of bytes to encode 1 character. The same for UTF-16, though there 1 character is always at least 2 bytes. With UTF-32 there is no need for an encoding scheme, as 4 bytes can represent all 4 billion characters that are possible with Unicode.