Easy to Learn Java: Programming Articles, Examples and Tips

Start with Java in a few days with Java Lessons or Lectures

Home

Code Examples

Java Tools

More Java Tools!

Java Forum

All Java Tips

Books

Submit News
Search the site here...
Search...
 
Search the JavaFAQ.nu
1000 Java Tips ebook

1000 Java Tips - Click here for the high resolution copy!1000 Java Tips - Click here for the high resolution copy!

Java Screensaver, take it here

Free "1000 Java Tips" eBook is here! It is huge collection of big and small Java programming articles and tips. Please take your copy here.

Take your copy of free "Java Technology Screensaver"!.

Java: Unicode

JavaFAQ Home » Java Notes by Fred Swartz Go to all tips in Java Notes by Fred Swartz


Bookmark and Share

Unicode is a system of encoding characters. All characters and Strings in Java use the Unicode encoding, which allows truly international programming. The Unicode effort is not coordinated with Java. At the time that Java was started, all 50,000 defined Unicode characters could be reprensented with 16 bits (2 bytes). However...

Java and Unicode

Unicode is a system of encoding characters. All characters and Strings in Java use the Unicode encoding, which allows truly international programming.

About Unicode

  • The Unicode effort is not coordinated with Java. At the time that Java was started, all 50,000 defined Unicode characters could be reprensented with 16 bits (2 bytes). Consequently, Java used the 2-byte (sometimes called UTF-16) representation for characters.

    However, Unicode, now at version 4.0, has defined more characters than fit into two bytes. To accommodate this unfortunate occurrance, Java 5 has added facilities to work with surrogate pairs, which can represent characters with multiple character codes. As a practical matter, most Java programs are written with the assumption that all characters are two bytes. The characters that don't fit into two bytes are largely unused, so it doesn't seem to be a serious deficiency. We'll see how this works out in the future.

  • ASCII. Most programming languages before Java (C/C++, Pascal, Basic, ...) use an 8-bit encoding of ASCII (American Standard Coding for Information Interchange). ASCII only defines the first 128 characters, and the other 128 values are often used for various extensions.
  • All of the world's major human languages can be represented in Unicode (including Chinese, Japanese, and Korean).
  • The first 64 characters of Unicode have the same values as the equivalent ASCII characters. The first 128 characters are the same as ISO-8895-1 Latin-1.

Unicode Fonts

Altho Java stores characters as Unicode, there are still some very practical operating system problems in entering or displaying many Unicode characters. Most fonts display only a very small subset of all Unicode characters, typically about 100 different characters.

References


 Printer Friendly Page  Printer Friendly Page
 Send to a Friend  Send to a Friend

.. Bookmark and Share

Search here again if you need more info!
Custom Search



Home Code Examples Java Forum All Java Tips Books Submit News, Code... Search... Offshore Software Tech Doodling

RSS feed Java FAQ RSS feed Java FAQ News     

    RSS feed Java Forums RSS feed Java Forums

All logos and trademarks in this site are property of their respective owner. The comments are property of their posters, all the rest 1999-2006 by Java FAQs Daily Tips.

Interactive software released under GNU GPL, Code Credits, Privacy Policy