|
JavaFAQ Home » Story by Dr. Kabutz

The Java Specialists' Newsletter [Issue 036] - Using Unicode
Variable Names
Author: Dr. Heinz M. Kabutz
JDK version:
Category: Language
You can subscribe from our home page:
http://www.javaspecialists.co.za (which also hosts all previous issues,
available free of charge
Welcome to the 36th edition of "The Java(tm) Specialists' Newsletter". This
week, we will look at the strange things that happen when we try to use unicode
characters in our code.
I am sitting outside in my garden, with beautiful sunshine and a pitbull
terrier at my command Approximately a month ago, the biggest software vendor
in South Africa went bankrupt, severely affecting the availability of software
in this country. Fortunately for me, I have friends in convenient places: I
purchased the software that I needed (Dragon NaturallySpeaking) from Amazon in
Germany and had it shipped to infor AG, who I have spoken about in other
newsletters - they very kindly shipped it down to the end of the earth.
As a result of using Dragon NaturallySpeaking, you will probably notice that
my newsletters will have an even more conversational style than before. I am
always looking at ways in which I can improve my newsletters and serve you
better. Please remember to forward this newsletter to friends and colleagues who
are interested in Java.
A special welcome to country No 56, Malta! My wife's previous boss at a hotel
was the Maltese ambassador for Cape Town, which was really cool, as he had
diplomatic immunity from parking fines and speeding fines. Mind you, traffic
laws are rather lax in this country, I have only had one speeding fine in my
life, and I drive an Alfa Romeo!
South Africa has just become the cheapest country in the world! We are the
first country where a Big Mac costs less than US$ 1. It is cheaper here even
than in the Philipines and China. I had a good response to my advert for my Java
Course (thank you for your patience in this regard) and so I definitely want to
develop the idea of running courses in South Africa, combined with a holiday
How do you go from being an OO beginner to an OO guru? Simple answer:
Experience! But what if you can't wait 10 years to get that experience? Simple
answer: Design Patterns! How can you learn Design Patterns in a relaxed setting
from someone who has used them in the real world? Simple answer:
Ask about my new course "Design Patterns - The Timeless Way of Coding".
1707 members are currently subscribed from 56 countries
Using Unicode Variable Names
A few months ago, I was reading a book written by the authors of Java, when I
stumbled across a piece of code that was using Unicode characters as variable
names. Being the curious type, I immediately tried writing a piece of code that
used funny characters. Easier said than done! I don't know of any Java IDE that
supports Unicode. The common e-mail systems in this world would also choke like
a dog on a chicken bone if I sent you a newsletter containing Unicode characters
Before I get into how we could use Unicode characters in our
variables, let's just take a step back and think about it: Imagine being called
in by a Japanese company who has got a memory leak in their program which they
want you to fix (one of the most common tasks I have been asked to perform), and
imagine if in their company they used Japanese characters for their variables.
Yes, it would compile if you follow the ideas in this newsletter, but what would
the result be for me? I would probably pack my bags and head back home! It's bad
enough having to read code where the variable names are in German or in
Afrikaans, I cannot imagine trying to understand code where I don't even know
the characters used in variable names!
Since I could not find an IDE that supported Unicode, my first job was to
write a Unicode editor. Also easier said than done. I had learned many years ago
that Writers and Readers are used for Unicode characters, but I had never really
used Unicode before. My first approach at reading and writing Unicode files
looked something like this:
public void load() throws IOException {
BufferedReader in = new BufferedReader(new FileReader(filename));
String s;
while((s = in.readLine()) != null) {
// ...
}
}
Did you know that FileReader extends InputStreamReader? In its constructor it
constructs a FileInputStream that it passes to its parent. The InputStreamReader
has a constructor that takes as argument the encoding used for reading files.
FileReader unfortunately does not expose the constructor that takes the encoding
as an argument, it simply uses an operating-system dependent encoding. One
cannot but wonder what the author of the FileReader had been smoking the day
he/she wrote that code ...
(Actually, when I wrote the Sun Microsystems Java programmer examination a
few years ago, the only none-GUI question that I got wrong was a question
relating to reading ISO-8859-1 data. Perhaps there has always been a hole in my
knowledge regarding this topic.)
Should you want to use the FileReader to read an encoding different to the
standard one, you would have to do the following:
public void load() throws IOException {
BufferedReader in = new BufferedReader(
new InputStreamReader(
new FileInputStream(filename), "UTF-16BE"));
String s;
while((s = in.readLine()) != null) {
// ...
}
}
Without further ado, here is the code for a Unicode text editor. It allows
you to insert Unicode characters by entering their decimal values and pressing
the appropriate button. For the design, I have followed an approach I saw a few
years ago on jGuru, where all the GUI elements are created lazily. It makes the
GUI code very nicely maintainable, as you never have to worry in what order
elements are constructed.
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import java.io.*;
public class UnicodeEditor extends JFrame {
private JPanel buttonPanel;
private JScrollPane editorPanel;
private JTextArea editor;
private final String filename;
private final String encoding;
public UnicodeEditor(String filename, String encoding)
throws IOException {
this.filename = filename;
this.encoding = encoding;
getContentPane().add(getButtonPanel(), BorderLayout.NORTH);
getContentPane().add(getEditorPanel(), BorderLayout.CENTER);
load();
}
protected JPanel getButtonPanel() {
if (buttonPanel == null) {
buttonPanel = new JPanel();
JButton unicodeInsert = new JButton("Insert Unicode:");
final JTextField unicodeField = new JTextField(8);
JButton saveExit = new JButton("Save & Exit");
unicodeInsert.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent e) {
getEditor().insert(
"" + (char)Integer.parseInt(unicodeField.getText()),
getEditor().getCaretPosition());
}
});
saveExit.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent e) {
try {
save();
System.exit(0);
} catch(IOException ex) { ex.printStackTrace(); }
}
});
buttonPanel.add(unicodeInsert);
buttonPanel.add(unicodeField);
buttonPanel.add(saveExit);
}
return buttonPanel;
}
protected JTextArea getEditor() {
if (editor == null) {
editor = new JTextArea();
}
return editor;
}
protected JScrollPane getEditorPanel() {
if (editorPanel == null) {
editorPanel = new JScrollPane(getEditor());
}
return editorPanel;
}
protected void load() throws IOException {
BufferedReader in = new BufferedReader(new InputStreamReader(
new FileInputStream(filename), encoding));
StringBuffer buf = new StringBuffer();
int i;
while((i = in.read()) != -1) buf.append((char)i);
in.close();
getEditor().setText(buf.toString());
}
protected void save() throws IOException {
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(filename), encoding));
char[] text = getEditor().getText().toCharArray();
for (int i=0; ipublic static void main(String[] args) throws IOException {
if (args.length < 1)
throw new IllegalArgumentException(
"usage: UnicodeEditor filename [encoding]");
String encoding = (args.length == 2)?args[1]:"UTF-16BE";
UnicodeEditor editor = new UnicodeEditor(args[0], encoding);
editor.setSize(500,500);
editor.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
editor.show();
}
}
By default this uses the UTF-16BE format, standing for Sixteen-bit Unicode
Transformation Format, big-endian byte order. You can specify any encoding
when you start the editor, such as UTF-8, ISO-8859-1, etc. But, before we use
this editor, we first need to have a file containing Unicode characters. I've
written a code generator that generates two files, MathsSymbols.java and
MathsSymbolsTest.java:
import java.io.*;
public class UnicodeVariableGenerator {
public static void generateMathsSymbols() throws IOException {
PrintWriter out = new PrintWriter(new OutputStreamWriter(
new FileOutputStream("MathsSymbols.java"), "UTF-16BE"));
out.println("public interface MathsSymbols {");
out.print( " public static final double ");
out.print((char)960);
out.println(" = 3.14159265358979323846;");
out.print( " public static final double ");
out.print((char)949);
out.println(" = 2.7182818284590452354;");
out.println("}");
out.close();
}
public static void generateMathsSymbolsTest() throws IOException {
PrintWriter out = new PrintWriter(new OutputStreamWriter(
new FileOutputStream("MathsSymbolsTest.java"), "UTF-16BE"));
out.println("public class MathsSymbolsTest implements MathsSymbols {");
out.println(" public static void main(String args[]) {");
out.println(" System.out.println("The value of PI is: " + u03C0);");
out.println(" System.out.println("The value of E is: " + u03B5);");
out.println(" }");
out.println("}");
out.close();
}
public static void main(String[] args) throws IOException {
generateMathsSymbols();
generateMathsSymbolsTest();
}
}
I won't include the code for MathsSymbols.java and MathsSymbolsTest.java,
please run the UnicodeVariableGenerator class to generate that code. I already
bomb out enough mailing systems by sending my newsletters in HTML (*evil grin*),
no use in causing more trouble by using Unicode. Once you've run the
UnicodeVariableGenerator, please load the MathsSymbols.java file with the
UnicodeEditor, using UTF-16BE and have a look at it: you should see the Greek
symbol for PI.
The last "trick" you need to know about is how to compile the
MathsSymbols.java and MathsSymbolsTest.java. If you open the files with notepad
or vi, you will probably see a rather strangely formatted file, with two bytes
being used per character. When you compile these files, you therefore have to
specify the character encoding used:
javac -encoding UTF-16BE MathsSymbols*.java
That's it! And it has kept me busy longer than just about all the other
newsletters to try and get it right. Another interesting variation of this is
where David Treves (who I met through a really cool advanced Java chat list -
JavaDesk on YahooGroups - where you get shouted at if you ask beginner
questions) tried to write/read Hebrew to the Database. He doggedly tried to get
it working until eventually he succeeded - after I had given up hope of ever
figuring it out. Stay tuned for the next few weeks to see how he did it.
Until next week, when we celebrate our first anniversary as the most
interesting Java newsletter on the Internet
Kind regards
Heinz
Copyright 2000-2004 Maximum Solutions, South Africa
Reprint Rights. Copyright subsists in all the material included
in this email, but you may freely share the entire email with anyone you feel
may be interested, and you may reprint excerpts both online and offline provided
that you acknowledge the source as follows: This material from The Java(tm)
Specialists' Newsletter by Maximum Solutions (South Africa). Please contact
Maximum Solutions for more
information.
Java and Sun are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Maximum Solutions is independent
of Sun Microsystems, Inc. Printer Friendly Page
Send to a Friend
..
Search here again if you need more info!
|