A special word of thanks also to Robin Beetge for hacking together a dirty
Perl script to generate XML tags for the syntax highlighting. Your little script
is saving me a lot of time
This article deals primarily with the subject of multicast communication in
Java. I have, however, included some background information to refresh the
memory of those who have forgotten how much they know about data communications.
If the concepts "datagram", "IP fragment", "reliable protocol" or "multicast"
are not clear to you, try referring to the appendices. If the appendices appear
shrouded in mystery, go back to your data comms lecturer and demand a refund.
In order to send any kind of datagram in Java, be it unicast, broadcast or
multicast, one needs a java.net.DatagramSocket:
One can optionally supply a local port to the DatagramSocket constructor to
which the socket must bind. This is only necessary if one needs other parties to
be able to reach us at a specific port. A third constructor takes the local port
AND the local IP address to which to bind. This is used (rarely) with
multi-homed hosts where it is important on which network adapter the traffic is
received. Neither of these is necessary for this example.
This sample code creates the socket and a datagram to send and then simply
sends the same datagram every second:
DEST_PORT: an unsigned 16-bit integer, eg. 7777
It is important to note the following points:
- DatagramPacket does not make a copy of the byte-array given to it, so
any change to the byte-array before the
socket.send() will
reflect in the data actually sent;
- One can send the same DatagramPacket to several different destinations
by changing the address and or port using the
setAddress() and
setPort() methods;
- One can send different data to the same destination by changing the byte
array referred to using
setData() and setLength()
or by changing the contents of the byte array the DatagramPacket is
referring to;
- One can send a subset of the data in the byte array by manipulating
offset and length through the
setOffset() and setLength()
methods.
3. Receiving multicast datagrams
One can use a normal DatagramSocket to send and receive unicast and broadcast
datagrams and to send multicast datagrams as seen in the section 2. In order to
receive multicast datagrams, however, one needs a MulticastSocket. The reason
for this is simple, additional work needs to be done to control and receive
multicast traffic by all the protocol layers below UDP.
The example given below, opens a multicast socket, binds it to a specific
port and joins a specific multicast group:
byte[] b = new byte[BUFFER_LENGTH];
DatagramPacket dgram = new DatagramPacket(b, b.length);
MulticastSocket socket =
new MulticastSocket(DEST_PORT); // must bind receive side
socket.joinGroup(InetAddress.getByName(MCAST_ADDR));
while(true) {
socket.receive(dgram); // blocks until a datagram is received
System.err.println("Received " + dgram.getLength() +
" bytes from " + dgram.getAddress());
dgram.setLength(b.length); // must reset length field!
}
Values for DEST_PORT and MCAST_ADDR must match
those in the sending code for the listener to receive the datagrams sent there.
BUFFER_LENGTH should be at least as long as the data we intend to
receive. If BUFFER_LENGTH is shorter, the data will be truncated
silently and dgram.getLength() will return b.length.
The MulticastSocket.joinGroup() method causes the lower protocol
layers to be informed that we are interested in multicast traffic to a
particular group address. One may execute joinGroup() many times to
subscribe to different groups. If multiple MulticastSockets bind to the same
port and join the same multicast group, they will all receive copies of
multicast traffic sent to that group/port.
As with the sending side, one can re-use ones DatagramPacket and byte-array
instances. The receive() method sets length to the amount of data
received, so remember to reset the length field in the DatagramPacket before
subsequent receives, otherwise you will be silently truncating all your incoming
data to the length of the shortest datagram previously received.
One can set a timeout on the receive() operation using
socket.setSoTimeout(timeoutInMilliseconds). If the timeout is reached
before a datagram is received, the receive() throws a
java.io.InterruptedIOException. The socket is still valid and usable for
sending and receiving if this happens.
4. Multicasting and serialization
We have seen in the previous sections that we can multicast anything we can
fit into a byte array. Conveniently for us, one of those things is a serialized
object.
Object serialization is based on the assumption of a stream (ObjectOutputStream,
ObjectInputStream), so we have to do a little massaging to
squeeze this into our datagram paradigm. ObjectOutputStream writes
a stream header (containing a magic number and version number) to the stream on
construction and ObjectInputStream reads and checks this on
construction (ever wondered why ObjectInputStream's constructor
blocks until the ObjectOutputStream has been constructed on the
sending side?). This is the reason one always attaches the
ObjectOutputStream to the outgoing side of a socket before attaching the
ObjectInputStream to the incoming side.
In order to multicast objects, we need to arrange that the stream header
information is in each datagram. The simplest way to ensure this is to create a
new ObjectOutputStream for each datagram we send and a new
ObjectInputStream for each one we receive. We could probably avoid these
instantiations by extending the two classes in question, but I'm not going into
that here.
On the sending side, we can do something like this:
ByteArrayOutputStream b_out = new ByteArrayOutputStream();
ObjectOutputStream o_out = new ObjectOutputStream(b_out);
o_out.writeObject(new Message());
byte[] b = b_out.toByteArray();
DatagramPacket dgram = new DatagramPacket(b, b.length,
InetAddress.getByName(MCAST_ADDR), DEST_PORT); // multicast
socket.send(dgram);
In addition, on the receiving side we can do something like this:
byte[] b = new byte[65535];
ByteArrayInputStream b_in = new ByteArrayInputStream(b);
DatagramPacket dgram = new DatagramPacket(b, b.length);
socket.receive(dgram); // blocks
ObjectInputStream o_in = new ObjectInputStream(b_in);
Object o = o_in.readObject();
dgram.setLength(b.length); // must reset length field!
b_in.reset(); // reset so next read is from start of byte[] again
Note that one can re-use the ByteArray*Streams, byte arrays and
DatagramPackets on both sides. Only the Object*Streams need be recreated.
5. Datagram sizes
The IP spec allows for datagrams up to 65535 bytes in length, including
the IP header. If the underlying protocol layers cannot support this size
(Ethernet's MTU is 1500 bytes), IP fragments the datagrams into several smaller
datagrams. On the receive side, IP reassembles the datagram before delivering it
to higher layer protocols, like UDP. If any of the fragments do not arrive at
the destination, the entire datagram is discarded, i.e. there is no partial
delivery of IP and therefore UDP datagrams.
Since the normal IP header is 20 bytes long and the UDP header is always 8
bytes long, one would expect the maximum UDP data length to be 65535-8-20 =
65507. Somehow, however, the combination of Win2k and JDK1.3.1 manages to
successfully send as much as 65527 bytes per datagram. I would be interested to
hear whether users of a real operating system experienced the same.
It is very important to note that although the IP spec allows for
datagrams up to 65535 bytes, it only requires implementations to support up to
576 byte IP datagrams including IP and higher protocol headers. Since the
maximum IP header length is 64 and the UDP header length is 8, it is safe to
send up to 504 byte UDP datagrams and expect the receiving side to
handle it (yes, even your Palm Pilot if it has a TCP/IP stack). I have not come
across a full size (i.e. non-embedded) system that cannot handle the full 64k-1,
though.
6. Effect of fault conditions
UDP does not gaurantee delivery or notification of non-delivery. If you send
a unicast packet to a host that does not exist, is down or is not listening on
that port, you will not know about it. If you send a broadcast or multicast
packet and nobody receives it or is even listening, you will not know about it.
On Win2k the network adapter settings are reset if it is detected that the
link is not available. With Ethernet, for example, if you unplug the LAN cable
so that there is no link available, Win2K detects this and effectively shuts
down the adapter at the IP level. It clears its IP address and will not attempt
to use it. The effect of this is that sockets cannot bind to a port, so all new
*Socket calls fail. Sockets that are already created function correctly if you
unplug and replug the cable.
On my notebook, local communication (sender and listener on the same machine)
began to fail when I unplugged the LAN cable. It gets nastier than this:- a
listener started before I unplugged the cable could not hear traffic from a
sender started after I had plugged the cable back in. But wait, there's more! I
started another listener after the cable was back in and it and the listeners
started before I unplugged the cable, all receive the multicasts again.
On WinNT4, my experience has been that the adapter is not "shutdown" when the
cable is unplugged and one does not have these weird effects.
7. Multiple listeners and unicast packets
Since one can send unicast packets using the same MulticastSocket instance as
for ones multicasts, it makes sense to mention how unicasts are handled when
there is more than one listener, which can only be when they are all on the same
machine.
Unicast traffic sent to the port will be received by only one of the
listeners with a socket bound to the port. With my test setup, the last socket
to bind to the port receives the unicast traffic. On WinNT4, the first one to
bind receives it. I don't know of any rules covering how unicast traffic should
be handled in the case of multiple listeners, so don't rely on it being handled
in any particular way.
8. Further reading
See the RFCs for IP(791), UDP(768) and IP multicasting(1112). Compared to
some of the ISO and IEEE stuff I've seen, they're recreational reading material.
APPENDICES
A. Protocol "reliability"
You may have heard TCP described as a "reliable" protocol and UDP as an
"unreliable" protocol. It is easy, but dangerous, to jump to conclusions about
what this means. Being "reliable" does not mean that TCP will deliver your data
under all circumstances (try unplugging the LAN cable for a day and see). Being
"unreliable", does not mean UDP will arbitrarily throw away your data.
"Unreliable" is a loaded term and I prefer to use "non-reliable" which indicates
more that it lacks the gaurantees of a "reliable" protocol, rather than
labelling it as some sort of untrustworthy servant.
Enough about what reliability, or lack of it, does not mean. A "reliable"
protocol like TCP guarantees that it will deliver your data correctly and in
order of transmission or inform you that it could not.
A "non-reliable" protocol, like UDP, does what is called "best-effort
delivery". Essentially, given enough available resources (buffers, bandwidth
etc) UDP will deliver your data correctly. It will not deliver incorrect data,
but it could deliver data in a different order to which it was sent or not at
all.
The NFS (Network File System) protocol uses UDP to communicate between the
server and the client. IMHO, this is a testament to the "reliability" of UDP as
a transport. Of course, NFS implements its own reliability mechanisms (timeouts
and retransmissions) on top of UDP to be sure.
B. Stream vs Datagrams
The differences between TCP and UDP don't end with reliability. They are
fundamentally different in their data model. TCP is stream based and UDP is
datagram based. This means that with UDP, if data is lost or delivered out of
order, it happens with datagram granularity.
Since TCP is stream based, it does not honour your message boundaries. If you
implement your own message passing system using TCP, you will find that doing a
send() call of n bytes on one side of the connection does not necessarily result
in n bytes being returned by the "corresponding" read() call on the other side.
TCP rides on top of IP, which is datagram based, so there is packetizing
happening when TCP data is sent, but TCP is at liberty to split your send() up
into several actual packets or to coalesce several send() operations into one
packet.
C. nCasting
In the case of TCP, the number of intended recipients of transmitted data is
always exactly one (like a telephone call). In general, this is not the case.
Everybody is aware of broadcast communication (like radio or television) where
there is one sender and any number of recipients. As most people know the same
exists in data communications.
Broadcast communication is frowned upon by network admins because they spend
a huge portion of their budget trying to provide bandwidth using network
switches, only to have this all defeated by broadcast traffic being delivered to
every segment of their LANs. Broadcast communication also causes an interrupt
and the associated processing on every node on the connected LAN, always. Ones
Ethernet hardware, for example, cannot determine whether the host is interested
in any particular broadcast packet and must therefore deliver the packet to the
upper protocol layers to make the decision. This is the reason Doom 1.1 network
games were banned on many LANs. The number of broadcasts used caused such high
interrupt processing loads on all the hosts on networks where it was played.
Thankfully, Doom 1.2 came along to avert boredom during my time at university.
Where broadcasting is a mechanism intended to deliver data to all hosts on a
network or subnetwork, multicasting is a mechanism to deliver data to a group of
interested hosts on a network. Many network adapters provide some sort of
rudimentary multicast filtering. In many cases, a host not interested in a
particular multicast group will not even be interrupted by its network hardware.
In the TCP/IP protocol family, UDP is used for broadcast and multicast (and
some unicast) traffic. As a result, broadcast and multicast traffic is datagram
based and non-reliable.
Reliability, datagram vs stream based and unicast vs multicast/broadcast
traffic are all orthogonal concepts. It is not inconceivable to have a reliable,
stream based multicast protocol, or any other combination of those features.
D. IP Multicast addresses
All class D IP addresses are multicast addresses. Class D IP addresses are those
that begin with 1110, that is, all addresses from 224.0.0.0 to 235.255.255.255.
Some are pre-assigned for specific applications, but most are available for
forming ad hoc multicast groups. There is a mapping between IP multicast
addresses and Ethernet addresses, described in RFC1112: "An IP host
group address is mapped to an Ethernet multicast address by placing the
low-order 23-bits of the IP address into the low-order 23 bits of the Ethernet
multicast address 01-00-5E-00-00-00 (hex). Because there are 28 significant bits
in an IP host group address, more than one host group address may map to the
same Ethernet multicast address."
Copyright 2000-2004 Maximum Solutions, South Africa
Reprint Rights. Copyright subsists in all the material included
in this email, but you may freely share the entire email with anyone you feel
may be interested, and you may reprint excerpts both online and offline provided
that you acknowledge the source as follows: This material from The Java(tm)
Specialists' Newsletter by Maximum Solutions (South Africa). Please contact
Maximum Solutions for more
information.
Java and Sun are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Maximum Solutions is independent
of Sun Microsystems, Inc.