The Java Specialists' Newsletter [Issue 047] - Lack of Streaming
leads to Screaming
Author: Dr. Heinz M. Kabutz
JDK version:
Category: Language
You can subscribe from our home page:
http://www.javaspecialists.co.za (which also hosts all previous issues,
available free of charge
Welcome to the 47th edition of The Java(tm) Specialists' Newsletter,
read by over 3600 Java programmers in
82 countries. I
have put my Mauritius trip with photos on my website
http://www.javaspecialists.co.za,
please have a look under "Courses". Warning: prolonged looking at that
webpage is known to cause envy - please enter at own risk!
I am planning Java and Design Patterns courses in South Africa for June.
Please have a look at
our website for more information.
Is your company thinking of venturing into new business but you do not have
the necessary resources? Are you scared of subcontracting work to a non-English
speaking country due to the communication problems involved? (no offense to my
non-English readers - this is my advert, ok? Then South Africa is your dream
come true. In South Africa we speak and write English fluently, so you will not
have the typical communication problems that you would find in non-English
speaking countries. Our software developers are highly skilled, very good
at solving problems and able to pull up their sleeves and get to work.
Internationally, South Africans are known for their hard work and dedication to
the task at hand. If your company is looking for such resources, please
contact me by simply replying to this email, and I will personally see to it
that you are contacted within 24 hours.
In the last two weeks I received two questions from readers who ran out of
memory when trying to read a big object from the database. In this newsletter I
want to explore how you can read a big object from a database without killing
your poor JVM, and be scalable as well.
Lack of Streaming leads to Screaming
How do you retrieve big objects from the database in Java? Say you have a
database containing previews of movies in DivX format, stored as IMAGE
columns. How do you retrieve the 25 megabyte file from the database using JDBC?
Simple. We write a SELECT statement, execute it, and say
result_set.getBytes(1). We run the
code and it works well for small movie snippets, but as soon as we have a 25
megabyte file, our poor JVM throws an OutOfMemoryError. What's
annoying about an OutOfMemoryError is that the stack trace is not
filled in (because hey, you've run out of memory!), so you cannot exactly
determine where the error occurred, unless you add trace logging. What makes it
even more tricky is that some JDBC drivers try to be too clever, resulting in
OutOfMemoryErrors.
Let's look at some test code. I have written two test classes,
TestDatabaseBlobInsert and TestDatabaseBlobFetch. What
surprised me was that of the several drivers that I tested (DataDirect, iNet
SPRINTA, Avenir, MS SQL Server Type 4, JDBC/ODBC bridge), the JDBC/ODBC bridge
was the fastest for inserting big objects into MS SQL Server. For fetching the
data it was the slowest:
import java.sql.DriverManager;
import java.sql.Statement;
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.io.ByteArrayInputStream;
public class TestDatabaseBlobInsert {
private static final String TABLE_DROP =
"DROP TABLE MovieArchive";
private static final String TABLE_CREATE =
"Create Table MovieArchive (moviedata image, title varchar(255))";
private static final String TABLE_INSERT =
"INSERT INTO MovieArchive (title, moviedata) VALUES (?,?)";
private static final int size = 25 * 1024 * 1024;
private final byte[] data = new byte[size];
private final Connection con;
public TestDatabaseBlobInsert(String driver, String url,
String user, String password)
throws SQLException, ClassNotFoundException {
Class.forName(driver);
con = DriverManager.getConnection(url, user, password);
System.out.println("Driver: " + driver);
for (int i=0; ibyte)(Math.random()*255);
}
public void setUp() throws SQLException {
Statement st = con.createStatement();
try {
System.out.println("Dropping old table");
st.executeUpdate(TABLE_DROP);
} catch(SQLException ex) {} // table might not exist
System.out.println("Creating new table");
st.executeUpdate(TABLE_CREATE);
st.close();
}
public void testInsertWithBinaryStream() throws SQLException {
long start = -System.currentTimeMillis();
System.out.println("Inserting via BinaryStream");
PreparedStatement stmt = con.prepareStatement(TABLE_INSERT);
ByteArrayInputStream bis = new ByteArrayInputStream(data);
stmt.setString(1, "Babe");
stmt.setBinaryStream(2, bis, data.length);
stmt.executeUpdate();
start += System.currentTimeMillis();
System.out.println("That took " + start + "ms");
stmt.close();
}
public void testInsertWithSetBytes() throws SQLException {
long start = -System.currentTimeMillis();
System.out.println("Inserting via setBytes()");
PreparedStatement stmt = con.prepareStatement(TABLE_INSERT);
stmt.setString(1, "On Her Majesty's Secret Service");
stmt.setBytes(2, data);
stmt.executeUpdate();
start += System.currentTimeMillis();
System.out.println("That took " + start + "ms");
stmt.close();
}
public void testAll() throws SQLException {
setUp();
testInsertWithBinaryStream();
testInsertWithSetBytes();
}
public static void main(String[] args) throws Exception {
if (args.length != 4) usage();
TestDatabaseBlobInsert test = new TestDatabaseBlobInsert(
args[0], args[1], args[2], args[3]);
test.testAll();
}
private static void usage() {
System.out.println(
"Usage: TestDatabaseBlobInsert driver url username password");
System.exit(1);
}
}
I ran this code by setting up an ODBC source pointing to the MS SQL Server
database Movies, and then running it with a maximum heap space of 256MB:
java -Xmx256m -classpath . TestDatabaseBlobInsert sun.jdbc.odbc.JdbcOdbcDriver jdbc:odbc:Movies sa ""
The result on my little notebook was the following:
Driver: sun.jdbc.odbc.JdbcOdbcDriver
Dropping old table
Creating new table
Inserting via BinaryStream
That took 78975ms
Inserting via setBytes()
That took 73419ms
Back to the issue at hand - how do we get this data out of the database? The
seemingly easiest way is to do the following:
1: PreparedStatement st = con.prepareStatement(
"SELECT moviedata FROM MovieArchive WHERE title = ?");
2: st.setString(1, "Babe");
3: ResultSet rs = st.executeQuery();
4: if (rs.next()) {
5: byte[] data = rs.getBytes(1);
}
This code can easily cause an OutOfMemoryError if the available
heap memory is less than the size of the data that you are reading. Now for the
1'000'000 dollar question: Where does OutOfMemoryError occur? That depends on
your driver. If you are using the iNet SPRINTA or the Avenir drivers, then you
will run out of memory on line 4, i.e. when you call rs.next(). If
you are using the DataDirect, Microsoft or ODBC bridge drivers, you will only
get the out of memory error on line 5.
How can we write this so that we won't get an out of memory error? Here is
some sample code. It is very important that you read the data blocks of
bytes at a time, rather than in one big chunk, otherwise your system will
definitely not scale to support many users.
import java.sql.*;
import java.io.*;
public class TestDatabaseBlobFetch {
private static final String TABLE_SELECT =
"SELECT moviedata FROM MovieArchive WHERE title = ?";
private final Connection con;
public TestDatabaseBlobFetch(String driver, String url,
String user, String password)
throws SQLException, ClassNotFoundException {
Class.forName(driver);
con = DriverManager.getConnection(url, user, password);
System.out.println("Driver: " + driver);
}
public void testSelectBlocksAtATime() throws SQLException {
long start = -System.currentTimeMillis();
System.out.println("SELECT: 64kb blocks at a time");
PreparedStatement stmt = con.prepareStatement(TABLE_SELECT);
stmt.setString(1, "Babe");
ResultSet rs = stmt.executeQuery();
int count=0;
if (rs.next()) {
try {
System.out.println("Retrieving Data");
OutputStream out = new BufferedOutputStream(
new FileOutputStream("Data.1"));
InputStream in = new BufferedInputStream(
rs.getBinaryStream(1));
byte[] buf = new byte[65536];
int i;
while((i = in.read(buf, 0, buf.length)) != -1) {
out.write(buf, 0, i);
count += i;
}
out.close();
} catch(IOException ex) { ex.printStackTrace(); }
}
System.out.println("fetched " + count + " bytes");
start += System.currentTimeMillis();
System.out.println("That took " + start + "ms");
stmt.close();
}
public void testSelectWithGetBytes() throws SQLException {
long start = -System.currentTimeMillis();
System.out.println("SELECT: all at once");
PreparedStatement stmt = con.prepareStatement(TABLE_SELECT);
stmt.setString(1, "Babe");
ResultSet rs = stmt.executeQuery();
byte[] data = null;
if (rs.next()) {
System.out.println("Retrieving Data");
data = rs.getBytes(1);
try {
FileOutputStream out = new FileOutputStream("Data.2");
out.write(data, 0, data.length);
out.close();
} catch(IOException ex) { ex.printStackTrace(); }
}
System.out.println("fetched " + data.length + " bytes");
start += System.currentTimeMillis();
System.out.println("That took " + start + "ms");
stmt.close();
}
public void testAll() throws SQLException {
testSelectBlocksAtATime();
testSelectWithGetBytes();
}
public static void main(String[] args) throws Exception {
if (args.length != 4) usage();
TestDatabaseBlobFetch test = new TestDatabaseBlobFetch (
args[0], args[1], args[2], args[3]);
test.testAll();
}
private static void usage() {
System.out.println(
"usage: TestDatabaseBlobFetch driver url username password");
System.exit(1);
}
}
I tried this with several JDBC drivers, the only Type 4 driver that worked
correclty was the DataDirect driver (now released under the Microsoft label). I
will not go into the differences between the Microsoft driver and the others,
that's for another article. If you want to try this out, you can run it like
this:
java -Xmx2m -classpath .;msbase.jar;mssqlserver.jar;msutil.jar
TestDatabaseBlobFetch com.microsoft.jdbc.sqlserver.SQLServerDriver
jdbc:microsoft:sqlserver://localhost:1433;DatabaseName=Movies sa ""
Naturally you have to download the Microsoft SQL Server Type 4 driver and put
the jar files into the directory from which you are running this code. The
output from using the DataDirect Microsoft driver is the following on my
machine:
Driver: com.microsoft.jdbc.sqlserver.SQLServerDriver
SELECT: 64kb blocks at a time
Retrieving Data
fetched 26214400 bytes
That took 62746ms
SELECT: all at once
Retrieving Data
Exception in thread "main" java.lang.OutOfMemoryError
<>
The iNet SPRINTA driver falls over much sooner - actually when you call
rs.next():
Driver: com.inet.tds.TdsDriver
SELECT: 64kb blocks at a time
Exception in thread "main" java.lang.OutOfMemoryError
<>
What is wrong here?
I am finding it very hard to think of a reason to store 25mb files in a
database. They are too big to stay in the database's cache for very long. I
think that the design is flawed to start with. I would personally rather store
the URL to the file in the database, instead of the actual data, and then
retrieve the data directly from the file system.
I don't know all the conditions why someone would want to do that, but just
remember, you have to stream such big data out of the database chunk by chunk,
otherwise you have a serious problem.
Until the next issue ...
Heinz
Copyright 2000-2004 Maximum Solutions, South Africa
Reprint Rights. Copyright subsists in all the material included
in this email, but you may freely share the entire email with anyone you feel
may be interested, and you may reprint excerpts both online and offline provided
that you acknowledge the source as follows: This material from The Java(tm)
Specialists' Newsletter by Maximum Solutions (South Africa). Please contact
Maximum Solutions for more
information.
Java and Sun are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Maximum Solutions is independent
of Sun Microsystems, Inc.
15551 bytes more | comments? | | Score: 0
|