Easy to Learn Java: Programming Articles, Examples and Tips

Start with Java in a few days with Java Lessons or Lectures

Home

Code Examples

Java Tools

More Java Tools!

Java Forum

All Java Tips

Books

Submit News
Search the site here...
Search...
 

Diff -- text file difference utility, using native code Java code example - Click here to copy ->>>

   Can't find what you're looking for? Try our search:

Really working examples categorized by API, package, class. You can compile and run our examples right away! Not from source code for Java projects - only working examples! Copy, compile and run!
------------------

Code:

 
 /***************************************************************************
*
* diff         Text file difference utility.
* ----         Copyright 1987, 1989 by Donald C. Lindsay,
*              School of Computer Science,  Carnegie Mellon University.
*              Copyright 1982 by Symbionics.
*              Use without fee is permitted when not for direct commercial
*              advantage, and when credit to the source is given. Other uses
*              require specific permission.
*
* USEAGE:      diff oldfile newfile
*
* This program assumes that "oldfile" and "newfile" are text files.
* The program writes to stdout a description of the changes which would
* transform "oldfile" into "newfile".
*
* The printout is in the form of commands, each followed by a block of
* text. The text is delimited by the commands, which are:
*
*    DELETE AT n
*         ..deleted lines
*
*    INSERT BEFORE n
*         ..inserted lines
*
*    n MOVED TO BEFORE n
*         ..moved lines
*
*    n CHANGED FROM
*         ..old lines
*    CHANGED TO
*         ..newer lines
*
* The line numbers all refer to the lines of the oldfile, as they are
*    numbered before any commands are applied.
* The text lines are printed as-is, without indentation or prefixing. The
*    commands are printed in upper case, with a prefix of ">>>>", so that
*    they will stand out. Other schemes may be preferred.
* Input lines which are longer than MAXLINELEN characters will be chopped
*    into multiple lines.
* Files which contain more than MAXLINECOUNT lines cannot be processed.
* The algorithm is taken from Communications of the ACM, Apr78 (21, 4, 264-),
*    "A Technique for Isolating Differences Between Files."
*    Ignoring I/O, and ignoring the symbol table, it should take O(N) time.
*    This implementation takes fixed space, plus O(U) space for the symbol
*    table (where U is the number of unique lines). Methods exist to change
*    the fixed space to O(N) space.
* Note that this is not the only interesting file-difference algorithm. In
*    general, different algorithms draw different conclusions about the
*    changes that have been made to the oldfile. This algorithm is sometimes
*    "more right", particularly since it does not consider a block move to be
*    an insertion and a (separate) deletion. However, on some files it will be
*    "less right". This is a consequence of the fact that files may contain
*    many identical lines (particularly if they are program source). Each
*    algorithm resolves the ambiguity in its own way, and the resolution
*    is never guaranteed to be "right". However, it is often excellent.
* This program is intended to be pe*****gic.  Specifically, this program was
*    the basis of the Literate Programming column which appeared in the
*    Communications of the ACM (CACM), in the June 1989 issue (32, 6,
*    740-755).
* By "pe*****gic", I do not mean that the program is gracefully worded, or
*    that it showcases language features or its algorithm. I also do not mean
*    that it is highly accessible to beginners, or that it is intended to be
*    read in full, or in a particular order. Rather, this program is an
*    example of one professional's style of keeping things organized and
*    maintainable.
* This program is a de-engineered version of a program which uses less
*    memory and less time.  The article points out that the "symbol" arrays
*    can be implemented as arrays of pointers to arrays, with dynamic
*    allocation of the subarrays.  (Macros are very useful for hiding the
*    two-level accesses.) This allows an extremely large value for
*    MAXLINECOUNT, without dedicating fixed arrays. (The "other" and
*    "blocklen" arrays can be allocated after the input phase, when the exact
*    sizes are known.) The only slow piece of code is the "strcmp" in the tree
*    descent: it can be speeded up by keeping a hash in the tree node, and
*    only using "strcmp" when two hashes happen to be equal.
* This program has been coded with 5-column tab settings.
*
* Change Log
* ----------
* 10jun89 D.C.Lindsay, CMU: posted version created.
*         Copyright notice changed to ACM style, and Dept. is now School.
*         ACM article referenced in docn.
* 26sep87 D.C.Lindsay, CMU: publication version created.
*         Condensed all 1982/83 change log entries.
*         Removed all command line options, and supporting code. This
*         simplified the input code (no case reduction etc). It also
*         simplified the symbol table, which was capable of remembering
*         offsets into files (instead of strings), and trusting (!) hash
*         values to be unique.
*         Removed dynamic allocation of arrays: now fixed static arrays.
*         Removed speed optimizations in symtab package.
*         Removed string compression/decompression code.
*         Recoded to Unix standards from old Lattice/MSDOS standards.
*         (This affected only the #include's and the IO.)
*         Some renaming of variables, and rewording of comments.
* 1982/83 D.C.Lindsay, Symbionics: created.
*
*/

#define MAXLINECOUNT 20000         /* arbitrary */
#define MAXLINELEN  255            /* arbitrary */

#include <stdio.h>
/************************************************/
#define UNREAL (MAXLINECOUNT+2)  /* block len > any possible real block len */

/************************************************/

struct info{                         /* This is the info kept per-file.     */
     FILE *file;                     /* File handle that is open for read.  */
     int maxline;                    /* After input done, # lines in file.  */
     char *symbol[ MAXLINECOUNT+2 ]; /* The symtab handle of each line.     */
     int other   [ MAXLINECOUNT+2 ]; /* Map of line# to line# in other file */
                                     /* ( -1 means don't-know ).            */
} oldinfo, newinfo;

int blocklen[ MAXLINECOUNT+2 ];
/* The above is the info about found blocks. It will be set to 0, except
*  at the line#s where blocks start in the old file. At these places it
*  will be set to the # of lines in the block. During the printout phase,
*  this value will be reset to -1 if the block is printed as a MOVE block.
*  (This is because the printout phase will encounter the block twice, but
*  must only print it once. )
*  The array declarations are to MAXLINECOUNT+2 so that we can have two
*  extra lines (pseudolines) at line# 0 and line# MAXLINECOUNT+1 (or less).
*/

               /* function predeclarations */
               /* (These could be eliminated by a reordering) */
FILE *openfile();
void inputscan();
void storeline();
void transform();
void scanunique();
void scanafter();
void scanbefore();
void scanblocks();
void printout();
void newconsume();
void oldconsume();
void showdelete();
void showinsert();
void showchange();
void skipold();
void showsame();
void showmove();
void initsymtab();
char *addsymbol();
int symbolisunique();
int lineofsymbol();
void showsymbol();
/***************************************************************************
*
* main         Entry point.
* ----
*
* NOTE: no routines return error codes. Instead, any routine may complain
*       to stderr and then exit with error to the system. This property
*       is not mentioned in the various routine headers.
*
***************************************************************************/
main( argcount, argstrings )
int argcount;
char *argstrings[];
{
     if( argcount != 3 ) {
          fprintf( stderr, "TRY: diff oldfile newfile\n\007" );  /* (bel) */
          exit(1);
     }
     printf( ">>>> Difference of file \"%s\" and file \"%s\".\n\n",
          argstrings[1], argstrings[2] );
     initsymtab();
     oldinfo.file = openfile( argstrings[1] );
     newinfo.file = openfile( argstrings[2] );
     /* note, we don't process until we know both files really do exist. */
     inputscan( &oldinfo );
     inputscan( &newinfo );
     transform();
     printout();
     exit(0);       /* OK */
}
/***************************************************************************
*
* openfile     Opens the filename for reading.
* --------     Returns the file handle.
*
***************************************************************************/
FILE *openfile( filename )
char *filename;
{
     FILE *file = fopen( filename, "r" );
     if( file == NULL ) {
          fprintf( stderr, "CAN'T OPEN FILE %s\n\007", filename );  /* (bel) */
          exit(1);
     }
     return file;
}
/***************************************************************************
*
* inputscan    Reads the file specified by pinfo->file.
* ---------    Places the lines of that file in the symbol table.
*              Sets pinfo->maxline to the number of lines found.
*              Expects initsymtab has been called.
*
****************************************************************************/
void inputscan( pinfo )
struct info *pinfo;
{
     char ch, linebuffer[ MAXLINELEN+1 ];    /* ac*****ulate lines to here */
     int linelen = 0;                        /* # of chars in linebuffer */

     pinfo-> maxline = 0;
     for(;;) {
          ch = getc( pinfo-> file );
          if( ch == EOF ) break;
          if( ch == '\n' ){                                 /* end of line */
               storeline( linebuffer, linelen, pinfo );
               linelen = 0;
          }else if( linelen >= MAXLINELEN ) {               /* overflow */
               storeline( linebuffer, linelen, pinfo );
               linelen = 1;
               linebuffer[ 0 ] = ch;                   /* start next line */
          }else linebuffer[ linelen++ ] = ch;
     }
     if( linelen != 0 ) storeline( linebuffer, linelen, pinfo );  /* runt */
}
/**************************************************************************
*
* storeline    Places line into symbol table.
* ---------    Expects pinfo-> maxline initted: increments.
*              Places symbol table handle in pinfo->symbol.
*              Expects pinfo is either &oldinfo or &newinfo.
*              Expects linebuffer contains linelen nonnull chars.
*              Expects linebuffer has room to write a trailing nul into.
*              Expects initsymtab has been called.
*
***************************************************************************/
void storeline( linebuffer, linelen, pinfo )
char linebuffer[];
int linelen;
struct info *pinfo;
{
     int linenum = ++( pinfo-> maxline );    /* note, no line zero */
     if( linenum > MAXLINECOUNT ) {
          fprintf( stderr, "MAXLINECOUNT exceeded, must stop.\n\007" );
          exit(1);
     }
     linebuffer[ linelen ] = '\0';           /* nul terminate */
     pinfo-> symbol[ linenum ] =
          addsymbol( linebuffer, linelen, pinfo == &oldinfo, linenum );
}
/***************************************************************************
*
* transform    Expects both files in symtab.
* ---------    Expects valid "maxline" and "symbol" in oldinfo and newinfo.
*              Analyzes the file differences and leaves its findings in
*              the global arrays oldinfo.other, newinfo.other, and blocklen.
*
***************************************************************************/
void transform()
{                                 
     int oldline, newline;
     int oldmax = oldinfo.maxline + 2;  /* Count pseudolines at  */
     int newmax = newinfo.maxline + 2;  /* ..front and rear of file */

     for(oldline=0; oldline < oldmax; oldline++ ) oldinfo.other[oldline]= -1;
     for(newline=0; newline < newmax; newline++ ) newinfo.other[newline]= -1;

     scanunique();  /* scan for lines used once in both files */
     scanafter();   /* scan past sure-matches for non-unique blocks */
     scanbefore();  /* scan backwards from sure-matches */
     scanblocks();  /* find the fronts and lengths of blocks */
}
/***************************************************************************
*
* scanunique   Expects both files in symtab, and oldinfo and newinfo valid.
* ----------   Scans for lines which are used exactly once in each file.
*              The appropriate "other" array entries are set to the line# in
*              the other file.
*              Claims pseudo-lines at 0 and XXXinfo.maxline+1 are unique.
*
***************************************************************************/
void scanunique()
{
     int oldline, newline;
     char *psymbol;

     for( newline = 1; newline <= newinfo.maxline; newline++ ) {
          psymbol = newinfo.symbol[ newline ];
          if( symbolisunique( psymbol )) {        /* 1 use in each file */
               oldline = lineofsymbol( psymbol );
               newinfo.other[ newline ] = oldline;     /* record a 1-1 map */
               oldinfo.other[ oldline ] = newline;
          }
     }
     newinfo.other[ 0 ] = 0;
     oldinfo.other[ 0 ] = 0;
     newinfo.other[ newinfo.maxline + 1 ] = oldinfo.maxline + 1;
     oldinfo.other[ oldinfo.maxline + 1 ] = newinfo.maxline + 1;
}
/***************************************************************************
*
* scanafter    Expects both files in symtab, and oldinfo and newinfo valid.
* ---------    Expects the "other" arrays contain positive #s to indicate
*              lines that are unique in both files.
*              For each such pair of places, scans past in each file.
*              Contiguous groups of lines that match non-uniquely are
*              taken to be good-enough matches, and so marked in "other".
*              Assumes each other[0] is 0.
*
***************************************************************************/
void scanafter()
{
     int oldline, newline;

     for( newline = 0; newline <= newinfo.maxline; newline++ ) {
          oldline = newinfo.other[ newline ];
          if( oldline >= 0 ) {          /* is unique in old & new */
               for(;;) {                /* scan after there in both files */
                    if( ++oldline > oldinfo.maxline   ) break;
                    if( oldinfo.other[ oldline ] >= 0 ) break;
                    if( ++newline > newinfo.maxline   ) break;
                    if( newinfo.other[ newline ] >= 0 ) break;

                    /* oldline & newline exist, and aren't already matched */

                    if( newinfo.symbol[ newline ] !=
                        oldinfo.symbol[ oldline ] ) break;  /* not same */

                    newinfo.other[ newline ] = oldline;   /* record a match */
                    oldinfo.other[ oldline ] = newline;
               }
          }
     }
}
/***************************************************************************
*
* scanbefore   As scanafter, except scans towards file fronts.
* ----------   Assumes the off-end lines have been marked as a match.
*
***************************************************************************/
void scanbefore()
{
     int oldline, newline;

     for( newline = newinfo.maxline + 1; newline > 0; newline-- ) {
          oldline = newinfo.other[ newline ];
          if( oldline >= 0 ) {               /* unique in each */
               for(;;) {
                    if( --oldline <= 0                ) break;
                    if( oldinfo.other[ oldline ] >= 0 ) break;
                    if( --newline <= 0                ) break;
                    if( newinfo.other[ newline ] >= 0 ) break;
     
                    /* oldline and newline exist, and aren't marked yet */

                    if( newinfo.symbol[ newline ] !=
                        oldinfo.symbol[ oldline ] ) break;  /* not same */

                    newinfo.other[ newline ] = oldline;   /* record a match */
                    oldinfo.other[ oldline ] = newline;
               }
          }
     }
}
/***************************************************************************
*
* scanblocks   Expects oldinfo valid.
* ----------   Finds the beginnings and lengths of blocks of matches.
*              Sets the blocklen array (see definition).
*
***************************************************************************/
void scanblocks()
{
     int oldline, newline;
     int oldfront = 0;      /* line# of front of a block in old file, or 0  */
     int newlast = -1;      /* newline's value during the previous iteration*/

     for( oldline = 1; oldline <= oldinfo.maxline; oldline++ )
               blocklen[ oldline ] = 0;
     blocklen[ oldinfo.maxline + 1 ] = UNREAL;    /* starts  a mythical blk */

     for( oldline = 1; oldline <= oldinfo.maxline; oldline++ ) {
          newline = oldinfo.other[ oldline ];
          if( newline < 0 ) oldfront = 0;         /* no match: not in block */
          else{                                   /* match. */
               if( oldfront == 0 )         oldfront = oldline;
               if( newline != (newlast+1)) oldfront = oldline;
               ++blocklen[ oldfront ];           
          }
          newlast = newline;
     }
}
/***************************************************************************
*
* printout     Expects all data structures have been filled out.
* --------     Prints summary to stdout.
*
***************************************************************************/
          /* The following are global to printout's subsidiary routines */

enum{ idle, delete, insert, movenew, moveold, same, change } printstatus;
enum{ false, true } anyprinted;
int printoldline, printnewline;         /* line numbers in old & new file */

void printout()
{
     printstatus = idle;
     anyprinted = false;
     for( printoldline = printnewline = 1; ; ) {
          if( printoldline > oldinfo.maxline ) { newconsume(); break;}
          if( printnewline > newinfo.maxline ) { oldconsume(); break;}
          if(      newinfo.other[ printnewline ] < 0 ) {
               if( oldinfo.other[ printoldline ] < 0 )           showchange();
               else                                              showinsert();
          }
          else if( oldinfo.other[ printoldline ] < 0 )           showdelete();
          else if( blocklen[ printoldline ] < 0 )                  skipold();
          else if( oldinfo.other[ printoldline ] == printnewline ) showsame();
          else                                                     showmove();
     }
     if( anyprinted == true ) printf( ">>>> End of differences.\n"  );
     else                     printf( ">>>> Files are identical.\n" );
}
/***************************************************************************
*
* newconsume        Part of printout. Have run out of old file.
* ----------        Print the rest of the new file, as inserts and/or moves.
*
***************************************************************************/
void newconsume()
{
     for(;;) {
          if( printnewline > newinfo.maxline ) break;        /* end of file */
          if( newinfo.other[ printnewline ] < 0 ) showinsert();
          else                                    showmove();
     }
}
/***************************************************************************
*
* oldconsume        Part of printout. Have run out of new file.
* ----------        Process the rest of the old file, printing any
*                   parts which were deletes or moves.
*
***************************************************************************/
void oldconsume()
{
     for(;;) {
          if( printoldline > oldinfo.maxline ) break;       /* end of file */
          printnewline = oldinfo.other[ printoldline ];
          if( printnewline < 0 ) showdelete();
          else if( blocklen[ printoldline ] < 0 ) skipold();
          else showmove();
     }
}
/***************************************************************************
*
* showdelete        Part of printout.
* ----------        Expects printoldline is at a deletion.
*
***************************************************************************/
void showdelete()
{
     if( printstatus != delete ) printf( ">>>> DELETE AT %d\n", printoldline);
     printstatus = delete;
     showsymbol( oldinfo.symbol[ printoldline ]);
     anyprinted = true;
     printoldline++;
}
/***************************************************************************
*
* showinsert        Part of printout.
* ----------        Expects printnewline is at an insertion.
*
***************************************************************************/
void showinsert()
{
     if( printstatus == change ) printf( ">>>>     CHANGED TO\n" );
     else if( printstatus != insert )
          printf( ">>>> INSERT BEFORE %d\n", printoldline );
     printstatus = insert;
     showsymbol( newinfo.symbol[ printnewline ]);
     anyprinted = true;
     printnewline++;
}
/***************************************************************************
*
* showchange        Part of printout.
* ----------        Expects printnewline is an insertion.
*                   Expects printoldline is a deletion.
*
***************************************************************************/
void showchange()
{
     if( printstatus != change )
          printf( ">>>> %d CHANGED FROM\n", printoldline );
     printstatus = change;
     showsymbol( oldinfo.symbol[ printoldline ]);
     anyprinted = true;
     printoldline++;
}
/***************************************************************************
*
* skipold           Part of printout.
* -------           Expects printoldline at start of an old block that has
*                   already been announced as a move.
*                   Skips over the old block.
*
***************************************************************************/
void skipold()
{
     printstatus = idle;
     for(;;) {
          if( ++printoldline > oldinfo.maxline ) break;     /* end of file  */
          if( oldinfo.other[ printoldline ] < 0 ) break;    /* end of block */
          if( blocklen[ printoldline ]) break;          /* start of another */
     }
}
/***************************************************************************
*
* skipnew           Part of printout.
* -------           Expects printnewline is at start of a new block that has
*                   already been announced as a move.
*                   Skips over the new block.
*
***************************************************************************/
void skipnew()
{
     int oldline;
     printstatus = idle;
     for(;;) {
          if( ++printnewline > newinfo.maxline ) break;    /* end of file  */
          oldline = newinfo.other[ printnewline ];
          if( oldline < 0 ) break;                         /* end of block */
          if( blocklen[ oldline ]) break;              /* start of another */

     }
}
/***************************************************************************
*
* showsame          Part of printout.
* --------          Expects printnewline and printoldline at start of
*                   two blocks that aren't to be displayed.
*
***************************************************************************/
void showsame()
{
     int count;
     printstatus = idle;
     if( newinfo.other[ printnewline ] != printoldline ) {
          fprintf( stderr, "BUG IN LINE REFERENCING\n\007" );     /* (bel) */
          exit(1);
     }
     count = blocklen[ printoldline ];
     printoldline += count;
     printnewline += count;
}
/***************************************************************************
*
* showmove          Part of printout.
* --------          Expects printoldline, printnewline at start of
*                   two different blocks ( a move was done).
*
***************************************************************************/
void showmove()
{
     int oldblock = blocklen[ printoldline ];
     int newother = newinfo.other[ printnewline ];
     int newblock = blocklen[ newother ];

     if( newblock < 0 ) skipnew();           /* already printed */
     else if( oldblock >= newblock ) {       /* assume new's blk moved */
          blocklen[ newother ] = -1;         /* stamp block as "printed" */
          printf( ">>>> %d MOVED TO BEFORE %d\n", newother, printoldline );
          for( ; newblock > 0; newblock--, printnewline++ )
               showsymbol( newinfo.symbol[ printnewline ]);
          anyprinted = true;
          printstatus = idle;

     }else                         /* assume old's block moved */
          skipold();               /* target line# not known, display later */

}
/***************************************************************************
*
* The symbol table routines follow. They are a "package" and all
* understand the symbol table format, which is a binary tree.
* The "entry points" are: initsymtab, addsymbol, symbolisunique,
* lineofsymbol, and showsymbol.
*
***************************************************************************/

enum linestates{ freshnode, oldonce, newonce, bothonce, other };

struct node{                       /* the tree is made up of these nodes */
     struct node *pleft, *pright;
     int linenum;
     enum linestates linestate;
     /* The text of the line is stored after this, as a varying-length
     *  nul-terminated string.
     */
};
struct node *panchor;    /* symtab is a tree hung from this */

/* The following macro computes the address of the string part of a node. */
#define PTEXT( PNODE )   (sizeof( struct node) + (char *)PNODE)
/***************************************************************************
*
* initsymtab        Must be called, once, before any calls to addsymbol.
* ----------
*
***************************************************************************/
void initsymtab()
{
     panchor = NULL;
}
/***************************************************************************
*
* newnode        Allocates a new symbol table node and fills in its fields.
* -------        Expects pline -> a nul-terminated string of linelen
*                non-nul characters. Copies this string into the new node.
*                Sets linestate = freshnode. Does not set linenum.
*                Returns a pointer to the new node.
*
***************************************************************************/
struct node *newnode( pline, linelen )
char *pline;
int linelen;
{
     struct node *new =
          (struct node *) malloc( 1 + linelen + sizeof( struct node ));
     if( new == NULL ) {
          fprintf( stderr, "Out of memory, sorry.\n\007");       /* (bel) */
          exit(1);
     }
     new-> pleft = new-> pright = NULL;
     new-> linestate = freshnode;
     /* linenum field is not always valid */     
     strcpy( PTEXT( new ), pline );
     return new;
}
/***************************************************************************
*
* matchsymbol       Searches tree for a match to the line.
* ----------        Expects pline-> a nul-terminated string of linelen
*                   non-null chars.
*                   Returns a ptr to a matching node.
*                   If node's linestate == freshnode, then created the node.
*
***************************************************************************/
struct node *matchsymbol( pline, linelen )
char *pline;
int linelen;
{
     int comparison;
     struct node *pnode = panchor;
     if( panchor == NULL ) return panchor = newnode( pline, linelen );
     for(;;) {
          comparison = strcmp( PTEXT(pnode), pline );
          if( comparison == 0 ) return pnode;          /* found */

          if( comparison < 0 ) {
               if( pnode-> pleft == NULL ) {
                    pnode->pleft = newnode( pline, linelen );
                    return pnode-> pleft;
               }
               pnode = pnode-> pleft;
          }
          if( comparison > 0 ) {
               if( pnode-> pright == NULL ) {
                    pnode->pright = newnode( pline, linelen );
                    return pnode-> pright;
               }
               pnode = pnode-> pright;
          }
     }
     /* NOTE: There are return stmts, so control does not get here. */
}
/***************************************************************************
*
* addsymbol    Expects pline-> a string with linelen non-nul chars.
* ---------    Saves that line into the symbol table.
*              Returns a handle to the symtab entry for that unique line.
*              If inoldfile nonzero, then linenum is remembered.
*              Expects initsymbtab has been called, once.
*
****************************************************************************/
char *addsymbol( pline, linelen, inoldfile, linenum )
char *pline;
int linelen, inoldfile, linenum;
{
     struct node *pnode;
     pnode = matchsymbol( pline, linelen );  /* find the node in the tree */
     if( pnode-> linestate == freshnode ) {
          pnode-> linestate = inoldfile ? oldonce : newonce;
     }else{
          if(( pnode-> linestate == oldonce && !inoldfile ) ||
             ( pnode-> linestate == newonce &&  inoldfile ))
               pnode-> linestate = bothonce;
          else pnode-> linestate = other;
     }
     if( inoldfile ) pnode-> linenum = linenum;
     return (char *)pnode;
}
/***************************************************************************
*
* symbolisunique    Arg is a ptr previously returned by addsymbol.
* --------------    Returns true if the line was added to the
*                   symbol table exactly once with inoldfile true,
*                   and exactly once with inoldfile false.
*
***************************************************************************/
int symbolisunique( psymbol )
struct node *psymbol;
{
     return (psymbol-> linestate == bothonce );
}
/***************************************************************************
*
* lineofsymbol      Arg is a ptr previously returned by addsymbol.
* ------------      Returns the line number stored with the line.
*
***************************************************************************/
int lineofsymbol( psymbol )
struct node *psymbol;
{
     return psymbol-> linenum;
}
/***************************************************************************
*
* showsymbol        Arg is a ptr previously returned by addsymbol.
* ----------        Prints the line to stdout.
*
***************************************************************************/
void showsymbol( psymbol )
struct node *psymbol;
{
     printf( "%s\n", PTEXT( psymbol ) );
}
/***************************************************************************/

 
/*
 * Copyright (c) Ian F. Darwin, http://www.darwinsys.com/, 1996-2002.
 * All rights reserved. Software written by Ian F. Darwin and others.
 * $Id: LICENSE,v 1.8 2004/02/09 03:33:38 ian Exp $
*/
// Diff -- text file difference utility.
// See full docu-comment at beginning of Diff class.

// $Id: Diff.java,v 1.6 2004/02/09 03:34:05 ian Exp $

import java.io.*;

/** This is the info kept per-file.     */
class fileInfo {

  static final int MAXLINECOUNT = 20000;

  DataInputStream file;  /* File handle that is open for read.  */
  public int maxLine;  /* After input done, # lines in file.  */
  node symbol[]/* The symtab handle of each line. */
  int other[]/* Map of line# to line# in other file */
                                /* ( -1 means don't-know ).            */
        /* Allocated AFTER the lines are read. */

  /**
   * Normal constructor with one filename; file is opened and saved.
   */
  fileInfoString filename ) {
    symbol = new node MAXLINECOUNT+];
    other  = null;    // allocated later!
    try {
      file = new DataInputStream(
        new FileInputStreamfilename));
    catch (IOException e) {
        System.err.println("Diff can't read file " +
        filename );
        System.err.println("Error Exception was:" + e );
        System.exit(1);
    }
  }
  // This is done late, to be same size as # lines in input file.
  void alloc() {
    other  = new int[symbol.length + 2];
  }
};

/**
 * diff         Text file difference utility.
 * ----         Copyright 1987, 1989 by Donald C. Lindsay,
 *              School of Computer Science,  Carnegie Mellon University.
 *              Copyright 1982 by Symbionics.
 *              Use without fee is permitted when not for direct commercial
 *              advantage, and when credit to the source is given. Other uses
 *              require specific permission.
 *
 * Converted from C to Java by Ian F. Darwin, http://www.darwinsys.com/, January, 1997.
 * Copyright 1997, Ian F. Darwin.
 *
 * Conversion is NOT FULLY TESTED.
 *
 * USAGE:      diff oldfile newfile
 *
 * This program assumes that "oldfile" and "newfile" are text files.
 * The program writes to stdout a description of the changes which would
 * transform "oldfile" into "newfile".
 *
 * The printout is in the form of commands, each followed by a block of
 * text. The text is delimited by the commands, which are:
 *
 *    DELETE AT n
 *         ..deleted lines
 *
 *    INSERT BEFORE n
 *         ..inserted lines
 *
 *    n MOVED TO BEFORE n
 *         ..moved lines
 *
 *    n CHANGED FROM
 *         ..old lines
 *    CHANGED TO
 *         ..newer lines
 *
 * The line numbers all refer to the lines of the oldfile, as they are
 *    numbered before any commands are applied.
 * The text lines are printed as-is, without indentation or prefixing. The
 *    commands are printed in upper case, with a prefix of ">>>>", so that
 *    they will stand out. Other schemes may be preferred.
 * Files which contain more than MAXLINECOUNT lines cannot be processed.
 *    This can be fixed by changing "symbol" to a Vector.
 * The algorithm is taken from Communications of the ACM, Apr78 (21, 4, 264-),
 *    "A Technique for Isolating Differences Between Files."
 *    Ignoring I/O, and ignoring the symbol table, it should take O(N) time.
 *    This implementation takes fixed space, plus O(U) space for the symbol
 *    table (where U is the number of unique lines). Methods exist to change
 *    the fixed space to O(N) space.
 * Note that this is not the only interesting file-difference algorithm. In
 *    general, different algorithms draw different conclusions about the
 *    changes that have been made to the oldfile. This algorithm is sometimes
 *    "more right", particularly since it does not consider a block move to be
 *    an insertion and a (separate) deletion. However, on some files it will be
 *    "less right". This is a consequence of the fact that files may contain
 *    many identical lines (particularly if they are program source). Each
 *    algorithm resolves the ambiguity in its own way, and the resolution
 *    is never guaranteed to be "right". However, it is often excellent.
 * This program is intended to be pe*****gic.  Specifically, this program was
 *    the basis of the Literate Programming column which appeared in the
 *    Communications of the ACM (CACM), in the June 1989 issue (32, 6,
 *    740-755).
 * By "pe*****gic", I do not mean that the program is gracefully worded, or
 *    that it showcases language features or its algorithm. I also do not mean
 *    that it is highly accessible to beginners, or that it is intended to be
 *    read in full, or in a particular order. Rather, this program is an
 *    example of one professional's style of keeping things organized and
 *    maintainable.
 * The program would be better if the "print" variables were wrapped into
 *    a struct. In general, grouping related variables in this way improves
 *    do*****entation, and adds the ability to pass the group in argument lists.
 * This program is a de-engineered version of a program which uses less
 *    memory and less time.  The article points out that the "symbol" arrays
 *    can be implemented as arrays of pointers to arrays, with dynamic
 *    allocation of the subarrays.  (In C, macros are very useful for hiding
 *    the two-level accesses.) In Java, a Vector would be used. This allows an
 *    extremely large value for MAXLINECOUNT, without dedicating fixed arrays.
 *    (The "other" array can be allocated after the input phase, when the exact
 *    sizes are known.) The only slow piece of code is the "strcmp" in the tree
 *    descent: it can be speeded up by keeping a hash in the tree node, and
 *    only using "strcmp" when two hashes happen to be equal.
 *
 * Change Log
 * ----------
 *  1Jan97 Ian F. Darwin: first working rewrite in Java, based entirely on
 *   D.C.Lindsay's reasonable C version.
 *  Changed comments from /***************** to /**, shortened, added
 *  whitespace, used tabs more, etc.
 *  6jul89 D.C.Lindsay, CMU: fixed portability bug. Thanks, Gregg Wonderly.
 *         Just changed "char ch" to "int ch".
 *         Also added comment about way to improve code.
 * 10jun89 D.C.Lindsay, CMU: posted version created.
 *         Copyright notice changed to ACM style, and Dept. is now School.
 *         ACM article referenced in docn.
 * 26sep87 D.C.Lindsay, CMU: publication version created.
 *         Condensed all 1982/83 change log entries.
 *         Removed all command line options, and supporting code. This
 *         simplified the input code (no case reduction etc). It also
 *         simplified the symbol table, which was capable of remembering
 *         offsets into files (instead of strings), and trusting (!) hash
 *         values to be unique.
 *         Removed dynamic allocation of arrays: now fixed static arrays.
 *         Removed speed optimizations in symtab package.
 *         Removed string compression/decompression code.
 *         Recoded to Unix standards from old Lattice/MSDOS standards.
 *         (This affected only the #include's and the IO.)
 *         Some renaming of variables, and rewording of comments.
 * 1982/83 D.C.Lindsay, Symbionics: created.
 *
 @author  Ian F. Darwin, Java version
 @version  Java version 0.9, 1997
 @author  D. C. Lindsay, C version (1982-1987)
 *
 */
public class Diff {

  /** block len > any possible real block len */
  final int UNREAL=Integer.MAX_VALUE;

  /** Keeps track of information about file1 and file2 */
  fileInfo oldinfo, newinfo;

  /** blocklen is the info about found blocks. It will be set to 0, except
   * at the line#s where blocks start in the old file. At these places it
   * will be set to the # of lines in the block. During printout ,
   * this # will be reset to -1 if the block is printed as a MOVE block
   * (because the printout phase will encounter the block twice, but
   * must only print it once.)
   * The array declarations are to MAXLINECOUNT+2 so that we can have two
   * extra lines (pseudolines) at line# 0 and line# MAXLINECOUNT+1
   * (or less).
   */
  int blocklen[];

  /**
   * main - entry point when used standalone.
   * NOTE: no routines return error codes or throw any local
   * exceptions. Instead, any routine may complain
   * to stderr and then exit with error to the system.
   */
  public static void main(String argstrings[])
  {
    if argstrings.length != ) {
      System.err.println("Usage: diff oldfile newfile" );
      System.exit(1);
    }
    Diff d = new Diff();
    d.doDiff(argstrings[0], argstrings[1]);
    return;
  }

  /** Construct a Diff object. */
  Diff() {
  }

  /** Do one file comparison. Called with both filenames. */
  public void doDiff(String oldFile, String newFile) {
    println">>>> Difference of file \"" + oldFile +
      "\" and file \"" + newFile + "\".\n");
    oldinfo = new fileInfo(oldFile);
    newinfo = new fileInfo(newFile);
    /* we don't process until we know both files really do exist. */
    try {
      inputscanoldinfo );
      inputscannewinfo );
    catch (IOException e) {
      System.err.println("Read error: " + e);
    }

    /* Now that we've read all the lines, allocate some arrays.
     */
    blocklen = new int[ (oldinfo.maxLine>newinfo.maxLine?
      oldinfo.maxLine : newinfo.maxLine];
    oldinfo.alloc();
    newinfo.alloc();

    /* Now do the work, and print the results. */
    transform();
    printout();
  }

  /**
   * inputscan    Reads the file specified by pinfo.file.
   * ---------    Places the lines of that file in the symbol table.
   *              Sets pinfo.maxLine to the number of lines found.
   */
  void inputscanfileInfo pinfo throws IOException
  {
       String linebuffer;

       pinfo.maxLine = 0;
       while ((linebuffer = pinfo.file.readLine()) != null) {
           storelinelinebuffer, pinfo );
       }
  }

  /**
   * storeline    Places line into symbol table.
   * ---------    Expects pinfo.maxLine initted: increments.
   *              Places symbol table handle in pinfo.ymbol.
   *              Expects pinfo is either oldinfo or newinfo.
   */
  void storelineString linebuffer, fileInfo pinfo )
  {
       int linenum = ++pinfo.maxLine;    /* note, no line zero */
       if linenum > fileInfo.MAXLINECOUNT ) {
      System.err.println"MAXLINECOUNT exceeded, must stop." );
      System.exit(1);
       }
       pinfo.symbollinenum =
      node.addSymbollinebuffer, pinfo == oldinfo, linenum );
  }

  /*
   * transform
   * Analyzes the file differences and leaves its findings in
   * the global arrays oldinfo.other, newinfo.other, and blocklen.
   * Expects both files in symtab.
   * Expects valid "maxLine" and "symbol" in oldinfo and newinfo.
   */
  void transform()
  {
       int oldline, newline;
       int oldmax = oldinfo.maxLine + 2;  /* Count pseudolines at  */
       int newmax = newinfo.maxLine + 2;  /* ..front and rear of file */

       for (oldline=0; oldline < oldmax; oldline++ )
      oldinfo.other[oldline]= -1;
       for (newline=0; newline < newmax; newline++ )
      newinfo.other[newline]= -1;

       scanunique();  /* scan for lines used once in both files */
       scanafter();   /* scan past sure-matches for non-unique blocks */
       scanbefore();  /* scan backwards from sure-matches */
       scanblocks();  /* find the fronts and lengths of blocks */
  }

  /*
   * scanunique
   * Scans for lines which are used exactly once in each file.
   * Expects both files in symtab, and oldinfo and newinfo valid.
   * The appropriate "other" array entries are set to the line# in
   * the other file.
   * Claims pseudo-lines at 0 and XXXinfo.maxLine+1 are unique.
   */
  void scanunique()
  {
       int oldline, newline;
       node psymbol;

       fornewline = 1; newline <= newinfo.maxLine; newline++ ) {
      psymbol = newinfo.symbolnewline ];
      if psymbol.symbolIsUnique()) {        // 1 use in each file
           oldline = psymbol.linenum;
           newinfo.othernewline = oldline; // record 1-1 map
           oldinfo.otheroldline = newline;
      }
       }
       newinfo.other0;
       oldinfo.other0;
       newinfo.othernewinfo.maxLine + = oldinfo.maxLine + 1;
       oldinfo.otheroldinfo.maxLine + = newinfo.maxLine + 1;
  }

  /*
   * scanafter
   * Expects both files in symtab, and oldinfo and newinfo valid.
   * Expects the "other" arrays contain positive #s to indicate
   * lines that are unique in both files.
   * For each such pair of places, scans past in each file.
   * Contiguous groups of lines that match non-uniquely are
   * taken to be good-enough matches, and so marked in "other".
   * Assumes each other[0] is 0.
   */
  void scanafter()
  {
       int oldline, newline;

       fornewline = 0; newline <= newinfo.maxLine; newline++ ) {
      oldline = newinfo.othernewline ];
      if oldline >= ) {  /* is unique in old & new */
           for(;;) {  /* scan after there in both files */
          if ++oldline > oldinfo.maxLine   break;
          if oldinfo.otheroldline >= break;
          if ++newline > newinfo.maxLine   break;
          if newinfo.othernewline >= break;

          /* oldline & newline exist, and
        aren't already matched */

          if newinfo.symbolnewline !=
        oldinfo.symbololdline ] ) break;  // not same

          newinfo.other[newline= oldline; // record a match
          oldinfo.other[oldline= newline;
           }
      }
       }
  }

  /**
   * scanbefore
   * As scanafter, except scans towards file fronts.
   * Assumes the off-end lines have been marked as a match.
   */
  void scanbefore()
  {
       int oldline, newline;

       fornewline = newinfo.maxLine + 1; newline > 0; newline-- ) {
      oldline = newinfo.othernewline ];
      if oldline >= ) {               /* unique in each */
           for(;;) {
          if --oldline <= 0                break;
          if oldinfo.otheroldline >= break;
          if --newline <= 0                break;
          if newinfo.othernewline >= break;

          /* oldline and newline exist,
        and aren't marked yet */

          if newinfo.symbolnewline !=
        oldinfo.symbololdline ] ) break;  // not same

          newinfo.other[newline= oldline; // record a match
          oldinfo.other[oldline= newline;
           }
      }
       }
  }

  /**
   * scanblocks - Finds the beginnings and lengths of blocks of matches.
   * Sets the blocklen array (see definition).
   * Expects oldinfo valid.
   */
  void scanblocks()
  {
       int oldline, newline;
       int oldfront = 0;      // line# of front of a block in old, or 0
       int newlast = -1;      // newline's value during prev. iteration

       foroldline = 1; oldline <= oldinfo.maxLine; oldline++ )
           blocklenoldline 0;
       blocklenoldinfo.maxLine + = UNREAL; // starts a mythical blk

       foroldline = 1; oldline <= oldinfo.maxLine; oldline++ ) {
      newline = oldinfo.otheroldline ];
      if newline < oldfront = 0;  /* no match: not in block */
      else{                                   /* match. */
           if oldfront == )         oldfront = oldline;
           if newline != (newlast+1)) oldfront = oldline;
           ++blocklenoldfront ];
      }
      newlast = newline;
       }
  }

  /* The following are global to printout's subsidiary routines */
  // enum{ idle, delete, insert, movenew, moveold,
  // same, change } printstatus;
  public static final int
    idle = 0, delete = 1, insert = 2, movenew = 3, moveold = 4,
    same = 5, change = 6;
  int printstatus;
  boolean anyprinted;
  int printoldline, printnewline;     // line numbers in old & new file

  /**
   * printout - Prints summary to stdout.
   * Expects all data structures have been filled out.
   */
  void printout()
  {
       printstatus = idle;
       anyprinted = false;
       forprintoldline = printnewline = 1; ; ) {
      if printoldline > oldinfo.maxLine ) { newconsume()break;}
      if printnewline > newinfo.maxLine ) { oldconsume()break;}
      if (      newinfo.otherprintnewline ) {
           if oldinfo.otherprintoldline )
        showchange();
           else
        showinsert();
      }
      else if oldinfo.otherprintoldline )
      showdelete();
      else if blocklenprintoldline )
      skipold();
      else if oldinfo.otherprintoldline == printnewline )
      showsame();
      else
      showmove();
       }
       if anyprinted == true println">>>> End of differences."  );
       else                     println">>>> Files are identical." );
  }

  /*
   * newconsume        Part of printout. Have run out of old file.
   * Print the rest of the new file, as inserts and/or moves.
   */
  void newconsume()
  {
       for(;;) {
      if printnewline > newinfo.maxLine )
      break;        /* end of file */
      if newinfo.otherprintnewline showinsert();
      else                                    showmove();
       }
  }

  /**
   * oldconsume        Part of printout. Have run out of new file.
   * Process the rest of the old file, printing any
   * parts which were deletes or moves.
   */
  void oldconsume()
  {
       for(;;) {
      if printoldline > oldinfo.maxLine )
      break;       /* end of file */
      printnewline = oldinfo.otherprintoldline ];
      if printnewline < showdelete();
      else if blocklenprintoldline skipold();
      else showmove();
       }
  }

  /**
   * showdelete        Part of printout.
   * Expects printoldline is at a deletion.
   */
  void showdelete()
  {
    if printstatus != delete )
      println">>>> DELETE AT " + printoldline);
    printstatus = delete;
    oldinfo.symbolprintoldline ].showSymbol();
    anyprinted = true;
    printoldline++;
  }

  /*
   * showinsert        Part of printout.
   * Expects printnewline is at an insertion.
   */
  void showinsert()
  {
       if printstatus == change println">>>>     CHANGED TO" );
       else if printstatus != insert )
      println">>>> INSERT BEFORE " + printoldline );
       printstatus = insert;
       newinfo.symbolprintnewline ].showSymbol();
       anyprinted = true;
       printnewline++;
  }

  /**
   * showchange        Part of printout.
   * Expects printnewline is an insertion.
   *  Expects printoldline is a deletion.
   */
  void showchange()
  {
       if printstatus != change )
      println">>>> " + printoldline + " CHANGED FROM");
       printstatus = change;
       oldinfo.symbolprintoldline ].showSymbol();
       anyprinted = true;
       printoldline++;
  }

  /**
   * skipold           Part of printout.
   * Expects printoldline at start of an old block that has
   * already been announced as a move.
   * Skips over the old block.
   */
  void skipold()
  {
       printstatus = idle;
       for(;;) {
      if ++printoldline > oldinfo.maxLine )
      break;     /* end of file  */
      if oldinfo.otherprintoldline )
      break;    /* end of block */
      if blocklenprintoldline ]!=0)
      break;          /* start of another */
       }
  }

  /**
   * skipnew           Part of printout.
   * Expects printnewline is at start of a new block that has
   * already been announced as a move.
   * Skips over the new block.
   */
  void skipnew()
  {
       int oldline;
       printstatus = idle;
       for(;;) {
      if ++printnewline > newinfo.maxLine )
      break;    /* end of file  */
      oldline = newinfo.otherprintnewline ];
      if oldline < )
      break;                         /* end of block */
      if blocklenoldline != 0)
      break;              /* start of another */
       }
  }

  /**
   * showsame          Part of printout.
   * Expects printnewline and printoldline at start of
   * two blocks that aren't to be displayed.
   */
  void showsame()
  {
       int count;
       printstatus = idle;
       if newinfo.otherprintnewline != printoldline ) {
      System.err.println("BUG IN LINE REFERENCING");
      System.exit(1);
       }
       count = blocklenprintoldline ];
       printoldline += count;
       printnewline += count;
  }

  /**
   * showmove          Part of printout.
   * Expects printoldline, printnewline at start of
   * two different blocks ( a move was done).
   */
  void showmove()
  {
       int oldblock = blocklenprintoldline ];
       int newother = newinfo.otherprintnewline ];
       int newblock = blocklennewother ];

       if newblock < skipnew();         // already printed.
       else if oldblock >= newblock ) {     // assume new's blk moved.
      blocklen[newother= -1;         // stamp block as "printed".
      println">>>> " + newother +
      " THRU " (newother + newblock - 1+
      " MOVED TO BEFORE " + printoldline );
      for; newblock > 0; newblock--, printnewline++ )
           newinfo.symbolprintnewline ].showSymbol();
      anyprinted = true;
      printstatus = idle;

       else                /* assume old's block moved */
      skipold();      /* target line# not known, display later */
  }

  /** Convenience wrapper for println */
  public void println(String s) {
    System.out.println(s);
  }
};        // end of main class!

/**
 * Class "node". The symbol table routines in this class all
 * understand the symbol table format, which is a binary tree.
 * The methods are: addSymbol, symbolIsUnique, showSymbol.
 */
class node{                       /* the tree is made up of these nodes */
  node pleft, pright;
  int linenum;

  static final int freshnode = 0,
  oldonce = 1, newonce = 2, bothonce = 3, other = 4;

  int /* enum linestates */ linestate;
  String line;

  static node panchor = null;    /* symtab is a tree hung from this */

  /**
   * Construct a new symbol table node and fill in its fields.
   @param        string  A line of the text file
   */
  nodeString pline)
  {
       pleft = pright = null;
       linestate = freshnode;
       /* linenum field is not always valid */
       line = pline;
  }

  /**
   * matchsymbol       Searches tree for a match to the line.
   @param  String  pline, a line of text
   * If node's linestate == freshnode, then created the node.
   */
  static node matchsymbolString pline )
  {
       int comparison;
       node pnode = panchor;
       if panchor == null return panchor = new nodepline);
       for(;;) {
      comparison = pnode.line.compareTo(pline);
      if comparison == return pnode;          /* found */

      if comparison < ) {
           if pnode.pleft == null ) {
          pnode.pleft = new nodepline);
          return pnode.pleft;
           }
           pnode = pnode.pleft;
      }
      if comparison > ) {
           if pnode.pright == null ) {
          pnode.pright = new nodepline);
          return pnode.pright;
           }
           pnode = pnode.pright;
      }
       }
       /* NOTE: There are return stmts, so control does not get here. */
  }

  /**
   * addSymbol(String pline) - Saves line into the symbol table.
   * Returns a handle to the symtab entry for that unique line.
   * If inoldfile nonzero, then linenum is remembered.
   */
  static node addSymbolString pline, boolean inoldfile, int linenum )
  {
    node pnode;
    pnode = matchsymbolpline );  /* find the node in the tree */
    if pnode.linestate == freshnode ) {
      pnode.linestate = inoldfile ? oldonce : newonce;
    else {
      if (( pnode.linestate == oldonce && !inoldfile ||
          pnode.linestate == newonce &&  inoldfile ))
           pnode.linestate = bothonce;
      else pnode.linestate = other;
    }
    if (inoldfilepnode.linenum = linenum;
    return pnode;
  }

  /**
   * symbolIsUnique    Arg is a ptr previously returned by addSymbol.
   * --------------    Returns true if the line was added to the
   *                   symbol table exactly once with inoldfile true,
   *                   and exactly once with inoldfile false.
   */
  boolean symbolIsUnique()
  {
    return (linestate == bothonce );
  }

  /**
   * showSymbol        Prints the line to stdout.
   */
  void showSymbol()
  {
    System.out.println(line);
  }
}
/**
* Java Code from Java Cookbook, Second Edition
* By Ian F. Darwin
* Second Edition June 2004
* Series: Cookbooks
* ISBN: 0-596-00701-9
* Details: http://www.oreilly.com/catalog/javacook2/index.html
* Copyright (c) Ian F. Darwin, http://www.darwinsys.com/, 1996-2002.
* All rights reserved. Software written by Ian F. Darwin and others.
* $Id: LICENSE,v 1.8 2004/02/09 03:33:38 ian Exp $
*/



References.

The list of classes which were used on this page you can find below. The links to Java API contain official SUN documentation about all used classes.

  java.io.BufferedReader

  org.w3c.dom.Text

  java.io.File

  java.util.Map

  javax.print.attribute.standard.Copies

  java.io.DataInputStream

  java.io.FileInputStream

  java.io.IOException

  java.lang.Error

  java.lang.Exception

  java.util.Vector

  java.lang.Process

  java.lang.Class




[ Go Back ]



Home Code Examples Java Forum All Java Tips Books Submit News, Code... Search... Offshore Software Tech Doodling

RSS feed Java FAQ RSS feed Java FAQ News     

    RSS feed Java Forums RSS feed Java Forums

All logos and trademarks in this site are property of their respective owner. The comments are property of their posters, all the rest 1999-2006 by Java FAQs Daily Tips.

Interactive software released under GNU GPL, Code Credits, Privacy Policy