Programming Languages 17 May 2008 11:25 am

Weird Designs in Java

So, maybe I have become warped. I was writing a little utility last night to extract any files from a jar that matched a given regex pattern. I did it all test-first, but it evolved in a weird way. It seems like a weird mix of functional and object-oriented programming.

My first pass was to identify all the files in the jar that matched my regex, but ultimately, I wanted the contents of those matching files. So, I wrote the version that loops through the jar entries and collects matches first. Once that worked and the tests passed, I wrote the next iteration to gather the contents.

In a language that supports functions, both would have been dead simple. Here’s the mostly-working code. (I didn’t run these.)

For getting all matching names:


jar.entries.select {|e| e.name =~ /regex/ }.map {|e| e.name }

For getting all matching names’ content:


jar.entries.select {|e| e.name =~ /regex/ }.map {|e| jar.read(e) }
// see below about the weird that is reading from jars

I might extract the common select call, but it is dead simple; kind of seems like excess.

def content_of_matching(jar, pattern, &block)
  jar.entries.select {|e| e.name =~ /regex/ }.map &block
end  	

names = content_of_matching(jar, regex) {|e| e.name }
contents = content_of_matching(jar, regex) {|e| jar.read(e) }

With the Java version, once I had written the version that gets all matching files’ names, I realized that it was just a variant of the map-a-function-over-entries problem. So, I used the template method pattern to make retrieveMatches call out to getContent for each entry.

Then I created two public static final anonymous subclasses of my RegexJarFileExtractor, each overriding getContent to provide the variant behavior of getting the content, or getting the name. (I later put these behind creator methods as you’ll see below.)

At first, I was really happy that it was so clean. I just subclassed for the variant behavior. Then I kind of wondered, if I should actually create real subclass class definitions. Seems like overkill.

Next, I kibbitzed about the template method getContent which is the variant overridden in each type of extractor. Due to the way the jarfile java library is written, you iterate jarEntry objects and then ask the jarFile for an inputstream on an entry, so to getContent in the case of really wanting the content I need both the jarFile and the jarEntry as parameters. However, in the case where getContent just returns the jarEntry‘s name, I only need the jarEntry itself. So, that feels a little dirty.

It also got me thinking, the way you would normally do both of these variants in Java, is just to create two static methods, repeating all the iterator code and changing the internal work of the loop in each one to get the content, or the name, of the entry. Now-a-days, that repetition seems repugnant, but separate classes seems like overkill.

Also, I had developed it test-first, piece-meal, so the individual methods that do work are package-protected so as to still be available to the test. They aren’t marked private. One could argue that the some of the internal methods are useful themselves, like retrieveMatches, and jarForClass. I would just argue that package scope says, “leave it alone”.

By creating instances of extractors in creator methods, instead of the static final instances as fields, I now create a new instance of the given extractor each time a retrieval entry-point method is called. I originally just declared a public static final instance field for each anonymous subclass variant.

  /**
   * Instance of RegexjarFileExtractor that returns the content of the matching entry
   *     as one long string.
   */
  public static final RegexJarFileExtractor CONTENT_EXTRACTOR = 
    new RegexJarFileExtractor() {
      protected String getContent(JarEntry entry, 
                                  JarFile jarFile) throws IOException {
        StringBuilder stringBuilder = new StringBuilder();
        BufferedReader reader = null;
        try {
          reader = new BufferedReader(
                          new InputStreamReader(
                               jarFile.getInputStream(entry)));
          String line;
          while ((line = reader.readLine()) != null) {
            stringBuilder.append(line);
          }
        } finally {
          reader.close();
        }
        return stringBuilder.toString();
      }
    };

  /**
   * Instance of RegexjarFileExtractor that returns the name of the matching entry.
   */
  public static final RegexJarFileExtractor NAME_EXTRACTOR = 
    new RegexJarFileExtractor() {
      protected String getContent(JarEntry entry, 
                                  JarFile jarFile) throws IOException {
        return entry.getName();
      }
    };

This meant only one instance of each was ever in the system, and all work is done in the retrieveMatches and getContent method, so I don’t think there is shared state that could get messed up by multiple threads. But somehow, it seemed too gratuituously functional-style. I probably would’ve kept it that way for myself.

Here’s the class. I can provide the test if you like. The first two methods are the entry points. The first retrieves content for matching files, and the second retrieves the names of matching files.

Note: It has some extra code that finds the right jarFile to load. This utility will be used in a running system to retrieve files from a jar on the classpath.

So, the first two methods use that to identify the jarFile, then proceed to find matching files in the jar file.

What do you think? I have barely coded in the last 2 months, so maybe I am just getting back into Java-land. Would this confuse people who only do Java?

import java.io.*;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.List;
import java.util.jar.JarEntry;
import java.util.jar.JarFile;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Utility to retrieve files from a jar that 
 * match a given regex pattern.
 *
 *
 * @author bobevans (Bob Evans)
 *
 */
public abstract class RegexJarFileExtractor {

  /**
   * Returns the contents of files, matching the given pattern, 
   *     from the jar that holds className.
   *
   * @param classNameInJar
   * @param pattern
   * @return
   * @throws IOException
   * @throws ClassNotFoundException
   */
  public static List retrieveMatchingContent(String classNameInJar, 
                                                     String pattern)
      throws IOException, ClassNotFoundException {
    String jarPath = RegexJarFileExtractor.getJarFileForClass(classNameInJar);
    return createContentExtractor().retrieveMatches(new File(jarPath), 
                                                    pattern);
  }

  /**
   * Returns the names of files, matching the given pattern, from
   *  the jar that holds className.
   *

   * @param classNameInJar
   * @param pattern
   * @return
   * @throws IOException
   * @throws ClassNotFoundException
   */
  public static List retrieveMatchingNames(String classNameInJar, 
                                           String pattern)
      throws IOException, ClassNotFoundException {
    String jarPath = RegexJarFileExtractor.getJarFileForClass(classNameInJar);
    return createNameExtractor().retrieveMatches(new File(jarPath), 
                                                 pattern);
  }

  /**
   * Overridable behavior for retrieving some part of the 
   * matching file to return, e.g., retrieve the name of 
   * the matching file, or perhaps the content of the matching file.
   *
   * @param entry A File in the jar that matches the search pattern.
   * @param jarFile The container of the files and their contents.
   * @return String Data from the entry in the jarFile.
   * @throws IOException
   */
  protected abstract String getContent(JarEntry entry, JarFile jarFile) 
       throws IOException;

  /**
   * Makes an instance of RegexJarFileExtractor that returns 
   *      the content of the matching entry.
   * Note: returns one long string.
   * 
   * @return RegexJarFileExtractor
   */
  private static RegexJarFileExtractor createContentExtractor() {
    return new RegexJarFileExtractor() {
      protected String getContent(JarEntry entry, 
                                  JarFile jarFile) throws IOException {
        StringBuilder stringBuilder = new StringBuilder();
        BufferedReader reader = null;
        try {
          reader = new BufferedReader(       
                         new InputStreamReader(
                               jarFile.getInputStream(entry)));
          String line;
          while ((line = reader.readLine()) != null) {
            stringBuilder.append(line);
          }
        } finally {
          if (reader != null) {
            reader.close();
          }
        }
        return stringBuilder.toString();
      }
    };
  }

  /**
   * Makes an instance of RegexJarFileExtractor that returns 
   * the name of the matching entry.
   * Mostly used for test purposes.
   *
   * @return RegexJarFileExtractor 
   */
  static RegexJarFileExtractor createNameExtractor() {
    return new RegexJarFileExtractor() {
      protected String getContent(JarEntry entry, 
                                  JarFile jarFile) throws IOException {
        return entry.getName();
      }
    };
  }

  /**
   * Given the name of a class that was loaded from a jar classloader,
   *    return the absolute file path to that jar on the local disk.
   *
   * @param className Name of the class in the jar.
   * @return String The file system path for the jar that contains 
   *    the class named className.
   * @throws ClassNotFoundException
   */
  static String getJarFileForClass(String className) 
        throws ClassNotFoundException {
    checkArgument(className);
    String classPath = getClassFullPath(className);
    final int protocolIndex = classPath.indexOf(":");
    final int jarPathEndIndex = classPath.indexOf("!");
    checkValidjarFilePath(protocolIndex, jarPathEndIndex);
    return classPath.subSequence(protocolIndex +1,
                                 jarPathEndIndex).toString();
  }

  /**
   * For a given jarFile, retrieve data from files matching 
   *     the regex pattern string.
   *
   * @param jarFile A jarfile of interest.
   * @param patternString A regex pattern to match against names of jar files.
   * @return List A collection of data/content from files in the jar
   *      that matched the patternString. One entry per file.
   * @throws IOException
   */
  List retrieveMatches(File jarFile, String patternString) 
       throws IOException {
    checkMatchArguments(jarFile, patternString);
    Pattern pattern = Pattern.compile(patternString);
    JarFile jar = new JarFile(jarFile);

    List matchingContent = new ArrayList();
    final Enumeration entries = jar.entries();
    while (entries.hasMoreElements()) {
      JarEntry entry = entries.nextElement();
      Matcher m = pattern.matcher(entry.getName());
      if (m.matches()) {
        matchingContent.add(getContent(entry, jar));
      }
    }
    return matchingContent;
  }

  /**
   * Get the path for a classfile on disk, inside the jar.
   * E.g. file:/home/bob/my.jar!com/google/MyClass.class
   *
   * @param className
   * @return Absolute path to class, with file: protocol, 
   *             jarName and filename.   
   * @throws ClassNotFoundException
   */
  private static String getClassFullPath(String className) 
         throws ClassNotFoundException {
    Class jarClass = Class.forName(className);
    return getResourceFullPath(className+".class", jarClass);
  }

  /**
   * Retrieve a resourceName from the classLoader for a given jarClass.
   *
   * @param resourceName
   * @param jarClass
   * @return
   */
  private static String getResourceFullPath(String resourceName, 
                                            Class jarClass) {
    return jarClass.getClassLoader().getResource(resourceName).getFile();
  }

  private static void checkArgument(String className) {
    if (isEmpty(className)) {
      throw new IllegalArgumentException("Invalid className");
    }
  }

  private static void checkMatchArguments(File file, String pattern) {
    if (isEmpty(pattern) || badFile(file)) {
      throw new IllegalArgumentException("Invalid pattern or jarFile.");
    }
  }

  private static void checkValidjarFilePath(int protocolIndex, 
                                            int jarPathEndIndex) {
    if (protocolIndex == -1 || jarPathEndIndex == -1) {
      throw new IllegalArgumentException("Invalid jarFile path.");
    }
  }

  private static boolean badFile(File jarFile) {
    return jarFile == null || !jarFile.exists();
  }

  private static boolean isEmpty(String pattern) {
    return pattern == null || pattern.length() == 0;
  }
}

Subscribe to the comments through RSS Feed

Leave a Reply