Category ArchiveProgramming Languages



Programming Languages 17 May 2008 11:25 am

Weird Designs in Java

So, maybe I have become warped. I was writing a little utility last night to extract any files from a jar that matched a given regex pattern. I did it all test-first, but it evolved in a weird way. It seems like a weird mix of functional and object-oriented programming.

My first pass was to identify all the files in the jar that matched my regex, but ultimately, I wanted the contents of those matching files. So, I wrote the version that loops through the jar entries and collects matches first. Once that worked and the tests passed, I wrote the next iteration to gather the contents.

In a language that supports functions, both would have been dead simple. Here’s the mostly-working code. (I didn’t run these.)

For getting all matching names:


jar.entries.select {|e| e.name =~ /regex/ }.map {|e| e.name }

For getting all matching names’ content:


jar.entries.select {|e| e.name =~ /regex/ }.map {|e| jar.read(e) }
// see below about the weird that is reading from jars

I might extract the common select call, but it is dead simple; kind of seems like excess.

def content_of_matching(jar, pattern, &block)
  jar.entries.select {|e| e.name =~ /regex/ }.map &block
end  	

names = content_of_matching(jar, regex) {|e| e.name }
contents = content_of_matching(jar, regex) {|e| jar.read(e) }

With the Java version, once I had written the version that gets all matching files’ names, I realized that it was just a variant of the map-a-function-over-entries problem. So, I used the template method pattern to make retrieveMatches call out to getContent for each entry.

Then I created two public static final anonymous subclasses of my RegexJarFileExtractor, each overriding getContent to provide the variant behavior of getting the content, or getting the name. (I later put these behind creator methods as you’ll see below.)

At first, I was really happy that it was so clean. I just subclassed for the variant behavior. Then I kind of wondered, if I should actually create real subclass class definitions. Seems like overkill.

Next, I kibbitzed about the template method getContent which is the variant overridden in each type of extractor. Due to the way the jarfile java library is written, you iterate jarEntry objects and then ask the jarFile for an inputstream on an entry, so to getContent in the case of really wanting the content I need both the jarFile and the jarEntry as parameters. However, in the case where getContent just returns the jarEntry‘s name, I only need the jarEntry itself. So, that feels a little dirty.

It also got me thinking, the way you would normally do both of these variants in Java, is just to create two static methods, repeating all the iterator code and changing the internal work of the loop in each one to get the content, or the name, of the entry. Now-a-days, that repetition seems repugnant, but separate classes seems like overkill.

Also, I had developed it test-first, piece-meal, so the individual methods that do work are package-protected so as to still be available to the test. They aren’t marked private. One could argue that the some of the internal methods are useful themselves, like retrieveMatches, and jarForClass. I would just argue that package scope says, “leave it alone”.

By creating instances of extractors in creator methods, instead of the static final instances as fields, I now create a new instance of the given extractor each time a retrieval entry-point method is called. I originally just declared a public static final instance field for each anonymous subclass variant.

  /**
   * Instance of RegexjarFileExtractor that returns the content of the matching entry
   *     as one long string.
   */
  public static final RegexJarFileExtractor CONTENT_EXTRACTOR = 
    new RegexJarFileExtractor() {
      protected String getContent(JarEntry entry, 
                                  JarFile jarFile) throws IOException {
        StringBuilder stringBuilder = new StringBuilder();
        BufferedReader reader = null;
        try {
          reader = new BufferedReader(
                          new InputStreamReader(
                               jarFile.getInputStream(entry)));
          String line;
          while ((line = reader.readLine()) != null) {
            stringBuilder.append(line);
          }
        } finally {
          reader.close();
        }
        return stringBuilder.toString();
      }
    };

  /**
   * Instance of RegexjarFileExtractor that returns the name of the matching entry.
   */
  public static final RegexJarFileExtractor NAME_EXTRACTOR = 
    new RegexJarFileExtractor() {
      protected String getContent(JarEntry entry, 
                                  JarFile jarFile) throws IOException {
        return entry.getName();
      }
    };

This meant only one instance of each was ever in the system, and all work is done in the retrieveMatches and getContent method, so I don’t think there is shared state that could get messed up by multiple threads. But somehow, it seemed too gratuituously functional-style. I probably would’ve kept it that way for myself.

Here’s the class. I can provide the test if you like. The first two methods are the entry points. The first retrieves content for matching files, and the second retrieves the names of matching files.

Note: It has some extra code that finds the right jarFile to load. This utility will be used in a running system to retrieve files from a jar on the classpath.

So, the first two methods use that to identify the jarFile, then proceed to find matching files in the jar file.

What do you think? I have barely coded in the last 2 months, so maybe I am just getting back into Java-land. Would this confuse people who only do Java?

import java.io.*;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.List;
import java.util.jar.JarEntry;
import java.util.jar.JarFile;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Utility to retrieve files from a jar that 
 * match a given regex pattern.
 *
 *
 * @author bobevans (Bob Evans)
 *
 */
public abstract class RegexJarFileExtractor {

  /**
   * Returns the contents of files, matching the given pattern, 
   *     from the jar that holds className.
   *
   * @param classNameInJar
   * @param pattern
   * @return
   * @throws IOException
   * @throws ClassNotFoundException
   */
  public static List retrieveMatchingContent(String classNameInJar, 
                                                     String pattern)
      throws IOException, ClassNotFoundException {
    String jarPath = RegexJarFileExtractor.getJarFileForClass(classNameInJar);
    return createContentExtractor().retrieveMatches(new File(jarPath), 
                                                    pattern);
  }

  /**
   * Returns the names of files, matching the given pattern, from
   *  the jar that holds className.
   *

   * @param classNameInJar
   * @param pattern
   * @return
   * @throws IOException
   * @throws ClassNotFoundException
   */
  public static List retrieveMatchingNames(String classNameInJar, 
                                           String pattern)
      throws IOException, ClassNotFoundException {
    String jarPath = RegexJarFileExtractor.getJarFileForClass(classNameInJar);
    return createNameExtractor().retrieveMatches(new File(jarPath), 
                                                 pattern);
  }

  /**
   * Overridable behavior for retrieving some part of the 
   * matching file to return, e.g., retrieve the name of 
   * the matching file, or perhaps the content of the matching file.
   *
   * @param entry A File in the jar that matches the search pattern.
   * @param jarFile The container of the files and their contents.
   * @return String Data from the entry in the jarFile.
   * @throws IOException
   */
  protected abstract String getContent(JarEntry entry, JarFile jarFile) 
       throws IOException;

  /**
   * Makes an instance of RegexJarFileExtractor that returns 
   *      the content of the matching entry.
   * Note: returns one long string.
   * 
   * @return RegexJarFileExtractor
   */
  private static RegexJarFileExtractor createContentExtractor() {
    return new RegexJarFileExtractor() {
      protected String getContent(JarEntry entry, 
                                  JarFile jarFile) throws IOException {
        StringBuilder stringBuilder = new StringBuilder();
        BufferedReader reader = null;
        try {
          reader = new BufferedReader(       
                         new InputStreamReader(
                               jarFile.getInputStream(entry)));
          String line;
          while ((line = reader.readLine()) != null) {
            stringBuilder.append(line);
          }
        } finally {
          if (reader != null) {
            reader.close();
          }
        }
        return stringBuilder.toString();
      }
    };
  }

  /**
   * Makes an instance of RegexJarFileExtractor that returns 
   * the name of the matching entry.
   * Mostly used for test purposes.
   *
   * @return RegexJarFileExtractor 
   */
  static RegexJarFileExtractor createNameExtractor() {
    return new RegexJarFileExtractor() {
      protected String getContent(JarEntry entry, 
                                  JarFile jarFile) throws IOException {
        return entry.getName();
      }
    };
  }

  /**
   * Given the name of a class that was loaded from a jar classloader,
   *    return the absolute file path to that jar on the local disk.
   *
   * @param className Name of the class in the jar.
   * @return String The file system path for the jar that contains 
   *    the class named className.
   * @throws ClassNotFoundException
   */
  static String getJarFileForClass(String className) 
        throws ClassNotFoundException {
    checkArgument(className);
    String classPath = getClassFullPath(className);
    final int protocolIndex = classPath.indexOf(":");
    final int jarPathEndIndex = classPath.indexOf("!");
    checkValidjarFilePath(protocolIndex, jarPathEndIndex);
    return classPath.subSequence(protocolIndex +1,
                                 jarPathEndIndex).toString();
  }

  /**
   * For a given jarFile, retrieve data from files matching 
   *     the regex pattern string.
   *
   * @param jarFile A jarfile of interest.
   * @param patternString A regex pattern to match against names of jar files.
   * @return List A collection of data/content from files in the jar
   *      that matched the patternString. One entry per file.
   * @throws IOException
   */
  List retrieveMatches(File jarFile, String patternString) 
       throws IOException {
    checkMatchArguments(jarFile, patternString);
    Pattern pattern = Pattern.compile(patternString);
    JarFile jar = new JarFile(jarFile);

    List matchingContent = new ArrayList();
    final Enumeration entries = jar.entries();
    while (entries.hasMoreElements()) {
      JarEntry entry = entries.nextElement();
      Matcher m = pattern.matcher(entry.getName());
      if (m.matches()) {
        matchingContent.add(getContent(entry, jar));
      }
    }
    return matchingContent;
  }

  /**
   * Get the path for a classfile on disk, inside the jar.
   * E.g. file:/home/bob/my.jar!com/google/MyClass.class
   *
   * @param className
   * @return Absolute path to class, with file: protocol, 
   *             jarName and filename.   
   * @throws ClassNotFoundException
   */
  private static String getClassFullPath(String className) 
         throws ClassNotFoundException {
    Class jarClass = Class.forName(className);
    return getResourceFullPath(className+".class", jarClass);
  }

  /**
   * Retrieve a resourceName from the classLoader for a given jarClass.
   *
   * @param resourceName
   * @param jarClass
   * @return
   */
  private static String getResourceFullPath(String resourceName, 
                                            Class jarClass) {
    return jarClass.getClassLoader().getResource(resourceName).getFile();
  }

  private static void checkArgument(String className) {
    if (isEmpty(className)) {
      throw new IllegalArgumentException("Invalid className");
    }
  }

  private static void checkMatchArguments(File file, String pattern) {
    if (isEmpty(pattern) || badFile(file)) {
      throw new IllegalArgumentException("Invalid pattern or jarFile.");
    }
  }

  private static void checkValidjarFilePath(int protocolIndex, 
                                            int jarPathEndIndex) {
    if (protocolIndex == -1 || jarPathEndIndex == -1) {
      throw new IllegalArgumentException("Invalid jarFile path.");
    }
  }

  private static boolean badFile(File jarFile) {
    return jarFile == null || !jarFile.exists();
  }

  private static boolean isEmpty(String pattern) {
    return pattern == null || pattern.length() == 0;
  }
}

Math &Programming Languages 09 Feb 2008 06:01 pm

Project Euler

Project Euler: 188 little programming math problems to amuse you.

Humor &Programming Languages &Science &Software 16 Jan 2008 05:22 pm

Problems Worthy of Attack

I found a great quote in the comments of Diomidis Spinellis’ blog entry Rational Metaprogramming:’Problems worthy of attack prove their worth by fighting back’ — Piet Hein

That’s a great way to put it!

Programming Languages &Software 16 Jan 2008 05:19 am

Vis a tergo: Baseball cards as objects

Rob’s discussing object models for baseball cards. He is evolving a design bit-by-bit.

My first question is what use cases do you want for your database of baseball cards? Searching real stats for the season? Or just searching for cards? To me, that changes the way I’d lay it out.

To play along, here is a primitive set of plain old Ruby objects to model what he has so far. I thought it might be interesting to contrast the two.

OK, well, I lied, it is a little different, because I went ahead and subclassed for the variant card type.

class Card
  attr_reader :year, :number
  
  def initialize(number, year)
    @number = number
    @year = year
  end
end

class PlayerCard < Card
  attr_reader :player
  
  def initialize(number, year, player)
    super(number, year)
    @player = player    
  end
end

class BattingLeadersCard < Card
  attr_reader :top_50_batting_players # this is a list of players
  
  def initialize(number, year, top_50_batting_players)
    super(number, year)
    @top_50_batting_players = top_50_batting_players
  end

  def top_3
	top_50_batting_players[0..2]
  end
end


class Player
  attr_reader :name, :team
  
  def initialize(name, team)
    @name = name
    @team = team    
  end
end

History &Language &Philosophy &Politics &Programming Languages &Science &Software 02 Jan 2008 08:23 pm

Reading List that Inspired Smalltalk

Squeakland has a list put together by Alan Kay for his students that gives background on the ideas behind Smalltalk.

It looks like quite an interesting list.

Programming Languages &Science &Software 29 Nov 2007 04:32 pm

Comments on “A Program is an Idea”

“A computer is like a violin. You can imagine a novice trying first a phonograph and then a violin. The latter, he says, sounds terrible.”

Eugene Wallingford comments on comments received on his recent entry, ‘A Program is an Idea.’

This is timely given recent conversations I’ve had about getting kids interested in programming. Mark Guzdial’s recent discussion of introducing programming is relevant as well.

(Via Knowing and Doing.)

Programming Languages &Science 16 Nov 2007 02:08 pm

Ruby used on the Enterprise RubyWorks overview – ThoughtWorks Studios

RubyWorks at ThoughtWorks Studios
: “More than forty percent of ThoughtWorks’ new consulting projects in the U.S are now developed using Ruby on Rails.”

Sweet. I am not surprised at all.

Language &Philosophy &Programming Languages &Software 12 Nov 2007 11:35 am

Document the Difficult Stuff, not the Easy: Rails Tutorials

I found Rails Tutorials pretty handy. They are short and sweet focusing on only one particularly problem per tutorial. I also found some good discussion in the comments of each tutorial. With their help, I finally got my head around how to deal with creating complex relationships of objects and dependent objects so that they are put in the database atomically, and so that they are available for error correction on the form if there were validation errors.

A big part of the problem I was having is that there are a million tutorials on the simple stuff in Rails and ActiveRecord, but very few on doing real work. There are many features of the Rails framework that I have just happened across accidentally, or that I had to scour the net and books looking for examples.

The one particular bug, err… feature, that has been haunting me is how to create an object that has a complex join model, and have all the other objects in that model get created automatically.

It is obvious enough for me to do it by hand. For example, creating a Registration for multiple People for multiple Courses, with other related information like Payment choices and related details to the Registration. When I get back a form, I could pick out each parameter set in my form fields. I could then create each object, Registration, People objects, Payment details, etc.. and then set them on the Registration object.

@registration = Registration.new(params[:registration])
new_payment = Payment.new(params[:payment])
@registration.payment = new_payment;
@registration.people = params[:people].collect { |[], person| Person.new(person) }
@registration.save

But rails should be able to do all this automatically since I already specified the relationships in the model objects. Well, turns out it can. You can use the build method on the model objects.

params[:people].each {|person| @registration.people.build(person) }

However, this doesn’t work for has_one or belongs_to objects, for those you have to call

@registration.build_payment(params[:payment])

Notice that this seems to be some sort of method_missing magic. I haven’t hunted down where this gets handled, but more troubling to me is that I never knew it even existed. I am glad I found it of course, but it is the first time in Ruby I have had the feeling that dynamic features could be bad.

I know the horror stories of dynamic languages and the claims for static typing and so on, but by and large I have not had this problem with Ruby, and I have probably written about 15-20kloc of Ruby now. Not a great amount, but not a trivial amount either given the expressiveness of the language.

Maybe I should have learned Rails by looking at the test suites. Perhaps within that code I would’ve found examples of build_x and that would have clued me in. I don’t know if those tests exist, but still I think that tutorials can be more efficient communication than test suites.

Of course that depends on what tutorials get written. My Request:

When writing tutorials for programmers, just write the tutorials for the hard stuff.

This should be the priority. Assume that programmers can figure out the obvious stuff, and spend your time on explaining the hard, hidden parts of the framework.

Programming Languages &Software 12 Nov 2007 10:29 am

LLVM Tutorials

LLVM Tutorials shows how to build your own language targeting the LLVM. The LLVM is a cross-platform low-level virtual machine that already comes with tools for bytecode analysis and optimization.

I’ll be curious to see how flexible it is for creating dynamic languages in comparison to the JVM.

Programming Languages &Software 16 Oct 2007 05:04 pm

More New Rails Screencasts from RailsCasts.com

Reference for myself mostly.

More New Rails Screencasts from RailsCasts.com: “Ryan Bates is being a total champ in rolling out more and more consistently good Rails related screencasts for free at RailsCasts.com. Some of the latest include:”

(Via Ruby Inside.)

Next Page »