Friday, May 04, 2007

FindBugs - Writing custom detectors (Part 2)

This is the second part of the "Howto write a FindBugs Bug Detector" (see the first part here). To understand why one would write the kind of detector mentioned here, you should read the first part if you do not already know it.

Last time I presented a detector that is able to detect static fields of the types java.util.Calendar and java.text.DateFormat. While declaring fields like this may be suspicious, there is not necessarily something wrong with the code. The real danger comes from calling methods on suchlike fields, especially if they are not synchronized to protect them against concurrent access. So in this article we will extend and improve the existing detector to cope with this problem.

Something to chew on

This a simple class that uses a static Calendar instance. It does not really do much, but the shorter the program, the easier to understand its bytecode (this is what we need to do this time).

import java.util.Calendar;

public class StaticCalendarSample3 {
    public static final Calendar cal = Calendar.getInstance();
    public void testCal() {
        cal.clear(); // this is dangerous
    }
}

What I "like" most about concurrency programs (or multi-threading bugs) is that even the simplest and most harmless looking code can be a source of nasty and hard to find problems. To emphasize this I even marked the cal field final, because that's advice you hear often (e. g. Securing Java, Ch. 7, Rule 3 or Effective Java, Item 13, "Favor immutability" (Google Account required to view)) and for good reason. However in this particular case it will not help.

In order to write a detector that will alert the cal.clear(); line we need to take a look at the method's bytecode, because that is what the detector will have to look at:

ds@yavin:~/... $ javap -c StaticCalendarSample3
Compiled from "StaticCalendarSample3.java"
public class de.bismarck.fb.test.StaticCalendarSample3 extends java.lang.Object{
public static final java.util.Calendar cal;

static {};
  Code:
   0:   invokestatic    #10; //Method java/util/Calendar.getInstance:()Ljava/util/Calendar;
   3:   putstatic       #16; //Field cal:Ljava/util/Calendar;
   6:   return

public de.bismarck.fb.test.StaticCalendarSample3();
  Code:
   0:   aload_0
   1:   invokespecial   #21; //Method java/lang/Object."<init>":()V
   4:   return

public void testCal();
  Code:
   0:   getstatic       #16; //Field cal:Ljava/util/Calendar;
   3:   invokevirtual   #26; //Method java/util/Calendar.clear:()V
   6:   return

}

The javap tool is part of the JDK installation, the -c option outputs bytecode instead of trying to recreate the Java source. Here you see the "machine code" instructions the compiler generated from the .java file. In the static intializer you can see the invocation of Calendar.getInstance() and the subsequent store (putstatic) of the result of that call to a static field which is denoted by #16. javap automatically adds a more human readable comment that allows you to identify the original variable name (cal) and type.

So a possible way to detect a call on a static Calendar is to look for a getstatic that accesses a Calendar or a subclass thereof, immediately followed by an invokevirtual opcode. For a list of all opcodes and their meanings, you can check The Java Virtual Machine Specification. This would work, at least for the simple cases, and in fact the first version of my detector worked liked this. There is a problem with this approach however (it stored the offset of the getstatic call and looked for a invokevirtual call less than 4 counters from there). Have a look at this Java/bytecode:

import java.util.Calendar;

public class StaticCalendarSample3 {
    public static final Calendar cal = Calendar.getInstance();
    public void testCal() {
        Calendar localCal = cal;
        System.out.println("Hello world");
        localCal.clear(); // still the static field!
    }
}
 
Compiled from "StaticCalendarSample3.java"
public class de.bismarck.fb.test.StaticCalendarSample3 extends java.lang.Object{
public static final java.util.Calendar cal;

static {};
  Code:
   0:   invokestatic    #10; //Method java/util/Calendar.getInstance:()Ljava/util/Calendar;
   3:   putstatic       #16; //Field cal:Ljava/util/Calendar;
   6:   return

public de.bismarck.fb.test.StaticCalendarSample3();
  Code:
   0:   aload_0
   1:   invokespecial   #21; //Method java/lang/Object."<init>":()V
   4:   return

public void testCal();
  Code:
   0:   getstatic       #16; //Field cal:Ljava/util/Calendar;
   3:   astore_1
   4:   getstatic       #26; //Field java/lang/System.out:Ljava/io/PrintStream;
   7:   ldc     #32; //String Hello world
   9:   invokevirtual   #34; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   12:  aload_1
   13:  invokevirtual   #40; //Method java/util/Calendar.clear:()V
   16:  return

}

In this example the call to the static field is still there, however it is not as easy to detect, because a reference to it first gets stored in a local variable, then some other code follows. Only then is the Calendar accessed again (12: aload_1), this time from a register into which is was stored earlier (3: astore_1). In this situation the detector will not work.

The documentation on FindBugs is somewhat sketchy, but with some very kind help from the Findbugs-discuss mailing list, including some very helpful sample code, I managed to get it more reliable. These are the things that need to be done to generate a bug warning:

  • locate a invokevirtual call
  • determine if it is directed at a type of Calendar/DateFormat or a subclass thereof
  • if so, check if it corresponds to a static field
  • if it does, check if it is outside a synchronization block (assuming that if there is such a block, someone has already thought about multi-threading)

The first two were rather easy:

@Override
public void sawOpcode(int seen) {
 // we are only interested in method calls
 if (seen != INVOKEVIRTUAL) {
  return;
 }

 // determine type of the object the method is invoked on
 ObjectType tType = ObjectTypeFactory.getInstance(getClassConstantOperand());
  // if it is not compatible with Calendar or DateFormat, we are not
 // interested anymore
 if (!tType.subclassOf(calendarType) && !tType.subclassOf(dateFormatType)) {
  return;
 }
 ...

While the first two are rather easy, Bill Pugh provided valuable information for the third one: There is a class called the OpcodeStack which allows to identify the item a method was called on by filtering out the "superfluous" code in between 0: getstatic #16 and 13: invokevirtual #40 in the example above. In order to use it, your detector should subclass OpcodeStackDetector which in turn inherits from BytecodeScanningDetector. As this was the base class of our detector before, too, we can simply change our parent class. This enables allows for use of the OpcodeStack automatically. Have a look at the implementation for details. I am glad I did not have to figure that out myself :)

The code using it looks like this:

 ...
 // determine the number of arguments the method expects
 int numArguments = getNumberArguments(getSigConstantOperand());
 // go back on the stack to find what the receiver of the method is
 OpcodeStack.Item invokedOn = stack.getStackItem(numArguments);
 XField field = invokedOn.getXField();
 // find out, if the field is static. if not, we are not interested
 // anymore
 if (field == null || !field.isStatic()) {
  return;
 }
 ...

As for the synchronization check, David Hovemeyer provided me with the neccessary pointers and some pseudocode which resulted in the following:

 ...
 try {
  if (currentMethod != null && currentLockDataFlow != null && currentCFG != null) {
   Collection<Location> tLocations = currentCFG.getLocationsContainingInstructionWithOffset(getPC());
   for (Location tLoc : tLocations) {
    LockSet lockSet = currentLockDataFlow.getFactAtLocation(tLoc);
    if (lockSet.getNumLockedObjects() > 0) {
     // within a synchronized block
     return;
    }
   }
  }
 } catch (DataflowAnalysisException e) {
  reporter.logError("Synchronization check in Static Calendar Detector caught an error.", e);
 }
 ...

I still need to try and find more information about the implementation of the so called CFG - the Control Flow Graph - to fully understand what the LockDataFlow, Location and LockSet classes do in full detail. What you can see however in the code above is, that it does not matter what the synchronisation block uses as its monitor object. This is something that might be worth some further improvements in the future, but for now it should suffice. Probably this is a case where JSR305 might come in handy.

Anyway, this is it. If all the above checks have not caused an early return from the method, we can generate a bug report:

 ...
 // if we get here, we want to generate a report, depending on the type
 String tBugType = null;
 if (tType.subclassOf(calendarType)) {
  tBugType = "STCAL_INVOKE_ON_STATIC_CALENDAR_INSTANCE";
 } else if (tType.subclassOf(dateFormatType)) {
  tBugType = "STCAL_INVOKE_ON_STATIC_DATE_FORMAT_INSTANCE";
 }
 if (tBugType != null) {
  reporter.reportBug(new BugInstance(this, tBugType, NORMAL_PRIORITY)
   .addClassAndMethod(this).addCalledMethod(this)
   .addOptionalField(field).addSourceLine(this));
 }

The whole class is contained in the following listing. It especially contains some more housekeeping methods that take care of the CFG and LockDataFlow instances:

/*
 * FindBugs - Find bugs in Java programs
 * Copyright (C) 2003,2004 University of Maryland
 * 
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 * 
 * This library is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 * 
 * You should have received a copy of the GNU Lesser General Public
 * License along with this library; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */

package edu.umd.cs.findbugs.detect;

import java.text.DateFormat;
import java.util.Calendar;
import java.util.Collection;

import org.apache.bcel.classfile.Field;
import org.apache.bcel.classfile.JavaClass;
import org.apache.bcel.classfile.Method;
import org.apache.bcel.generic.ObjectType;

import edu.umd.cs.findbugs.BugInstance;
import edu.umd.cs.findbugs.BugReporter;
import edu.umd.cs.findbugs.OpcodeStack;
import edu.umd.cs.findbugs.ba.*;
import edu.umd.cs.findbugs.bcel.OpcodeStackDetector;

/**
 * Detector for static fields of type {@link java.util.Calendar} or
 * {@link java.text.DateFormat} and their subclasses. Because {@link Calendar}
 * is unsafe for multithreaded use, static fields look suspicous. To work
 * correctly, all access would need to be synchronized by the client which
 * cannot be guaranteed.
 * 
 * @author Daniel Schneller
 */
public class StaticCalendarDetector extends OpcodeStackDetector {

 /** External Debug flag set? */
 private static final boolean DEBUG = Boolean.getBoolean("debug.staticcal");

 /**
  * External flag to determine whether to skip the test for synchronized
  * blocks (default: if a call on a static Calendar or DateFormat is detected
  * inside a synchronizationb block, it will not be reported). Setting this
  * to <code>true</code> will report method calls on static fields if they
  * are in a synchronized block. As the check currently does not take into
  * account the lock's mutex it may be useful to switch allow
  */
 private static final String PROP_SKIP_SYNCHRONIZED_CHECK = "staticcal.skipsynccheck";

 /** The reporter to report to */
 private BugReporter reporter;

 /** Name of the class being inspected */
 private String currentClass;

 /**
  * {@link ObjectType} for {@link java.util.Calendar}
  */
 private final ObjectType calendarType = ObjectTypeFactory.getInstance("java.util.Calendar");

 /**
  * {@link ObjectType} for {@link java.text.DateFormat}
  */
 private final ObjectType dateFormatType = ObjectTypeFactory.getInstance("java.text.DateFormat");

 /** Stores the current method */
 private Method currentMethod = null;

 /** Stores current Control Flow Graph */
 private CFG currentCFG;

 /** Stores current LDF */
 private LockDataflow currentLockDataFlow;

 /**
  * Creates a new instance of this Detector.
  * 
  * @param aReporter
  *            {@link BugReporter} instance to report found problems to.
  */
 public StaticCalendarDetector(BugReporter aReporter) {
  reporter = aReporter;
 }

 /**
  * Remembers the class name and resets temporary fields.
  */
 @Override
 public void visit(JavaClass someObj) {
  currentClass = someObj.getClassName();
  currentMethod = null;
  currentCFG = null;
  currentLockDataFlow = null;
  super.visit(someObj);
 }

 /**
  * Checks if the visited field is of type {@link Calendar} or
  * {@link DateFormat} or a subclass of either one. If so and the field is
  * static it is suspicious and will be reported.
  */
 @Override
 public void visit(Field aField) {
  super.visit(aField);
  String tFieldSig = aField.getSignature();
  if (aField.getType() instanceof ObjectType) {
   String tBugType = null;
   ObjectType tType = (ObjectType) aField.getType();
   try {
    if (tType.subclassOf(calendarType) && aField.isStatic()) {
     tBugType = "STCAL_STATIC_CALENDAR_INSTANCE";
    } else if (tType.subclassOf(dateFormatType) && aField.isStatic()) {
     tBugType = "STCAL_STATIC_SIMPLE_DATA_FORMAT_INSTANCE";
    }
    if (tBugType != null) {
     reporter.reportBug(new BugInstance(this, tBugType, NORMAL_PRIORITY).addClass(currentClass).addField(
             currentClass, aField.getName(), tFieldSig, true));
    }
   } catch (ClassNotFoundException e) {
    AnalysisContext.reportMissingClass(e);
   }
  }
 }

 /*
  * (non-Javadoc)
  * 
  * @see edu.umd.cs.findbugs.visitclass.BetterVisitor#visitMethod(org.apache.bcel.classfile.Method)
  */
 @Override
 public void visitMethod(Method obj) {
  try {
   super.visitMethod(obj);
   currentMethod = obj;
   currentLockDataFlow = getClassContext().getLockDataflow(currentMethod);
   currentCFG = getClassContext().getCFG(currentMethod);
  } catch (CFGBuilderException e) {
   reporter.logError("Synchronization check in Static Calendar Detector caught an error.", e);
  } catch (DataflowAnalysisException e) {
   reporter.logError("Synchronization check in Static Calendar Detector caught an error.", e);
  }
 }

 /**
  * Checks for method invocations ({@link org.apache.bcel.generic.INVOKEVIRTUAL})
  * call on a static {@link Calendar} or {@link DateFormat} fields. The
  * {@link OpcodeStack} is used to determine if an invocation is done on such
  * a static field.
  * 
  * @param seen
  *            An opcode to be analyzed
  * @see edu.umd.cs.findbugs.visitclass.DismantleBytecode#sawOpcode(int)
  */
 @Override
 public void sawOpcode(int seen) {
  // we are only interested in method calls
  if (seen != INVOKEVIRTUAL) {
   return;
  }

  try {
   // determine type of the object the method is invoked on
   ObjectType tType = ObjectTypeFactory.getInstance(getClassConstantOperand());

   // if it is not compatible with Calendar or DateFormat, we are not
   // interested anymore
   if (!tType.subclassOf(calendarType) && !tType.subclassOf(dateFormatType)) {
    return;
   }

   // determine the number of arguments the method expects
   int numArguments = getNumberArguments(getSigConstantOperand());
   // go back on the stack to find what the receiver of the method is
   OpcodeStack.Item invokedOn = stack.getStackItem(numArguments);
   XField field = invokedOn.getXField();
   // find out, if the field is static. if not, we are not interested
   // anymore
   if (field == null || !field.isStatic()) {
    return;
   }

   if (getNameConstantOperand().equals("equals") && numArguments == 1) {
    OpcodeStack.Item passedAsArgument = stack.getStackItem(0);
    field = passedAsArgument.getXField();
    if (field == null || !field.isStatic()) {
     return;
    }
   }

   if (!Boolean.getBoolean(PROP_SKIP_SYNCHRONIZED_CHECK)) {
    // check synchronization
    try {
     if (currentMethod != null && currentLockDataFlow != null && currentCFG != null) {
      Collection<Location> tLocations = currentCFG.getLocationsContainingInstructionWithOffset(getPC());
      for (Location tLoc : tLocations) {
       LockSet lockSet = currentLockDataFlow.getFactAtLocation(tLoc);
       if (lockSet.getNumLockedObjects() > 0) {
        // within a synchronized block
        return;
       }
      }
     }
    } catch (DataflowAnalysisException e) {
     reporter.logError("Synchronization check in Static Calendar Detector caught an error.", e);
    }
   }

   // if we get here, we want to generate a report, depending on the type
   String tBugType = null;
   if (tType.subclassOf(calendarType)) {
    tBugType = "STCAL_INVOKE_ON_STATIC_CALENDAR_INSTANCE";
   } else if (tType.subclassOf(dateFormatType)) {
    tBugType = "STCAL_INVOKE_ON_STATIC_DATE_FORMAT_INSTANCE";
   }
   if (tBugType != null) {
    reporter.reportBug(new BugInstance(this, tBugType, NORMAL_PRIORITY).addClassAndMethod(this).addCalledMethod(this)
            .addOptionalField(field).addSourceLine(this));
   }
  } catch (ClassNotFoundException e) {
   AnalysisContext.reportMissingClass(e);
  }
 }

}

In order to use the new bug type, the findbugs.xml and messages.xml files have to be completed, too. Add the following lines to them:

findbugs.xml:
  ...
      <BugPattern abbrev="STCAL" type="STCAL_INVOKE_ON_STATIC_CALENDAR_INSTANCE" category="MT_CORRECTNESS" />
      <BugPattern abbrev="STCAL" type="STCAL_INVOKE_ON_STATIC_DATE_FORMAT_INSTANCE" category="MT_CORRECTNESS" />
  ...
  
  
messages.xml
<BugPattern type="STCAL_INVOKE_ON_STATIC_CALENDAR_INSTANCE">
<ShortDescription>Call to static Calendar</ShortDescription>
<LongDescription>Call to method of static java.util.Calendar in {1}</LongDescription>
<Details>
<![CDATA[
<p>Even though the JavaDoc does not contain a hint about it, Calendars are inherently unsafe for multihtreaded use. 
The detector has found a call to an instance of Calendar that has been obtained via a static
field. This looks suspicous.</p>
<p>For more information on this see <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6231579">Sun Bug #6231579</a>
and <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6178997">Sun Bug #6178997</a>.</p>
]]>
</Details>
</BugPattern>

<BugPattern type="STCAL_INVOKE_ON_STATIC_DATE_FORMAT_INSTANCE">
<ShortDescription>Call to static DateFormat</ShortDescription>
<LongDescription>Call to method of static java.text.DateFormat in {1}</LongDescription>
<Details>
<![CDATA[
<p>As the JavaDoc states, DateFormats are inherently unsafe for multithreaded use. 
The detector has found a call to an instance of DateFormat that has been obtained via a static
field. This looks suspicous.</p>
<p>For more information on this see <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6231579">Sun Bug #6231579</a>
and <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6178997">Sun Bug #6178997</a>.</p>
]]>
</Details>
</BugPattern>

When you download the current version of FindBugs either as a ZIP package or from the SVN trunk, it will contain an earlier version of this check, however disabled by default in the findbugs.xml file. I will submit the newer one as a patch to the tracker on SourceForge, so I guess it will be included in one of the later builds.

Some final words

While I am still far from really deeply understanding all the concepts and techniques in FindBugs, I found it relatively easy to get going, even though the primary source of "documentation" is the source code of existing detectors. I was also very happy to get quick, friendly and helpful responses on the mailing list. So I'd really like to encourage anyone who comes across a bug pattern that is not yet detected to give it a shot and try to implement a detector.

The future of FindBugs is open for ideas and discussions, Bill Pugh just yesterday started a FindBugs Google Group where you can take part in discussions and present any ideas you might have :)

2 comments:

Anonymous said...

Can u please develop a plug in or detector to improve xss detection in findbug ...because it can not detc xss error as fortify will do so can u plz implement it and send to me its my academic project.reply me as soon as possible in asishagarwalla@gmail.com

arvind said...

I am modifying one bug detector(SwitchFallthrough) of FindBugs 1.3.7.
In this bugdetector i am adding new bugpattern(SF_SWITCH_NO_DEFAULT)
which is in 1.3.8, i.e. identifying "missing default case" in a
switch.
I changed the SwitchFallthrough.java(added changes from 1.3.8), also
in findbugs.xml & messages.xml as needed.

But while i am deploying the updated findbugs.jar in plugin folder &
running findbugs, its giving me error (An internal error occurred
during: "Finding bugs in [project]...").

Can you tell me why it is giving this error?