Tuesday, December 2, 2014

XPath evaluation performance tweaks

    I have recently played with XPath evaluation in Java. I must admit that default configuration of XPath processor in JDK is really broken. I was able to achieve some really significant performance gains with just a couple of simple tricks.
The easiest possible implementation that can be used for XPath evaluation is similar to:
public class XPathEvaluator {
    
    public static Object query(String xPathExpression, Node document, QName resultType)
    {
        try
        {
            XPath xpath = XPathFactory.newInstance().newXPath();
            XPathExpression expression = xpath.compile(xPathExpression);
            return expression.evaluate(document, resultType);
        }
        catch (XPathExpressionException e)
        {
            throw new IllegalStateException("Error while executing XPath evaluation!", e);
        }
    }
}
I executed that code against some XML documents and it took 13168 ms to complete. Let's try to improve that time a little bit. According to this source, it seems that the default behaviour is really inefficient. Let's modify our code:
public class XPathEvaluator {
    private static final String DTM_MANAGER_NAME = "com.sun.org.apache.xml.internal.dtm.DTMManager";
    private static final String DTM_MANAGER_VALUE = "com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault";
    static
    {
        // performance improvement: https://issues.apache.org/jira/browse/XALANJ-2540
        System.setProperty(DTM_MANAGER_NAME, DTM_MANAGER_VALUE);
    }
    
    public static Object query(String xPathExpression, Node document, QName resultType)
    {
        try
        {
            XPath xpath = XPathFactory.newInstance().newXPath();
            XPathExpression expression = xpath.compile(xPathExpression);
            return expression.evaluate(document, resultType);
        }
        catch (XPathExpressionException e)
        {
            throw new IllegalStateException("Error while executing XPath evaluation!", e);
        }
    }
}
The result of XPath evaluation against the same set of XML documents is promising. It took 5949 ms. Not bad provided such a small change. Let's go further. What about reusing XPathFactory rather than creation of a new one for each request? It should do the trick but we need to keep in mind that XPathFactory is not thread safe and it cannot be accessed from within multiple threads concurrently. On the other hand, our XPathProcessor was intended to be thread safe. We can deal with that using ThreadLocal:
public class XPathEvaluator {
    private static final String DTM_MANAGER_NAME = "com.sun.org.apache.xml.internal.dtm.DTMManager";
    private static final String DTM_MANAGER_VALUE = "com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault";
    static
    {
        // performance improvement: https://issues.apache.org/jira/browse/XALANJ-2540
        System.setProperty(DTM_MANAGER_NAME, DTM_MANAGER_VALUE);
    }
    private static final ThreadLocal<XPathFactory> XPATH_FACTORY = new ThreadLocal<XPathFactory>()
    {
        @Override
        protected XPathFactory initialValue()
        {
            return XPathFactory.newInstance();
        }
    }; 
    
    public static Object query(String xPathExpression, Node document, QName resultType)
    {
        try
        {
            XPath xpath = XPATH_FACTORY.get().newXPath();
            XPathExpression expression = xpath.compile(xPathExpression);
            return expression.evaluate(document, resultType);
        }
        catch (XPathExpressionException e)
        {
            throw new IllegalStateException("Error while executing XPath evaluation!", e);
        }
    }
} 
Thanks to that and that the processing time dropped to 1948 ms. The initial processing time was 13168 ms. It dropped to 1948 ms thanks to a couple of simple tricks. Not bad for me. You can go even further and try to cache compiled XPathExpressions. Have fun!

No comments :

Post a Comment