finding node in org.w3c.dom document via xpath taking forever and returns null

My XpathUtility class has following method:

public Node findElementByXpath(Document doc, String axpath) throws Exception{
            XPath xPath = XPathFactory.newInstance().newXPath();
            Node node = (Node) xPath.evaluate(axpath, doc, XPathConstants.NODE);
            return node;
        }

in my main I load a org.w3c.dom document and attempt to locate an element via xpath:

XpathUtility xu = new XpathUtility();
Node foundElement= xu.findElementByXpath(domdoc, "/html[1]/body[1]/div[32]/a[1]");

I have checked manually via firebug that element exists using that xpath.

What happens when this code runs: it hangs becomes unresponsive for about 30 seconds and then throws NullPointerException for foundElement.

Answer

An XHTML document is an XML document with a DTD reference, which XML parsers are obliged to download and evaluate in order to properly parse the XML infoset, and the elements are bound to the XHTML namespace.

So, it appears that you have two problems:

  1. The XHTML DTD is taking a really long time to download from the W3C website.

    The W3C servers are slow to return DTDs. Is the delay intentional?

    Yes. Due to various software systems downloading DTDs from our site millions of times a day (despite the caching directives of our servers), we have started to serve DTDs from our site with an artificial delay. Our goals in doing so are to bring more attention to our ongoing issues with excessive DTD traffic, and to protect the stability and response time of the rest of our site.

    You can overcome this by using a local entity resolver that loads a local copy of the DTD, rather than reaching out to the W3C website on every request.

  2. The elements in the document are bound to the XHTML namespace, but you are using an XPath that is matching on the default no-namespace.

    There are several things that you can do to ensure that your XPath matches what you want:

    • Register the XHTML namespace with your XPath engine and adjust your XPath expressions to use the registered XHTML namespace prefix.
    • Use an XPath statement that matches on the XHTML namespace and the local name inside of a predicate filter for a more generic match on elements e.g. /*[local-name()='html' and namespace-uri()='www.w3.org/1999/xhtml/'][1]/*[local-name()='body' and namespace-uri()='www.w3.org/1999/xhtml/'][1]/*[local-name()='div' and namespace-uri()='www.w3.org/1999/xhtml/'][32]/*[local-name()='a' and namespace-uri()='www.w3.org/1999/xhtml/'][1]
    • Use an XPath statement that simply matches on the local name for a more generic match on elements. e.g. /*[local-name()='html'][1]/*[local-name()='body'][1]/*[local-name()='div'][32]/*[local-name()='a'][1]

Leave a Reply

Your email address will not be published. Required fields are marked *