Lucene query parser to use filters for wildcard queries Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of Lucene query parser to use filters for wildcard queries without wasting too much if your time.

The question is published on by Tutorial Guruji team.

My problem is how to parse wildcard queries with Lucene that the query term is passed through a TokenFilter.

I’m using a a custom Analyzer with several filers (e.g. ASCIIFoldingFilter, but that’s only an example). My problem is that whenever Lucene’s QueryParser detects that one of the sub-queries is a WildcardQuery, it by design [1] ignores the Analyzer.

This means that a query for über is filtered correctly,

über -> uber

but a query for über* (with a wildcard) is not passed through a filter at all:

über* -> über*

Obviously this means – as index-side all tokens are filtered – that there can be no matches on any query containing ü

Q: How do I force Lucene to filter the query for the WildCard queries, too? I’m looking for a way which would at least marginally re-use Lucene’s codebase 😉

Note: As an input I receive a query string, so building queries programmatically is not an option. Note: I’m using Lucene 4.5.1.

[1] http://www.gossamer-threads.com/lists/lucene/java-user/14224

Context:

// analyzer applies filters in Analyzer#createComponents (String, Reader)
Analyzer analyzer = new CustomAnalyzer (Version.LUCENE_45); 

// I'm using org.apache.lucene.queryparser.classic.MultiFieldQueryParser
QueryParser parser = new MultiFieldQueryParser (Version.LUCENE_45, fields, analyzer);
parser.setAllowLeadingWildcard (true);
parser.setMultiTermRewriteMethod (MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);

// actual parsing of the input query
Query query = parser.parse (input);

Answer

Ok, I found a solution: I’m extending QueryParser to override #getWildcardQuery (String, String). This way I can intercept and alter the term after a wildcard query is detected and before it is created:

@Override
protected Query getWildcardQuery (String field, String termStr) throws ParseException
{
    String term = termStr;
    TokenStream stream = null;
    try
    {
        // we want only a single token and we don't want to lose special characters
        stream = new KeywordTokenizer (new StringReader (term));

        stream = new LowerCaseFilter (Version.LUCENE_45, stream);
        stream = new ASCIIFoldingFilter (stream);

        CharTermAttribute charTermAttribute = stream.addAttribute (CharTermAttribute.class);

        stream.reset ();
        while (stream.incrementToken ())
        {
            term = charTermAttribute.toString ();
        }
    }
    catch (IOException e)
    {
        LOGGER.debug ("Failed to filter search query token {}", term, e);
    }
    finally
    {
        IOUtils.closeQuietly (stream);
    }
    return super.getWildcardQuery (field, term);
}

This solution is based on similar questions:

Using a Combination of Wildcards and Stemming

How to get a Token from a Lucene TokenStream?

Note: in my code it’s actually a bit more convoluted to keep all filters in the single location…

I still feel that there should be a better solution, though.

We are here to answer your question about Lucene query parser to use filters for wildcard queries - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji