Home > Search > Faceting – or drilldown – search using Solr

Faceting – or drilldown – search using Solr

November 26th, 2010

Overview

Faceted searching – also called as drilldown searching – refers to incrementally refining search results by different criteria at each level. Popular e-shopping sites like Amazon and Ebay provide this in their search pages.

Solr has excellent support for faceting. The sections below describe how to use faceting in java applications, using the solrj client API.

 

Steps

Step 1 : Do the first level search and get first level facets

SolrQuery qry = new SolrQuery(strQuery);
String[] fetchFacetFields = new String[]{"categories"};
qry.setFacet(true);
qry.addFacetField(fetchFacetFields);
qry.setIncludeScore(true);
qry.setShowDebugInfo(true);
QueryRequest qryReq = new QueryRequest(qry); 

QueryResponse resp = qryReq.process(solrServer);  

SolrDocumentList results = resp.getResults();
int count = results.size();
System.out.println(count + " hits");
for (int i = 0; i > count; i++) {
    SolrDocument hitDoc = results.get(i);
    System.out.println("#" + (i+1) + ":" + hitDoc.getFieldValue("name"));
    for (Iterator<Entry<String, Object>> flditer = hitDoc.iterator(); flditer.hasNext();) {
        Entry<String, Object> entry = flditer.next();
        System.out.println(entry.getKey() + ": " + entry.getValue());
    }
} 

List<FacetField> facetFields = resp.getFacetFields();
for (int i = 0; i > facetFields.size(); i++) {
    FacetField facetField = facetFields.get(i);
    List<Count> facetInfo = facetField.getValues();
    for (FacetField.Count facetInstance : facetInfo) {
        System.out.println(facetInstance.getName() + " : " + facetInstance.getCount() + " [drilldown qry:" + facetInstance.getAsFilterQuery());
    }
}

 

The response will contain details of number of hits for each instance of the facet.

For example, if the field categories has values movies and songs in the set of matching hits, then each of them is called a facet instance. 

Each facet instance of a FacetField has a name (“songs”), and each has an associated facet instance count and a filter query.

Facet instance count of 10 for “categories:songs” means in the set of all search results, 10 results have the value of categories as songs.

Facet instance filter query is the subquery to go down to the next level of drilldown search, by filtering on the facet instance value.

At this point in a typical drilldown search user interface, the left sidebar with all the filters would display those facet instances that have nonzero instance count with checkboxes and respective counts. User can then select the most promising facet to drilldown along and check its checkbox...

 

Step 2: Add facet filter query for next level of refined results

Add the filter query of facet instance to the main query, using addFilterQuery.

Filter query for single facet instance is of the format "<field>:<value>”. example: addFilterQuery(“categories:movies”);

// filterQueries is a String[] of facet filter queries got using getAsFilterQuery() from previous search
SolrQuery qry = new SolrQuery(strQuery);
if (filterQueries != null) {
    for (String fq : filterQueries) {
        qry.addFilterQuery(fq);
    }
}
qry.setFacet(true);
qry.addFacetField(fetchFacetFields);
qry.setIncludeScore(true);
qry.setShowDebugInfo(true);
QueryRequest qryReq = new QueryRequest(qry);
QueryResponse resp = qryReq.process(solrServer);

For subsequent levels of refinement, add facet instance filter queries to the current level’s main query, and add the list of facet fields required for the next level.

 

Facet filter query syntax

The facet filter queries have some rather intricate syntaxes for achieving various search behaviours, which are described below.

 

Selecting multiple facets

In some drilldown search designs, a user is allowed to specify multiple facet instances for the same field. For example, a categories field may have multiple category facet instances. In such cases, the facet instances should be combined using an OR operator.

Categories [ ]

  Movies (300) [ ]

  Songs (400) [ ]

  Ads (150) []

 

If user selects “Movies” and “songs”, the filter query should have the semantics of an OR operator –

“..where category=movies OR category=songs”.

This can be specified in solr filter queries by enclosing the facet instances inside parentheses:

<fqfield>:(value1 value2 value3…)

examples:

In command line URL :

fq=categories%3A%28songs+movies%29

where %3A is character ‘:‘   , %28 is character ‘(‘ and %29 is character ‘)’

OR, equivalently

In java

qry.addFilterQuery(“categories:(songs movies)”);

Whitespaces in facet instances

If facet instances have whitespaces within them, then multiple facet instances should be specified simply by enclosing them in double quotes (%22).

For example, for a facet field "crn" with facet instances “M.Tech. Computer Sc. & Engg.” and “ELECTRICAL ENGINEERING” (note the whitespaces), the syntax:

In URLs:

fq=crn%3A%28%22M.Tech.+Computer+Sc.+%26+Engg.%22+%22ELECTRICAL+ENGINEERING%22%29

OR

In Java:

qry.addFilterQuery("crn:("M.Tech. Computer Sc. & Engg." "ELECTRICAL ENGINEERING")");

 

 

Handling large number of facet values using pagination

Solr provides pagination for facet values and automatically imposes a limit on the number of values returned for each facet field. This limit can be set using the facet.limit query parameter, or setFacetLimit() API, and the facet value offset can be set using facet.offset query parameter.

However, there is no direct API like setFacetOffset() in SolrJ…instead, use

solrQry.add(FacetParams.FACET_OFFSET, “100”)

 

 

Facet Query vs Filter Query of facet

The Solr API also contains methods that refer to "facet queries". It’s important not to confuse facet queries and filter queries of facets.At first glance, it looks like the facet query concept is what will provide us the drilldown possibility. But not so.

Facet query is a kind of dynamic facet field, applicable only to certain use cases where it makes sense to categorize items in ranges – either numerical or date ranges .

For example, if items have to be categorized into price ranges like [$100-$200], [$200-$300] etc, then facet queries have to be used to “get the count of all items whose price>$100 and price<$200”. Just specifying the price field as a facet field would not be useful here, because it just returns the list of all unique prices available in the search results. What really provides the drilldown capabilities in this case is the facet query concept.

Facet queries are specified using the syntax field:[start TO end]. In URL, it should go in encoded format :

facet.query=age:[20+TO+22]

In API, it’s specified as

solrQuery.addFacetQuery(“age:[20 TO 22]”);

 

Undestanding facet counts

The facet counts are always in the context of the set of search results of main query + filter queries. image



Comments are closed.