silikonsexy.blogg.se - Apache lucene scoreing

#Apache lucene scoreing code#

Pick_best to select the most significant (rarest document frequency) synonym when scoring Dismax(tee,tshirt)with 0 tie factor.Īs_distinct_terms to bias scoring towards documents that contains more synonyms (pants OR slacks). Score(document1) (let’s consider just the tigre^0.9 score component 2.5526304 = weight(title:tigre in 14), result of:Īpache Solr currently supports three different Synonym Query Styles:Īs_same_term (default): to blend terms document frequencies, i.e., SynonymQuery(tshirt,tee) where each term will be treated as equally important independently of their rarity in the corpus of information (blended document frequency) Given a pair where document1 is a search result of query1 Now you are ready to explore the various ways synonyms query expansion works and how boosts are applied Query TimeĪt Query Time the weight you configured for the synonym is going to be used to build a boost query that wraps the synonym.Īt Scoring time this is a multiplicative factor that is applied to the score produced by the synonym match.

by default ‘|’ is used as a separator for the weights, if you prefer any other character, there is a “delimiter” parameter available:

defining a fieldType in the schema.xml that applies the delimitedBoost filter after synonyms are expanded at query time.

Panthera blytheae, oldest|0.5 ancient|0.9 panthera Panthera onca => jaguar|0.95, big cat|0.85, black panther|0.65 Snow leopard, panthera uncia|0.9, big cat|0.8, white_leopard|0.6 Leopard, big cat|0.8, bagheera|0.9, panthera pardus|0.85 defining the synonyms with the associated weight in the synonyms.txt file following the syntax of the delimitedBoost token filter (you can use the managed REST API to do that if you prefer).Apache Solr ConfigurationĮnabling query time weighted synonyms requires two configurations: Solr side, the change affected the Solr base query parser, to be compatible with the synonym query style approach. This makes the contribution usable by both Apache Solr and Elasticsearch(coming soon). – query building, that checks for boost attributes and use them to build boosted queries when present – a new token filter, that is able to extract the weight and store it as a token boost attribute The changes happened mostly Lucene side : This new features will be available with Apache Lucene/Solr 8.5

#Apache lucene scoreing code#

The code review and merge process has been tracked in the Pull Request: The contribution is detailed in the following official Jira issues : This contribution aims to help users that deal with complex synonyms dictionaries where it’s important to associate a numerical weight to each of them, for example to boost the ones that are more important in the domain or closer to the original concept.

Introducing the ability of assigning different weights to synonyms. This blog post is about our latest contribution to the Apache Lucene/Solr project: