我正在努力按照我想要的方式在弹性搜索中提升工作量. 假设我有一些索引包含性别,兴趣和年龄的个人资料,让我们说我发现性别匹配最相关,那么兴趣和最不重要的标准就是用户的年龄
假设我有一些索引包含性别,兴趣和年龄的个人资料,让我们说我发现性别匹配最相关,那么兴趣和最不重要的标准就是用户的年龄.我期待下面的查询导致根据刚刚提到的原则对匹配的配置文件进行排序,但是当我执行它时,我首先得到一些男性,然后我得到50岁的女性安娜,然后是喜欢汽车的女性玛丽亚…为什么玛丽亚得分不比安娜高?
{
"query": {
"bool" : {
"should" : [
{ "term" : { "gender" : { "term": "male", "boost": 10.0 } } },
{ "term" : { "likes" : { "term": "cars", "boost" : 5.0 } } },
{ "range" : { "age" : { "from" : 50, "boost" : 1.0 } } }
],
"minimum_number_should_match" : 1
}
}
}
提示将不胜感激,
斯坦
这些是执行的curl命令:
$curl -XPUT http://localhost:9200/users/profile/1 -d '{
"nickname" : "bob",
"gender" : "male",
"age" : 48,
"likes" : "airplanes"
}'
$curl -XPUT http://localhost:9200/users/profile/2 -d '{
"nickname" : "carlos",
"gender" : "male",
"age" : 24,
"likes" : "food"
}'
$curl -XPUT http://localhost:9200/users/profile/3 -d '{
"nickname" : "julio",
"gender" : "male",
"age" : 18,
"likes" : "ladies"
}'
$curl -XPUT http://localhost:9200/users/profile/4 -d '{
"nickname" : "maria",
"gender" : "female",
"age" : 25,
"likes" : "cars"
}'
$curl -XPUT http://localhost:9200/users/profile/5 -d '{
"nickname" : "anna",
"gender" : "female",
"age" : 50,
"likes" : "clothes"
}'
$curl -XGET http://localhost:9200/users/profile/_search -d '{
"query": {
"bool" : {
"should" : [
{ "term" : { "gender" : { "term": "male", "boost": 10.0 } } },
{ "term" : { "likes" : { "term": "cars", "boost" : 5.0 } } },
{ "range" : { "age" : { "from" : 50, "boost" : 1.0 } } }
],
"minimum_number_should_match" : 1
}
}
}'
提升值不是绝对值 – 确定每个术语的相关性为
combined with other factors.
你有两个“性别”(我会假设),但有很多不同的“喜欢”.因此,男性被认为几乎无关紧要,因为它在您的数据中经常发生.但是,汽车可能只会出现几次,因此被认为更具相关性.
此逻辑对于全文搜索很有用,但对于枚举而言则不适用于枚举,枚举旨在用作过滤器.
幸运的是,您可以使用omit_term_freq_and_positions和omit_norms在每个字段的基础上禁用此功能.
尝试按如下方式设置映射:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"test" : {
"properties" : {
"likes" : {
"index" : "not_analyzed",
"omit_term_freq_and_positions" : 1,
"omit_norms" : 1,
"type" : "string"
},
"gender" : {
"index" : "not_analyzed",
"omit_term_freq_and_positions" : 1,
"omit_norms" : 1,
"type" : "string"
},
"age" : {
"type" : "integer"
}
}
}
}
}
'
更新:完整的工作示例:
删除现有索引:
curl -XDELETE 'http://127.0.0.1:9200/users/?pretty=1'
使用新映射创建索引:
curl -XPUT 'http://127.0.0.1:9200/users/?pretty=1' -d '
{
"mappings" : {
"profile" : {
"properties" : {
"likes" : {
"index" : "not_analyzed",
"omit_term_freq_and_positions" : 1,
"type" : "string",
"omit_norms" : 1
},
"age" : {
"type" : "integer"
},
"gender" : {
"index" : "not_analyzed",
"omit_term_freq_and_positions" : 1,
"type" : "string",
"omit_norms" : 1
}
}
}
}
}
'
索引测试文档:
curl -XPOST 'http://127.0.0.1:9200/users/profile/_bulk?pretty=1' -d '
{"index" : {"_id" : 1}}
{"nickname" : "bob", "likes" : "airplanes", "age" : 48, "gender" : "male"}
{"index" : {"_id" : 2}}
{"nickname" : "carlos", "likes" : "food", "age" : 24, "gender" : "male"}
{"index" : {"_id" : 3}}
{"nickname" : "julio", "likes" : "ladies", "age" : 18, "gender" : "male"}
{"index" : {"_id" : 4}}
{"nickname" : "maria", "likes" : "cars", "age" : 25, "gender" : "female"}
{"index" : {"_id" : 5}}
{"nickname" : "anna", "likes" : "clothes", "age" : 50, "gender" : "female"}
'
刷新索引(以确保搜索可见最新文档):
curl -XPOST 'http://127.0.0.1:9200/users/_refresh?pretty=1'
搜索:
curl -XGET 'http://127.0.0.1:9200/users/profile/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"minimum_number_should_match" : 1,
"should" : [
{
"term" : {
"gender" : {
"boost" : 10,
"term" : "male"
}
}
},
{
"term" : {
"likes" : {
"boost" : 5,
"term" : "cars"
}
}
},
{
"range" : {
"age" : {
"boost" : 1,
"from" : 50
}
}
}
]
}
}
}
'
结果:
# {
# "hits" : {
# "hits" : [
# {
# "_source" : {
# "nickname" : "bob",
# "likes" : "airplanes",
# "age" : 48,
# "gender" : "male"
# },
# "_score" : 0.053500723,
# "_index" : "users",
# "_id" : "1",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "carlos",
# "likes" : "food",
# "age" : 24,
# "gender" : "male"
# },
# "_score" : 0.053500723,
# "_index" : "users",
# "_id" : "2",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "julio",
# "likes" : "ladies",
# "age" : 18,
# "gender" : "male"
# },
# "_score" : 0.053500723,
# "_index" : "users",
# "_id" : "3",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "anna",
# "likes" : "clothes",
# "age" : 50,
# "gender" : "female"
# },
# "_score" : 0.029695695,
# "_index" : "users",
# "_id" : "5",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "maria",
# "likes" : "cars",
# "age" : 25,
# "gender" : "female"
# },
# "_score" : 0.015511602,
# "_index" : "users",
# "_id" : "4",
# "_type" : "profile"
# }
# ],
# "max_score" : 0.053500723,
# "total" : 5
# },
# "timed_out" : false,
# "_shards" : {
# "failed" : 0,
# "successful" : 5,
# "total" : 5
# },
# "took" : 4
# }
更新:替代方法
在这里,我提出了一个替代查询,虽然更详细,但为您提供了更可预测的结果.它涉及使用custom filters score query.首先,我们将文档过滤到与至少一个条件匹配的文档.因为我们使用constant score查询,所有文档的初始分数为1.
自定义过滤器分数允许我们在每个文档与过滤器匹配时提升:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"custom_filters_score" : {
"query" : {
"constant_score" : {
"filter" : {
"or" : [
{
"term" : {
"gender" : "male"
}
},
{
"term" : {
"likes" : "cars"
}
},
{
"range" : {
"age" : {
"gte" : 50
}
}
}
]
}
}
},
"score_mode" : "total",
"filters" : [
{
"boost" : "10",
"filter" : {
"term" : {
"gender" : "male"
}
}
},
{
"boost" : "5",
"filter" : {
"term" : {
"likes" : "cars"
}
}
},
{
"boost" : "1",
"filter" : {
"range" : {
"age" : {
"gte" : 50
}
}
}
}
]
}
}
}
'
您将看到与每个文档关联的分数是很好的整数,可以很容易地追溯到匹配的子句:
# [Fri Jun 8 21:30:24 2012] Response:
# {
# "hits" : {
# "hits" : [
# {
# "_source" : {
# "nickname" : "bob",
# "likes" : "airplanes",
# "age" : 48,
# "gender" : "male"
# },
# "_score" : 10,
# "_index" : "users",
# "_id" : "1",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "carlos",
# "likes" : "food",
# "age" : 24,
# "gender" : "male"
# },
# "_score" : 10,
# "_index" : "users",
# "_id" : "2",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "julio",
# "likes" : "ladies",
# "age" : 18,
# "gender" : "male"
# },
# "_score" : 10,
# "_index" : "users",
# "_id" : "3",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "maria",
# "likes" : "cars",
# "age" : 25,
# "gender" : "female"
# },
# "_score" : 5,
# "_index" : "users",
# "_id" : "4",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "anna",
# "likes" : "clothes",
# "age" : 50,
# "gender" : "female"
# },
# "_score" : 1,
# "_index" : "users",
# "_id" : "5",
# "_type" : "profile"
# }
# ],
# "max_score" : 10,
# "total" : 5
# },
# "timed_out" : false,
# "_shards" : {
# "failed" : 0,
# "successful" : 20,
# "total" : 20
# },
# "took" : 6
# }
