Elasticsearch 官方自带的向量搜索压测
in Note with 0 comment
Elasticsearch 官方自带的向量搜索压测
in Note with 0 comment

配置情况

elasticsearch 版本7.14.1,16G JVM

创建 mapping

{
    "feature":{
        "path_match":"feature",
        "mapping":{
            "type":"dense_vector",
            "dims":256
        }
    }
}

query 语句

GET features_self_1w/_search
{
  "size": 1,
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "doc['feature'].size() ==0 ? 0 : cosineSimilarity(params.query_vector, 'feature') + 1.0",
        "params": {
          "query_vector": [
            -0.291748046875,
            0.096435546875,
            -0.11865234375,
            0.107177734375,
            -0.117919921875,
            ...
            ...
            ...
            -0.105712890625,
            -0.01171875
          ]
        }
      }
    }
  }
}

压测结果

features_exact_1w

10并发+10秒
2021-11-12T09:11:39.png

100并发+10秒
2021-11-12T09:11:46.png

features_self_10w

10并发+10秒
2021-11-12T09:11:53.png

100并发+10秒
2021-11-12T09:12:01.png

features_self_50w

10并发+10秒
2021-11-12T09:12:06.png

结论

  1. 10并发,1w量级耗时70~100ms,10w量级耗时700~1000ms,50w量级耗时3s;
  2. 在特征集数据量少的业务场景下,在低并发的场景下,能接受100ms的话就能用,计算是暴力的,耗时随着数据量线性递增。
Responses