当前位置 : 主页 > 操作系统 > centos >

openGauss索引优化以及虚拟索引

来源:互联网 收集:自由互联 发布时间:2023-08-21
一、 索引推荐 1、测试数据导入 gsql -d database_test -p 26000 -U joe -W Mysql@123456 -rCREATE TABLE tab_ysl_1 (col1 int, col2 int, col3 text);INSERT INTO tab_ysl_1 VALUES(generate_series(1, 3000),generate_series(1, 3000),repeat(

一、索引推荐

1、测试数据导入

gsql -d database_test -p 26000 -U joe -W Mysql@123456 -r
CREATE TABLE tab_ysl_1 (col1 int, col2 int, col3 text);
INSERT INTO tab_ysl_1 VALUES(generate_series(1, 3000),generate_series(1, 3000),repeat( chr(int4(random()*26)+65),4));
ANALYZE tab_ysl_1;
CREATE TABLE tab_ysl_2 (col1 int, col2 int);
INSERT INTO tab_ysl_2 VALUES(generate_series(1, 1000),generate_series(1, 1000));
ANALYZE tab_ysl_2;

2、gs_index_advise函数

1.测试where
SELECT  * FROM gs_index_advise('SELECT * FROM tab_ysl_1 WHERE col1 = 10');
 schema |   table   | column | indextype
--------+-----------+--------+-----------
 joe    | tab_ysl_1 | col1   |
(1 row)

2.测试join
SELECT  * FROM gs_index_advise('SELECT * FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col1 = tab_ysl_2.col1');
 schema |   table   | column | indextype
--------+-----------+--------+-----------
 joe    | tab_ysl_1 | col1   |
 joe    | tab_ysl_2 |        |
(2 rows)

3.测试多表
SELECT  * FROM gs_index_advise('SELECT count(*), tab_ysl_2.col1 FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col2 = 
tab_ysl_2.col2 WHERE tab_ysl_2.col2 > 2 GROUP BY tab_ysl_2.col1 ORDER BY tab_ysl_2.col1');
 schema |   table   | column | indextype
--------+-----------+--------+-----------
 joe    | tab_ysl_1 | col2   |
 joe    | tab_ysl_2 | col1   |
(2 rows)

4.测试order by
SELECT  * FROM gs_index_advise('SELECT *, col2 FROM tab_ysl_1 ORDER BY 1, 3');
 schema |   table   |  column   | indextype
--------+-----------+-----------+-----------
 joe    | tab_ysl_1 | col1,col3 |
(1 row)
SELECT * FROM gs_index_advise('SELECT * FROM tab_ysl_1 WHERE col1 > 10 ORDER BY 1,col2');
 schema |   table   |  column   | indextype
--------+-----------+-----------+-----------
 joe    | tab_ysl_1 | col1,col2 |
(1 row)

5.测试过长字符串
SELECT  * FROM gs_index_advise('SELECT * FROM tab_ysl_1 where col3 in (''aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa'',''bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb'',''cccccccccccccccccccccccccccccccccccccc
c'',''ddddddddddddddddddddddddddddddddddddddd'',''ffffffffffffffffffffffffffffffffffffffff'',''gggggggggggggggggggg
ggggggggggggggggggggggggggggggg'',''ttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt'',''vvv
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv'',''ggmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
mmm'')');
 schema |   table   | column | indextype
--------+-----------+--------+-----------
 joe    | tab_ysl_1 | col3   |
(1 row)

3、Workload级别索引推荐

这种方式可以针对多条SQL,可以将待优化的SQL写到文件里,通过脚本获得推荐索引。
脚本目录在安装目录的bin/dbmind/index_advisor下边,我的目录为
/opt/gaussdb/app/bin/dbmind/index_advisor/index_advisor_workload.py
将待优化的SQL放到文件里
[omm@node1 index_advisor]$ cat 1.sql
SELECT * FROM tab_ysl_1 WHERE col1 = 10;
SELECT count(*), tab_ysl_2.col1 FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col2 = tab_ysl_2.col2 WHERE tab_ysl_2.col2 > 2 GROUP BY tab_ysl_2.col1 ORDER BY tab_ysl_2.col1;
SELECT * FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col1 = tab_ysl_2.col1;

使用如下方式调用脚本,可以批量获取推荐索引,26000为我的数据库端口,ysla为我的数据库名,1.sql为我待优化的SQL存放的文件
[omm@node1 index_advisor]$ pwd
/opt/gaussdb/app/bin/dbmind/index_advisor
[omm@node1 index_advisor]$ python3 ./index_advisor_workload.py 26000 database_test 1.sql
############################################################## Generate candidate indexes
table: tab_ysl_1 columns: col1
table: tab_ysl_1 columns: col2
table: tab_ysl_2 columns: col1
############################################################### Determine optimal indexes
create index ind0 on tab_ysl_1(col1);

4、索引效率查看

1、未优化的情况下
cat 1.sql
SELECT * FROM tab_ysl_1 WHERE col1 = 10;
[omm@gsdb01 ~]$ time gsql -d database_test -p 26000 -U joe -W Mysql@123456 -f 1.sql
 col1 | col2 | col3
------+------+------
   10 |   10 | HHHH
(1 row)
total time: 1  ms
real    0m0.020s
user    0m0.007s
sys     0m0.001s

2、分析并创建索引
[omm@node1 index_advisor]$ python3 ./index_advisor_workload.py 26000 database_test 1.sql
############################################################## Generate candidate indexes
table: tab_ysl_1 columns: col1
############################################################### Determine optimal indexes
create index ind0 on tab_ysl_1(col1);
通过Index-advisor获取推荐索引。并创建索引
gsql -d database_test -p 26000 -U joe -W Mysql@123456 -r -c "create index ind0 on tab_ysl_1(col1);"

[omm@gsdb01 ~]$ time gsql -d database_test -p 26000 -U joe -W Mysql@123456 -f 1.sql
 col1 | col2 | col3
------+------+------
   10 |   10 | HHHH
(1 row)
total time: 0  ms
real    0m0.019s
user    0m0.006s
sys     0m0.002s
#短了一丢丢,可能是数据量太少了,不明显

5、总结

1、未添加索引的查询效率
time gsql -d tpch -p 26000 -c "select * from lineitem where 1_orderkey < 100 and 1_suppkey > 50;"
real 0m4.916s
user 0m0.014s
sys  0m0.001s

2、使用索引推荐函数gs_index_advise获取优化建议
select * from gs_index_advise('select * from lineitem where 1_orderkey < 100 and 1_suppkey > 50;')
table    |  column
lineitem |  (1_orderkey)

3、创建索引
create index idx1 on lineitem(1_orderkey);

4、查看优化结果
time gsql -d tpch -p 26000 -c "select * from lineitem where 1_orderkey < 100 and 1_suppkey > 50;"

单索引推荐:适用于表中数据量大的情况,数据量过小不会进行推荐。
(1)当where中的查询条件只有一个的时候,推荐单一索引,如只有id在where中,只推荐id为索引;当where中的查询条件有多个的时候推荐多重索
引,如id,name在where中被当做条件,则一起被推荐为联合索引,但是如果同时存在id、person_id则默认推荐id( 目前不知道原因)
(2)当query中除了where这个语句时,还存在order by 和 group by等条件时,将where、order by、group by中的属性全部作为联合索引进行
推荐。
(3)使用like模糊查询时或精准查询都不对该属性进行索引建立,并且=属性一定给予索引推荐建立。
(4)当query中的条件过多时存在的属性也超过三个时,依旧推荐联合属性是在三个以上,会不会导致推荐索引过多从而性能下降,这个有待商榷,最
好推荐索引中的属性在三个以内最好。不过这个可能需要通过DRL来学习,判断究竟选择一条query中的哪几个属性来建立索引。

二、虚拟索引

一般在加索引时,会堵塞DML(不过PG支持并发加索引,不堵塞DML) 。只有索引真正能起到优化作用,我们建立索引才是有意义的。虚拟索引是一个很有用的东西,没有副作用,只是虚拟的索引,建立虚拟索引后,可以通过EXPLAIN来查看加索引后的成本估算,判断是否加索引COST会降低。

可以用虚拟索引检验索引的效果,根据效果可选择是否创建真实的索引优化查询。

1、清理之前的索引
gsql -d database_test -p 26000 -U joe -W Mysql@123456 -r
\d+ tab_ysl_1
drop index ind0;
\d+ tab_ysl_1
                        Table "joe.tab_ysl_1"
 Column |  Type   | Modifiers | Storage  | Stats target | Description
--------+---------+-----------+----------+--------------+-------------
 col1   | integer |           | plain    |              |
 col2   | integer |           | plain    |              |
 col3   | text    |           | extended |              |
Has OIDs: no
Options: orientation=row, compression=no

2、#测试建立虚拟索引(hypopg_create_index)
SELECT * FROM hypopg_create_index('CREATE INDEX ON tab_ysl_1(col1)');
 indexrelid |          indexname
------------+-----------------------------
      54410 | <54410>btree_tab_ysl_1_col1
(1 row)

3、显示所有创建的虚拟索引信息(enable_hypo_index)
select * from hypopg_display_index();
          indexname          | indexrelid |   table   | column
-----------------------------+------------+-----------+--------
 <54410>btree_tab_ysl_1_col1 |      54410 | tab_ysl_1 | (col1)
(1 row)

4、执行分析
set enable_hypo_index = on;explain SELECT * FROM tab_ysl_1 WHERE col1 = 100;
SET
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Index Scan using <54410>btree_tab_ysl_1_col1 on tab_ysl_1  (cost=0.00..8.27 rows=1 width=13)
   Index Cond: (col1 = 100)
(2 rows)

5、测试删除指定虚拟索引(hypopg_display_index)
使用函数hypopg_drop_index删除指定oid的虚拟索引
 select * from hypopg_drop_index(54410);
 hypopg_drop_index
-------------------
 t
(1 row)

6、再次查看虚拟索引
database_test=> select * from hypopg_display_index();
 indexname | indexrelid | table | column
-----------+------------+-------+--------
(0 rows)

7、#使用函数hypopg_reset_index一次性清除所有创建的虚拟索引
SELECT * FROM hypopg_reset_index();
 hypopg_reset_index
--------------------
上一篇:Nginx upstream采集
下一篇:没有了
网友评论