作者
digoal
日期
2022-05-31
标签
PostgreSQL , 事务号
也许是最后一次讲这个问题, 因为PostgreSQL社区正在将xid从32位改成64位, 未来的PG版本不会因为某些原因未及时freeze导致事务号耗尽需要停库维护. 当然了, 事务号改成64位并不代表未来都不需要freeze了, commit log的推进依然可能需要freeze的动作. (否则需要保留所有commit log来判断事务状态, 所有事务状态都从commit log查询会导致物理IO增加可能导致性能变差).
事务号耗尽前会发生什么? 数据库会发出告警, 就好像我们身体出现问题时会发出告警一样, 例如胃不舒服会胀气、胃痛. - 1、剩余4000万可分配事务号时, 开始警告, 必须执行库级别vacuum freeze.- 2、剩余300万可分配事务号时, 报错, 必须停库, 重启进入单用户模式, 执行集群级别vacuum freeze.
相关代码:
src/backend/access/transam/varsup.c
/* * Allocate the next FullTransactionId for a new transaction or * subtransaction. * * The new XID is also stored into MyProc->xid/ProcGlobal->xids[] before * returning. * * Note: when this is called, we are actually already inside a valid * transaction, since XIDs are now not allocated until the transaction * does something. So it is safe to do a database lookup if we want to * issue a warning about XID wrap. */ FullTransactionId GetNewTransactionId(bool isSubXact) { ...... /*---------- * Check to see if it's safe to assign another XID. This protects against * catastrophic data loss due to XID wraparound. The basic rules are: * * If we're past xidVacLimit, start trying to force autovacuum cycles. * If we're past xidWarnLimit, start issuing warnings. * If we're past xidStopLimit, refuse to execute transactions, unless * we are running in single-user mode (which gives an escape hatch * to the DBA who somehow got past the earlier defenses). * * Note that this coding also appears in GetNewMultiXactId. *---------- */ ...... TransactionId xidWarnLimit = ShmemVariableCache->xidWarnLimit; TransactionId xidStopLimit = ShmemVariableCache->xidStopLimit; TransactionId xidWrapLimit = ShmemVariableCache->xidWrapLimit; Oid oldest_datoid = ShmemVariableCache->oldestXidDB; ...... if (IsUnderPostmaster /* complain even if that DB has disappeared */ if (oldest_datname) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"", oldest_datname), errhint("Stop the postmaster and vacuum that database in single-user mode.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); else ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u", oldest_datoid), errhint("Stop the postmaster and vacuum that database in single-user mode.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); } else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit)) { char *oldest_datname = get_database_name(oldest_datoid); /* complain even if that DB has disappeared */ if (oldest_datname) ereport(WARNING, (errmsg("database \"%s\" must be vacuumed within %u transactions", oldest_datname, xidWrapLimit - xid), errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); else ereport(WARNING, (errmsg("database with OID %u must be vacuumed within %u transactions", oldest_datoid, xidWrapLimit - xid), errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); }
/* * VariableCache is a data structure in shared memory that is used to track * OID and XID assignment state. For largely historical reasons, there is * just one struct with different fields that are protected by different * LWLocks. * * Note: xidWrapLimit and oldestXidDB are not "active" values, but are * used just to generate useful messages when xidWarnLimit or xidStopLimit * are exceeded. */ typedef struct VariableCacheData { ... /* * These fields are protected by XidGenLock. */ FullTransactionId nextXid; /* next XID to assign */ TransactionId oldestXid; /* cluster-wide minimum datfrozenxid */ TransactionId xidVacLimit; /* start forcing autovacuums here */ TransactionId xidWarnLimit; /* start complaining here */ TransactionId xidStopLimit; /* refuse to advance nextXid beyond here */ TransactionId xidWrapLimit; /* where the world ends */ Oid oldestXidDB; /* database with minimum datfrozenxid */ ... /* * The place where we actually get into deep trouble is halfway around * from the oldest potentially-existing XID. (This calculation is * probably off by one or two counts, because the special XIDs reduce the * size of the loop a little bit. But we throw in plenty of slop below, * so it doesn't matter.) */ xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1); if (xidWrapLimit
2、事务号有什么用? - 结合commit log和事务快照, 判断tuple对其他活跃会话的可见性 - 事务分配的先后顺序
3、为什么要擦除(freeze)事务号? - 事务号只有32位(值的上限约40亿), 需要循环使用, 可以理解为需要降低已使用xid的水位. (解释: 把40亿xid分布在一个圆圈上, 以frozen xid为分界点劈成两半(一半即20亿), 以frozen xid为起点顺时针的一半圆内是可分配或已消耗事务号. 随着事务号的分配, 半圆被逐渐消耗, freeze的动作是针对已分配事务的, freeze的目的是擦除xid(使这些事务对应tuple对所有人可见), 从而在圆上移动frozen xid, 保持半圆内一直都有可分配事务号. ) - 被擦除(freeze)事务号的tuple, 对所有事务可见. 因此tuple的xid信息能不能擦出取决于它还有没有可能被用到(判断可见性)?
4、事务号耗尽原因: 4.1、 没有及时擦除(freeze)事务号- 高并发小事务, 消耗事务号快. 在非常极端的情况下(例如硬件性能差, 但是写入事务开启了异步模式. 使得写入可以非常快, 但是擦除(freeze)慢.), 擦除速度可能跟不上. - 建议选择低延迟高带宽的存储- 表太大, 设计不合理. 因为单表(如果是分区表, 指的是最小粒度的分区大小)太大, freeze暂时还不支持单表并行, 所以速度可能比较慢. - 建议采用分区表 - 参数设置不当, 垃圾回收进程睡眠过多, 垃圾回收进程数太少等, 导致freeze慢 - autovacuum_max_workers 太小 - autovacuum_vacuum_cost_delay 太大- 未开启自动回收(请注意这个参数可以全局设置, 也可以表级别设置.) - autovacuum=off - 此时freeze只会发生在表的年龄超过autovacuum_freeze_max_age时, 强制freeze, 但是这个时候可能年龄已经较大, 剩余可分配xid已经不多, 那么有更大的可能性导致在耗尽xid之前都无法完成freeze. - 硬件性能差(IO延迟高、吞吐低), 回收慢 - 建议选择低延迟高带宽的存储- 内存配置不合理, 大表索引多次扫描. 单表的垃圾比较多的情况下, 垃圾记录的行号过多打爆 autovacuum_work_mem, 使得索引需要多次扫描, 影响速度. - 《PostgreSQL 14 preview - 支持 vacuum 强制加速参数 vacuum_failsafe_age , vacuum_multixact_failsafe_age. (跳过vacuum index, 不受delay限制等) 防止 xid wraparound》 - 《PostgreSQL 垃圾回收参数优化之 - maintenance_work_mem , autovacuum_work_mem》 - 调大autovacuum_work_mem或者用分区表解决.
4.2、 存在无法擦除(freeze)的事务号主库 或 开启了hot_standby_feedback的只读从库 存在如下情况时, 可能影响freeze操作擦除某些事务号 : - 存在长事务 , 检查 pg_stat_activity : backend_xid , backend_xmin , xact_start , query_start - 存在未结束的2pc , 检查 pg_prepared_xacts : prepared , database - 存在长sql , 检查 pg_stat_activity : backend_xid , backend_xmin , xact_start , query_start
主库存在如下情况时, 可能影响freeze操作擦除某些事务号: - 未正常消费的 replication slot, 检查 pg_replication_slots. 由于逻辑订阅需要从wal中解析数据逻辑变更, 需要用到当时对应的catalog元数据(例如结构定义数据), 所以不消费或者消费慢的话, catalog的表需要保留比较旧的版本. 妨碍catalog的freeze. - 设置了延迟回收参数, vacuum_defer_cleanup_age. 强制延迟保留多少个垃圾版本.
假设你的数据库已经不允许执行事务, 报错了.
"database is not accepting commands to avoid wraparound data loss in database \"%s\"", oldest_datname), ("Stop the postmaster and vacuum that database in single-user mode.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
你必须进入单用户模式进行修复.
如果你想最快的速度恢复数据库的使用, 可以不要freeze整个集群. 进入单用户, 找出年龄最大的对象(表、物化视图, 包括系统表), 依次 freeze. 集群的年龄下降后即可开启, 然后在正常模式执行freeze.
排序方法:
先找最大年龄的库 age(pg_database.datfrozenxid), age(pg_database.datminmxid) 进入对应库, 找最大年龄的表或者物化视图 age(pg_class.relfrozenxid), age(pg_class.relminmxid) 然后 vacuum freeze 对应TABLE; 差不多了就可以检查数据库的剩余可分配事务是否大于300万;
单用户进入方法:- 《PostgreSQL single-user mode 单用户模式修复 database is not accepting commands to avoid wraparound data loss in database "...." 问题 - usage,like Repair Database》 - 《PostgreSQL 单用户模式修复 系统索引损坏 - single user mode, reindex system》
如果尝试了以上方法, 数据库年龄还是降不下来, 需要分析是不是因为集群中存在未结束的prepare事务或者僵尸slot导致的, 是的话要先要把prepare事务结束, 或把slot干掉(或者推进slot位移). 然后重复以上操作.
slot处理: pg_drop_replication_slot(name) 删除slot pg_replication_slot_advance(slot_name name, upto_lsn pg_lsn) 推进slot位移 2pc事务处理: ROLLBACK PREPARED transaction_id; COMMIT PREPARED transaction_id;
1、硬
提升硬件性能: - 更大内存; - 低RT, 高带宽的存储硬件; - 多CPU核数;
2、软
1、打开snapshot too old参数. old_snapshot_threshold, 根源上杜绝存在无法擦除(freeze)的事务号的问题, 但是有性能副作用, 有对长事务或者2pc事务的伤害. 对slot原因无效.
2、使用分区表. 缩短大表freeze时间.
3、分区, 并设置每个分区的autovacuum_freeze_max_age, 错开freeze风暴. - 《PostgreSQL的"天气预报" - 如何预测Freeze IO风暴》 - 《PostgreSQL freeze 风暴导致的IOPS飙升 - 事后追溯》 - 《PostgreSQL Freeze 风暴预测续 - 珍藏级SQL》
4、调大autovacuum_work_mem内存配置, 避免大表垃圾多时需要多次扫描索引.
5、调大autovacuum_max_workers. 但是也不能太多, 可能对正常业务有影响.
6、调高vacuum的工作频率, 减少休息, 涉及autovacuum_vacuum_cost_delay, autovacuum_vacuum_cost_limit等参数.
7、监控主库和从库的: 年龄、长SQL、长事务、2PC、僵尸SLOT, 尽早发现处理隐患.
8、采用批量提交, 减少事务号的消耗.