一、MySQL LOAD 基本背景
我们在数据库运维过程中难免会涉及到需要对文本数据进行处理,并导入到数据库中,本文整理了一些导入导出时常见的场景进行示例演示。
提示:演示环境MySQL版本的mysql Ver 14.14 Distrib 5.7.32, for linux-glibc2.12 (x86_64) using EditLine wrapper
二、MySQL LOAD 基础参数
文章后续示例均使用以下命令导出的 csv 格式样例数据(以 , 逗号做分隔符,以 " 双引号作为界定符)
测试数据表结构如下:
Create Table: CREATE TABLE `t_menu` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单名称', `parent_id` int(11) DEFAULT '0' COMMENT '父菜单id', `level` int(11) DEFAULT '1' COMMENT '菜单等级,从1开始', `url` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单链接', `icon` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单图标', `order` int(11) DEFAULT NULL COMMENT '同级菜单顺序', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `menu_type` int(3) DEFAULT '2' COMMENT '菜单类型:0:目录,1:页面,2:不区分(兼容老数据)', PRIMARY KEY (`id`), UNIQUE KEY `unique_menu_name_level_parent_id` (`name`,`level`,`parent_id`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=202 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci 1 row in set (0.00 sec)-- 导出基础参数
load data infile '/data/mysql/tmp/b_menu.txt' replace into table `menu.tmp` character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n'; localhost "mgr01" 10:52:02 test01>select * into outfile '/data/mysql/tmp/b_menu.txt' character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' from test01.b_menu limit 10; Query OK, 10 rows affected (0.00 sec) [root@test ~]# cat /data/mysql/tmp/b_menu.txt "1","核心数据指标","30","2","/index",\N,"1","2019-06-19 19:58:10","2019-10-31 20:27:37","1" "2","拍机数据","29","2","/auction-dashboard",\N,"1","2019-06-19 19:58:24","2019-10-24 20:21:36","1" "3","产品滞留数据","31","2","/product-dashboard",\N,"1","2019-06-19 19:58:42","2019-10-24 20:21:36","1" "4","发货数据","42","3","/product-data",\N,"1","2019-08-29 17:44:35","2019-11-18 17:22:29","1" "6","退租数据","14","2","/tuizushuju","","3","2019-09-25 19:05:47","2019-11-18 17:23:40","1" "7","呆滞数据","14","2","/daizhishuju","","2","2019-09-25 19:12:29","2019-11-18 17:23:40","1" "10","发货数据明细","14","2","/shujumingxi","","4","2019-09-25 19:15:37","2019-11-18 17:23:40","1" "12","增率统计","32","3","/branch-dashboard",\N,"1","2019-09-26 21:23:16","2020-01-15 21:03:38","1" "13","增率详细","32","3","/customer-dashboard",\N,"2","2019-09-26 21:23:46","2020-01-15 21:03:38","1" "14","产品部数据","0","1","/svn7kezaqe9","","5","2019-09-29 21:58:09","2020-07-28 21:18:50","0"创建测试临时表 menu.tmp:
CREATE TABLE `menu.tmp` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单名称', `parent_id` int(11) DEFAULT '0' COMMENT '父菜单id', `level` int(11) DEFAULT '1' COMMENT '菜单等级,从1开始', `url` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单链接', `icon` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单图标', `order` int(11) DEFAULT NULL COMMENT '同级菜单顺序', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `menu_type` int(3) DEFAULT '2' COMMENT '菜单类型:0:目录,1:页面,2:不区分(兼容老数据)', PRIMARY KEY (`id`), UNIQUE KEY `unique_menu_name_level_parent_id` (`name`,`level`,`parent_id`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; localhost "mgr01" 10:59:07 test01>load data infile '/data/mysql/tmp/b_menu.txt' replace into table `menu.tmp` character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n'; Query OK, 10 rows affected (0.03 sec) Records: 10 Deleted: 0 Skipped: 0 Warnings: 0 localhost "mgr01" 10:59:17 test01> localhost "mgr01" 11:00:28 test01>select * from `menu.tmp`; +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ | id | name | parent_id | level | url | icon | order | create_time | update_time | menu_type | +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ | 1 | 核心数指标 | 30 | 2 | /index | NULL | 1 | 2019-06-19 19:58:10 | 2019-10-31 20:27:37 | 1 | | 2 | 易机数据 | 29 | 2 | /auction-dashboard | NULL | 1 | 2019-06-19 19:58:24 | 2019-10-24 20:21:36 | 1 | | 3 | 产品滞留数据 | 31 | 2 | /product-dashboard | NULL | 1 | 2019-06-19 19:58:42 | 2019-10-24 20:21:36 | 1 | | 4 | 发货数据 | 42 | 3 | /product-data | NULL | 1 | 2019-08-29 17:44:35 | 2019-11-18 17:22:29 | 1 | | 6 | 退数据 | 14 | 2 | /tuizushuju | | 3 | 2019-09-25 19:05:47 | 2019-11-18 17:23:40 | 1 | | 7 | 数据 | 14 | 2 | /daizhishuju | | 2 | 2019-09-25 19:12:29 | 2019-11-18 17:23:40 | 1 | | 10 | 数据明细 | 14 | 2 | /shujumingxi | | 4 | 2019-09-25 19:15:37 | 2019-11-18 17:23:40 | 1 | | 12 | 租率统计 | 32 | 3 | /branch-dashboard | NULL | 1 | 2019-09-26 21:23:16 | 2020-01-15 21:03:38 | 1 | | 13 | 租率详细 | 32 | 3 | /customer-dashboard | NULL | 2 | 2019-09-26 21:23:46 | 2020-01-15 21:03:38 | 1 | | 14 | 产品部数据 | 0 | 1 | /svn7kezaqe9 | | 5 | 2019-09-29 21:58:09 | 2020-07-28 21:18:50 | 0 | +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ 10 rows in set (0.00 sec)三、LOAD 场景示例
场景 1. LOAD 文件中的字段比数据表中的字段多
只需要文本文件中部分数据导入到数据表中
临时创建2个字段的表结构:
localhost "mgr01" 11:09:48 test01>create table menu_tmp01 as select id,name,level,url from `menu.tmp`; ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT. localhost "mgr01" 11:00:38 test01>create table `menu.tmp01` select id,name,level,url from `menu.tmp`; ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.原因是MySQL开启了Gtid,导致的:
一般mysql5.7以前版本是支持create table XXX as select * from XXX; 这种创建表的语法,但是MySQL5.7.x版本里面gtid是开启的,会报错ERROR 1786 (HY000):Statement violates GTID consistency: CREATE TABLE ... SELECT.官方说明:https://dev.mysql.com/doc/refman/5.7/en/replication-gtids-restrictions.html
有2种方式关闭MySQL的开启的Gtid: 第一种 直接修改MySQL的my.cnf的配置文件,重启MySQL服务:gtid_mode = offenforce_gtid_consistency = 0
第二种方式就是在线滚动修改参数:
尝试在线动态修改时的报错:
localhost "mgr01" 11:15:36 test01>SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = off; ERROR 1779 (HY000): GTID_MODE = ON requires ENFORCE_GTID_CONSISTENCY = ON. localhost "mgr01" 11:16:49 test01> set global GTID_MODE = off; ERROR 1788 (HY000): The value of @@GLOBAL.GTID_MODE can only be changed one step at a time: OFF <-> OFF_PERMISSIVE <-> ON_PERMISSIVE <-> ON. Also note that this value must be stepped up or down simultaneously on all servers. See the Manual for instructions.上面提示如果当前值为ON,要设置为OFF,则先设置为GTID_MODE=ON_PERMISSIVE,再设置GTID_MODE=OFF_PERMISSIVE,再设置GTID_MODE = off,如果将OFF设置为ON,则反过来设置即可。
继续设置:
localhost "mgr01" 11:25:51 test01>set @@GLOBAL.GTID_MODE=ON_PERMISSIVE; Query OK, 0 rows affected (0.03 sec) localhost "mgr01" 11:25:52 test01>set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE; Query OK, 0 rows affected (0.01 sec)如果set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE;报错时,一般是如下报错:
mysql> set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE; ERROR 1766 (HY000): The system variable gtid_mode cannot be set when there is an ongoing transaction.上面报错,当有正在进行的事务时,不能设置,所以就COMMIT一下:
localhost "mgr01" 11:26:00 test01>commit; Query OK, 0 rows affected (0.00 sec) localhost "mgr01" 11:27:48 test01>set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE; Query OK, 0 rows affected (0.00 sec) localhost "mgr01" 11:28:01 test01>set @@GLOBAL.GTID_MODE=OFF; Query OK, 0 rows affected (0.02 sec) localhost "mgr01" 11:28:19 test01> show variables like 'GTID_MODE'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | gtid_mode | OFF | +---------------+-------+ 1 row in set (0.00 sec)然后再设置SET GLOBAL ENFORCE_GTID_CONSISTENCY = off:
localhost "mgr01" 11:29:03 test01>show variables like 'ENFORCE_GTID_CONSISTENCY'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | enforce_gtid_consistency | OFF | +--------------------------+-------+ **到此时在线把Gtid关闭掉了。**再次执行create table menu_tmp01 as select id,name,level,url from menu.tmp;create table menu_tmp02 select id,name,level,url from menu.tmp;
localhost "mgr01" 11:29:17 test01>create table menu_tmp01 as select id,name,level,url from `menu.tmp`; Query OK, 10 rows affected (0.04 sec) Records: 10 Duplicates: 0 Warnings: 0 localhost "mgr01" 11:30:10 test01>desc menu_tmp01; +-------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+-------+ | id | int(11) | NO | | 0 | | | name | varchar(255) | YES | | NULL | | | level | int(11) | YES | | 1 | | | url | varchar(255) | YES | | NULL | | +-------+--------------+------+-----+---------+-------+ 4 rows in set (0.00 sec) localhost "mgr01" 11:30:20 test01>create table menu_tmp02 select id,name,level,url from `menu.tmp`; Query OK, 10 rows affected (0.04 sec) Records: 10 Duplicates: 0 Warnings: 0 localhost "mgr01" 11:30:45 test01>desc menu_tmp02; +-------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+-------+ | id | int(11) | NO | | 0 | | | name | varchar(255) | YES | | NULL | | | level | int(11) | YES | | 1 | | | url | varchar(255) | YES | | NULL | | +-------+--------------+------+-----+---------+-------+ 4 rows in set (0.00 sec) localhost "mgr01" 11:30:50 test01>接着 场景1.LOAD 文件中的字段比数据表中的字段多 ,把只需要文本文件中部分数据导入到数据表中演示
-- 导入数据语句
load data infile '/data/mysql/tmp/b_menu.txt' replace into table test01.menu_tmp01 character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) set id=@C1, name=@C2, level=@C4, url=@C5;导入数据:
load data infile '/data/mysql/tmp/b_menu.txt' replace into table test01.menu_tmp01 character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) -- 该部分对应b_menu.txt件中10列数据 -- 只对导出数据中指定的2个列与表中字段做匹配,mapping关系指定的顺序不影响导入结果 set id=@C1, name=@C2, level=@C4, url=@C5; localhost "mgr01" 11:46:19 test01>load data infile '/data/mysql/tmp/b_menu.txt' -> replace into table test01.menu_tmp01 -> character set utf8mb4 -> fields terminated by ',' -> enclosed by '"' -> lines terminated by '\n' -> (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) -- 该部分对应b_menu.txt件中10列数据 -> -- 只对导出数据中指定的2个列与表中字段做匹配,mapping关系指定的顺序不影响导入结果 -> set id=@C1, -> name=@C2, -> level=@C4, -> url=@C5; Query OK, 10 rows affected (0.01 sec) Records: 10 Deleted: 0 Skipped: 0 Warnings: 0 localhost "mgr01" 11:46:26 test01>select * from menu_tmp01; +----+--------------------+-------+---------------------+ | id | name | level | url | +----+--------------------+-------+---------------------+ | 1 | 核心数据指标 | 2 | /index | | 2 | 易机数据 | 2 | /auction-dashboard | | 3 | 产品滞留数据 | 2 | /product-dashboard | | 4 | 发货数据 | 3 | /product-data | | 6 | 退租数据 | 2 | /tuizushuju | | 7 | 呆滞数据 | 2 | /daizhishuju | | 10 | 发货数据明细 | 2 | /shujumingxi | | 12 | 增率统计 | 3 | /branch-dashboard | | 13 | 增率详细 | 3 | /customer-dashboard | | 14 | 产品部数据 | 1 | /svn7kezaqe9 | +----+--------------------+-------+---------------------+ 10 rows in set (0.00 sec)场景 2. LOAD 文件中的字段比数据表中的字段少
说明:表字段不仅包含文本文件中所有数据,还包含了额外的字段
导出部分MySQL表test01.b_menu部分字段的数据到文本文件:
select id,name,url,create_time into outfile '/data/mysql/tmp/c_menu.txt' character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' from test01.b_menu limit 10; [root@test tmp]# cat /data/mysql/tmp/c_menu.txt "1","核心数据指标","/index","2019-06-19 19:58:10" "2","易机数据","/auction-dashboard","2019-06-19 19:58:24" "3","产品滞留数据","/product-dashboard","2019-06-19 19:58:42" "4","发货数据","/product-data","2019-08-29 17:44:35" "6","退租数据","/tuizushuju","2019-09-25 19:05:47" "7","呆滞数据","/daizhishuju","2019-09-25 19:12:29" "10","发货数据明细","/shujumingxi","2019-09-25 19:15:37" "12","增率统计","/branch-dashboard","2019-09-26 21:23:16" "13","增率详细","/customer-dashboard","2019-09-26 21:23:46" "14","产品部数据","/svn7kezaqe9","2019-09-29 21:58:09"创建测试表a_menu:
CREATE TABLE `a_menu` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单名称', `parent_id` int(11) DEFAULT '0' COMMENT '父菜单id', `level` int(11) DEFAULT '1' COMMENT '菜单等级,从1开始', `url` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单链接', `icon` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单图标', `order` int(11) DEFAULT NULL COMMENT '同级菜单顺序', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `menu_type` int(3) DEFAULT '2' COMMENT '菜单类型:0:目录,1:页面,2:不区分(兼容老数据)', PRIMARY KEY (`id`), UNIQUE KEY `unique_menu_name_level_parent_id` (`name`,`level`,`parent_id`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; load data infile '/data/mysql/tmp/c_menu.txt' replace into table test01.a_menu character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) -- 该部分对应test01.a_menu表中10列字段 -- 只对导出数据中指定的4个列与表中字段做匹配,mapping关系指定的顺序不影响导入结果. a_menu表中多出的字段不做处理,这些表字段以设定的默认值和null来处理 set id=@C1, name=@C2, url=@C3, create_time=@C4; -- 此行set后面的这些@C1 @C2 @C3 @C4 指的是导出文件/data/mysql/tmp/c_menu.txt中的前后顺序的4列数值。下面的sql才是正确的姿势:
localhost "mgr01" 12:50:02 (none)>load data infile '/data/mysql/tmp/c_menu.txt' replace into table test01.a_menu character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) set id=@C1, name=@C2, url=@C3, create_time=@C4; Query OK, 10 rows affected (0.02 sec) Records: 10 Deleted: 0 Skipped: 0 Warnings: 0 localhost "mgr01" 12:50:23 (none)>select * from test01.a_menu; +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ | id | name | parent_id | level | url | icon | order | create_time | update_time | menu_type | +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ | 1 | 核心数据指标 | 0 | 1 | /index | NULL | NULL | 2019-06-19 19:58:10 | 2021-03-27 12:50:23 | 2 | | 2 | 易机数据 | 0 | 1 | /auction-dashboard | NULL | NULL | 2019-06-19 19:58:24 | 2021-03-27 12:50:23 | 2 | | 3 | 产品滞留数据 | 0 | 1 | /product-dashboard | NULL | NULL | 2019-06-19 19:58:42 | 2021-03-27 12:50:23 | 2 | | 4 | 发货数据 | 0 | 1 | /product-data | NULL | NULL | 2019-08-29 17:44:35 | 2021-03-27 12:50:23 | 2 | | 6 | 退租数据 | 0 | 1 | /tuizushuju | NULL | NULL | 2019-09-25 19:05:47 | 2021-03-27 12:50:23 | 2 | | 7 | 呆滞数据 | 0 | 1 | /daizhishuju | NULL | NULL | 2019-09-25 19:12:29 | 2021-03-27 12:50:23 | 2 | | 10 | 发货数据明细 | 0 | 1 | /shujumingxi | NULL | NULL | 2019-09-25 19:15:37 | 2021-03-27 12:50:23 | 2 | | 12 | 增率统计 | 0 | 1 | /branch-dashboard | NULL | NULL | 2019-09-26 21:23:16 | 2021-03-27 12:50:23 | 2 | | 13 | 增率详细 | 0 | 1 | /customer-dashboard | NULL | NULL | 2019-09-26 21:23:46 | 2021-03-27 12:50:23 | 2 | | 14 | 产品部数据 | 0 | 1 | /svn7kezaqe9 | NULL | NULL | 2019-09-29 21:58:09 | 2021-03-27 12:50:23 | 2 | +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ 10 rows in set (0.00 sec)场景 3. LOAD 生成自定义字段数据:
从场景 2 的验证可以看到,emp 表中新增的字段 fullname,modify_date,delete_flag 字段在导入时并未做处理,被置为了 NULL 值,如果需要对其进行处理,可在 LOAD 时通过 MySQL支持的函数 或给定 固定值 自行定义数据,对于文件中存在的字段也可做函数处理,结合导入导出,实现简单的 ETL 功能,如下所示:
-- 导入数据语句
load data infile '/data/mysql/3306/tmp/employees.txt' replace into table demo.emp character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6) -- 该部分对应employees.txt文件中6列数据-- 以下部分明确对表中字段与数据文件中的字段做Mapping关系,不存在的数据通过函数处理生成(也可设置为固定值)
set emp_no=@C1, birth_date=@C2, first_name=upper(@C3), -- 将导入的数据转为大写 last_name=lower(@C4), -- 将导入的数据转为小写 fullname=concat(first_name,' ',last_name), -- 对first_name和last_name做拼接 gender=@C5, hire_date=@C6 , modify_date=now(), -- 生成当前时间数据 delete_flag=if(hire_date<'1988-01-01','Y','N'); -- 对需要生成的值基于某一列做条件运算场景 4. LOAD 定长数据
参考文档:https://mp.weixin.qq.com/s/WNXRshkvC3bFcc5NDaWlrw