当前位置 : 主页 > 操作系统 > centos >

MySQL Load data多种使用方法

来源:互联网 收集:自由互联 发布时间:2022-06-20
一、MySQL LOAD 基本背景 我们在数据库运维过程中难免会涉及到需要对文本数据进行处理,并导入到数据库中,本文整理了一些导入导出时常见的场景进行示例演示。 提示:演示环境My

一、MySQL LOAD 基本背景

我们在数据库运维过程中难免会涉及到需要对文本数据进行处理,并导入到数据库中,本文整理了一些导入导出时常见的场景进行示例演示。

提示:演示环境MySQL版本的mysql Ver 14.14 Distrib 5.7.32, for linux-glibc2.12 (x86_64) using EditLine wrapper

二、MySQL LOAD 基础参数

文章后续示例均使用以下命令导出的 csv 格式样例数据(以 , 逗号做分隔符,以 " 双引号作为界定符)

测试数据表结构如下:

Create Table: CREATE TABLE `t_menu` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单名称', `parent_id` int(11) DEFAULT '0' COMMENT '父菜单id', `level` int(11) DEFAULT '1' COMMENT '菜单等级,从1开始', `url` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单链接', `icon` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单图标', `order` int(11) DEFAULT NULL COMMENT '同级菜单顺序', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `menu_type` int(3) DEFAULT '2' COMMENT '菜单类型:0:目录,1:页面,2:不区分(兼容老数据)', PRIMARY KEY (`id`), UNIQUE KEY `unique_menu_name_level_parent_id` (`name`,`level`,`parent_id`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=202 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci 1 row in set (0.00 sec)

-- 导出基础参数

load data infile '/data/mysql/tmp/b_menu.txt' replace into table `menu.tmp` character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n'; localhost "mgr01" 10:52:02 test01>select * into outfile '/data/mysql/tmp/b_menu.txt' character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' from test01.b_menu limit 10; Query OK, 10 rows affected (0.00 sec) [root@test ~]# cat /data/mysql/tmp/b_menu.txt "1","核心数据指标","30","2","/index",\N,"1","2019-06-19 19:58:10","2019-10-31 20:27:37","1" "2","拍机数据","29","2","/auction-dashboard",\N,"1","2019-06-19 19:58:24","2019-10-24 20:21:36","1" "3","产品滞留数据","31","2","/product-dashboard",\N,"1","2019-06-19 19:58:42","2019-10-24 20:21:36","1" "4","发货数据","42","3","/product-data",\N,"1","2019-08-29 17:44:35","2019-11-18 17:22:29","1" "6","退租数据","14","2","/tuizushuju","","3","2019-09-25 19:05:47","2019-11-18 17:23:40","1" "7","呆滞数据","14","2","/daizhishuju","","2","2019-09-25 19:12:29","2019-11-18 17:23:40","1" "10","发货数据明细","14","2","/shujumingxi","","4","2019-09-25 19:15:37","2019-11-18 17:23:40","1" "12","增率统计","32","3","/branch-dashboard",\N,"1","2019-09-26 21:23:16","2020-01-15 21:03:38","1" "13","增率详细","32","3","/customer-dashboard",\N,"2","2019-09-26 21:23:46","2020-01-15 21:03:38","1" "14","产品部数据","0","1","/svn7kezaqe9","","5","2019-09-29 21:58:09","2020-07-28 21:18:50","0"

创建测试临时表 menu.tmp:

CREATE TABLE `menu.tmp` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单名称', `parent_id` int(11) DEFAULT '0' COMMENT '父菜单id', `level` int(11) DEFAULT '1' COMMENT '菜单等级,从1开始', `url` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单链接', `icon` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单图标', `order` int(11) DEFAULT NULL COMMENT '同级菜单顺序', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `menu_type` int(3) DEFAULT '2' COMMENT '菜单类型:0:目录,1:页面,2:不区分(兼容老数据)', PRIMARY KEY (`id`), UNIQUE KEY `unique_menu_name_level_parent_id` (`name`,`level`,`parent_id`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; localhost "mgr01" 10:59:07 test01>load data infile '/data/mysql/tmp/b_menu.txt' replace into table `menu.tmp` character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n'; Query OK, 10 rows affected (0.03 sec) Records: 10 Deleted: 0 Skipped: 0 Warnings: 0 localhost "mgr01" 10:59:17 test01> localhost "mgr01" 11:00:28 test01>select * from `menu.tmp`; +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ | id | name | parent_id | level | url | icon | order | create_time | update_time | menu_type | +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ | 1 | 核心数指标 | 30 | 2 | /index | NULL | 1 | 2019-06-19 19:58:10 | 2019-10-31 20:27:37 | 1 | | 2 | 易机数据 | 29 | 2 | /auction-dashboard | NULL | 1 | 2019-06-19 19:58:24 | 2019-10-24 20:21:36 | 1 | | 3 | 产品滞留数据 | 31 | 2 | /product-dashboard | NULL | 1 | 2019-06-19 19:58:42 | 2019-10-24 20:21:36 | 1 | | 4 | 发货数据 | 42 | 3 | /product-data | NULL | 1 | 2019-08-29 17:44:35 | 2019-11-18 17:22:29 | 1 | | 6 | 退数据 | 14 | 2 | /tuizushuju | | 3 | 2019-09-25 19:05:47 | 2019-11-18 17:23:40 | 1 | | 7 | 数据 | 14 | 2 | /daizhishuju | | 2 | 2019-09-25 19:12:29 | 2019-11-18 17:23:40 | 1 | | 10 | 数据明细 | 14 | 2 | /shujumingxi | | 4 | 2019-09-25 19:15:37 | 2019-11-18 17:23:40 | 1 | | 12 | 租率统计 | 32 | 3 | /branch-dashboard | NULL | 1 | 2019-09-26 21:23:16 | 2020-01-15 21:03:38 | 1 | | 13 | 租率详细 | 32 | 3 | /customer-dashboard | NULL | 2 | 2019-09-26 21:23:46 | 2020-01-15 21:03:38 | 1 | | 14 | 产品部数据 | 0 | 1 | /svn7kezaqe9 | | 5 | 2019-09-29 21:58:09 | 2020-07-28 21:18:50 | 0 | +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ 10 rows in set (0.00 sec)

三、LOAD 场景示例

场景 1. LOAD 文件中的字段比数据表中的字段多

只需要文本文件中部分数据导入到数据表中

临时创建2个字段的表结构:

localhost "mgr01" 11:09:48 test01>create table menu_tmp01 as select id,name,level,url from `menu.tmp`; ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT. localhost "mgr01" 11:00:38 test01>create table `menu.tmp01` select id,name,level,url from `menu.tmp`; ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.

原因是MySQL开启了Gtid,导致的:

一般mysql5.7以前版本是支持create table XXX as select * from XXX; 这种创建表的语法,但是MySQL5.7.x版本里面gtid是开启的,会报错ERROR 1786 (HY000):Statement violates GTID consistency: CREATE TABLE ... SELECT.官方说明:https://dev.mysql.com/doc/refman/5.7/en/replication-gtids-restrictions.html

有2种方式关闭MySQL的开启的Gtid: 第一种 直接修改MySQL的my.cnf的配置文件,重启MySQL服务:gtid_mode = offenforce_gtid_consistency = 0

第二种方式就是在线滚动修改参数:

尝试在线动态修改时的报错:

localhost "mgr01" 11:15:36 test01>SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = off; ERROR 1779 (HY000): GTID_MODE = ON requires ENFORCE_GTID_CONSISTENCY = ON. localhost "mgr01" 11:16:49 test01> set global GTID_MODE = off; ERROR 1788 (HY000): The value of @@GLOBAL.GTID_MODE can only be changed one step at a time: OFF <-> OFF_PERMISSIVE <-> ON_PERMISSIVE <-> ON. Also note that this value must be stepped up or down simultaneously on all servers. See the Manual for instructions.

上面提示如果当前值为ON,要设置为OFF,则先设置为GTID_MODE=ON_PERMISSIVE,再设置GTID_MODE=OFF_PERMISSIVE,再设置GTID_MODE = off,如果将OFF设置为ON,则反过来设置即可。

继续设置:

localhost "mgr01" 11:25:51 test01>set @@GLOBAL.GTID_MODE=ON_PERMISSIVE; Query OK, 0 rows affected (0.03 sec) localhost "mgr01" 11:25:52 test01>set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE; Query OK, 0 rows affected (0.01 sec)

如果set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE;报错时,一般是如下报错:

mysql> set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE; ERROR 1766 (HY000): The system variable gtid_mode cannot be set when there is an ongoing transaction.

上面报错,当有正在进行的事务时,不能设置,所以就COMMIT一下:

localhost "mgr01" 11:26:00 test01>commit; Query OK, 0 rows affected (0.00 sec) localhost "mgr01" 11:27:48 test01>set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE; Query OK, 0 rows affected (0.00 sec) localhost "mgr01" 11:28:01 test01>set @@GLOBAL.GTID_MODE=OFF; Query OK, 0 rows affected (0.02 sec) localhost "mgr01" 11:28:19 test01> show variables like 'GTID_MODE'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | gtid_mode | OFF | +---------------+-------+ 1 row in set (0.00 sec)

然后再设置SET GLOBAL ENFORCE_GTID_CONSISTENCY = off:

localhost "mgr01" 11:29:03 test01>show variables like 'ENFORCE_GTID_CONSISTENCY'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | enforce_gtid_consistency | OFF | +--------------------------+-------+ **到此时在线把Gtid关闭掉了。**

再次执行create table menu_tmp01 as select id,name,level,url from menu.tmp;create table menu_tmp02 select id,name,level,url from menu.tmp;

localhost "mgr01" 11:29:17 test01>create table menu_tmp01 as select id,name,level,url from `menu.tmp`; Query OK, 10 rows affected (0.04 sec) Records: 10 Duplicates: 0 Warnings: 0 localhost "mgr01" 11:30:10 test01>desc menu_tmp01; +-------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+-------+ | id | int(11) | NO | | 0 | | | name | varchar(255) | YES | | NULL | | | level | int(11) | YES | | 1 | | | url | varchar(255) | YES | | NULL | | +-------+--------------+------+-----+---------+-------+ 4 rows in set (0.00 sec) localhost "mgr01" 11:30:20 test01>create table menu_tmp02 select id,name,level,url from `menu.tmp`; Query OK, 10 rows affected (0.04 sec) Records: 10 Duplicates: 0 Warnings: 0 localhost "mgr01" 11:30:45 test01>desc menu_tmp02; +-------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+-------+ | id | int(11) | NO | | 0 | | | name | varchar(255) | YES | | NULL | | | level | int(11) | YES | | 1 | | | url | varchar(255) | YES | | NULL | | +-------+--------------+------+-----+---------+-------+ 4 rows in set (0.00 sec) localhost "mgr01" 11:30:50 test01>

接着 场景1.LOAD 文件中的字段比数据表中的字段多 ,把只需要文本文件中部分数据导入到数据表中演示

-- 导入数据语句

load data infile '/data/mysql/tmp/b_menu.txt' replace into table test01.menu_tmp01 character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) set id=@C1, name=@C2, level=@C4, url=@C5;

导入数据:

load data infile '/data/mysql/tmp/b_menu.txt' replace into table test01.menu_tmp01 character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) -- 该部分对应b_menu.txt件中10列数据 -- 只对导出数据中指定的2个列与表中字段做匹配,mapping关系指定的顺序不影响导入结果 set id=@C1, name=@C2, level=@C4, url=@C5; localhost "mgr01" 11:46:19 test01>load data infile '/data/mysql/tmp/b_menu.txt' -> replace into table test01.menu_tmp01 -> character set utf8mb4 -> fields terminated by ',' -> enclosed by '"' -> lines terminated by '\n' -> (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) -- 该部分对应b_menu.txt件中10列数据 -> -- 只对导出数据中指定的2个列与表中字段做匹配,mapping关系指定的顺序不影响导入结果 -> set id=@C1, -> name=@C2, -> level=@C4, -> url=@C5; Query OK, 10 rows affected (0.01 sec) Records: 10 Deleted: 0 Skipped: 0 Warnings: 0 localhost "mgr01" 11:46:26 test01>select * from menu_tmp01; +----+--------------------+-------+---------------------+ | id | name | level | url | +----+--------------------+-------+---------------------+ | 1 | 核心数据指标 | 2 | /index | | 2 | 易机数据 | 2 | /auction-dashboard | | 3 | 产品滞留数据 | 2 | /product-dashboard | | 4 | 发货数据 | 3 | /product-data | | 6 | 退租数据 | 2 | /tuizushuju | | 7 | 呆滞数据 | 2 | /daizhishuju | | 10 | 发货数据明细 | 2 | /shujumingxi | | 12 | 增率统计 | 3 | /branch-dashboard | | 13 | 增率详细 | 3 | /customer-dashboard | | 14 | 产品部数据 | 1 | /svn7kezaqe9 | +----+--------------------+-------+---------------------+ 10 rows in set (0.00 sec)

场景 2. LOAD 文件中的字段比数据表中的字段少

说明:表字段不仅包含文本文件中所有数据,还包含了额外的字段

导出部分MySQL表test01.b_menu部分字段的数据到文本文件:

select id,name,url,create_time into outfile '/data/mysql/tmp/c_menu.txt' character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' from test01.b_menu limit 10; [root@test tmp]# cat /data/mysql/tmp/c_menu.txt "1","核心数据指标","/index","2019-06-19 19:58:10" "2","易机数据","/auction-dashboard","2019-06-19 19:58:24" "3","产品滞留数据","/product-dashboard","2019-06-19 19:58:42" "4","发货数据","/product-data","2019-08-29 17:44:35" "6","退租数据","/tuizushuju","2019-09-25 19:05:47" "7","呆滞数据","/daizhishuju","2019-09-25 19:12:29" "10","发货数据明细","/shujumingxi","2019-09-25 19:15:37" "12","增率统计","/branch-dashboard","2019-09-26 21:23:16" "13","增率详细","/customer-dashboard","2019-09-26 21:23:46" "14","产品部数据","/svn7kezaqe9","2019-09-29 21:58:09"

创建测试表a_menu:

CREATE TABLE `a_menu` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单名称', `parent_id` int(11) DEFAULT '0' COMMENT '父菜单id', `level` int(11) DEFAULT '1' COMMENT '菜单等级,从1开始', `url` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单链接', `icon` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单图标', `order` int(11) DEFAULT NULL COMMENT '同级菜单顺序', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `menu_type` int(3) DEFAULT '2' COMMENT '菜单类型:0:目录,1:页面,2:不区分(兼容老数据)', PRIMARY KEY (`id`), UNIQUE KEY `unique_menu_name_level_parent_id` (`name`,`level`,`parent_id`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; load data infile '/data/mysql/tmp/c_menu.txt' replace into table test01.a_menu character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) -- 该部分对应test01.a_menu表中10列字段 -- 只对导出数据中指定的4个列与表中字段做匹配,mapping关系指定的顺序不影响导入结果. a_menu表中多出的字段不做处理,这些表字段以设定的默认值和null来处理 set id=@C1, name=@C2, url=@C3, create_time=@C4; -- 此行set后面的这些@C1 @C2 @C3 @C4 指的是导出文件/data/mysql/tmp/c_menu.txt中的前后顺序的4列数值。

下面的sql才是正确的姿势:

localhost "mgr01" 12:50:02 (none)>load data infile '/data/mysql/tmp/c_menu.txt' replace into table test01.a_menu character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) set id=@C1, name=@C2, url=@C3, create_time=@C4; Query OK, 10 rows affected (0.02 sec) Records: 10 Deleted: 0 Skipped: 0 Warnings: 0 localhost "mgr01" 12:50:23 (none)>select * from test01.a_menu; +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ | id | name | parent_id | level | url | icon | order | create_time | update_time | menu_type | +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ | 1 | 核心数据指标 | 0 | 1 | /index | NULL | NULL | 2019-06-19 19:58:10 | 2021-03-27 12:50:23 | 2 | | 2 | 易机数据 | 0 | 1 | /auction-dashboard | NULL | NULL | 2019-06-19 19:58:24 | 2021-03-27 12:50:23 | 2 | | 3 | 产品滞留数据 | 0 | 1 | /product-dashboard | NULL | NULL | 2019-06-19 19:58:42 | 2021-03-27 12:50:23 | 2 | | 4 | 发货数据 | 0 | 1 | /product-data | NULL | NULL | 2019-08-29 17:44:35 | 2021-03-27 12:50:23 | 2 | | 6 | 退租数据 | 0 | 1 | /tuizushuju | NULL | NULL | 2019-09-25 19:05:47 | 2021-03-27 12:50:23 | 2 | | 7 | 呆滞数据 | 0 | 1 | /daizhishuju | NULL | NULL | 2019-09-25 19:12:29 | 2021-03-27 12:50:23 | 2 | | 10 | 发货数据明细 | 0 | 1 | /shujumingxi | NULL | NULL | 2019-09-25 19:15:37 | 2021-03-27 12:50:23 | 2 | | 12 | 增率统计 | 0 | 1 | /branch-dashboard | NULL | NULL | 2019-09-26 21:23:16 | 2021-03-27 12:50:23 | 2 | | 13 | 增率详细 | 0 | 1 | /customer-dashboard | NULL | NULL | 2019-09-26 21:23:46 | 2021-03-27 12:50:23 | 2 | | 14 | 产品部数据 | 0 | 1 | /svn7kezaqe9 | NULL | NULL | 2019-09-29 21:58:09 | 2021-03-27 12:50:23 | 2 | +----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+ 10 rows in set (0.00 sec)

场景 3. LOAD 生成自定义字段数据:

从场景 2 的验证可以看到,emp 表中新增的字段 fullname,modify_date,delete_flag 字段在导入时并未做处理,被置为了 NULL 值,如果需要对其进行处理,可在 LOAD 时通过 MySQL支持的函数 或给定 固定值 自行定义数据,对于文件中存在的字段也可做函数处理,结合导入导出,实现简单的 ETL 功能,如下所示:

-- 导入数据语句

load data infile '/data/mysql/3306/tmp/employees.txt' replace into table demo.emp character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6) -- 该部分对应employees.txt文件中6列数据

-- 以下部分明确对表中字段与数据文件中的字段做Mapping关系,不存在的数据通过函数处理生成(也可设置为固定值)

set emp_no=@C1, birth_date=@C2, first_name=upper(@C3), -- 将导入的数据转为大写 last_name=lower(@C4), -- 将导入的数据转为小写 fullname=concat(first_name,' ',last_name), -- 对first_name和last_name做拼接 gender=@C5, hire_date=@C6 , modify_date=now(), -- 生成当前时间数据 delete_flag=if(hire_date<'1988-01-01','Y','N'); -- 对需要生成的值基于某一列做条件运算

场景 4. LOAD 定长数据

参考文档:https://mp.weixin.qq.com/s/WNXRshkvC3bFcc5NDaWlrw

五、LOAD 总结

  • 默认情况下导入的顺序以文本文件 列-从左到右,行-从上到下 的顺序导入
  • 如果表结构和文本数据不一致,建议将文本文件中的各列依次顺序编号并与表中字段建立 mapping 关系,以防数据导入到错误的字段
  • 对于待导入的文本文件较大的场景,建议将文件 按行拆分 为多个小文件,如用 split 拆分
  • 对文件导入后建议执行以下语句验证导入的数据是否有 Warning,ERROR 以及导入的数据量GET DIAGNOSTICS @p1=NUMBER,@p2=ROW_COUNT;select @p1 AS ERROR_COUNT,@p2 as ROW_COUNT;
  • 文本文件数据与表结构存在过大的差异或数据需要做清洗转换,建议还是用专业的 ETL 工具或先粗略导入 MySQL 中再进行加工转换处理。
  • 上一篇:shell总结笔记——关于变量
    下一篇:没有了
    网友评论