SQLite数据库的一些高级特性
- 数据库使用入门
- 多数据库
- 时间处理
- coalesce函数
- case表达式
- with clause
- window functions
- SQL语句中的表连接(Where)和过滤(Where)
- sqlite独有特性
- sqlite作为数据分析工具
- 备份
本文介绍了SQL的一些高级特性(case函数, json操作, 窗口函数,coalesce函数等).
兼容mysql8.0以上版本以及postgresql7.x以上版本. 用sqlite介绍是因为sqlite数据库方便.
希望能够帮助大家对sqlite的特性有更全面的认识, 也对sql的认识会更上一个台阶.
sqlite是一个体积小, 执行效率高, 独立自含, 高可靠性, 全功能的开源数据库.
sqlite是全球使用最广泛的数据库, 实际上已经无所不在, 所有的操作系统windows/mac/linux/Android/iOS, 都预装了sqlite库.
据信平均每台电脑上有12个运行实例. 文件管理器, 浏览器, 微信, qq, 网易云音乐, 百度网盘, office, vscode上都有使用.
查看了以下自己的电脑中: 至少有184个sqite的数据库.
mdfind -name "\.db" | grep "\.db$" | wc -lc
有这么多主要是因为钉钉和微信中有多个slqite库.
sqlite数据库用c语言写成, 是一个库, 不能单独执行, 只为调用它的进程服务. 不存在外部通讯的可能性, 机制上安全.
因为功能单纯, 项目规模不大, 代码只有14万行, 编译后总共只有700k大小.
代码完全开源, 经过充分测试, 覆盖率100%, 经过gcc/llvm/msvc等各种编译器的静态检查, 0编译警告, 具有航空界级别的代码质量.
slqite性能优异, 针对小文件操作, 性能比直接读写文件系统还要快30%. 而且由于直接嵌入在程序内部, 避免了额外的进程通讯和网络调用.
也是一个广大程序员了解不充分的一个数据库. 本文是在sqlite使用过程中形成的笔记.
笔记的初衷是为记录自己的学习和思考, 虽然已经做了增补, 还是会有缺少连贯性, 且代码远远多于文字说明的问题.
数据库使用入门
介绍数据库的基础使用, 数据库语言的语句的构成.
基本的数据库命令, 即创建修改一个表怎, 插入/更新数据.
基础的数据库.命令行, 以及怎如何导入/导出出完整的数据库结构.
设置sqlite命令行的显示样式
修改用户根目录下的 .sqliterc 文件.
.headers on 显示查询结果的表头
.mode column 以column对齐的形式显示查询结果
.nullvalue ¤ 用符号 ¤ 表示null值
.timer on 显示每次查询的时间消耗.
更多的信息可以参考sqlite的帮助 .help
.headers on
.mode column
.nullvalue ¤
.prompt "> \n"
.timer on
语句和注释和关键词
-- 这是一个单行注释
-- 想多写几行也不是不行
/*
如果确实太长, 就得
用这种注释
和C系的注释一致
*/
-- 关键字: select, update, insert, create, drop, and begin
-- 不能用关键词来当标识符
create table table (a, b);
Parse error: near "table": syntax error
create table table (a, b);
^--- error here
-- 此外sqlite不区分大小写
create table a(c1);
CREATE TABLE A(C1);
arse error: table A already exists
CREATE TABLE A(C1);
^--- error here
创建数据库 & sqlite3命令行基础操作
- sqlite3 mydb.sqlite
- sqlite3
- .help [.cmd] 使用.help来查看某个命令的具体帮助
- .schema 查看数据库结构
- .index 查看所有的索引
- .output 指定输出到某个文件, 如果不指定, 输出到控制台
- [filename]: output to file
- [] : output to stdout
- .mode 输出的格式
- list, tabs, csv, html, markdown, qbox. etc
- etc 命令有很多, 上面仅仅是列举一下
创建表
create [temp] table table_name (column_definitions [, constraints]);
create table contacts ( id integer primary key,
name text not null collate nocase,
phone text not null default 'UNKNOWN',
unique (name,phone) );
修改表结构
支持表改名, 增加新字段, 字段重命名, 删除字段.
alter table table { rename to name |
add column column_def |
drop column column_name |
rename column_name to new_column_name};
举例子:
CREATE TABLE contacts ( id integer primary key,
name text not null collate nocase,
phone text not null default 'UNKNOWN',
email text not null default '' collate nocase,
unique (name,phone) );
alter table contacts
add column email text not null default '' collate nocase;
.schema contacts
插入数据/更新数据
-- 支持多行或者单行插入
insert into tbl values (...), (...);
update tbl set .. where ...;
数据查询
select [distinct] heading
from tables
where predicate
group by columns
having predicate
order by columns
limit count, offset;
获得数据库结构
.schema.output schema.sql.read schema.sql
导入/导出数据
- .dump fulldb.sql & .read fulldb.sql
- .backup binary.file
- .restore binary.file
多数据库
sqlite支持同时挂载多个数据库(最多挂载125个).
挂载数据库的语法
ATTACH [DATABASE] database_name;
卸载数据库的语法
DETACH [DATABASE] database_name;
命令行中可以使用.database来进行查询当前已经挂载的数据库
sqlite> .database
seq name file
--- --------------- ----------------------------
0 main /techonthenet/test.sqlite
2 example /techonthenet/example.sqlite
也可以从数据库字典表pragma_database_list中查询
select * from pragma_database_list where name = 'task';
对于已经挂载的数据库中的表, 加上数据库前缀后, 可以和普通的表一样使用, 可以使用跨数据库的join.
select * from tbl, another_db.tbl where tbl.id = another_db.tbl.id;
时间处理
sqlite的数据结构和数据类型. 以及时间类型的各种处理方式和文本转化.
时间类型
要深入了解时间类型, 首先得了解sqlite的数据类型. 数据类型可能是对其他数据库有了解的人最迷惑的地方.
| type | format |
|---------|--------------------------------------------|
| TEXT | "YYYY-MM-DD HH:MM:SS.SSS" |
| REAL | float number of days since Nov 24, 4714 BC |
| Integer | seconds from 1970-01-01 00:00:00 UTC |
时间函数
以下是sqlite支持的还算丰富的时间函数
| function | examples | |-----------------------------------------------------|---------------------| | date(timestring, modifier, modifier, …) | YYYY-MM-DD | | time(timestring, modifier, modifier, …) | HH:MM:SS | | datetime(timestring, modifier, modifier, …) | YYYY-MM-DD HH:MM:SS | | julianday(timestring, modifier, modifier, …) | | | strftime(format, timestring, modifier, modifier, …) | |时间字符串
支持的格式化字符串
| time string | examples | |-------------------------|-------------------------| | YYYY-MM-DD | 2010-12-30 | | YYYY-MM-DD HH:MM | 2010-12-30 12:10 | | YYYY-MM-DD HH:MM:SS.SSS | 2010-12-30 12:10:04.100 | | MM-DD-YYYY HH:MM | 30-12-2010 12:10 | | HH:MM | 12:10 | | YYYY-MM-DDTHH:MM | 2010-12-30 12:10 | | HH:MM:SS | 12:10:01 | | YYYYMMDD HHMMSS | 20101230 121001 | | now | 2013-05-07 |修饰符
支持的修饰符
- NNN days
- NNN hours
- NNN minutes
- NNN.NNNN seconds
- NNN months
- NNN years
- start of month
- start of year
- start of day
- weekday N
- unixepoch
- localtime
- utc
格式化字符串
数据库支持的格式化的字符串
| 格式 | 意义 | |------|-----------------------------------| | %d | 一月中的第几天,01-31 | | %f | 带小数部分的秒,SS.SSS | | %H | 小时,00-23 | | %j | 一年中的第几天,001-366 | | %J | 儒略日数,DDDD.DDDD | | %m | 月,00-12 | | %M | 分,00-59 | | %s | 从 1970-01-01 算起的秒数 | | %S | 秒,00-59 | | %w | 一周中的第几天,0-6 (0 is Sunday) | | %W | 一年中的第几周,01-53 | | %Y | 年,YYYY | | %% | % symbol |日期函数的例子
-- 当前日期 SELECT date('now');-- 当前月份的最后一天: SELECT date('now','start of month','+1 month','-1 day');-- UNIX 时间戳 1092941466 的格林尼治日期和时间: SELECT datetime(1092941466, 'unixepoch');-- UNIX 时间戳 1092941466 本地时区的日期和时间: SELECT datetime(1092941466, 'unixepoch', 'localtime');-- 当前的 UNIX 时间戳: SELECT strftime('%s','now');-- 美国"独立宣言"签署以来的天数: SELECT julianday('now') - julianday('1776-07-04');-- 从 2004 年某一特定时刻以来的秒数: SELECT strftime('%s','now') - strftime('%s','2004-01-01 02:34:56');-- 今年10月份的第一个星期二 select date('now','start of year','+9 months','weekday 2');-- 从 UNIX 纪元算起的以秒为单位的时间(类似 strftime('%s','now') ,不同的是这里有包括小数部分): SELECT (julianday('now') - 2440587.5)*86400.0;-- 在 UTC 与本地时间值之间进行转换,当格式化日期时,使用 utc 或 localtime 修饰符,如下所示: -- utc时间转本地时间 SELECT time('12:00', 'localtime');-- 本地时间转utc时间 SELECT time('12:00', 'utc');
coalesce函数
这个函数在特定场景下能够极大减少冗余的代码.
函数的涵义
返回第一个非空值
SELECT COALESCE(10,20);
SELECT COALESCE(NULL,20,10);
具体举例
CREATE TABLE IF NOT EXISTS memberships (
membership_id INT PRIMARY KEY,
program_name TEXT NOT NULL,
net_price NUMERIC NOT NULL,
discount NUMERIC
);
INSERT INTO memberships(program_name, net_price, discount)
VALUES('1 Month', 100, null),
('3 Months', 300, 10),
('6 Months', 600, 30);
SELECT program_name,
(net_price - discount) AS amount
FROM memberships;
SELECT program_name,
(net_price - coalesce(discount, 0) ) AS amount
FROM memberships;
简化sql
下面通过一个例子, 把一个156行的sql变为了只有10行的例子 https://smallthingssql.com/having-a-less-understood-sql-clause
-- 业务涉及两张表 国家表: country 有 country code, name 两个字段
-- 和发票表 : invoice, 有 invoice number, country code, year, month, item count这么几个字段
-- 以及金额, amount.
select * from country;
ctry_code ctry_name
US USA
CA Canada
CN China
.. ..
rows
1select * from invoice;
invoice_nbr ctry_cd year_nbr month_nbr item_cnt invoice_amt
2014001 BR 2014 3 13 162875
2021172 CA 2021 10 200 1299355
2020435 CN 2020 1 12 145654
2016201 US 2016 8 900 7125125
2021662 US 2021 4 100 800135
2018743 MX 2018 11 5 76124
.. .. .. .. .. ..
4 million rows
-- 对发票信息做如下聚合处理
-- 所有国家和所有年份, 所有USA以外的国家
-- 所有国家, 所有USA以外的国家
-- 所有年份(包括美国数据)
-- 全部合计
-- 类似以下结果
ctry_name year_nbr tot_cnt tot_amt
Canada 2014 200 1300000
Canada 2015 220 1500000
Brazil 2014 200 1150000
Brazil 2015 180 1000000
Brazil 2018 150 750000
.. .. .. ..
Canada null 420 2800000
Brazil null 530 2900000
.. .. .. ..
All countries 2014 900 6406325
All countries 2015 1000 7306368
All countries 2018 1200 8206334
.. .. .. ..
All countries null 6124 41261346
-- 以上是我们预期的结果。它使用6个sql表示6个国家(美国除外)的年度总计,
-- 6个sql表示所有年份的国家总计,1个sql表示所有国家的年度总计,
-- 最后使用1个sql表示总计。
-- 然后将这14个sql与13个UNION操作相结合,如下所示。
-- 各个国家的年合计
select c.ctry_name, i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
where c.ctry_name = 'Mexico'
group by c.ctry_name, i.year_nbr -- grouping by country and year
UNION
....
.... -- USA以外的6个国家需要UNION起来
UNION
-- 国家合计
select c.ctry_name, null as year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
where c.ctry_name = 'Mexico'
group by c.ctry_name -- grouping by just country
UNION
....
.... -- USA以外的6个国家需要UNION起来
UNION
-- 所有国家的年合计
select 'All countries' as ctry_name, i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
group by i.year_nbr -- grouping by just year for all countries
UNION
-- 所有国家的年合计
select 'All countries' as ctry_name, null as year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
-- 此查询返回预期结果,但除了难以维护外,它还缓慢且效率低下。
-- 它从发票表中获取数据14次,以执行14次聚合,然后是一组昂贵的UNION操作。
-- 我们可以编写一个更简单、更高效的查询,更好地了解group和having的能力。
-- 正如where子句允许我们过滤原始数据一样,它允许我们按操作过滤组的结果集。
-- 以下sql使用分组按立方体()生成国家和年份所有组合的汇总,同时过滤美国的汇总行。
-- 可谓是又臭又长且性能不高
This query returns the expected results, but apart from being difficult to maintain, it is also slow and inefficient. It fetches data from the invoice table 14 times to perform
14 aggregations followed by an expensive set of UNION operations.
-- 以下是简化板
select coalesce(c.ctry_name,'All countries') as ctry_name,
i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
group by cube(c.ctry_name, i.year_nbr)
having coalesce(c.ctry_name,'x') != 'USA' -- 排除USA的总结行
What is this query doing? It uses cube with group by to get-
* one row for each country, year combination
* one row for each country (including data for all years)
* one row for each year (including data for all countries)
* one row for all years, all countries
select coalesce(c.ctry_name,'All countries') as ctry_name,
i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
where c.ctry_name != 'USA' -- exclude USA in all summaries
group by cube(c.ctry_name, i.year_nbr)
case表达式
-- nothing is = to NULL
select NULL = NULL;
-----------
¤
select NULL is NULL;
------------
1
-- nullif function takes two arguments and returns null if they have the same values; otherwise, it returns the first argument:
select nullif(1,1);
-----------
¤
select nullif(1,2);
------------
1
create table episodes (
id integer primary key,
season int,
name text );
create table foods(
id integer primary key,
type_id integer,
name text );
create table food_types(
id integer primary key,
name text );
create table foods_episodes(
food_id integer,
episode_id integer );
delete from foods;
insert into foods values (1, 1, '汉堡');
insert into foods values (2, 2, '苹果');
insert into foods values (3, 3, '茅台');
insert into foods values (4, 2, '西瓜');
insert into foods values (5, 4, '海参');
insert into foods values (6, 5, '可乐');
select * from foods;
select name || case type_id
when 5 then ' 是饮料'
when 2 then ' 是水果'
when 1 then ' 是垃圾食品'
when 4 then ' 是海鲜'
else null
end description
from foods
where description is not null
order by name
limit 10;
delete from foods_episodes;
insert into foods_episodes values (1, 1);
insert into foods_episodes values (1, 2);
insert into foods_episodes values (1, 3);
insert into foods_episodes values (1, 4);
insert into foods_episodes values (1, 5);
insert into foods_episodes values (2, 1);
insert into foods_episodes values (2, 2);
insert into foods_episodes values (3, 3);
insert into foods_episodes values (4, 4);
insert into foods_episodes values (5, 1);
select name,(select
case
when count(*) > 4 then '非常多'
when count(*) = 4 then '比较多'
when count(*) in (2,3) then '一般般'
else '比较低'
end
from foods_episodes
where food_id=f.id) frequency
from foods f
where frequency like '%High';
-- json
select json_set('[0,1,2]','$[0]','new');
["new",1,2]
select json_set('[0,1,2]','$[#-0]','new');
[0,1,2,"new"]
select json_set('[0,1,2]','$[#]','new');
[0,1,2,"new"]
select json_object('ex','[52,3.14159]');
{"ex":"[52,3.14159]"}
select json_object('ex',('52,3.14159]'->>'$'));
select json_object('ex',('52,3.14159]'->>'$'));
select json_object('ex',json('[52,3.14159]'));
→ '{"ex":[52,3.14159]}'
select json_object('ex',json_array(52,3.14159));
→ '{"ex":[52,3.14159]}'
select json_object('ex','[52,3.14159]'->'$');
→ '{"ex":[52,3.14159]}'
-- function json
-- str->minified json, remove duplicate
select json(' { "this" : "is", "a": [ "test" ] } ') ;
→ '{"this":"is","a":["test"]}';
-- function json_array
select json_array(1,2,'3',4);
→ '[1,2,"3",4]'
select json_array('[1,2]');
→ '["[1,2]"]'
select json_array(json_array(1,2)) ;
→ '[[1,2]]'
select json_array(1,null,'3','[4,5]','{"six":7.7}') ;
→ '[1,null,"3","[4,5]","{\"six\":7.7}"]'
select json_array(1,null,'3',json('[4,5]'),json('{"six":7.7}'));
→ '[1,null,"3",[4,5],{"six":7.7}]'
-- function The json_array_length()
select json_array_length('[1,2,3,4]');
→ 4
select json_array_length('[1,2,3,4]', '$');
→ 4
select json_array_length('[1,2,3,4]', '$[2]');
→ 0
select json_array_length('{"one":[1,2,3]}');
→ 0
select json_array_length('{"one":[1,2,3]}', '$.one');
→ 3
select json_array_length('{"one":[1,2,3]}', '$.two');
→ NULL
-- json_extract()
select json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$');
→ '{"a":2,"c":[4,5,{"f":7}]}'
select json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c');
→ '[4,5,{"f":7}]'
select json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c[2]');
→ '{"f":7}'
select json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c[2].f');
→ 7
select json_extract('{"a":2,"c":[4,5],"f":7}','$.c','$.a');
→ '[[4,5],2]'
select json_extract('{"a":2,"c":[4,5],"f":7}','$.c[#-1]');
→ 5
select json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.x');
→ NULL
select json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.x', '$.a');
→ '[null,2]'
select json_extract('{"a":"xyz"}', '$.a');
→ 'xyz'
select json_extract('{"a":null}', '$.a');
→ NULL
-- The -> and ->> operators
-- The -> operator returns a JSON representation of the selected subcomponent or NULL if that subcomponent does not exist.
-- The ->> operator returns an SQL TEXT, INTEGER, REAL, or NULL value that represents the selected subcomponent,
-- or NULL if the subcomponent does not exist.
select '{"a":2,"c":[4,5,{"f":7}]}' -> '$'
→ '{"a":2,"c":[4,5,{"f":7}]}'
select '{"a":2,"c":[4,5,{"f":7}]}' -> '$.c'
→ '[4,5,{"f":7}]'
select '{"a":2,"c":[4,5,{"f":7}]}' -> 'c'
→ '[4,5,{"f":7}]'
select '{"a":2,"c":[4,5,{"f":7}]}' -> '$.c[2]'
→ '{"f":7}'
select '{"a":2,"c":[4,5,{"f":7}]}' -> '$.c[2].f'
→ '7'
select '{"a":2,"c":[4,5],"f":7}' -> '$.c[#-1]'
→ '5'
select '{"a":2,"c":[4,5,{"f":7}]}' -> '$.x'
→ NULL
select '[11,22,33,44]' -> 3
→ '44'
select '[11,22,33,44]' ->> 3
→ 44
select '{"a":"xyz"}' -> '$.a'
→ '"xyz"'
select '{"a":"xyz"}' ->> '$.a'
→ 'xyz'
select '{"a":null}' -> '$.a'
→ 'null'
select '{"a":null}' ->> '$.a'
→ NULL
+----------------+------------------------------+---------------------------+
| Function | Overwrite if already exists? | Create if does not exist? |
+----------------+------------------------------+---------------------------+
| json_insert() | No | Yes |
+----------------+------------------------------+---------------------------+
| json_replace() | Yes | No |
+----------------+------------------------------+---------------------------+
| json_set() | Yes | Yes |
+----------------+------------------------------+---------------------------+
-- append to the end of an array
select json_insert('[1,2,3,4]','$[#]',99);
→ '[1,2,3,4,99]'
select json_insert('[1,[2,3],4]','$[1][#]',99;
→ '[1,[2,3,99],4]'
-- json_set/json_insert/json_replace
select json_insert('{"a":2,"c":4}', '$.a', 99);
→ '{"a":2,"c":4}'
select json_insert('{"a":2,"c":4}', '$.e', 99);
→ '{"a":2,"c":4,"e":99}'
select json_replace('{"a":2,"c":4}', '$.a', 99);
→ '{"a":99,"c":4}'
select json_replace('{"a":2,"c":4}', '$.e', 99);
→ '{"a":2,"c":4}'
select json_set('{"a":2,"c":4}', '$.a', 99);
→ '{"a":99,"c":4}'
select json_set('{"a":2,"c":4}', '$.e', 99);
→ '{"a":2,"c":4,"e":99}'
select json_set('{"a":2,"c":4}', '$.c', '[97,96]');
→ '{"a":2,"c":"[97,96]"}'
select json_set('{"a":2,"c":4}', '$.c', json('[97,96]'));
→ '{"a":2,"c":[97,96]}'
select json_set('{"a":2,"c":4}', '$.c', json_array(97,96));
→ '{"a":2,"c":[97,96]}'
-- json_object function
select json_object('a',2,'c',4);
→ '{"a":2,"c":4}'
select json_object('a',2,'c','{e:5}');
→ '{"a":2,"c":"{e:5}"}'
select json_object('a',2,'c',json_object('e',5));
→ '{"a":2,"c":{"e":5}}'
--- The json_patch()
select json_patch('{"a":1,"b":2}','{"c":3,"d":4}');
→ '{"a":1,"b":2,"c":3,"d":4}'
select json_patch('{"a":[1,2],"b":2}','{"a":9}');
→ '{"a":9,"b":2}'
select json_patch('{"a":[1,2],"b":2}','{"a":null}');
→ '{"b":2}'
-- remote element
select json_patch('{"a":1,"b":2}','{"a":9,"b":null,"c":8}');
→ '{"a":9,"c":8}'
select json_patch('{"a":{"x":1,"y":2},"b":3}','{"a":{"y":9},"c":8}');
→ '{"a":{"x":1,"y":9},"b":3,"c":8}'
-- json_type() function
select json_type('{"a":[2,3.5,true,false,null,"x"]}');
→ 'object'
select json_type('{"a":[2,3.5,true,false,null,"x"]}','$');
→ 'object'
select json_type('{"a":[2,3.5,true,false,null,"x"]}','$.a');
→ 'array'
select json_type('{"a":[2,3.5,true,false,null,"x"]}','$.a[0]');
→ 'integer'
select json_type('{"a":[2,3.5,true,false,null,"x"]}','$.a[1]');
→ 'real'
select json_type('{"a":[2,3.5,true,false,null,"x"]}','$.a[2]');
→ 'true'
select json_type('{"a":[2,3.5,true,false,null,"x"]}','$.a[3]');
→ 'false'
select json_type('{"a":[2,3.5,true,false,null,"x"]}','$.a[4]');
→ 'null'
select json_type('{"a":[2,3.5,true,false,null,"x"]}','$.a[5]');
→ 'text'
select json_type('{"a":[2,3.5,true,false,null,"x"]}','$.a[6]');
→ NULL
--The json_valid() function
select json_valid('{"x":35}');
→ 1
select json_valid('{"x":35');
→ 0
-- json_quote()
select json_quote(3.14159);
→ 3.14159
select json_quote('verdant');
→ "verdant"
select json_quote('[1]');
→ '[1]'
select json_quote('[1,');
→ '\"[1\"'
drop table json_tree;
CREATE TABLE json_tree(
key ANY, -- key for current element relative to its parent
value ANY, -- value for the current element
type TEXT, -- 'object','array','string','integer', etc.
atom ANY, -- value for primitive types, null for array & object
id INTEGER, -- integer ID for this element
parent INTEGER, -- integer ID for the parent of this element
fullkey TEXT, -- full path describing the current element
path TEXT, -- path to the container of the current row
json JSON HIDDEN, -- 1st input parameter: the raw JSON
root TEXT HIDDEN -- 2nd input parameter: the PATH at which to start
);
DROP TABLE user;
CREATE TABLE user(name,phone);
insert into user values ("李思思", "[18660236100, 18660236101, 18660236102]");
insert into user values ("张三", "[18660236100, 18660236101, 18660236102]");
insert into user values ("吴秀秀", "18660236100");
select * from user;
SELECT user.name
FROM user, json_each(user.phone)
WHERE json_valid(user.phone)
AND json_each.value LIKE '186%';
SELECT DISTINCT user.name
FROM user, json_each(user.phone)
WHERE json_each.value LIKE '186%';
SELECT name FROM user WHERE phone LIKE '186%'
UNION
SELECT user.name
FROM user, json_each(user.phone)
WHERE json_valid(user.phone)
AND json_each.value LIKE '186%';
DROP TABLE big;
CREATE TABLE big(json JSON);
insert into big values ("[]");
SELECT big.rowid, fullkey, value
FROM big, json_tree(big.json)
WHERE json_tree.type NOT IN ('object','array');
SELECT DISTINCT json_extract(big.json,'$.id')
FROM big, json_tree(big.json, '$.partlist')
WHERE json_tree.key='uuid'
AND json_tree.value='6fa5181e-5721-11e5-a04e-57f3d7b32808';
with clause
也称CTE(common table expression) 有普通和递归两种模式, 普通的CTE创建一个作用域仅为当前查询的临时视图, 而递归模式的的CTE给我们提供了遍历树或者图的能力
非递归
是重构sql的工具
-- QUERY 1 :
drop table emp;
create table emp
( emp_ID int
, emp_NAME varchar(50)
, SALARY int);
insert into emp values(101, 'Mohan', 40000);
insert into emp values(102, 'James', 50000);
insert into emp values(103, 'Robin', 60000);
insert into emp values(104, 'Carol', 70000);
insert into emp values(105, 'Alice', 80000);
insert into emp values(106, 'Jimmy', 90000);
select * from emp;
with avg_sal(avg_salary) as
(select cast(avg(salary) as int) from emp)
select *
from emp e
join avg_sal av on e.salary > av.avg_salary
-- QUERY 2 :
DROP table sales ;
create table sales
(
store_id int,
store_name varchar(50),
product varchar(50),
quantity int,
cost int
);
insert into sales values
(1, 'Apple Originals 1','iPhone 12 Pro', 1, 1000),
(1, 'Apple Originals 1','MacBook pro 13', 3, 2000),
(1, 'Apple Originals 1','AirPods Pro', 2, 280),
(2, 'Apple Originals 2','iPhone 12 Pro', 2, 1000),
(3, 'Apple Originals 3','iPhone 12 Pro', 1, 1000),
(3, 'Apple Originals 3','MacBook pro 13', 1, 2000),
(3, 'Apple Originals 3','MacBook Air', 4, 1100),
(3, 'Apple Originals 3','iPhone 12', 2, 1000),
(3, 'Apple Originals 3','AirPods Pro', 3, 280),
(4, 'Apple Originals 4','iPhone 12 Pro', 2, 1000),
(4, 'Apple Originals 4','MacBook pro 13', 1, 2500);
select * from sales;
-- 每个门店的总销售额
select s.store_id, sum(s.cost) as total_sales_per_store
from sales s
group by s.store_id;
-- 所有门店的总销售额
select cast(avg(total_sales_per_store) as int) avg_sale_for_all_store
from (select s.store_id, sum(s.cost) as total_sales_per_store
from sales s
group by s.store_id) x;
-- 门店中高于平均销售额的
select *
from (select s.store_id, sum(s.cost) as total_sales_per_store
from sales s
group by s.store_id
) total_sales
join (select cast(avg(total_sales_per_store) as int) avg_sale_for_all_store
from (select s.store_id, sum(s.cost) as total_sales_per_store
from sales s
group by s.store_id) x
) avg_sales
on total_sales.total_sales_per_store > avg_sales.avg_sale_for_all_store;
-- 使用with语句重写
WITH total_sales as
(select s.store_id, sum(s.cost) as total_sales_per_store
from sales s
group by s.store_id),
avg_sales as
(select cast(avg(total_sales_per_store) as int) avg_sale_for_all_store
from total_sales)
select *
from total_sales
join avg_sales
on total_sales.total_sales_per_store > avg_sales.avg_sale_for_all_store;
递归
-- 简单例子
-- 使用UNION ALL 避免了去重的步骤, 可以逐条生成结果, 性能上有很大优势,
-- 特别是当条数很多的时候
WITH RECURSIVE
cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<100)
SELECT x FROM cnt;
-- 和上面的sql其实是一样的
-- select 1 和value(1) 是等价的
-- limit保证返回最大条数, 是个好的编程习惯
WITH RECURSIVE
cnt(x) AS (
SELECT 1
UNION ALL
SELECT x+1 FROM cnt
LIMIT 1000000
)
SELECT x FROM cnt;
-- 层级查询
DROP TABLE org;
CREATE TABLE org (
name TEXT PRIMARY KEY,
boss TEXT REFERENCES org,
height INT
);
INSERT INTO org VALUES('Alice',NULL, 190);
INSERT INTO org VALUES('Bob','Alice', 134);
INSERT INTO org VALUES('Cindy','Alice', 129);
INSERT INTO org VALUES('Dave','Bob', 179);
INSERT INTO org VALUES('Emma','Bob', 197);
INSERT INTO org VALUES('Fred','Cindy', 180);
INSERT INTO org VALUES('Gail','Cindy', 169);
-- sqlite CAST 函数
WITH RECURSIVE
works_for_alice(n) AS (
VALUES('Alice')
UNION
SELECT name FROM org, works_for_alice
WHERE org.boss=works_for_alice.n
)
SELECT CAST(avg(height) as int) FROM org
WHERE org.name IN works_for_alice;
-- 家庭关系比上面的组织关系更复杂一些, 因为上级有父母两个人
DROP TABLE family;
DELETE from family;
select * from family;
CREATE TABLE family(
name TEXT PRIMARY KEY,
mom TEXT REFERENCES family,
dad TEXT REFERENCES family,
born DATETIME,
died DATETIME
);
insert into family values ('Alice', 'Lisa', 'Jason', '1970-01-01', null);
insert into family values ('Lisa', 'Vense', 'Vencent', '1950-01-01', null);
insert into family values ('Jason', 'Beautify', 'Lucifer', '1965-01-01', null);
insert into family values ('Beautify', 'Alex', 'Eve', '1905-01-01', '2020-09-09');
insert into family values ('Lucifer', 'Wang', 'Li', '1905-01-01', null);
insert into family values ('Kevin', 'Anlin', 'Yuezhen', '1950-01-01', NULL);
-- 以下sql找出所有Alice健在的先祖
WITH RECURSIVE
parent_of(name, parent) AS
(SELECT name, mom FROM family UNION SELECT name, dad FROM family),
ancestor_of_alice(name) AS
(SELECT parent FROM parent_of WHERE name='Alice'
UNION ALL
SELECT parent FROM parent_of JOIN ancestor_of_alice USING(name))
SELECT family.name FROM ancestor_of_alice, family
WHERE ancestor_of_alice.name=family.name
AND died IS NULL
ORDER BY born;
window functions
以slqite为例, 参照官方文档, 实现了postgres的规范
介绍
语法上, window函数伴随着over语句出现. 如果一个函数有over语句,
它就是一个window function. 如果没有over语句, 函数就只能是一个
aggregate函数或者scalar函数. window 函数还可以带一个过滤语句
有别于普通函数, widnwo函数不能使用 DISTINCT 关键字, 此外, window
函数只能出现在select查询的结果以及order by子句中 WINDOW 子句只能在
HAVING 之后, ORDER BY 之前
CREATE TABLE t0(x INTEGER PRIMARY KEY, y TEXT);
INSERT INTO t0 VALUES (1, 'aaa'), (2, 'ccc'), (3, 'bbb'), (4, 'aaa');
SELECT x, y, row_number() OVER (ORDER BY y) AS row_number
FROM t0 ORDER BY x;
> x y row_number
- --- ----------
1 aaa 1
2 ccc 3
3 bbb 2
SELECT x, y, row_number() OVER win1, rank() OVER win2
FROM t0
WINDOW win1 AS (ORDER BY y RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
win2 AS (PARTITION BY y ORDER BY x)
ORDER BY x;
x y row_number() OVER win1 rank() OVER win2
- --- ---------------------- ----------------
1 aaa 1 1
2 ccc 3 1
3 bbb 2 1
支持的函数
row~number~() over 语句中的顺序, 从1开始, 是连续的数字
rank() 在特定的排序下, rank和同级别的行row~number一致~, 如果没有顺序(没有进行order by), rank返回1, 如果有并列的情况, 会出现排序断层
dense~rank~() 不会出现断层
percent~rank~() 0.0 and 1.0
cume~dist~() 此行据顶的位置0.0 到 1.0
ntile(N) 以N为底的比率, 比如N是100, 则是百分比
DROP TABLE t2; CREATE TABLE t2(a, b); INSERT INTO t2 VALUES('a', 'one'), ('a', 'two'), ('a', 'three'), ('b', 'four'), ('c', 'five'), ('c', 'six'); SELECT a AS a, b AS b, row_number() Over win AS row_num, ntile(2) OVER win AS ntile_2, ntile(4) OVER win AS ntile_4, ntile(100) OVER win AS ntile_100 FROM t2 WINDOW win AS (ORDER BY a);lag(expr)
lag(expr, offset)
lag(expr, offset, default)
lead(expr)
lead(expr, offset)
lead(expr, offset, default)
first~value~(expr)
last~value~(expr)
nth~value~(expr, N)
窗口统计函数
在以下的查询语句中, 一个 window frame 由当前行 + 前一行 + 后一行构成
CREATE TABLE t1(a INTEGER PRIMARY KEY, b, c);
INSERT INTO t1 VALUES (1, 'A', 'one' ),
(2, 'B', 'two' ),
(3, 'C', 'three'),
(4, 'D', 'one' ),
(5, 'E', 'two' ),
(6, 'F', 'three'),
(7, 'G', 'one' );
SELECT a, b, group_concat(b, '.') OVER (
ORDER BY a ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS group_concat FROM t1;
a b group_concat
- - ------------
1 A A.B
2 B A.B.C
3 C B.C.D
4 D C.D.E
5 E D.E.F
6 F E.F.G
7 G F.G
适用于window-fun的统计函数有:
- avg(X)
- count(*)
- count(X)
- group~concat~(X)
- group~concat~(X,Y)
- max(X)
- min(X)
- sum(X)
PARTITION BY 子句
出于统计目的, 查询的结果集会被分为一个或者多个分区(partition), 分区由
PARTITION BY 语句决定. 如果没有分组子句, 结果集中所有的行分为一组.
-- PARTITION BY c 把结果集分为三个分组
SELECT c, a, b, group_concat(b, '.') OVER (
PARTITION BY c ORDER BY a RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS group_concat
FROM t1 ORDER BY c, a;
c a b group_concat
----- - - ------------
one 1 A A.D.G
one 4 D D.G
one 7 G G
three 3 C C.F
three 6 F F
two 2 B B.E
two 5 E E
Run Time: real 0.001 user 0.000553 sys 0.000534
SELECT c, a, b, group_concat(b, '.') OVER (
PARTITION BY c ORDER BY a RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS group_concat
FROM t1 ORDER BY a;
c a b group_concat
----- - - ------------
one 1 A A.D.G
two 2 B B.E
three 3 C C.F
one 4 D D.G
two 5 E E
three 6 F F
one 7 G G
Run Time: real 0.001 user 0.000238 sys 0.000268
Frame的指定
默认的frame是这么指定的
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS
SELECT a, b, c,
group_concat(b, '.') OVER (ORDER BY c) AS group_concat
FROM t1 ORDER BY a;
a b c group_concat
- - ----- -------------
1 A one A.D.G
2 B two A.D.G.C.F.B.E
3 C three A.D.G.C.F
4 D one A.D.G
5 E two A.D.G.C.F.B.E
6 F three A.D.G.C.F
7 G one A.D.G
Run Time: real 0.000 user 0.000245 sys 0.000246
Frame 的类型
| type | spec |
|---|---|
| ROWS | starting & ending boundaries for the frame are detemined by counting boundaries of the frame are measured |
| GROUPS | |
| RANGE | ORDER BY clause of the window have excetly one term. |
Frame 边界
SELECT c, a, b, group_concat(b, '.') OVER (
ORDER BY c, a ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS group_concat
FROM t1 ORDER BY c, a;
c a b group_concat
----- - - -------------
one 1 A A.D.G.C.F.B.E
one 4 D D.G.C.F.B.E
one 7 G G.C.F.B.E
three 3 C C.F.B.E
three 6 F F.B.E
two 2 B B.E
two 5 E E
Run Time: real 0.003 user 0.000241 sys 0.000858
NBOUNDED PRECEDING: 分区里的第一条<expr> PRECEDING- ROWS
- GROUPS
- RANGE
CURRENT ROW<expr> FOLLOWINGUNBOUNDED FOLLOWING
EXCLUDE 语句
SELECT c, a, b,
group_concat(b, '.') OVER (
ORDER BY c GROUPS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS
) AS no_others,
group_concat(b, '.') OVER (
ORDER BY c GROUPS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW
) AS current_row,
group_concat(b, '.') OVER (
ORDER BY c GROUPS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE GROUP
) AS grp,
group_concat(b, '.') OVER (
ORDER BY c GROUPS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE TIES
) AS ties
FROM t1 ORDER BY c, a;
c a b no_others current_row grp ties
----- - - ------------- ----------- --------- -----------
one 1 A A.D.G D.G ¤ A
one 4 D A.D.G A.G ¤ D
one 7 G A.D.G A.D ¤ G
three 3 C A.D.G.C.F A.D.G.F A.D.G A.D.G.C
three 6 F A.D.G.C.F A.D.G.C A.D.G A.D.G.F
two 2 B A.D.G.C.F.B.E A.D.G.C.F.E A.D.G.C.F A.D.G.C.F.B
two 5 E A.D.G.C.F.B.E A.D.G.C.F.B A.D.G.C.F A.D.G.C.F.E
Run Time: real 0.001 user 0.000416 sys 0.000421
- EXCLUDE NO OTHERS: 默认
- EXCLUDE CURRENT ROW:
- EXCLUDE GROUP:
- EXCLUDE TIES:
FILTER 语句
SELECT c, a, b, group_concat(b, '.') FILTER (WHERE c!='two') OVER (
ORDER BY a
) AS group_concat
FROM t1 ORDER BY a;
c a b group_concat
----- - - ------------
one 1 A A
two 2 B A
three 3 C A.C
one 4 D A.C.D
two 5 E A.C.D
three 6 F A.C.D.F
one 7 G A.C.D.F.G
Run Time: real 0.000 user 0.000120 sys 0.000134
SQL语句中的表连接(Where)和过滤(Where)
首先, 可视化的介绍下连接的各种情况, 以表1 和表2为例, 下面分别列出了各种可能的连接的情况
- 笛卡尔集: cross join
- 内连接: inner join
- 左外连接: left outer join
- 右外连接:right outer join
- 全外连接:full outer join

以上连接的sql实现
重要的事情: 避免在`where`中写隐含的join, join要放到join语句里
下面的代码在sqlite中执行成功
create table zhihu(
real_name text,
nick_name text
);
create table weibo(
real_name text,
nick_name text
);
select * from zhihu;
select * from weibo;
insert into zhihu (real_name, nick_name)
values ("李照宇", "KL@zhihu"),
("马海强", "MHQ@zhihu"),
("迪丽热巴", "DLRB@zhihu");
insert into weibo (real_name, nick_name)
values ("李照宇", "KL@weibo"),
("马海强", "MHQ@weibo"),
("杨超越", "YCY@weibo");
-- cross join
select * from weibo, zhihu;
real_name nick_name real_name nick_name
--------- --------- --------- ----------
李照宇 KL@weibo 李照宇 KL@zhihu
李照宇 KL@weibo 马海强 MHQ@zhihu
李照宇 KL@weibo 迪丽热巴 DLRB@zhihu
马海强 MHQ@weibo 李照宇 KL@zhihu
马海强 MHQ@weibo 马海强 MHQ@zhihu
马海强 MHQ@weibo 迪丽热巴 DLRB@zhihu
杨超越 YCY@weibo 李照宇 KL@zhihu
杨超越 YCY@weibo 马海强 MHQ@zhihu
杨超越 YCY@weibo 迪丽热巴 DLRB@zhihu
-- 隐式内联, 我们要避免的写法
select * from zhihu, weibo where zhihu.real_name = weibo.real_name;
real_name nick_name real_name nick_name
--------- --------- --------- ---------
李照宇 KL@zhihu 李照宇 KL@weibo
马海强 MHQ@zhihu 马海强 MHQ@weibo
-- 显式内联, 结果同上,是我们推荐的写法
select * from weibo
join zhihu on weibo.real_name = zhihu.real_name;
real_name nick_name real_name nick_name
--------- --------- --------- ---------
李照宇 KL@zhihu 李照宇 KL@weibo
马海强 MHQ@zhihu 马海强 MHQ@weibo
-- 全外连接, sqlite不支持
select * from weibo
outer join zhihu on weibo.real_name = zhihu.real_name;
Error: RIGHT and FULL OUTER JOINs are not currently supported
-- 左外连接
select * from weibo
left outer join zhihu on weibo.real_name = zhihu.real_name;
-- 右外连接不支持哟
select * from weibo
right outer join zhihu on weibo.real_name = zhihu.real_name;
right outer join zhihu on weibo.real_name = zhihu.real_name;
-- 左右其实是等价的, 右可以改成做
select * from zhihu
left outer join weibo on weibo.real_name = zhihu.real_name;
real_name nick_name real_name nick_name
--------- ---------- --------- ---------
李照宇 KL@zhihu 李照宇 KL@weibo
马海强 MHQ@zhihu 马海强 MHQ@weibo
迪丽热巴 DLRB@zhihu
-- 区分join和filter
select * from weibo
join zhihu on weibo.real_name = zhihu.real_name
where weibo.real_name = "李照宇";
real_name nick_name real_name nick_name
--------- --------- --------- ---------
李照宇 KL@weibo 李照宇 KL@zhihu
-- 最后再强调一遍: 避免下隐式join
select * from weibo, zhihu
where weibo.real_name = zhihu.real_name and weibo.real_name = "李照宇";
sqlite独有特性
存储数据类型(storage type) 指的是slqite存储的5种基础数据类型
type explain integer They can vary in size: 1-8 bytes. maximum range is {-9223372036854775808,-1,0,1, 9223372036854775807} real 8 byte floats number text UTF-8 and UTF-16 (big and little endian), 1,000,000,000 bytes. blob Binary large object (BLOB) data is any kind of data. 1,000,000,000 bytes. null SQLite has full support for NULL handling. select typeof(3.14), typeof('3.14'), typeof(314), typeof(x'3142'), typeof(NULL); typeof(3.14) typeof('3.14') typeof(314) typeof(x'3142') typeof(NULL) ------------ -------------- ----------- --------------- ------------ real text integer blob null Run Time: real 0.000 user 0.000123 sys 0.000037 drop table domain; create table domain(x); insert into domain values (3.142); insert into domain values ('3.142'); insert into domain values (3142); insert into domain values (x'3142'); insert into domain values (null); select rowid, x, typeof(x) from domain; rowid x typeof(x) ----- ----- --------- 1 3.142 real 2 3.142 text 3 3142 integer 4 1B blob 5 ¤ null -- 顺序上, NULL最小, real/integer 次之, TEXT再次之, Blob最小, 比较时候通过c的 memcmp()函数 select rowid, x, typeof(x) from domain order by x; rowid x typeof(x) ----- ----- --------- 5 ¤ null 1 3.142 real 3 3142 integer 2 3.142 text 4 1B blob声明类型 sqlite是存储类型区分的, 表声明的类型完全无效
drop table abc; create table abc (int a, float b); insert into abc values(100, 100.1); insert into abc values(100.1, 100.1); insert into abc values(100.1, 'abc'); insert into abc values(x'1000', 100.1); select int, float, typeof(int), typeof(float) from abc; create table xyz(a haha, b xixi);字符串比较类型, 也是一种约束
BINARY 二进制比较,直接使用memcmp()比较 NOCASE 将26个大写字母转换为小写字母后进行与BINARY一样的比较 RTRIM 和BINARY一样,忽略结尾的空格 PRAGMA Statements 官方文档 查询/设置数据库状态, 一般这些信息并不是以table形式保存的. 有这么几个要点
- 不保证版本兼容性
- 语句如果有错误, 不会给出任何提示
- PRAGMA提供的某些查询可以使用sql进行
- 比如
PRAGMA database_list等价于select * from pragma_database_list
- 比如
- PRAGMA 语句是sqlite特有的, 其他SQL引擎并不支持
例子
select last_insert_rowid(); -- 查看数据库编码 PRAGMA encoding; -- 查看所有sqlite支持的函数 PRAGMA function_list; -- 查看/设置数据库是否只读 PRAGMA query_only;
sqlite作为数据分析工具
sqlite分析大的csv文件
.import a.txt" foo --csv
备份
cronjob自动备份
#!/bin/bash -x
# Backup & compress our database to the temp directory.
sqlite3 /path/to/db '.backup /tmp/db'
gzip /tmp/db
# Upload backup to S3 using a rolling daily naming scheme.
aws s3 cp /tmp/db.gz s3://mybucket/db-`date +%d`.gz
# Notify dead man that back up completed successfully.
curl -d s=$? https://nosnch.in/xxxxxxxxxx &> /dev/null
# Edit your cron jobs
crontab -e
# Add this to the end of the crontab
0 0 * * * /path/to/my_backup_script.sh