August 12, 2020
By: Kevin
文件编码, js中转码并且下载到本地
文件是有编码的
下载下来以后, 编码是iso8859, cat下来是乱码
cat是个很简单的工具, 它只会拿文件指定的编码去解码, ios8859对应拉丁字符
file 345201317205065728.csv
345201317205065728.csv: ISO-8859 text, with CRLF line terminators
cat 345201317205065728.csv
���,Ա������,Ա������,ǰ�³�,���³�,��Χ,��Χ,���,�䳤(��),��Χ,�ϱ�Χ
1,����-������ɾ��,,�ο���Χ1~150,45~125,75~180,60~170,30~65,15~90,,20~70
二进制查看, 各人习惯, vim 打开后执行:%!xxd
00000000: d0f2 bac5 2cd4 b1b9 a4d0 d5c3 fb2c d4b1 ....,........,.. <= 该编码方式下,汉字字长是2个字节, 典型的使用iOS编码汉字
00000010: b9a4 b9a4 bac5 2cc7 b0d2 c2b3 a42c baf3 ......,......,.. 对应GBK/gb2312
00000020: d2c2 b3a4 2cd0 d8ce a72c d1fc cea7 2cbc ....,....,....,.
00000030: e7bf ed2c d0e4 b3a4 28d3 d229 2cb0 dace ...,....(..),...
00000040: a72c c9cf b1db cea7 0d0a 312c c0fd d7d3 .,........1,....
00000050: 2dba f3c6 dac7 ebc9 beb3 fd2c 2cb2 cebf -..........,,...
00000060: bcb7 b6ce a731 7e31 3530 2c34 357e 3132 .....1~150,45~12
00000070: 352c 3735 7e31 3830 2c36 307e 3137 302c 5,75~180,60~170,
00000080: 3330 7e36 352c 3135 7e39 302c 2c32 307e 30~65,15~90,,20~
00000090: 3730 0d0a 70..
编码是可以转化的
iconv -f gb2312 -t utf-8 345201317205065728.csv > u.csv
file u.csv
u.csv: UTF-8 Unicode text, with CRLF line terminators
cat u.csv
序号,员工姓名,员工工号,前衣长,后衣长,胸围,腰围,肩宽,袖长(右),摆围,上臂围
1,例子-后期请删除,,参考范围1~150,45~125,75~180,60~170,30~65,15~90,,20~70
二进制查看, 注意这个是没有BOM的, BOM不重要, 纯做展示
00000000: e5ba 8fe5 8fb7 2ce5 9198 e5b7 a5e5 a793 ......,......... <= 汉字三个字节, 比较典型的utf8, 注意\u 也就是e5
00000010: e590 8d2c e591 98e5 b7a5 e5b7 a5e5 8fb7 ...,............
00000020: 2ce5 898d e8a1 a3e9 95bf 2ce5 908e e8a1 ,.........,.....
00000030: a3e9 95bf 2ce8 83b8 e59b b42c e885 b0e5 ....,......,....
00000040: 9bb4 2ce8 82a9 e5ae bd2c e8a2 96e9 95bf ..,......,......
00000050: 28e5 8fb3 292c e691 86e5 9bb4 2ce4 b88a (...),......,...
00000060: e887 82e5 9bb4 0d0a 312c e4be 8be5 ad90 ........1,......
00000070: 2de5 908e e69c 9fe8 afb7 e588 a0e9 99a4 -...............
00000080: 2c2c e58f 82e8 8083 e88c 83e5 9bb4 317e ,,............1~
00000090: 3135 302c 3435 7e31 3235 2c37 357e 3138 150,45~125,75~18
000000a0: 302c 3630 7e31 3730 2c33 307e 3635 2c31 0,60~170,30~65,1
000000b0: 357e 3930 2c2c 3230 7e37 300d 0a 5~90,,20~70..
在js使用FilerReader进行转码
转码过程可以发生在浏览器, 注意这个文件编码我特意增加了BOM
(ns show.core.async
(:require [ajax.core :as ajax]
[ajax.protocols :refer [-body]]))
(defn container []
"klipse容器"
[]
js/klipse-container)
(defn create-a-add-click!
"创建一个新的<a>标签,增加到当前klipse容器,并且自动下载点击"
[msg]
(let [el (container)
a (.createElement js/document "a")]
(.setAttribute a "href" (str "data:text/plain;charset=utf-8,%EF%BB%BF" (js/encodeURIComponent msg)))
(.setAttribute a "download" "my.csv")
(.appendChild el a)
(.click a)))
(def reader "reader可以用来指定编码读写" (js/FileReader.))
(set! (.. reader -onload) (fn [e]
(prn (.-result reader))
(create-a-add-click! (.-result reader))))
(ajax/GET "https://aidingshan-qiniu.3vyd.com/store-message/345201317205065728.csv"
{:response-format {:type :blob
:read -body}
:handler (fn [body]
(.readAsText reader body "GB2312"))})
这次带BOM了
file u-with-bom.csv
u-with-bom.csv: UTF-8 Unicode (with BOM) text, with CRLF line terminators
二进制验证
00000000: efbb bfe5 ba8f e58f b72c e591 98e5 b7a5 .........,...... <= 注意前面增加的efbbbf
00000010: e5a7 93e5 908d 2ce5 9198 e5b7 a5e5 b7a5 ......,.........
00000020: e58f b72c e589 8de8 a1a3 e995 bf2c e590 ...,.........,..
00000030: 8ee8 a1a3 e995 bf2c e883 b8e5 9bb4 2ce8 .......,......,.
00000040: 85b0 e59b b42c e882 a9e5 aebd 2ce8 a296 .....,......,...
00000050: e995 bf28 e58f b329 2ce6 9186 e59b b42c ...(...),......,
00000060: e4b8 8ae8 8782 e59b b40d 0a31 2ce4 be8b ...........1,...
00000070: e5ad 902d e590 8ee6 9c9f e8af b7e5 88a0 ...-............
00000080: e999 a42c 2ce5 8f82 e880 83e8 8c83 e59b ...,,...........
00000090: b431 7e31 3530 2c34 357e 3132 352c 3735 .1~150,45~125,75
000000a0: 7e31 3830 2c36 307e 3137 302c 3330 7e36 ~180,60~170,30~6
000000b0: 352c 3135 7e39 302c 2c32 307e 3730 0d0a 5,15~90,,20~70..
如果我们的问题是: 我怎么把一个gb2312的csv文件转化为utf8下载, 这个问题算是解决了, 但...是我们的问题吗?
问题
csv文件如果是utf8编码的, 咱们不用绕这么大个圈子吧??????