Monthly Archives: 五月 2019

Hello World

python读取csv/json/xml

python(3.x)读取csv:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import csv
 
filename = 'xxx.csv'
fi = [] 
ls = []
 
with open(filename, 'r', encoding='gbk') as csvfile:
    reader = csv.reader(csvfile)
    fi = next(reader)
    for lr in reader:
        ls.append(lr)
 
for lr in ls[:5]:
    print(lr)

当然,使用专业类库,会更简单:

1
2
3
4
5
6
7
import pandas as pd
 
filename = 'xxx.csv'
 
data = pd.read_csv(filename, encoding='gbk')
 
print(data.head(5))

读取json:

1
2
3
4
5
6
7
8
import json
 
filename = 'xxx.json'
 
with open(filename) as f:
    jsondata = json.load(f)
 
print(jsondata)

当然,pandas里也有更高效的读取json的read_json(),不过用它读取多维不定格式的json时,会报错“ValueError: arrays must all be same length”,这里就不再演示了

读取xml,并转变为json:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import json
import xml.etree.ElementTree as ET
import xmltodict
 
filename = 'xxx/s1.xml'
 
tree = ET.parse(filename)
xmldata = tree.getroot()
 
xmlstr = ET.tostring(xmldata, encoding='utf8', method='xml')
xmldict = xmltodict.parse(xmlstr, encoding='utf8')
 
datadict = dict(xmldict)
 
print(datadict)
 
filename2 = 'xxxx/s1.json'
with open(filename2, 'w+') as jsonfile:
    json.dump(datadict, jsonfile, indent=4, sort_keys=True)

注:
1. xmltodict可能需要安装:pip install xmltodict
2. xmltodict.parse对xml格式要求较严格,如果某节点字符串开头第一个字符是数值,createElement时可能会报错