我在向CSV文件发出Http请求时收到此类CSV数据.非常畸形的字符串. response = '"Subject";"Start Date";"Start Time";"End Date";"End Time";"All day event";"Description""Play football";"16/11/2009";"10:00 PM";"16/11/2009";"11
response = '"Subject";"Start Date";"Start Time";"End Date";"End Time";"All day event";"Description""Play football";"16/11/2009";"10:00 PM";"16/11/2009";"11:00 PM";"false";"""Watch 2012";"20/11/2009";"07:00 PM";"20/11/2009";"08:00 PM";"false";""'
我想将其转换为字典列表
[{"Subject": "Play football", "Start Date": "16/11/2009", "Start Time": "10:00 PM", "End Date": "16/11/2009", "End Time": "11:00 PM", "All day event", false, "Description": ""},
{"Subject": "Watch 2012", "Start Date": "20/11/2009", "Start Time": "07:00 PM", "End Date": "20/11/2009", "End Time": "08:00 PM", "All day event", false, "Description": ""}]
我尝试使用python csv模块解决这个问题但是没有用.
import csv
from cStringIO import StringIO
>>> str_obj = StringIO(response)
>>> reader = csv.reader(str_obj, delimiter=';')
>>> [x for x in reader]
[['Subject',
'Start Date',
'Start Time',
'End Date',
'End Time',
'All day event',
'Description"Play football',
'16/11/2009',
'10:00 PM',
'16/11/2009',
'11:00 PM',
'false',
'"Watch 2012',
'20/11/2009',
'07:00 PM',
'20/11/2009',
'08:00 PM',
'false',
'']]
我得到了上面的结果.
任何形式的帮助将不胜感激.提前致谢.
这是一个pyparsing解决方案:from pyparsing import QuotedString, Group, delimitedList, OneOrMore
# a row of headings or data is a list of quoted strings, delimited by ';'s
qs = QuotedString('"')
datarow = Group(delimitedList(qs, ';'))
# an entire data set is a single data row containing the headings, followed by
# one or more data rows containing the data
dataset_parser = datarow("headings") + OneOrMore(datarow)("rows")
# parse the returned response
data = dataset_parser.parseString(response)
# create dict by zipping headings with each row's data values
datadict = [dict(zip(data.headings, row)) for row in data.rows]
print datadict
打印:
[{'End Date': '16/11/2009', 'Description': '', 'All day event': 'false',
'Start Time': '10:00 PM', 'End Time': '11:00 PM', 'Start Date': '16/11/2009',
'Subject': 'Play football'},
{'End Date': '20/11/2009', 'Description': '', 'All day event': 'false',
'Start Time': '07:00 PM', 'End Time': '08:00 PM', 'Start Date': '20/11/2009',
'Subject': 'Watch 2012'}]
如果引用的字符串包含嵌入的分号,这也将处理这种情况.
