我有一长串的整数,我想把它变成MD5哈希.最快的方法是什么?我尝试了两个选项,两者都相似.只是想知道我是否错过了一种明显更快的方法. import randomimport hashlibimport cPickle as pickler = [r
import random import hashlib import cPickle as pickle r = [random.randrange(1, 1000) for _ in range(0, 1000000)] def method1(r): p = pickle.dumps(r, -1) return hashlib.md5(p).hexdigest() def method2(r): p = str(r) return hashlib.md5(p).hexdigest() def method3(r): p = ','.join(map(str, r)) return hashlib.md5(p).hexdigest()
然后在iPython中计时:
timeit method1(r) timeit method2(r) timeit method3(r)
给我这个:
In [8]: timeit method1(r) 10 loops, best of 3: 68.7 ms per loop In [9]: timeit method2(r) 10 loops, best of 3: 176 ms per loop In [10]: timeit method3(r) 1 loops, best of 3: 270 ms per loop
所以,选项1是我得到的最好的.但是我必须做很多事情,而且它目前是我的代码中的速率决定步骤.
使用Python 2.7,使用大型列表获取独特哈希的任何提示或技巧都比使用Python更快?
你可能会觉得这很有用.它使用我自己的自定义基准测试框架(基于timeit)来收集和打印结果.由于速度的变化主要是由于需要将therlist转换为hashlib.md5()可以使用的东西,我已经更新了测试用例套件,以显示如何将值存储在anarray.arrayinstead中,如@DSM建议的那样评论,会大大加快速度.请注意,由于列表中的整数都相对较小,我将它们存储在一个短(2字节)值的数组中.from __future__ import print_function import sys import timeit setup = """ import array import random import hashlib import marshal import cPickle as pickle import struct r = [random.randrange(1, 1000) for _ in range(0, 1000000)] ra = array.array('h', r) # create an array of shorts equivalent def method1(r): p = pickle.dumps(r, -1) return hashlib.md5(p).hexdigest() def method2(r): p = str(r) return hashlib.md5(p).hexdigest() def method3(r): p = ','.join(map(str, r)) return hashlib.md5(p).hexdigest() def method4(r): fmt = '%dh' % len(r) buf = struct.pack(fmt, *r) return hashlib.md5(buf).hexdigest() def method5(r): a = array.array('h', r) return hashlib.md5(a).hexdigest() def method6(r): m = marshal.dumps(r) return hashlib.md5(m).hexdigest() # using pre-built array... def pb_method1(ra): p = pickle.dumps(ra, -1) return hashlib.md5(p).hexdigest() def pb_method2(ra): p = str(ra) return hashlib.md5(p).hexdigest() def pb_method3(ra): p = ','.join(map(str, ra)) return hashlib.md5(p).hexdigest() def pb_method4(ra): fmt = '%dh' % len(ra) buf = struct.pack(fmt, *ra) return hashlib.md5(buf).hexdigest() def pb_method5(ra): return hashlib.md5(ra).hexdigest() def pb_method6(ra): m = marshal.dumps(ra) return hashlib.md5(m).hexdigest() """ statements = { "pickle.dumps(r, -1)": """ method1(r) """, "str(r)": """ method2(r) """, "','.join(map(str, r))": """ method3(r) """, "struct.pack(fmt, *r)": """ method4(r) """, "array.array('h', r)": """ method5(r) """, "marshal.dumps(r)": """ method6(r) """, # versions using pre-built array... "pickle.dumps(ra, -1)": """ pb_method1(ra) """, "str(ra)": """ pb_method2(ra) """, "','.join(map(str, ra))": """ pb_method3(ra) """, "struct.pack(fmt, *ra)": """ pb_method4(ra) """, "ra (pre-built)": """ pb_method5(ra) """, "marshal.dumps(ra)": """ pb_method6(ra) """, } N = 10 R = 3 timings = [( idea, min(timeit.repeat(statements[idea], setup=setup, repeat=R, number=N)), ) for idea in statements] longest = max(len(t[0]) for t in timings) # length of longest name print('fastest to slowest timings (Python {}.{}.{})\n'.format(*sys.version_info[:3]), ' ({:,d} calls, best of {:d})\n'.format(N, R)) ranked = sorted(timings, key=lambda t: t[1]) # sort by speed (fastest first) for timing in ranked: print("{:>{width}} : {:.6f} secs, rel speed {rel:>8.6f}x".format( timing[0], timing[1], rel=timing[1]/ranked[0][1], width=longest))
结果:
fastest to slowest timings (Python 2.7.6) (10 calls, best of 3) ra (pre-built) : 0.037906 secs, rel speed 1.000000x marshal.dumps(ra) : 0.177953 secs, rel speed 4.694626x marshal.dumps(r) : 0.695606 secs, rel speed 18.350932x pickle.dumps(r, -1) : 1.266096 secs, rel speed 33.401179x array.array('h', r) : 1.287884 secs, rel speed 33.975950x pickle.dumps(ra, -1) : 1.955048 secs, rel speed 51.576558x struct.pack(fmt, *r) : 2.085602 secs, rel speed 55.020743x struct.pack(fmt, *ra) : 2.357887 secs, rel speed 62.203962x str(r) : 2.918623 secs, rel speed 76.996860x str(ra) : 3.686666 secs, rel speed 97.258777x ','.join(map(str, r)) : 4.701531 secs, rel speed 124.032173x ','.join(map(str, ra)) : 4.968734 secs, rel speed 131.081303x