这次更新的原因是,在我提出这个问题的时候,我并不知道我发现了一些关于Python3如何在“引擎盖下”工作的东西.
以下所有内容的结论是:
If you write own Python3 code for an iterator and care about speed of execution you should write it as a generator function and not as an iterator class.
下面是一个简约的代码示例,它演示了表示为生成器函数的相同算法(此处为:自制的Pythons range())运行速度比表示为迭代器类的速度快得多:
def gnrtYieldRange(startWith, endAt, step=1): while startWith <= endAt: yield startWith startWith += step class iterClassRange: def __init__(self, startWith, endAt, step=1): self.startWith = startWith - 1 self.endAt = endAt self.step = step def __iter__(self): return self def __next__(self): self.startWith += self.step if self.startWith <= self.endAt: return self.startWith else: raise StopIteration N = 10000000 print(" Size of created list N = {} elements (ints 1 to N)".format(N)) from time import time as t from customRange import gnrtYieldRange as cthnYieldRange from customRange import cintYieldRange from customRange import iterClassRange as cthnClassRange from customRange import cdefClassRange iterPythnRangeObj = range(1, N+1) gnrtYieldRangeObj = gnrtYieldRange(1, N) cthnYieldRangeObj = cthnYieldRange(1, N) cintYieldRangeObj = cintYieldRange(1, N) iterClassRangeObj = iterClassRange(1, N) cthnClassRangeObj = cthnClassRange(1, N) cdefClassRangeObj = cdefClassRange(1, N) sEXECs = [ "liPR = list(iterPythnRangeObj)", "lgYR = list(gnrtYieldRangeObj)", "lcYR = list(cthnYieldRangeObj)", "liGR = list(cintYieldRangeObj)", "liCR = list(iterClassRangeObj)", "lcCR = list(cthnClassRangeObj)", "ldCR = list(cdefClassRangeObj)" ] sCOMMENTs = [ "Python3 own range(1, N+1) used here as reference for timings ", "self-made range generator function using yield (run as it is) ", "self-made range (with yield) run from module created by Cython", "Cython-optimized self-made range (using yield) run from module", "self-made range as iterator class using __next__() and return ", "self-made range (using __next__) from module created by Cython", "Cython-optimized self-made range (using __next__) from module " ] for idx, sEXEC in enumerate(sEXECs): s=t();exec(sEXEC);e=t();print("{} takes: {:3.1f} sec.".format(sCOMMENTs[idx], e-s)) print("All created lists are equal:", all([liPR == lgYR, lgYR == lcYR, lcYR == liGR, liGR == liCR, liCR == lcCR, lcCR == ldCR]) ) print("Run on Linux Mint 18.1, used Cython.__version__ == '0.25.2'")
上面的代码放入文件并运行打印到stdout:
>python3.6 -u "gnrtFunction-fasterThan-iterClass_runMe.py" Size of created list N = 10000000 elements (ints 1 to N) Python3 own range(1, N+1) used here as reference for timings takes: 0.2 sec. self-made range generator function using yield (run as it is) takes: 1.1 sec. self-made range (with yield) run from module created by Cython takes: 0.5 sec. Cython-optimized self-made range (using yield) run from module takes: 0.3 sec. self-made range as iterator class using __next__() and return takes: 3.9 sec. self-made range (using __next__) from module created by Cython takes: 3.3 sec. Cython-optimized self-made range (using __next__) from module takes: 0.2 sec. All created lists are equal: True Run on Linux Mint 18.1, used Cython.__version__ == '0.25.2' >Exit code: 0
从上面的时间可以看出,自制range()迭代器的生成器函数变量比迭代器类变量运行得更快,并且当不涉及代码优化时,此行为也传播到C代码级别的C代码创建中到Cython.
如果你很好奇为什么会这么详细,你可以通过提供的答案阅读或自己玩一下提供的代码.
下面运行上面代码所需的代码片段:
customRange.pyx – Cython文件从以下位置创建customRange模块:
def gnrtYieldRange(startWith, endAt, step=1): while startWith <= endAt: yield startWith startWith += step class iterClassRange: def __init__(self, startWith, endAt, step=1): self.startWith = startWith - 1 self.endAt = endAt self.step = step def __iter__(self): return self def __next__(self): self.startWith += self.step if self.startWith <= self.endAt: return self.startWith else: raise StopIteration def cintYieldRange(int startWith, int endAt, int step=1): while startWith <= endAt: yield startWith startWith += step cdef class cdefClassRange: cdef int startWith cdef int endAt cdef int step def __init__(self, int startWith, int endAt, int step=1): self.startWith = startWith - 1 self.endAt = endAt self.step = step def __iter__(self): return self def __next__(self): self.startWith += self.step if self.startWith <= self.endAt: return self.startWith else: raise StopIteration
以及用于创建Python customRange模块的安装文件customRange-setup.py:
import sys sys.argv += ['build_ext', '--inplace'] from distutils.core import setup from Cython.Build import cythonize setup( name = 'customRange', ext_modules = cythonize("customRange.pyx"), )
现在提供一些进一步的信息,以便更容易理解所提供的答案:
当我提出这个问题时,我正忙于一个相当复杂的问题
用于从使用yield的生成器函数形式的非唯一列表生成唯一组合的算法.我的目标是使用此算法创建一个用C语言编写的Python模块,以使其运行得更快.为此,我重写了生成器函数,该函数使用__next __()并返回使用yield到迭代器类.当我比较算法的两个变体的速度时,我很惊讶迭代器类比生成器函数慢两倍,我(错误地)认为它与我重写算法的方式有关(你需要知道这个,如果你想更好地理解这里的答案是什么)并因此
Originally asked how to make the iterator class version run at the same speed as the generator function and where the speed difference comes from?.
下面是关于问题历史的更多信息:
在下面提供的Python脚本代码中,使用带有yield的Python函数和使用带有__next__的类,实现了从非唯一元素列表创建唯一组合的完全相同的算法.代码可以在复制/粘贴后运行,因此您可以自己查看我正在谈论的内容.
对于纯Python代码观察到的相同现象传播到由Cython创建的脚本代码创建的Python扩展模块的C代码中,因此它不限于Python级代码,因为它不会在C代码级别消失.
问题是:
Where does the huge difference in speed of execution come from?
Is there anything that can be done to get both code variants to run at comparable speed? Is there something went wrong with the class/next implementation compared to the function/yield variant? Both are to my knowledge exactly the same code …
这里的代码(调整突出显示的行中的数字会改变列表中元素的唯一性级别,组合是根据对运行时间产生巨大影响的方式生成的):
def uniqCmboYieldIter(lstItems, lenCmbo): dctCounter = {} lenLstItems = len(lstItems) for idx in range(lenLstItems): item = lstItems[idx] if item in dctCounter.keys(): dctCounter[item] += 1 else: dctCounter[item] = 1 #:if #:for lstUniqs = sorted(dctCounter.keys()) lstCntRpts = [dctCounter[item] for item in lstUniqs] lenUniqs = len(lstUniqs) cmboAsIdxUniqs = [None] * lenCmbo multiplicities = [0] * lenUniqs idxIntoCmbo, idxIntoUniqs = 0, 0 while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs: count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo) cmboAsIdxUniqs[idxIntoCmbo : idxIntoCmbo + count] = [idxIntoUniqs] * count multiplicities[idxIntoUniqs] = count idxIntoCmbo += count idxIntoUniqs += 1 if idxIntoCmbo != lenCmbo: return while True: yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs) for idxIntoCmbo in reversed(range(lenCmbo)): x = cmboAsIdxUniqs[idxIntoCmbo] y = x + 1 if y < lenUniqs and multiplicities[y] < lstCntRpts[y]: break else: return for idxIntoCmbo in range(idxIntoCmbo, lenCmbo): x = cmboAsIdxUniqs[idxIntoCmbo] cmboAsIdxUniqs[idxIntoCmbo] = y multiplicities[x] -= 1 multiplicities[y] += 1 # print("# multiplicities:", multiplicities) while y != lenUniqs and multiplicities[y] == lstCntRpts[y]: y += 1 if y == lenUniqs: break class uniqCmboClassIter: # ---------------------------------------------------------------------------------------------- def __iter__(self): return self # ---------------------------------------------------------------------------------------------- def __init__(self, lstItems, lenCmbo): dctCounter = {} lenLstItems = len(lstItems) for idx in range(lenLstItems): item = lstItems[idx] if item in dctCounter.keys(): dctCounter[item] += 1 else: dctCounter[item] = 1 #:if #:for self.lstUniqs = sorted(dctCounter.keys()) self.lenUniqs = len(self.lstUniqs) self.lstCntRpts = [dctCounter[item] for item in self.lstUniqs] self.lenCmbo = lenCmbo self.cmboAsIdxUniqs = [None] * lenCmbo self.multiplicities = [0] * self.lenUniqs self.idxIntoCmbo, self.idxIntoUniqs = 0, 0 while self.idxIntoCmbo != self.lenCmbo and self.idxIntoUniqs != self.lenUniqs: count = min(self.lstCntRpts[self.idxIntoUniqs], self.lenCmbo-self.idxIntoCmbo) self.cmboAsIdxUniqs[self.idxIntoCmbo : self.idxIntoCmbo + count] = [self.idxIntoUniqs] * count self.multiplicities[self.idxIntoUniqs] = count self.idxIntoCmbo += count self.idxIntoUniqs += 1 # print("self.multiplicities:", self.multiplicities) # print("self.cmboAsIdxUniqs:", self.cmboAsIdxUniqs) if self.idxIntoCmbo != self.lenCmbo: return self.stopIteration = False self.x = None self.y = None return # ---------------------------------------------------------------------------------------------- def __next__(self): if self.stopIteration is True: raise StopIteration return nextCmbo = tuple(self.lstUniqs[idxUniqs] for idxUniqs in self.cmboAsIdxUniqs) for self.idxIntoCmbo in reversed(range(self.lenCmbo)): self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo] self.y = self.x + 1 if self.y < self.lenUniqs and self.multiplicities[self.y] < self.lstCntRpts[self.y]: break else: self.stopIteration = True return nextCmbo for self.idxIntoCmbo in range(self.idxIntoCmbo, self.lenCmbo): self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo] self.cmboAsIdxUniqs[self.idxIntoCmbo] = self.y self.multiplicities[self.x] -= 1 self.multiplicities[self.y] += 1 # print("# multiplicities:", multiplicities) while self.y != self.lenUniqs and self.multiplicities[self.y] == self.lstCntRpts[self.y]: self.y += 1 if self.y == self.lenUniqs: break return nextCmbo # ============================================================================================================================================ lstSize = 48 # 48
06005
aList = [] from random import randint for _ in range(lstSize): aList.append( ( randint(1,uniqLevel), randint(1,uniqLevel) ) ) lenCmbo = 6 percUnique = 100.0 - 100.0*(lstSize-len(set(aList)))/lstSize print("======================== lenCmbo:", lenCmbo, " sizeOfList:", len(aList), " noOfUniqueInList", len(set(aList)), " percUnique", int(percUnique) ) import time from itertools import combinations # itertools.combinations # --- # def uniqCmboYieldIter(lstItems, lenCmbo): # class uniqCmboClassIter: def __init__(self, lstItems, lenCmbo): # --- start_time = time.time() print("Combos:%9i"%len(list(combinations(aList, lenCmbo))), " ", end='') duration = time.time() - start_time print("print(len(list( combinations(aList, lenCmbo)))):", "{:9.5f}".format(duration), "seconds.") start_time = time.time() print("Combos:%9i"%len(list(uniqCmboYieldIter(aList, lenCmbo))), " ", end='') duration = time.time() - start_time print("print(len(list(uniqCmboYieldIter(aList, lenCmbo)))):", "{:9.5f}".format(duration), "seconds.") start_time = time.time() print("Combos:%9i"%len(list(uniqCmboClassIter(aList, lenCmbo))), " ", end='') duration = time.time() - start_time print("print(len(list(uniqCmboClassIter(aList, lenCmbo)))):", "{:9.5f}".format(duration), "seconds.")
和我的盒子上的时间:
>python3.6 -u "nonRecursiveUniqueCombos_Cg.py" ======================== lenCmbo: 6 sizeOfList: 48 noOfUniqueInList 32 percUnique 66 Combos: 12271512 print(len(list( combinations(aList, lenCmbo)))): 2.04635 seconds. Combos: 1296058 print(len(list(uniqCmboYieldIter(aList, lenCmbo)))): 3.25447 seconds. Combos: 1296058 print(len(list(uniqCmboClassIter(aList, lenCmbo)))): 5.97371 seconds. >Exit code: 0 [2017-05-02_03:23] 207474 <-Chrs,Keys-> 1277194 OnSave(): '/home/claudio/CgMint18/_Cg.DIR/ClaudioOnline/at-stackoverflow/bySubject/uniqueCombinations/nonRecursiveUniqueCombos_Cg.py' >python3.6 -u "nonRecursiveUniqueCombos_Cg.py" ======================== lenCmbo: 6 sizeOfList: 48 noOfUniqueInList 22 percUnique 45 Combos: 12271512 print(len(list( combinations(aList, lenCmbo)))): 2.05199 seconds. Combos: 191072 print(len(list(uniqCmboYieldIter(aList, lenCmbo)))): 0.47343 seconds. Combos: 191072 print(len(list(uniqCmboClassIter(aList, lenCmbo)))): 0.89860 seconds. >Exit code: 0 [2017-05-02_03:23] 207476 <-Chrs,Keys-> 1277202 OnSave(): '/home/claudio/CgMint18/_Cg.DIR/ClaudioOnline/at-stackoverflow/bySubject/uniqueCombinations/nonRecursiveUniqueCombos_Cg.py' >python3.6 -u "nonRecursiveUniqueCombos_Cg.py" ======================== lenCmbo: 6 sizeOfList: 48 noOfUniqueInList 43 percUnique 89 Combos: 12271512 print(len(list( combinations(aList, lenCmbo)))): 2.17285 seconds. Combos: 6560701 print(len(list(uniqCmboYieldIter(aList, lenCmbo)))): 16.72573 seconds. Combos: 6560701 print(len(list(uniqCmboClassIter(aList, lenCmbo)))): 31.17714 seconds. >Exit code: 0
更新(状态2017-05-07):
At the time of asking the question and offering a bounty it was not known to me that there is a way to easily create C code of an extension module for an iterator object out of Python script code using Cython and that such C code can be created also from an iterator function using
yield
.
考虑到生成的更快版本的C扩展模块仍然不够快,无法与itertools.combinations竞争,与迭代器函数相比,深入了解使用迭代器类时究竟导致减速的原因没有多大意义以及如何克服这一点.使用Cython找到加速更快版本的方法更有意义,特别是因为我是编写Python扩展模块的新手,在经过数小时的专注工作花费在调整现有C代码上之后无法创建工作代码由于Segmentation Fault错误而导致自己修改的itertools.combinations,我无法理解其中的原因.
目前我认为仍然有空间来加速我使用的Cython代码,而不需要更难以自己编写C代码.
在运行良好的Cython代码下面,对于速度优化的Cython代码,它以某种方式改变(我目前无法看到原因)算法的工作方式,因此产生错误的结果. Cython优化背后的想法是在Cython代码中使用Python / Cython数组而不是Python列表.任何提示如何以新手“安全”的方式从使用的算法中获得更快速运行的Python扩展模块是值得欢迎的.
def subbags_by_loops_with_dict_counter(lstItems, int lenCmbo): dctCounter = {} cdef int lenLstItems = len(lstItems) cdef int idx = 0 for idx in range(lenLstItems): item = lstItems[idx] if item in dctCounter.keys(): dctCounter[item] += 1 else: dctCounter[item] = 1 #:if #:for lstUniqs = sorted(dctCounter.keys()) lstCntRpts = [dctCounter[item] for item in lstUniqs] cdef int lenUniqs = len(lstUniqs) cmboAsIdxUniqs = [None] * lenCmbo multiplicities = [0] * lenUniqs cdef int idxIntoCmbo cdef int idxIntoUniqs cdef int count while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs: count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo) cmboAsIdxUniqs[idxIntoCmbo : idxIntoCmbo + count] = [idxIntoUniqs] * count multiplicities[idxIntoUniqs] = count idxIntoCmbo += count idxIntoUniqs += 1 if idxIntoCmbo != lenCmbo: return cdef int x cdef int y while True: yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs) for idxIntoCmbo in reversed(range(lenCmbo)): x = cmboAsIdxUniqs[idxIntoCmbo] y = x + 1 if y < lenUniqs and multiplicities[y] < lstCntRpts[y]: break else: return for idxIntoCmbo in range(idxIntoCmbo, lenCmbo): x = cmboAsIdxUniqs[idxIntoCmbo] cmboAsIdxUniqs[idxIntoCmbo] = y multiplicities[x] -= 1 multiplicities[y] += 1 while y != lenUniqs and multiplicities[y] == lstCntRpts[y]: y += 1 if y == lenUniqs: break
低于优化的CYTHON CODE会产生错误的结果:
def subbags_loops_dict_cython_optimized(lstItems, int lenCmbo): dctCounter = {} cdef int lenLstItems = len(lstItems) cdef int idx = 0 for idx in range(lenLstItems): item = lstItems[idx] if item in dctCounter.keys(): dctCounter[item] += 1 else: dctCounter[item] = 1 #:if #:for lstUniqs = sorted(dctCounter.keys()) lstCntRpts = [dctCounter[item] for item in lstUniqs] cdef int lenUniqs = len(lstUniqs) cdef array.array cmboAsIdxUniqs = array.array('i', []) array.resize(cmboAsIdxUniqs, lenCmbo) # cmboAsIdxUniqs = [None] * lenCmbo cdef array.array multiplicities = array.array('i', []) array.resize(multiplicities, lenUniqs) # multiplicities = [0] * lenUniqs cdef int idxIntoCmbo cdef int maxIdxCmbo cdef int curIdxCmbo cdef int idxIntoUniqs cdef int count while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs: count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo) maxIdxCmbo = idxIntoCmbo + count curIdxCmbo = idxIntoCmbo while curIdxCmbo < maxIdxCmbo: cmboAsIdxUniqs[curIdxCmbo] = idxIntoUniqs curIdxCmbo += 1 multiplicities[idxIntoUniqs] = count idxIntoCmbo += count idxIntoUniqs += 1 # print("multiplicities:", multiplicities) # print("cmboAsIdxUniqs:", cmboAsIdxUniqs) if idxIntoCmbo != lenCmbo: return cdef int x cdef int y while True: yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs) for idxIntoCmbo in reversed(range(lenCmbo)): x = cmboAsIdxUniqs[idxIntoCmbo] y = x + 1 if y < lenUniqs and multiplicities[y] < lstCntRpts[y]: break else: return for idxIntoCmbo in range(idxIntoCmbo, lenCmbo): x = cmboAsIdxUniqs[idxIntoCmbo] cmboAsIdxUniqs[idxIntoCmbo] = y multiplicities[x] -= 1 multiplicities[y] += 1 # print("# multiplicities:", multiplicities) while y != lenUniqs and multiplicities[y] == lstCntRpts[y]: y += 1 if y == lenUniqs: break当我将itertools文档的一些配方重写为C扩展时,我提供了一些经验.我想我可能会有一些可以帮助你的见解.
Generator与Iterator类.
当你编写纯Python代码时,它是速度(生成器)和功能(迭代器)之间的权衡.
屈服函数(称为生成器)用于速度,通常可以在不打扰内部状态的情况下编写它们.因此编写它们的工作量较少,并且它们很快,因为Python只管理所有“状态”.
生成器更快(或至少不慢)的原因主要是因为:
>除__next __-方法外,它们直接实现__next __-槽(通常为tp_iternext).在这种情况下,Python不必查找__next__方法 – 这实际上是在以下示例中使其更快的原因:
from itertools import islice def test(): while True: yield 1 class Test(object): def __iter__(self): return self def __next__(self): return 1 %timeit list(islice(test(), 1000)) # 173 µs ± 2.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) %timeit list(islice(Test(), 1000)) # 499 µs ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
所以它几乎快3倍,因为生成器直接填充__next __-槽.
> yield-function和类有一个状态,但yield函数比使用类和属性访问更快地保存和加载状态:
def test(): i = 0 while True: yield i i += 1 class Test(object): def __init__(self): self.val = 0 def __iter__(self): return self def __next__(self): current = self.val self.val += 1 return current %timeit list(islice(test(), 1000)) # 296 µs ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit list(islice(Test(), 1000)) # 1.22 ms ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
这次课程的速度已经慢了4倍(相比之下,当没有涉及任何州时,几乎是3次).这是累积效应:所以你拥有的“状态”越多,类变体就越慢.
对于收益率与阶级方法的关系如此之多.请注意,实际时间取决于操作类型.例如,如果调用next时运行的实际代码很慢(即time.sleep(1)),那么生成器和类之间几乎没有差别!
用Cython
如果你想要一个快速的cython迭代器类,它必须是一个cdef类.否则你不会得到真正快速的课程.原因是只有一个cdef类创建了一个直接实现tp_iternext字段的扩展类型!我将使用IPythons %% cython来编译代码(所以我不必包含设置):
%%cython def test(): while True: yield 1 class Test(object): def __iter__(self): return self def __next__(self): return 1 cdef class Test_cdef(object): def __iter__(self): return self def __next__(self): return 1 %timeit list(islice(test(), 1000)) # 113 µs ± 4.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) %timeit list(islice(Test(), 1000)) # 407 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit list(islice(Test_cdef(), 1000)) # 62.8 µs ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
时间已经表明生成器和基本类比纯Python等价物快,但它们的相对性能大致保持不变.然而,cdef类变体击败了它们,这主要是因为使用了tp_iternext槽而不是仅实现__next__方法. (如果你不信任我,请检查Cython生成的C代码:))
然而,它只比Python生成器快2倍,这并不坏但是它并不是真正的压倒性的.为了获得真正惊人的加速,你需要找到一种方法来表达没有Python对象的程序(Python对象越少,加速越快).例如,如果您使用字典存储项目并且它的多样性,您仍然存储Python对象,并且任何查找都必须使用python字典方法完成 – 即使您可以通过C API函数调用它们而不必查找实际方法:
%%cython cpdef cython_count(items): cdef dict res = dict() for item in items: if item in res: res[item] += 1 else: res[item] = 1 return res import random def count(items): res = {} for item in items: if item in res: res[item] += 1 else: res[item] = 1 return res l = [random.randint(0, 100) for _ in range(10000)] %timeit cython_count(l) # 2.06 ms ± 13 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit count(l) # 3.63 ms ± 21.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
这里有一个问题,你没有使用collections.Counter,它有一个优化的C代码(至少在python-3中)用于这种操作:
from collections import Counter %timeit Counter(l) # 1.17 ms ± 41.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
这里有一个快速说明:不要在some_dict.keys()中使用某些东西,因为keys()在Python2中是列表式的,而ony实现O(n)包含操作,而some_dict中的某些东西通常是O(1)(两个Pythons) !这将使两个版本的速度更快,尤其是在Python2上:
def count2(items): res = {} for item in items: if item in res.keys(): # with "keys()" res[item] += 1 else: res[item] = 1 return res # Python3 l = [random.randint(0, 100) for _ in range(10000)] %timeit count(l) # 3.63 ms ± 29 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit count2(l) # 5.9 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # Python2 l = [random.randint(0, 10000) for _ in range(10000)] %timeit count(l) # 100 loops, best of 3: 4.59 ms per loop %timeit count2(l) # 1 loop, best of 3: 2.65 s per loop <--- WHOOPS!!!
这表明当你使用python结构时,你只能希望使用Cython(和C扩展)加速3-4倍,但即使是使用“.keys()”这样的小错误也会在性能方面花费更多不正确.
优化Cython
那么如果你想要它更快,你能做什么?答案相对简单:基于C类型而不是Python类型创建自己的数据结构.
这意味着你必须考虑设计:
>您想在uniqComb **中支持哪些类型?你想要整数(例子说的是这样,但我想你想要任意的Python对象).
>你想要从Python内省(如当前状态)吗?如果你想要它,将多重性保持为python对象是有意义的,但是如果你不在乎你可以将它们保存为类似整数的对象而不是python对象.
>您是否需要传递给uniqComb **函数的对象可以排序?您使用了已排序但您也可以使用OrderedDict并按照外观顺序而不是按数值保持键.
这些问题的答案(这些只是我立即问自己的问题,可能还有更多!)可以帮助您确定可以在内部使用的结构.例如,使用Cython,您可以连接到C,并且可以使用包含整数键和整数值的map
而不是字典.它是默认排序的,因此您不需要自己对它们进行手动排序,而是使用本机整数而不是Python对象.但是你放弃了在uniqComb中处理任意python对象的能力,你需要知道如何在Cython中使用C类型.它可能会非常快!
我不会走那条路,因为我假设你想要支持任意可订购的python类型,我坚持使用Counter作为起点,但我将多重性保存为整数array.arrays而不是列表.我们称之为“最少侵入性”的优化.如果你使用lstCntRpts和multiplicities的列表或数组,它实际上在性能方面并不重要,因为它们不是瓶颈 – 但它更快一些并节省了一点内存,更重要的是它显示了如何包含cython的同构数组:
%%cython from cpython.list cimport PyList_Size # (most) C API functions can be used with cython! from array import array from collections import Counter cdef class uniqCmboClassIter: cdef list lstUniqs cdef Py_ssize_t lenUniqs cdef int[:] lstCntRpts # memoryview cdef Py_ssize_t lenCmbo cdef list cmboAsIdxUniqs cdef int[:] multiplicities # memoryview cdef Py_ssize_t idxIntoCmbo cdef Py_ssize_t idxIntoUniqs cdef bint stopIteration cdef Py_ssize_t x cdef Py_ssize_t y def __init__(self, lstItems, lenCmbo): dctCounter = Counter(lstItems) self.lstUniqs = sorted(dctCounter) self.lenUniqs = PyList_Size(self.lstUniqs) self.lstCntRpts = array('i', [dctCounter[item] for item in self.lstUniqs]) self.lenCmbo = lenCmbo self.cmboAsIdxUniqs = [None] * lenCmbo self.multiplicities = array('i', [0] * self.lenUniqs) self.idxIntoCmbo, self.idxIntoUniqs = 0, 0 while self.idxIntoCmbo != self.lenCmbo and self.idxIntoUniqs != self.lenUniqs: count = min(self.lstCntRpts[self.idxIntoUniqs], self.lenCmbo-self.idxIntoCmbo) self.cmboAsIdxUniqs[self.idxIntoCmbo : self.idxIntoCmbo + count] = [self.idxIntoUniqs] * count self.multiplicities[self.idxIntoUniqs] = count self.idxIntoCmbo += count self.idxIntoUniqs += 1 # print("self.multiplicities:", self.multiplicities) # print("self.cmboAsIdxUniqs:", self.cmboAsIdxUniqs) if self.idxIntoCmbo != self.lenCmbo: return self.stopIteration = False self.x = 0 self.y = 0 return def __iter__(self): return self def __next__(self): if self.stopIteration is True: raise StopIteration nextCmbo = tuple(self.lstUniqs[idxUniqs] for idxUniqs in self.cmboAsIdxUniqs) for self.idxIntoCmbo in reversed(range(self.lenCmbo)): self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo] self.y = self.x + 1 if self.y < self.lenUniqs and self.multiplicities[self.y] < self.lstCntRpts[self.y]: break else: self.stopIteration = True return nextCmbo for self.idxIntoCmbo in range(self.idxIntoCmbo, self.lenCmbo): self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo] self.cmboAsIdxUniqs[self.idxIntoCmbo] = self.y self.multiplicities[self.x] -= 1 self.multiplicities[self.y] += 1 # print("# multiplicities:", multiplicities) while self.y != self.lenUniqs and self.multiplicities[self.y] == self.lstCntRpts[self.y]: self.y += 1 if self.y == self.lenUniqs: break return nextCmbo
你实际上没有分享你的时间参数,但我尝试了一些我的:
from itertools import combinations import random import time def create_values(maximum): vals = [random.randint(0, maximum) for _ in range(48)] print('length: ', len(vals)) print('sorted values: ', sorted(vals)) print('uniques: ', len(set(vals))) print('uniques in percent: {:%}'.format(len(set(vals)) / len(vals))) return vals class Timer(object): def __init__(self): pass def __enter__(self): self._time = time.time() def __exit__(self, *args, **kwargs): print(time.time() - self._time) vals = create_values(maximum=50) # and 22 and 75 and 120 n = 6 with Timer(): list(combinations(vals, n)) with Timer(): list(uniqCmboClassIter(vals, n)) with Timer(): list(uniqCmboClassIterOriginal(vals, n)) with Timer(): list(uniqCmboYieldIterOriginal(vals, n))
06008
它绝对比原始方法执行得更好,实际上只需要类型声明几倍.可能有更多可以优化(禁用边界检查,使用Python C API函数调用,使用无符号整数或更小的整数,如果你知道你的多重性的“最大”和“最小”,…) – 但事实即使对于80%的独特项目来说,它也不比itertools.combinations慢得多,而且比任何原始实现都要快得多.