在我正在处理的问题中有这样一部分代码,如下所示.定义部分只是为了向您展示数组的大小.下面我贴了矢量化版本 – 它慢了2倍.为什么会这样?我知道如果矢量化要求大的临时变量,我
通常,我可以做些什么(除了parfor,我已经使用过)来加速这段代码?
maxN = 100; levels = maxN+1; xElements = 101; umn = complex(zeros(levels, levels)); umn2 = umn; bessels = ones(xElements, xElements, levels); % 1.09 GB posMcontainer = ones(xElements, xElements, maxN); tic for j = 1 : xElements for i = 1 : xElements for n = 1 : 2 : maxN nn = n + 1; mm = 1; for m = 1 : 2 : n umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m); mm = mm + 1; end end end end toc % 0.520594 seconds tic for j = 1 : xElements for i = 1 : xElements for n = 1 : 2 : maxN nn = n + 1; m = 1:2:n; numOfEl = ceil(n/2); umn2(nn, 1:numOfEl) = bessels(i, j, nn) * posMcontainer(i, j, m); end end end toc % 1.275926 seconds sum(sum(umn-umn2)) % veryfying, if all done right
最好的祝福,
亚历克斯
从探查器:
编辑:
在回复@Jason answer时,此替代方案需要相同的时间:
for n = 1:2:maxN nn(n) = n + 1; numOfEl(n) = ceil(n/2); end for j = 1 : xElements for i = 1 : xElements for n = 1 : 2 : maxN umn2(nn(n), 1:numOfEl(n)) = bessels(i, j, nn(n)) * posMcontainer(i, j, 1:2:n); end end end
EDIT2:
回复@EBH:
关键是要做到以下几点:
parfor i = 1 : xElements for j = 1 : xElements umn = complex(zeros(levels, levels)); % cleaning for n = 0:maxN mm = 1; for m = -n:2:n nn = n + 1; % for indexing if m < 0 umn(nn, mm) = bessels(i, j, nn) * negMcontainer(i, j, abs(m)); end if m > 0 umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m); end if m == 0 umn(nn, mm) = bessels(i, j, nn); end mm = mm + 1; % for indexing end % m end % n beta1 = sum(sum(Aj1.*umn)); betaSumSq1(i, j) = abs(beta1).^2; beta2 = sum(sum(Aj2.*umn)); betaSumSq2(i, j) = abs(beta2).^2; end % j end % i
我尽可能地加快了速度.你所写的只是最后的bessels和posMcontainer值,所以它不会产生相同的结果.在实际代码中,这两个容器不是用1填充,而是用一些预先计算的值填充.
编辑之后,我可以看到umn只是另一个计算的临时变量.它仍然可以主要是可矢量化的:betaSumSq1 = zeros(xElements); % preallocating betaSumSq2 = zeros(xElements); % preallocating % an index matrix to fetch the right values from negMcontainer and % posMcontainer: indmat = tril(repmat([0 1;1 0],ceil((maxN+1)/2),floor(levels/2))); indmat(end,:) = []; % an index matrix to fetch the values in correct order for umn: b_ind = repmat([1;0],ceil((maxN+1)/2),1); b_ind(end) = []; tempind = logical([fliplr(indmat) b_ind indmat+triu(ones(size(indmat)))]); % permute the arrays to prevent squeeze: PM = permute(posMcontainer,[3 1 2]); NM = permute(negMcontainer,[3 1 2]); B = permute(bessels,[3 1 2]); for k = 1 : maxN+1 % third dim for jj = 1 : xElements % columns b = B(:,jj,k); % get one vector of B % perform b*NM for every row of NM*indmat, than flip the result: neg = fliplr(bsxfun(@times,bsxfun(@times,indmat,NM(:,jj,k).'),b)); % perform b*PM for every row of PM*indmat: pos = bsxfun(@times,bsxfun(@times,indmat,PM(:,jj,k).'),b); temp = [neg mod(1:levels,2).'.*b pos].'; % concat neg and pos % assign them to the right place in umn: umn = reshape(temp(tempind.'),[levels levels]).'; beta1 = Aj1.*umn; betaSumSq1(jj,k) = abs(sum(beta1(:))).^2; beta2 = Aj2.*umn; betaSumSq2(jj,k) = abs(sum(beta2(:))).^2; end end
这样可以将运行时间从约95秒减少到不到3秒(两者都没有标准时间),因此几乎可以提高97%.