我有一个用Matlab编写的代码,它使用’intersect’来查找在两个大矩阵中相交的向量(和它们的索引).我发现’intersect’是我代码中最慢的一行(差别很大).不幸的是,到目前为止我找不到更快
例如,在我的电脑上运行以下代码大约需要5秒钟:
profile on for i = 1 : 500 a = rand(10000,5); b = rand(10000,5); [intersectVectors, ind_a, ind_b] = intersect(a,b,'rows'); end profile viewer
我想知道是否有更快的方法.注意,矩阵(a)和(b)有5列.对于两个矩阵,行数不必相同.
任何帮助都会很棒.
谢谢
您可以使用利用fast matrix multiplication in MATLAB
将这5列输入数组转换为一列的方法,将每列视为单个数字的重要“数字”.因此,您最终会得到一个只有列的数组,然后,您可以使用不带’行’的intersect或ismember,这必须大大加快代码的速度!
以下是作为功能代码的承诺实现,以方便使用 –
intersectrows_fast_v1.m:
function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v1(a,b) %// Calculate equivalent one-column versions of input arrays mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//' acol1 = a*mult; bcol1 = b*mult; %// Use intersect without 'rows' option for a good speedup [~, ind_a, ind_b] = intersect(acol1,bcol1); intersectVectors = a(ind_a,:); return;
intersectrows_fast_v2.m:
function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v2(a,b) %// Calculate equivalent one-column versions of input arrays mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//' acol1 = a*mult; bcol1 = b*mult; %// Use ismember to get indices of the common elements [match_a,idx_b] = ismember(acol1,bcol1); %// Now, with ismember, duplicate items are not taken care of automatically as %// are done with intersect. So, we need to find the duplicate items and %// remove those from the outputs of ismember [~,a_sorted_ind] = sort(acol1); a_rm_ind =a_sorted_ind([false;diff(sort(acol1))==0]); %//indices to be removed match_a(a_rm_ind)=0; intersectVectors = a(match_a,:); ind_a = find(match_a); ind_b = idx_b(match_a); return;
快速测试和结论
使用问题中列出的数据,运行时间是 –
-------------------------- With original approach Elapsed time is 3.885792 seconds. -------------------------- With Proposed approach - Version - I Elapsed time is 0.581123 seconds. -------------------------- With Proposed approach - Version - II Elapsed time is 0.963409 seconds.
结果似乎表明了支持版本的一个很大的优势 – 我提出的两种方法,比原始方法的速度提高了大约6.7倍!
此外,请注意,如果您不需要原始三个输出中的任何一个或两个与基于’行’的方法相交,那么可以进一步缩短所提出的方法以获得更好的运行时性能!