Modified unigram precision
判断一个翻译得好不好,是看翻译的话里面和reference中的句子一样的单词有多少一致,一致得越多,说明翻译得越准。也就是看准确率(即找出的单词总数中找对的单词所占的比例)。但这样会存在这样一种问题。
Candidate: the the the the the the the
Reference 1: The cat is on the mat.
Reference 2: There is a cat on the mat.
上面翻译的话,一看就是鸟话,但是每个单词都在reference中出现过,所以准确率是7/7=100%,但这明显不合理,因此推出modified unigram precision。
其思想是:
Reference中的the最多出现了2次,因此,即使candidate中全是the,但是只能算前两个配对了,后面的the就认为不算。因此modified unigram precision=2/7
测试实例1:
Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party.
Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct.
Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.
Reference 2:It is the guiding principle which guarantees the military forces always being under the command of the Party.
Reference 3:It is the practical guide for the army always to heed the directions of the party.
Candidate 1中:the出现最多的reference是reference 2中的4次,所以candidate 1中的the算次数最多不能超过4次,而candidate 1中的the出现了3次,所以the的总数算3次。进行累加得到:It(1)+is(1)+a(1)+guide(1)+to(1)+action(1)+which(1)+ensures(1)+that(1)+the(1)+military(1)+always(1)+the(1)+commands(1)+of(1)+the(1)+party(1)=17
然后modified unigram precision=17/candidate1中单词总数18=17/18
而Candidate 2中:
It(1)+is(1)+to(1)+the(1)+forever(1)+the(1)+that(1)+party(1)=8
然后modified unigram precision=8/candidate2中单词总数=8/14
而bi-gram等情况是这样匹配
W1w2w3:w1w2算一个,w2w3算一个
上面是针对单个句子,如果想针对篇章等句子的组合计算方法为:
以此时实例1为例:
Modified unigram precision=17+8/(18+14)
使用modified precison有个最大的缺陷在于,如果candidate的句子长度很短,即使是一句鸟话,得到的modified precision依旧很高。如下例所示:
Candidate: of the
Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.
Reference 2:It is the guiding principle which guarantees the military forces always being under the command of the Party.
Reference 3:It is the practical guide for the army always to heed the directions of the party.
其中,modified unigram precision=2/2,而modified bi-gram precison=1/1
为了解决candidate的句子很短造成的这种问题,可以考虑同时引入recall来进行折中。
但是recall对于下面的例子又显得不恰当。
Candidate 1: I always invariably perpetually do.
Candidate 2: I always do.
Reference 1: I always do.
Reference 2: I invariably do.
Reference 3: I perpetually do.
Candidate 1因为单词包含得比candidate2多,所以recall较大,但显然candidate 1不如candidate 2.
BLEU的详细思想:
针对某一句candidate,有很多个reference,选取其中长度最接近的reference,在语料中的这样的长度的求和为r,而candidate的长度的求和为c
BLEU只考虑了precision的情况,为了解决candidate句子短造成的问题,所以引入了惩罚措施,即BLEU.具体推导过程如下:
引入对数的原因在于:使得数据之间不会因为稀疏性造成的差别很大的情况,且单调性不会发生变化。
引入几何平均的意义在于:可以体现出不同性质的参数的折中综合性能,在这里就是每句话的翻译集成在一起时整个篇章的翻译好坏。