当前位置 : 主页 > 编程语言 > python >

Java-Python的完全对齐的tokenizer(字级别)

来源:互联网 收集:自由互联 发布时间:2022-07-20
python侧: def tokenize_to_str_list ( textString ): split_tokens = [] for i in range ( len ( textString )): split_tokens . append ( textString [ i ]) return split_tokens def convert_to_int_list ( split_tokens ): output = [] for token in s


python侧:

def tokenize_to_str_list(textString):
split_tokens = []
for i in range(len(textString)):
split_tokens.append(textString[i])
return split_tokens

def convert_to_int_list(split_tokens):
output = []
for token in split_tokens:
if token in char2id:
output.append(char2id[item])
return

java侧:

public String[] tokenize_to_str_list(final String textString) {
int textLength = textString.length();
String[] split_tokens = new String[textLength];
for(int i=0; i < textLength; i++){
split_tokens[i]= String.valueOf(textString.charAt(i));
}
return split_tokens;
}

public int[] convert_to_int_list(final String[] split_tokens) {
int seqLen = split_tokens.length;
int[] output = new int[seqLen];
int index = 0
for(int i = 0; i < seqLen; i++){
if(char2id.containsKey(split_tokens[i])){
output[index] = char2id.get(split_tokens[i]);
index = index + 1;
}
}
return output;
}


上一篇:python sqlite insert 报错 no such column
下一篇:没有了
网友评论