当前位置 : 主页 > 网络推广 > seo >

图片检索

来源:互联网 收集:自由互联 发布时间:2021-06-16
Approximate nearest neighbor search is an efficient strategy for large-scale image retrieval. Encouraged by the recent advances in convolutional neural networks (CNNs), we propose an effective deep learning framework to generate binary hash

Approximate nearest neighbor search is an efficient strategy for large-scale image retrieval. Encouraged by the recent advances in convolutional neural networks (CNNs), we propose an effective deep learning framework to generate binary hash codes for fast image retrieval. Our idea is that when the data labels are available, binary codes can be learned by employing a hidden layer for representing the latent concepts that dominate the class labels. The utilization of the CNN also allows for learning image representations. Unlike other supervised methods that require pair-wised inputs for binary code learning, our method learns hash codes and image representations in a point-wised manner, making it suitable for large-scale datasets. Experimental results show that our method outperforms several state-ofthe-art hashing algorithms on the CIFAR-10 and MNIST datasets. We further demonstrate its scalability and efficacy on a large-scale dataset of 1 million clothing images.

近似最邻近搜索是一个对于大规模图片检索的有效占率。由于最近cnn的火热,我们提出一个有效的深度学习框架来生成二进制哈希编码来进行快速的图片索引。我的的观点是当数据标签是可获得,二进制编码可以通过使用隐藏层来表示支配类标签的潜在概念来学习。  CNN的使用也用作图片的表达。不像其他的无监督学习方法需要pari-wised 输入来进行二进制编码的学习,我们用来学习hash编码和图片的表征是用point-wised 方式,并且使得他适合于大规模的数据集。实验结果表明,在CIFAR-10和MNIST数据集上,我们的方法优于几种最先进的哈希算法。我们进一步展示了它的可扩展性和有效性的大规模数据集的100万服装图像。

 

Content-based image retrieval aims at searching for similar images through the analysis of image content; hence image representations and similarity measure become critical to such a task. Along this research track, one of the most challenging issues is associating the pixel-level information to the semantics from human perception [25, 27]. Despite several hand-crafted features have been proposed to represent the images [19, 2, 22], the performance of these visual descriptors is still limited until the recent breakthrough of deep learning. Recent studie have shown that deep CNN significantly improves the performance on various vision tasks, such as object detection, image classification, and segmentation. These accomplishments are attributed to the ability of deep CNN to learn the rich midlevel image representations.

基于内容的图像检索的目标是搜索相似的图像通过分析图像的内容。因此图像的表达和相似度变得很重要。

沿着这条研究轨道,其中最挑战性的问题是像素级信息的关联从人类感知的语义。尽管已经提出了几种手工制作的特征来表示图像、这些视觉描述子的性能仍然是有限的,直到最近的突破性的深层学习。最近的研究已经表明,深度CNN显著地改善了各种视觉任务,例如对象检测、图像分类和分割的性能。这些成绩归功于深度学习cnn的能力去学习丰富的中层图像表示。

As deep CNNs learn rich mid-level image descriptors, Krizhevsky et al. [14] used the feature vectors from the 7th layer in image retrieval and demonstrated outstanding performance on ImageNet. However, because the CNN features are high-dimensional and directly computing the similarity between two 4096-dimensional vectors is inefficient, Babenko et al. [1] proposed to compress the CNN features using PCA and discriminative dimensionality reduction, and obtained a good performance. In CBIR, both image representations and computational cost play an essential role. Due to the recent growth of visual contents, rapid search in a large database becomes an emerging need. Many studies aim at answering the question that how to efficiently retrieve the relevant data from the large-scale database. Due to the high-computational cost, traditional linear search (or exhaustive search) is not appropriate for searching in a large corpus. Instead of linear search, a practical strategy is to use the technique of Approximate Nearest Neighbor (ANN) or hashing based method [6, 29, 18, 20, 15, 30] for speedup. These methods project the high-dimensional features to a lower dimensional space, and then generate the compact binary codes. Benefiting from the produced binary codes, fast image search can be carried out via binary pattern matching or Hamming distance measurement, which dramatically reduces the computational cost and further optimizes the efficiency of the search. Some of these methods belong to the pair-wised method that use similarity matrix (containing the pair-wised similarity of data) to describe the relationship of the image pairs or data pairs, and employ this similarity information to learn hash functions. However, it is demanding to construct the matrix and generate the codes when dealing with a large-scale dataset.

网友评论