期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

q-Gaussian mixture models for image and video semantic indexing

Nakamasa Inoue Koichi Shinoda 《Journal of Visual Communication and Image Representation》2013,24(8):1450-1457

相似文献

2.

An experimental study on the universality of visual vocabularies

Jian Hou Wei-Xue Liu Xu E Qi Xia Nai-Ming Qi 《Journal of Visual Communication and Image Representation》2013,24(7):1204-1211

Bag-of-visual-words has been shown to be a powerful image representation and attained success in many computer vision and pattern recognition applications. Usually for a given classification task, researchers choose to build a specific visual vocabulary, and the problem of building a universal visual vocabulary is rarely addressed. In this paper we conduct extensive classification experiments with three features on four image datasets and show that the visual vocabularies built from different datasets can be exchanged without apparent performance loss. Furthermore, we investigate the correlation between the visual vocabularies built from different datasets and find that they are nearly identical, which explains why they are universal across classification tasks. We believe that this work reveals what is behind the universality of visual vocabularies and narrows the gap between bag-of-visual-words and bag-of-words in text domain. 相似文献

3.

Sequence of the most informative joints (SMIJ): A new representation for human skeletal action recognition

《Journal of Visual Communication and Image Representation》2014,25(1):24-38

Much of the existing work on action recognition combines simple features with complex classifiers or models to represent an action. Parameters of such models usually do not have any physical meaning nor do they provide any qualitative insight relating the action to the actual motion of the body or its parts. In this paper, we propose a new representation of human actions called sequence of the most informative joints (SMIJ), which is extremely easy to interpret. At each time instant, we automatically select a few skeletal joints that are deemed to be the most informative for performing the current action based on highly interpretable measures such as the mean or variance of joint angle trajectories. We then represent the action as a sequence of these most informative joints. Experiments on multiple databases show that the SMIJ representation is discriminative for human action recognition and performs better than several state-of-the-art algorithms. 相似文献

4.

《Journal of Visual Communication and Image Representation》2016

Geometric image re-ranking is a widely adopted phrase to refine the large-scale image retrieval systems built based upon popular paradigms such as Bag-of-Words (BoW) model. Its main idea can be treated as a sort of geometric verification targeting at reordering the initial returning list by previous similarity ranking metrics, e.g. Cosine distance over the BoW vectors between query image and reference ones. In the literature, to guarantee the re-ranking accuracy, most existing schemes requires the initial retrieval to be conducted by using a large vocabulary (codebook), corresponding to a high-dimensional BoW vector. However, in many emerging applications such as mobile visual search and massive-scale retrieval, the retrieval has to be conducted by using a compact BoW vector to accomplish the memory or time requirement. In these scenarios, the traditional re-ranking paradigms are questionable and new algorithms are urgently demanded. In this paper, we propose an accurate yet efficient image re-ranking algorithm specific for small vocabulary in aforementioned scenarios. Our idea is inspired by Hough Voting in the transformation space, where votes come from local feature matches. Most notably, this geometry re-ranking can easily been aggregated to the cutting-edge image based retrieval systems yielding superior performance with a small vocabulary and being able to store in mobile end facilitating mobile visual search systems. We further prove that its time complexity is linear in terms of the re-ranking instance, which is a significant advantage over the existing scheme. In terms of mean Average Precision, we show that its performance is comparable or in some cases better than the state-of-the-art re-ranking schemes. 相似文献

5.

Action recognition via structured codebook construction

《Signal Processing: Image Communication》2014,29(4):546-555

Bag-of-words models have been widely used to obtain the global representation for action recognition. However, these models ignored the structure information, such as the spatial and temporal contextual information for action representation. In this paper, we propose a novel structured codebook construction method to encode spatial and temporal contextual information among local features for video representation. Given a set of training videos, our method first extracts local motion and appearance features. Next, we encode the spatial and temporal contextual information among local features by constructing correlation matrices for local spatio-temporal features. Then, we discover the common patterns of movements to construct the structured codebook. After that, actions can be represented by a set of sparse coefficients with respect to the structured codebook. Finally, a simple linear SVM classifier is applied to predict the action class based on the action representation. Our method has two main advantages compared to traditional methods. First, our method automatically discovers the mid-level common patterns of movements that capture rich spatial and temporal contextual information. Second, our method is robust to unwanted background local features mainly because most unwanted background local features cannot be sparsely represented by the common patterns and they are treated as residual errors that are not encoded into the action representation. We evaluate the proposed method on two popular benchmarks: KTH action dataset and UCF sports dataset. Experimental results demonstrate the advantages of our structured codebook construction. 相似文献

6.

Pin-Zhong Pan Chung-Lin Huang 《电子科技学刊:英文版》2016,14(4):370-376

This paper presents a human action recognition method. It analyzes the spatio-temporal grids along the dense trajectories and generates the histogram of oriented gradients (HOG) and histogram of optical flow (HOF) to describe the appearance and motion of the human object. Then, HOG combined with HOF is converted to bag-of-words (BoWs) by the vocabulary tree. Finally, it applies random forest to recognize the type of human action. In the experiments, KTH database and URADL database are tested for the performance evaluation. Comparing with the other approaches, we show that our approach has a better performance for the action videos with high inter-class and low inter-class variabilities. 相似文献

7.

一种SVM集成的图像分类方法研究*

罗会兰杜连平《电视技术》2012,36(23):39-42

针对单分类器没有充分考虑数据集的特征而不能很好地完成分类识别,提出了一种基于集成学习技术的SVM集成的图像分类方法。该方法是在基于较为流行的词袋(Bag-of-Words,BOW)模型的图像分类方法的基础上,利用训练生成的不同SVM分类器分类测试图像,并将分类结果采用集成学习算法进行集成。分别采用传统的BOW模型的图像分类方法和本文提出的方法进行分类实验,实验结果表明采用SVM集成的图像分类方法明显提高了分类精度,具有一定的稳健性。相似文献