Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Deep Learning through Sparse and Low-Rank Modeling
Deep Learning through Sparse and Low-Rank Modeling
Deep Learning through Sparse and Low-Rank Modeling
Ebook548 pages4 hours

Deep Learning through Sparse and Low-Rank Modeling

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Deep Learning through Sparse Representation and Low-Rank Modeling bridges classical sparse and low rank models—those that emphasize problem-specific Interpretability—with recent deep network models that have enabled a larger learning capacity and better utilization of Big Data. It shows how the toolkit of deep learning is closely tied with the sparse/low rank methods and algorithms, providing a rich variety of theoretical and analytic tools to guide the design and interpretation of deep learning models. The development of the theory and models is supported by a wide variety of applications in computer vision, machine learning, signal processing, and data mining.

This book will be highly useful for researchers, graduate students and practitioners working in the fields of computer vision, machine learning, signal processing, optimization and statistics.

  • Combines classical sparse and low-rank models and algorithms with the latest advances in deep learning networks
  • Shows how the structure and algorithms of sparse and low-rank methods improves the performance and interpretability of Deep Learning models
  • Provides tactics on how to build and apply customized deep learning models for various applications
LanguageEnglish
Release dateApr 11, 2019
ISBN9780128136607
Deep Learning through Sparse and Low-Rank Modeling
Author

Zhangyang Wang

Dr. Zhangyang (Atlas) Wang is an Assistant Professor of Computer Science and Engineering (CSE), at the Texas A&M University (TAMU), since August 2017. During 2012-2016, he was a Ph.D. student in the Electrical and Computer Engineering (ECE) Department, at the University of Illinois at Urbana-Champaign (UIUC). He was a former research intern with Microsoft Research (2015), Adobe Research (2014), and US Army Research Lab (2013). Dr. Wang has published over 70 papers in top-tier venues, in the broad fields of machine learning, computer vision, artificial intelligence, and interdisciplinary data science. He has published 2 books and 1 chapter, has been granted 3 patents, and has received over 20 research awards and scholarships. Dr. Wang regularly serves as tutorial speakers, guest editors, area chairs, session chairs, TPC members, and workshop organizers at leading conferences and journals.

Related to Deep Learning through Sparse and Low-Rank Modeling

Related ebooks

Computers For You

View More

Related articles

Reviews for Deep Learning through Sparse and Low-Rank Modeling

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Deep Learning through Sparse and Low-Rank Modeling - Zhangyang Wang

    Authors

    Chapter 1

    Introduction

    Zhangyang Wang⁎; Ding Liu†    ⁎Department of Computer Science and Engineering, Texas A&M University, College Station, TX, United States

    †Beckman Institute for Advanced Science and Technology, Urbana, IL, United States

    Abstract

    Deep learning has achieved prevailing success in a wide domain of machine learning and computer vision fields. On the other hand, sparsity and low-rankness have been popular regularizations in classical machine learning. This section is intended as a brief introduction to the basics if deep learning, and then focuses on its inherent connections to the concepts of sparsity and low-rankness.

    Keywords

    Sparsity; Low rank; Deep learning

    Chapter Outline

    1.1  Basics of Deep Learning

    1.2  Basics of Sparsity and Low-Rankness

    1.3  Connecting Deep Learning to Sparsity and Low-Rankness

    1.4  Organization

    References

    1.1 Basics of Deep Learning

    Machine learning makes computers learn from data without explicitly programming them. However, classical machine learning algorithms often find it challenging to extract semantic features directly from raw data, e.g., due to the well-known semantic gap [1], which calls for the assistance from domain experts to hand-craft many well-engineered feature representations, on which the machine learning models operate more effectively. In contrast, the recently popular deep learning relies on multilayer neural networks to derive semantically meaningful representations, by building multiple simple features to represent a sophisticated concept. Deep learning requires less hand-engineered features and expert knowledge. Taking image classification as an example [2], a deep learning-based image classification system represents an object by gradually extracting edges, textures, and structures, from lower to middle-level hidden layers, which becomes more and more associated with the target semantic concept as the model grows deeper. Driven by the emergence of big data and hardware acceleration, the intricacy of data can be extracted with higher and more abstract level representation from raw inputs, gaining more power for deep learning to solve complicated, even traditionally intractable problems. Deep learning has achieved tremendous success in visual object recognition [2–5], face recognition and verification [6,7], object detection [8–11], image restoration and enhancement [12–17], clustering [18], emotion recognition [19], aesthetics and style recognition [20–23], scene understanding [24,25], speech recognition [26], machine translation [27], image synthesis [28], and even playing Go [29] and poker [30].

    A basic neural network is composed of a set of perceptrons (artificial neurons), each of which maps inputs to output values with a simple activation function. Among recent deep neural network architectures, convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are the two main streams, differing in their connectivity patterns. CNNs deploy convolution operations on hidden layers for weight sharing and parameter reduction. CNNs can extract local information from grid-like input data, and have mainly shown successes in computer vision and image processing, with many popular instances such as LeNet [31], AlexNet [2], VGG [32], GoogLeNet [33], and ResNet [34]. RNNs are dedicated to processing sequential input data with variable length. RNNs produce an output at each time step. The hidden neuron at each time step is calculated based on input data and hidden neurons at the previous time step. To avoid vanishing/exploding gradients of RNNs in long term dependency, long short-term memory (LSTM) [35] and gated recurrent unit (GRU) [36] with controllable gates are widely used in practical applications. Interested readers are referred to a comprehensive deep learning textbook [37].

    1.2 Basics of Sparsity and Low-Rankness

    In signal processing, the classical way to represent a multidimensional signal is to express it as a linear combination of the components in a (chosen in advance and also learned) basis. The goal of linearly transforming a signal with respect to a basis is to have a more predictable pattern in the resultant linear coefficients. With an appropriate basis, such coefficients often exhibit some desired characteristics for signals. One important observation is that, for most natural signals such as image and audio, most of the coefficients are zero or close to zero if the basis is properly selected: the technique is usually termed as sparse coding, and the basis is called the dictionary -norm. Beyond the element-wise sparsity model, more elaborate structured sparse models have also been developed [40,41]. The learning of basis (called dictionary) further boosts the power of sparse coding [42–44].

    More generally, the sparsity belongs to the well-received principle of parsimony, i.e., preferring a simple representation to a more complex one. The sparsity level (number of nonzero elements) is a natural measure of representation complexity of vector-valued features. In the case of matrix-valued features, the matrix rank provides another notion of parsimony, assuming high-dimensional data lies close to a low-dimensional subspace or manifold. Similarly to sparse optimization, a series of works have shown that rank minimization can be achieved through convex optimization [45] or efficient heuristics [46], paving the path to high-dimensional data analysis such as video processing [47–52].

    1.3 Connecting Deep Learning to Sparsity and Low-Rankness

    Beyond their proven success in conventional machine learning algorithms, the sparse and low-rank structures are widely found to be effective for regularizing deep learning, for improving model generalization, training behaviors, data efficiency ) decay term limits the weights of the neurons. Another popular tool to avoid overfitting, dropout [2], is a simple regularization approach that improves the generalization of deep networks, by randomly putting hidden neurons to zero in the training stage, which could be viewed as a stochastic form of enforcing sparsity. Besides, the inherent sparse properties of both deep network weights and activations have also been widely observed and utilized for compressing deep models [55] and improving their energy efficiency [56,57]. As for low-rankness, much research has also been devoted to learning low-rank convolutional filters [58] and network compression [59].

    Our focus of this book is to explore a deeper structural connection between sparse/low-rank models and deep models. While many examples will be detailed in the remainder of the book, we here briefly state the main idea. We start from the following regularized regression form, which represents a large family of feature learning models, such as ridge regression, sparse coding, and low-rank representation

    (1.1)

    further incorporates the problem-specific prior knowledge. Not surprisingly, many instances of Eq. (1.1) could be solved by a similar class of iterative algorithms

    (1.2)

    denotes the intermediate output of the kis a simple nonlinear operator. Equation (1.2) could be expressed by a recursive system, whose fixed point is expected to be the solution a of Eq. (1.1). Furthermore, the recursive system could be unfolded and truncated to k iterations, to construct a (k+1)-layer feed-forward network. Without any further tuning, the resulting architecture will output a k-iteration approximation of the exact solution aby default. Then, the concrete function forms are given as (u is its ith element)

    (1.3)

    is an element-wise soft shrinkage function. The unfolded and truncated version of Eq. (1.3) was first proposed in [60], called the learned iterative shrinkage and thresholding algorithm (LISTA). Recent works [61,18,62–64] followed LISTA and developed various models, and many jointly optimized the unfolded model with discriminative tasks [65].

    , Eq. (1.2) could be adapted to solve the nonnegative sparse coding problem

    (1.4)

    A by-product of applying nonnegativity is that the original sparsity coefficient λ as in Eq. , and have

    (1.5)

    is assumed, it could be absorbed into the bias term −λ. Equation (1.5) is exactly a fully-connected layer followed by ReLU neurons, one of the most standard building blocks in existing deep models. Convolutional layers could be derived similarly by looking at a convolutional sparse coding model [66] rather than a linear one. Such a hidden structural resemblance reveals the potential to bridge many sparse and low-rank models with current successful deep models, potentially enhancing the generalization, compactness and interpretability of the latter.

    1.4 Organization

    In the remainder of this book, Chapter 2 will first introduce the bi-level sparse coding model, using the example of hyperspectral image classification. Chapters 3, 4 and 5 will then present three concrete examples (classification, superresolution, and clustering), to show how (bi-level) sparse coding models could be naturally converted to and trained as deep networks. From Chapter 6 to Chapter 9, we will delve into the extensive applications of deep learning aided by sparsity and low-rankness, in signal processing, dimensionality reduction, action recognition, style recognition and kinship understanding, respectively.

    References

    [1] R. Zhao, W.I. Grosky, Narrowing the semantic gap-improved text-based web document retrieval using visual features, IEEE Transactions on Multimedia 2002;4(2):189–200.

    [2] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, NIPS. 2012.

    [3] Z. Wang, S. Chang, Y. Yang, D. Liu, T.S. Huang, Studying very low resolution recognition using deep networks, Proceedings of the IEEE conference on computer vision and pattern recognition. 2016:4792–4800.

    [4] D. Liu, B. Cheng, Z. Wang, H. Zhang, T.S. Huang, Enhance visual recognition under adverse conditions via deep networks, arXiv preprint arXiv:1712.07732; 2017.

    [5] Z. Wu, Z. Wang, Z. Wang, H. Jin, Towards privacy-preserving visual recognition via adversarial training: a pilot study, arXiv preprint arXiv:1807.08379; 2018.

    [6] N. Bodla, J. Zheng, H. Xu, J. Chen, C.D. Castillo, R. Chellappa, Deep heterogeneous feature fusion for template-based face recognition, 2017 IEEE winter conference on applications of computer vision, WACV 2017. Santa Rosa, CA, USA, March 24–31, 2017. 2017:586–595.

    [7] R. Ranjan, A. Bansal, H. Xu, S. Sankaranarayanan, J. Chen, C.D. Castillo, et al., Crystal loss and quality pooling for unconstrained face verification and recognition, CoRR 2018. arXiv:1804.01159 [abs].

    [8] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, Advances in neural information processing systems. 2015:91–99.

    [9] J. Yu, Y. Jiang, Z. Wang, Z. Cao, T. Huang, Unitbox: an advanced object detection network, Proceedings of the 2016 ACM on multimedia conference. ACM; 2016:516–520.

    [10] J. Gao, Q. Wang, Y. Yuan, Embedding structured contour and location prior in siamesed fully convolutional networks for road detection, Robotics and automation (ICRA), 2017 IEEE international conference on. IEEE; 2017:219–224.

    [11] H. Xu, X. Lv, X. Wang, Z. Ren, N. Bodla, R. Chellappa, Deep regionlets for object detection, The European conference on computer vision (ECCV). 2018.

    [12] R. Timofte, E. Agustsson, L. Van Gool, M.H. Yang, L. Zhang, B. Lim, et al., NTIRE 2017 challenge on single image super-resolution: methods and results, Computer vision and pattern recognition workshops (CVPRW), 2017 IEEE conference on. IEEE; 2017:1110–1121.

    [13] B. Li, X. Peng, Z. Wang, J. Xu, D. Feng, AOD-Net: all-in-one dehazing network, Proceedings of the IEEE international conference on computer vision. 2017:4770–4778.

    [14] B. Li, X. Peng, Z. Wang, J. Xu, D. Feng, An all-in-one network for dehazing and beyond, arXiv preprint arXiv:1707.06543; 2017.

    [15] B. Li, X. Peng, Z. Wang, J. Xu, D. Feng, End-to-end united video dehazing and detection, arXiv preprint arXiv:1709.03919; 2017.

    [16] D. Liu, B. Wen, J. Jiao, X. Liu, Z. Wang, T.S. Huang, Connecting image denoising and high-level vision tasks via deep learning, arXiv preprint arXiv:1809.01826; 2018.

    [17] R. Prabhu, X. Yu, Z. Wang, D. Liu, A. Jiang, U-finger: multi-scale dilated convolutional network for fingerprint image denoising and inpainting, arXiv preprint arXiv:1807.10993; 2018.

    [18] Z. Wang, S. Chang, J. Zhou, M. Wang, T.S. Huang, Learning a task-specific deep architecture for clustering, SDM 2016.

    [19] B. Cheng, Z. Wang, Z. Zhang, Z. Li, D. Liu, J. Yang, et al., Robust emotion recognition from low quality and low bit rate video: a deep learning approach, arXiv preprint arXiv:1709.03126; 2017.

    [20] Z. Wang, J. Yang, H. Jin, E. Shechtman, A. Agarwala, J. Brandt, et al., DeepFont: identify your font from an image, Proceedings of the 23rd ACM international conference on multimedia. ACM; 2015:451–459.

    [21] Z. Wang, J. Yang, H. Jin, E. Shechtman, A. Agarwala, J. Brandt, et al., Real-world font recognition using deep network and domain adaptation, arXiv preprint arXiv:1504.00028; 2015.

    [22] Z. Wang, S. Chang, F. Dolcos, D. Beck, D. Liu, T.S. Huang, Brain-inspired deep networks for image aesthetics assessment, arXiv preprint arXiv:1601.04155; 2016.

    [23] T.S. Huang, J. Brandt, A. Agarwala, E. Shechtman, Z. Wang, H. Jin, et al., Deep learning for font recognition and retrieval, Applied cloud deep semantic recognition. Auerbach Publications; 2018:109–130.

    [24] C. Farabet, C. Couprie, L. Najman, Y. LeCun, Learning hierarchical features for scene labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence 2013;35(8):1915–1929.

    [25] Q. Wang, J. Gao, Y. Yuan, A joint convolutional neural networks and context transfer for street scenes labeling, IEEE Transactions on Intelligent Transportation Systems 2017.

    [26] G. Saon, H.K.J. Kuo, S. Rennie, M. Picheny, The IBM 2015 English conversational telephone speech recognition system, arXiv preprint arXiv:1505.05899; 2015.

    [27] I. Sutskever, O. Vinyals, Q.V. Le, Sequence to sequence learning with neural networks, Advances in neural information processing systems. 2014:3104–3112.

    [28] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial nets, Advances in neural information processing systems. 2014:2672–2680.

    [29] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, et al., Mastering the game of go with deep neural networks and tree search, Nature 2016;529(7587):484–489.

    [30] M. Moravčík, M. Schmid, N. Burch, V. Lisỳ, D. Morrill, N. Bard, et al., DeepStack: expert-level artificial intelligence in no-limit poker, arXiv preprint arXiv:1701.01724; 2017.

    [31] Y. LeCun, et al., LeNet-5, convolutional neural networks, URL: http://yann.lecun.com/exdb/lenet; 2015.

    [32] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556; 2014.

    [33] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., Going deeper with convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition. 2015:1–9.

    [34] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition. 2016:770–778.

    [35] F.A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: continual prediction with LSTM, Neural Computation 2000;12(10):2451–2471.

    [36] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv preprint arXiv:1412.3555; 2014.

    [37] I. Goodfellow, Y. Bengio, A. Courville, Deep learning. MIT Press; 2016.

    [38] Z. Wang, J. Yang, H. Zhang, Z. Wang, Y. Yang, D. Liu, et al., Sparse coding and its applications in computer vision. World Scientific; 2015.

    [39] R.G. Baraniuk, Compressive sensing [lecture notes], IEEE Signal Processing Magazine 2007;24(4):118–121.

    [40] J. Huang, T. Zhang, D. Metaxas, Learning with structured sparsity, Journal of Machine Learning Research Nov. 2011;12:3371–3412.

    [41] H. Xu, J. Zheng, A. Alavi, R. Chellappa, Template regularized sparse coding for face verification, 23rd International conference on pattern recognition, ICPR 2016. Cancún, Mexico, December 4–8, 2016. 2016:1448–1454.

    [42] H. Xu, J. Zheng, A. Alavi, R. Chellappa, Cross-domain visual recognition via domain adaptive dictionary learning, CoRR 2018. arXiv:1804.04687 [abs].

    [43] H. Xu, J. Zheng, R. Chellappa, Bridging the domain shift by domain adaptive dictionary learning, Proceedings of the British machine vision conference 2015, BMVC 2015. Swansea, UK, September 7–10, 2015. 2015 p. 96.1–96.12.

    [44] H. Xu, J. Zheng, A. Alavi, R. Chellappa, Learning a structured dictionary for video-based face recognition, 2016 IEEE winter conference on applications of computer vision, WACV 2016. Lake Placid, NY, USA, March 7–10, 2016. 2016:1–9.

    [45] E.J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? Journal of the ACM (JACM) 2011;58(3):11.

    [46] Z. Wen, W. Yin, Y. Zhang, Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm, Mathematical Programming Computation 2012:1–29.

    [47] Z. Wang, H. Li, Q. Ling, W. Li, Robust temporal-spatial decomposition and its applications in video processing, IEEE Transactions on Circuits and Systems for Video Technology

    Enjoying the preview?
    Page 1 of 1