Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Meta Learning With Medical Imaging and Health Informatics Applications
Meta Learning With Medical Imaging and Health Informatics Applications
Meta Learning With Medical Imaging and Health Informatics Applications
Ebook910 pages8 hours

Meta Learning With Medical Imaging and Health Informatics Applications

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Meta-Learning, or learning to learn, has become increasingly popular in recent years. Instead of building AI systems from scratch for each machine learning task, Meta-Learning constructs computational mechanisms to systematically and efficiently adapt to new tasks. The meta-learning paradigm has great potential to address deep neural networks’ fundamental challenges such as intensive data requirement, computationally expensive training, and limited capacity for transfer among tasks.

This book provides a concise summary of Meta-Learning theories and their diverse applications in medical imaging and health informatics. It covers the unifying theory of meta-learning and its popular variants such as model-agnostic learning, memory augmentation, prototypical networks, and learning to optimize. The book brings together thought leaders from both machine learning and health informatics fields to discuss the current state of Meta-Learning, its relevance to medical imaging and health informatics, and future directions.
  • First book on applying Meta Learning to medical imaging
  • Pioneers in the field as contributing authors to explain the theory and its development
  • Has GitHub repository consisting of various code examples and documentation to help the audience to set up Meta-Learning algorithms for their applications quickly
LanguageEnglish
Release dateSep 24, 2022
ISBN9780323998529
Meta Learning With Medical Imaging and Health Informatics Applications

Related to Meta Learning With Medical Imaging and Health Informatics Applications

Related ebooks

Computers For You

View More

Related articles

Reviews for Meta Learning With Medical Imaging and Health Informatics Applications

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Meta Learning With Medical Imaging and Health Informatics Applications - Hien Van Nguyen

    Part 1: Introduction to meta learning

    Outline

    Chapter 1. Learning to learn in medical applications

    Chapter 2. Introduction to meta learning

    Chapter 3. Metric learning algorithms for meta learning

    Chapter 4. Meta learning by optimization

    Chapter 5. Model-based meta learning

    Chapter 6. Meta learning for domain generalization

    Chapter 1: Learning to learn in medical applications

    A journey through optimization

    Azade Farshada; Yousef Yeganeha; Nassir Navaba,b    aTechnical University of Munich, Munich, Germany

    bJohns Hopkins University, Baltimore, MD, United States

    Abstract

    Meta learning or learning to learn has been an attractive topic of research in the past years. Different methods in this area have been proposed to solve existing problems in the machine learning world. One of the common problems in machine learning that has received much attention in recent years is few-shot learning. Meta learning has been the natural solution to many few-shot learning problems. In this chapter, we introduce some background in meta learning. Then, we provide some examples of its applications in different areas, especially in medical imaging.

    Keywords

    Meta learning; Few-shot learning; Medical imaging

    1.1 Introduction

    Meta learning has many different definitions, but it is generally known as learning to learn. This term was first introduced in 1987 by Jürgen Schmidhuber in [1]. Works in the field of meta learning focus on different problems such as few-shot learning, neural architecture search; however, most of these works have something in common: the use of prior data to improve the learning capability of the model. This prior knowledge [2] can be obtained from: 1. the similarity or distance of the data points 2. the learning algorithm, e.g., for fast adaptation of the parameters or learning an optimal update rule 3. the data, using automated augmentations or learning from the data statistics.

    Deep learning advances initiated with supervised learning using a labeled set of data, which made it a hot topic in many fields such as medical imaging. Most of the early proposed works are, nevertheless, dependent on large amounts of labeled data as the depth of the model increases. Even though it is a difficult task to gather and annotate data in most fields, it is a much more challenging task in the medical field as it needs the expert knowledge of the physicians.

    Meta learning is generally used for the few-shot learning problem similar to transfer learning by pretraining (metatraining) a model on a set of tasks and fine-tuning on the limited labeled data. We present a taxonomy of the works in meta learning, discussed in this chapter in Fig. 1.1.

    Figure 1.1 A Taxonomy of Meta learning and its Applications.

    This chapter first goes through the background and notations in meta learning and the well-known works on this topic. Then, we delve into the task construction problem and representation learning for meta learning. We go through a few works on unsupervised learning for meta learning. After that, we explore some applications of meta learning in medical imaging and other related domains such as few-shot segmentation and few-shot image generation. Later, we discuss the relation of meta learning to federated learning and review several ideas that could also benefit meta learning. Finally, we conclude the chapter and discuss the outlook of this field.

    1.2 Problem statement

    In few-shot classification, for an N-way-K-shot problem, each task τ of the total T tasks consists of support set with N classes of K samples each. The support set is used for learning how to learn the task. The query set includes more examples of the same classes for evaluation of the task.

    The model is meta-trained for a total of E steps and then fine-tuned given the few-shot data. At each meta-training step i, a task is randomly sampled from the set of all training tasks. The loss is usually defined by the performance of the classification on trained on . Training in this manner helps the model generalize well to newly seen tasks in each training episode. Therefore, the final metatrained model is able to adapt fast to previously unobserved data. A general meta learning pipeline is demonstrated in Fig. 1.2.

    Figure 1.2 A general meta learning pipeline.Given a dataset , the data is first split into training and testing subsets, and respectively. The meta learning tasks are constructed by sampling from the train and test subsets. Each sampled task τ i consists of a support and a query set. The model is optimized on the support set and its performance on the query set is used to optimize the metatrainer. Finally, the model is evaluated on by training on each and testing on its corresponding query set .

    1.3 Background

    Earlier meta learning works [1,44,45,17], aim at learning update functions or learning rules. Recent works in meta learning are more diverse in the usage of prior knowledge and are generally grouped into the following categories [46]:

    1.  Metric learning, which takes the similarity or dissimilarity of classes into account

    2.  Optimization-based, which optimizes the training task using a learning algorithm

    3.  Model-based, which is based on the architecture design of the model

    In this section, we go through these categories and the well-known works on each topic.

    1.3.1 Metric learning

    Metric learning is based on a metric that measures similarities or dissimilarities of the data. In this area, contrastive learning focuses on learning from the data similarity between different classes. The goal is to learn separated and disentangled embeddings for different classes from training tasks. This enables us to separate unseen classes with few amounts of data.

    The first work in this category is the Siamese Network along with contrastive loss, which is first proposed in [47]. Pairs of randomly selected images are passed to two identical networks sharing the same parameters θ, and based on the assumption that these are from two different classes, we try to maximize their distance in the embedding space. A one-shot learning approach using the Siamese architecture has been proposed in [48]. The contrastive loss is shown in Eq. (1.1):

    (1.1)

    (1.2)

    Where Eq. (1.2) defines the energy function which computes the L1 distance between the embeddings of the data pairs using the embedding function . y denotes a binary flag of 0 for a negative pair and 1 for a positive pair. The network parameters are denoted by θ, and m is a constant margin.

    The triplet loss [49] in Eq. (1.3) brings in an anchor data point. The positive pair embeddings similarities are maximized, while the negative pair embeddings are diverged based on their distance to the anchor point .

    (1.3)

    Where , , d, and are the positive data points, the negative data points, the distance metric, and the embedding function, respectively.

    Triplet networks [13] on the other hand, are similar to Siamese Network. Instead of a pair of networks sharing the same weights, triplet networks have three instances of the same network with a similar objective to the triplet loss. Fig. 1.3 shows a simple visualization of the triplet networks.

    Figure 1.3 In metric learning approaches, the goal is to minimize the distance in the latent space for the images belonging to the same class (positive pair) and to maximize the distance between a negative pair (from different class categories).

    Matching Networks [14] rely on an attention mechanism applied to the learned embeddings of the input from the support set for predicting the classes for the points in the query set. Then, the sum of labels from the support set weighted by an attention kernel is computed to predict the class. The attention kernel for two data points is defined in its simplest form as the cosine similarity of their corresponding embeddings.

    (1.4)

    Eq. (1.4) shows the attention kernel, where is a test example, and are embedding functions that could potentially be equal, i.e. . The embedding functions can be a simple neural network or an LSTM (Long Short Term Memory [50]) for complex scenarios. The final objective is:

    (1.5)

    Where is the attention kernel from Eq. (1.4), and is the test data sample.

    Prototypical Networks [16] are proposed for the problem of few-shot classification. An embedding function is used to encode each input to its corresponding features. For each class k in the training set, a prototype feature vector is defined by averaging the embeddings of the data samples in that class from the support set, i.e., :

    (1.6)

    The classification is done by calculating the Euclidean distances to the prototype vector of each class:

    (1.7)

    Where k is the ground truth class and is the distance function. This method has been shown to perform well in zero-shot learning as well.

    Similar to [48], Relation Network [15] is based on an embedding module and a similarity measure. The Relation Net consists of two modules: 1. The embedding network, generating feature representation from the input 2. A relation module, computing a relation score based on the similarity of its inputs. The support and query data points are passed to the embedding module to acquire their corresponding feature embeddings. Finally, for the few-shot classification task, the embeddings from each support data sample are concatenated with the embeddings from the query sample and passed to the relation module to classify whether the support and query samples are from the same class or not, by producing a similarity score between 0 and 1.

    (1.8)

    The relation score is defined in Eq. (1.8) as the relation score between the data points and , and is the relation module. The objective function is the mean square error (MSE):

    (1.9)

    1.3.2 Optimization-based learning

    The optimization-based learning methods focus on the fast adaptation of models to newly seen data by modifying the optimization steps towards gradients that would help the fast adaptation and better generalization, given a few examples of data.

    One of the well-known early works in meta learning is Optimization as a Model for Few-shot Learning [17]. The learner M parameterized by θ, which is the primary neural network classifier, is trained in the few-shot setting by an LSTM-based metalearner network R to optimize the learner. The metalearner learns the update rule for training the learner. The parameters of the learner are assigned to the cell state of the LSTM. This method is formulated as a bi-level optimization approach, where the first level focuses on quick learning for each separate task and the second level on slower learning across all tasks.

    (1.10)

    where is the average negative log-probability assigned by M to the correct class and is the task-specific loss which is optimized by the metalearner.

    Another work using reinforcement learning for better convergence is Learning to Optimize [18]. In this work, the algorithms that converge quickly are rewarded, and the ones that do not converge are penalized; therefore, the optimization problem is formulated as finding the optimal policy, which is solved using reinforcement learning.

    Another approach in meta learning is automating the update rule design for optimization methods. [19] tackles this problem by using a recurrent neural network (RNN) as a controller for generating an update function for the optimizer. The controller's objective is to train the primary model optimally. The controller is optimized using reinforcement learning for maximizing the accuracy of the primary model. In this work, two new update rules, namely PowerSign and AddSign, and a new learning rate annealing scheme, linear cosine decay, are presented, which are discovered by the proposed method.

    MAML [20] or model-agnostic meta learning is the first optimization-based work in the field of meta learning for few-shot learning in recent years. Unlike the previously mentioned works, MAML focuses on optimizing the model weights for fast adaptation. MAML does the optimization in two steps, an inner loop and an outer loop which makes it a second-order optimization problem. In order to overcome the computation complexities of second-order optimization, Reptile [21] and First-order MAML (FOMAML) [20] are proposed as first order approximation of MAML which are simpler to train. A visual comparison of MAML and Reptile is presented in Fig. 1.4.

    Figure 1.4 MAML [20] and Reptile [21] are examples of optimization-based meta learning. While MAML uses second-order derivation for model optimization, Reptile simplifies this by averaging the model parameters over the different tasks.

    1.3.3 Model-based learning

    Model-based learning is a class of meta learning frameworks with models that are specifically designed for fast learning.

    Meta learning with Memory-Augmented Neural Networks [22] (MANN) is one of the initial works in meta learning. It is proposed as a solution for the one-shot learning problem. In this work, Neural Turing Machines (NTMs) allow it to encode and retrieve new information quickly.

    In Metanetworks [23] the architecture is designed for fast adaptation by proposing fast and slow weight layers. The fast weights are produced by processing the metainformation from gradients using an LSTM network. It consists of two objectives: 1. The embedding loss, which applies the contrastive loss in a similar manner to Eq. (1.1) to the output of a representation module and the support label y. The representation module takes slow weights as input and produces fast weights. 2. The task loss, which optimizes the base learner using the slow weights.

    Simple Neural AttentIve Learner (SNAIL) [24] formalizes meta learning as a sequence-to-sequence problem. The main idea of this work is the combination of temporal convolution layers with causal attention layers. This allows the metalearner to aggregate contextual information from past experience.

    1.4 Task construction in meta learning

    The construction of tasks is an essential part of meta learning since it can directly affect the optimization process. Generally, the tasks are defined from randomly selected classes with no overlap, i.e., n distinctive classes are chosen for the support set, and m classes with no overlap with the support set are chosen for the query set. Commonly, the tasks are hand-designed [51]; however, some works are focusing on automating the task construction problem that we discuss in this section.

    An unsupervised task design approach for few-shot medical image classification has been proposed in [3]. They apply deep clustering methods to the dataset and get pairs of data samples from different clusters (e.g., data from clusters 1 and 3 or 2 and 5). Then, the network is optimized for the classification problem given the clustered data pairs. They show that the proposed task design can outperform other methods for the breast cancer classification task. Similarly, [4] performs meta learning by hierarchical clustering of the tasks.

    An effective way of task selection has been explored in [5]. They demonstrate that if the training tasks are selected appropriately, learning can be faster and more effective in terms of better performance in the test tasks. To select the tasks, they take task difference and task relevance of the training tasks into account. This is performed in the reinforcement learning setting.

    Here, the task difference is the average KL divergence of the respective policies over the states of the validation tasks, and the relevance of task to task is the expected difference in entropy of the policies before and after learning over the states of the validation tasks with regard to the on-policy distribution. The on-policy method aims at improving the policy that is used for action selection.

    A probabilistic task modeling for meta learning is first proposed in [6]. The probabilistic modeling makes it possible to quantitatively measure task uncertainty for the goal of active task selection in meta learning. Initially, a variational autoencoder (VAE) [52] is adapted to reduce the data dimensionality and acquire the feature embeddings. These inferred embeddings are employed to model each task as a mixture of Gaussians using LDA (Latent Dirichlet Allocation) [53]. In contrast to the standard LDA, the embedding space of VAE is continuous; therefore, the categorical word-topic distributions in LDA are replaced by Gaussian task-theme distributions.

    1.5 Representation learning in meta learning

    This section focuses on studies that attempt to learn better representations or distributions using meta learning or for meta learning.

    PAC-Bayes [54] framework, in which PAC stands for probably approximately correct, is a variational/generalization error bound for i.i.d. (independent and identically distributed) data. It was initially used in lifelong learning that is very similar to the meta learning setting in [55]. The framework permits a quantitative assessment of the data quality of transferred information by comparing the expected loss on a future learning task to the average loss on the observed tasks. The agent is enforced to identify prior knowledge from the information in the observed tasks to improve the performance on new, unobserved tasks. This framework is intended for solving single-task problems. [56] explore the same problem in meta learning for deep neural networks. They provide a tighter bound in the PAC-Bayes setting by taking the union of multiple single-task bounds. Nguyen et al. [57] assert that the generalization errors have not been well investigated for unseen tasks, resulting in limited generalization guarantee. Furthermore, they argue that variational functions may not correctly represent the underlying distributions; therefore, they use implicit modeling of both the prior and posterior distributions using a deep generator network. The proposed approach does not directly use the KL divergence term; instead, it uses an estimation of it by employing a probabilistic classification approach. Using the new estimation approach, they provide a tighter bound on the PAC-Bayes framework.

    In an attempt at semisupervised few-shot learning, [7,58] build on top of Prototypical Networks [16]. A prototypical random-walk semisupervised loss (PRW) is proposed for learning compact and well-separated representations through a similarity graph between the prototypes and the embeddings of the unlabeled points. The random walker matrix is defined in a way that each entry denotes a walk starting at prototype i and ending at prototype j. Given this notion, the probabilities of the walker returning to the starting prototype would be at the diagonal entries of the T matrix. Since the objective for the walker is to return to the prototype it started from; the loss aims at maximizing the probabilities of the diagonal entries in T. The walker loss is then computed as the cross-entropy between the matrix T and the identity matrix I. As it is preferred to have the walker visit as many unlabeled points as possible, the visitor loss computes the overall probability that each point would be visited when walking from prototypes to the points. The final PRW loss is a sum of visitor and walker losses. Their proposed method is robust to labeled/unlabeled class distribution mismatch due to its resistance to unlabeled data not belonging to any of the training classes. This resistance results from the random walker avoiding the distractor points and not attracting them to the class prototypes.

    Variational Agnostic Modeling that Performs Inference for Robust Estimation (VAMPIRE) [8] uses uncertainty in MAML using variational inference. VAMPIRE aims to learn a probability distribution of model parameters before the few-shot learning by approximating the task-specific parameters using a variational distribution. In contrast to MAML, which has fixed values for parameters, the goal here is to have probability distributions for metaparameters for more robust learning.

    1.6 Unsupervised / self-supervised meta learning

    Although meta learning makes it easier to learn with few data, some labeled data still needs to exist for pretraining. Several approaches are proposed to use unlabeled data in the pretraining step and a few labeled data for the fine-tuning step to overcome this issue.

    The first method in this line of work is unsupervised learning using meta learning [9] or CACTUS. CACTUS constructs tasks from unlabeled data automatically by clustering the feature embeddings from data points and using the cluster IDs as pseudo labels for the metatraining step as shown in Fig. 1.5. It is shown that even using a simple task construction method, such as clustering the embeddings, leads to performance improvement on different sets of tasks. Different meta learning approaches such as MAML [20] and Prototypical Networks [16] are applied in the metatraining step to evaluate the few-shot classification performance.

    Figure 1.5 Few-shot learning approaches require data from a set of tasks with label information. This is solved in unsupervised meta learning approaches [9, 10] using pseudo-labels generated by clustering or data augmentation. The generated tasks from unlabeled data are later used for standard few-shot learning models such as MAML [20], and Prototypical Networks [16].

    On the same topic, Khodadadeh et al. propose UMTRA (Unsupervised Meta learning for Few-shot Image Classification) [10]. Instead of using cluster IDs as pseudo labels as presented in CACTUS, UMTRA takes each data sample and applies different augmentations to the data sample. The task is constructed by assigning the same label to each data sample and its augmentations. They use the AutoAugment [59] approach for augmentation of images on mini-Imagenet and random augmentation such as random flip or crop on Omniglot. Similar to CACTUS, UMTRA uses its proposed task construction method with different meta learning frameworks.

    In follow-up work, LASIUM (Latent-Space Interpolation in Unsupervised Meta learning) [11] focuses on automatic task construction using generative models. LASIUM is based on sampling data from generative models, such as GANs or VAEs, by modeling the data distribution and interpolating the latent space of the generative model for generating new data samples. The generated samples are then used for constructing tasks for pretraining.

    Another direction towards less supervision is self-supervised learning (SSL). It is based on employing information in the underlying structure of the data for pretraining a model on a pretext task that is later used for a downstream task. A related SSL approach to meta learning is SSL for few-shot segmentation [12]. This work has three main components: 1. An adaptive tuning mechanism, 2. A self-supervised base, 3. A metalearner.

    Initially, the features of input support and query images are obtained from a Siamese Network [48] with shared parameters. The distribution of the support features is tuned using the self-supervised module (SSM) from the underlying semantic information of the data. Then the similarity between the tuned feature map and the query features is measured using a deep nonlinear metric inspired from Relation Network [15]. The similarity metric is used to determine the regions of interest in the query image. Finally, a segmentation decoder network produces and refines the segmentation map results in the original image size. The self-supervised module performs by duplicating the support feature maps , multiplying one with the corresponding segmentation mask, which gives us and applying cross-entropy loss between and . The gradients from the later cross-entropy loss are used to update the SSM module. The optimization-based metalearner inspired by [15] is employed as the outer loop of the proposed framework for refining the segmentation results.

    1.7 Meta learning applications

    In this section, we explore some of the typical applications of meta learning, such as few-shot segmentation and few-shot image generation. After that, we go through other applications that would be less common.

    1.7.1 Segmentation

    Semantic segmentation is an essential task in the medical field for identifying different diseases or detecting anomalies in organs and pathology. Deep neural networks have achieved outstanding performance in semantic segmentation, but they usually need large amounts of labeled data. Hence, few-shot learning seems like a natural solution to this problem by learning new classes from only a few annotated data. U-Net [60] is a commonly used architecture for 2D semantic segmentation using pairs of data and segmentation labels. V-Net [61] is similar to U-Net, but for 3D volumes.

    In this section, we list a number of the recent works in few-shot segmentation for medical applications.

    Squeeze & excite guided few-shot segmentation of volumetric images [25] One of the challenges in few-shot medical image segmentation is the scarcity of pretrained models compared to computer vision tasks and the difficulty of training on volumetric data. Roy et al. propose a few-shot learning framework [25] for segmentation of volumetric medical images with few annotated slices. Their proposed method enables stable training without relying on a pretrained model, which is usually unavailable for medical data. The proposed method consists of three building blocks:

    1.  A conditioner arm

    2.  Interaction blocks with their proposed ‘channel squeeze & spatial excitation’ (sSE) modules

    3.  A segmenter arm

    The conditioner and segmenter arms are both encoder-decoder-based architectures with a few differences. The segmentation is performed by matching a few slices of the support volume to all the slices of the query volume. A task-specific representation is generated from the annotated support input using the conditioner arm. This task-specific representation models how a new semantic class looks in the image. The generated representation is fed to the segmenter arm for segmenting the new query image using the interaction blocks.

    One of the main contributions of this work is the sSE module in the interaction block, which is based on Squeeze & excite blocks. These blocks are initially proposed in [62,63]. The goal of the sSE module is the efficient interaction between the conditioner and segmenter arms. sSE is a lightweight computational module with a low computational complexity which improves the gradient flow. These blocks are used between all the encoder, bottleneck, and decoder blocks in the network. sSE performs ‘channel squeeze’ on the obtained learned representation from the conditioner block to learn a spatial map. Subsequently, the learned spatial map performs ‘spatial excitation’ on the segmenter feature map.

    The models are trained using 2D images and tested with 3D volume queries; hence, the support set needs to be selected from a diverse set of annotated slices. The proposed method is evaluated for organ segmentation on whole-body contrast-enhanced CT scans from the Visceral Dataset [64].

    Learn to segment organs with a few bounding boxes [26] This work [26] proposes a method for 3D medical segmentation in low-data setting. This method has two main modules: 1. prototype learner, and 2. segmenter. The prototype learner takes the input image and its corresponding label as input and provides a prototype as output. The prototype, along with the query images, is then fed to the segmenter model to predict segmentation maps. The prototype learning is optimized using the nearest neighbor loss, while the segmenter is trained using weighted cross-entropy. They also propose a semisupervised approach by providing weak supervision from bounding boxes, as it is easier to acquire than semantics segmentation labels. This method has been evaluated on the Visceral dataset [64].

    Differentiable meta learning model for few-shot semantic segmentation [27] Most methods in semantic segmentation target single object per image (1-way) segmentation problem. Nevertheless, this is not a realistic scenario. Tian et al. [27] propose a method for K-way multiobject few-shot semantic segmentation called MetaSegNet which is based on meta learning. Here, the few-shot segmentation task is formulated as a pixel-classification problem. They propose a differentiable metalearner which is based on ridge regression and an embedding network that is composed of two main components: 1. The feature extractor, and 2. The feature fusion module. The feature extractor provides local and global features for both support and query sets. The feature extractor is a slightly modified version of ResNet-9 with two branches, one for local and one for global context extraction, which are trained from scratch. The global and local features extracted by the feature extraction module are then passed to the feature fusion module, which fuses the latter by concatenating the local feature map and the unpooled global feature for better prediction of each pixel. After applying the L2 norm in each channel for normalization, the combined feature map is reshaped for pixel-wise classification. This approach can be applied to the multiorgan segmentation problem or other similar topics.

    Few-shot microscopy image cell segmentation [28] Cell segmentation in microscopy is a challenging task because of the limited amount of labeled data required for segmentation using deep neural networks. Meta learning makes it possible to have a well-generalized model by pretraining the model on a large amount of data from other domains and fine-tuning the model on the limited labeled data from the target domain. [28] focuses on this issue by proposing a meta learning-based algorithm for cell segmentation. They employ data from a different domain with different image appearance and cell types and transfer the knowledge to the target domain. They use Reptile [21] as the meta learning framework due to its simplicity in optimization compared to previous works. They implement two auto-encoder-based architectures for the segmentation task, 1. fully convolutional regression network (FCRN) [65], 2. U-Net. They slightly modify FCRN by replacing the bi-linear upsampling layers with transposed convolution layers and altering the heat-map predictions to sigmoid activation functions for better performance. They propose three loss terms for metatraining: 1. standard binary cross-entropy loss for segmentation 2. entropy regularization for moving the segmentation results away from the classification boundary 3. distillation loss for enforcing the learning of a common feature representation across different tasks.

    SML: semantic meta learning for few-shot semantic segmentation [31] Incorporating semantic information in the few-shot setting can be extremely beneficial. SML [31] proposes combining visual features with semantic information for improving class prototypes in prototypical learning [16]. Inspired by [66] that obtains better class prototypes using the class names into the base learner, the semantic information is acquired using existing pretrained language models such as Word2Vec [67], or FastText [68] by getting the attribute vectors given the class name. The image pixels in the support and query set belonging to the same class are enforced to have the same semantic representation using the obtained attribute vectors. SML has two main modules: 1. Feature extractor and 2. Attribute-injector. Given support images as input in each episode, the visual features are obtained from the feature extractor module that is a VGG-16 or Resnet-50 network pretrained on Imagenet. The background and foreground regions of the image, which are obtained from the segmentation masks, are used to compute the average foreground and background feature embeddings for each support image. The background and foreground feature embeddings combined with the attribute-injector module's semantic information using the support set's class names are then incorporated into the base-learner. The incorporation of the semantic information into the embeddings is done by ridge regression which effectively combines the class-level semantic information with the information from multiple images for prototype computation. After the metatraining is done, the final segmentation map prediction task is performed using the class prototypes, and background features are utilized. Although this work is not on medical applications, it can be used to process semantic information from medical reports and employ it for medical image segmentation.

    Few-shot segmentation of medical images based on meta learning with implicit gradients [32] In a work [32] proposed by Khadga et al., implicit MAML (iMAML) [69] combined with the attention U-Net [70] is used for medical image segmentation. iMAML aims at removing the need for differentiating through the optimization path using an iterative optimization solver. The metatraining step is done on two datasets for the attention U-Net, then the metatrained network is fine-tuned on an unseen dataset for few-shot segmentation. In this work, conjugate gradients are used for optimizing the weights for computing the metagradients in the metatraining steps using iMAML. This method is applied to skin and polyp data for identifying cancerous data samples.

    Segmentation in style [30] Unsupervised segmentation can be performed by discovering knowledge from unsupervised models, such as Generative Adversarial Networks [71] or as done in [30] using a trained StyleGAN V2 [72] model. In this work, unsupervised segmentation is performed by clustering the latent space of the StyleGAN model into the number of segmentation classes. Some rare classes such as beard, hat, or glasses are not segmented by the model since they are not normally assigned to a specific cluster. Therefore, the data is augmented by manipulating the latent space using CLIP [73] and generating more samples of the rare classes. This approach can outperform semisupervised methods and is on par with fully supervised ones.

    MetaMedSeg: volumetric meta learning for few-shot organ segmentation [29] In our recent work on few-shot organ segmentation, we propose a meta learning framework for learning from broadly labeled organs to learn a model for organs with limited labeled data. The model performs segmentation of 2D slices, which are extracted from 3D volumes. The main contributions of this work are: 1. A volumetric task definition, where the images in a task are sampled from the same volume 2. An inverse distance weighting scheme for model parameter aggregation designed for the high cross-domain shift between the organs used for metatraining and the organs used for testing. An overview of this work is shown in Fig. 1.6.

    Figure 1.6 Segmentation of medical images with limited labeled data is a challenging task which could be solved using meta learning. In MetaMedSeg [29], the model is first metatrained on a set of organs with high amount of labeled data available. Then the model is fine-tuned on the organ with limited data.

    1.7.2 Few-shot image generation

    Another use case of meta learning is few-shot image generation. Generative Adversarial Networks (GANs) are commonly used for generating images or manipulating them [74]. These networks require large amounts of data for generating high-quality images. The few-shot image generation problem is first introduced by Clouâtre, and Demers in FIGR (Few-shot image generation with Reptile) [33], which combines the idea of GANs with meta learning using Reptile [21]. Similarly, MIGS (Metaimage Generation from Scene Graphs) [35] uses Reptile for few-shot generation of images from scene graphs and improving the image generation quality. Even though current works in few-shot image generation are not used in the medical field, this direction of research can help generate images as a way of augmentation for unbalanced datasets, e.g., cancerous vs. malignant [75].

    Another attempt at few-shot image generation is made in [34], where the goal is to regularize the changes of the weights during the fine-tuning step while preserving the information and diversity of the source domain and at the same time adapting to the appearance of the target domain. To preserve the essential weights during the adaptation, they use a metric for quantifying the importance of the weights, namely Elastic Weight Consolidation (EWC) [76], which evaluates the importance of each parameter by estimating its Fisher Information relative to the objective likelihood.

    1.7.3 Other applications

    Meta learning has been used for many applications in medical imaging in recent years. We go through some of these works in this section.

    1.7.3.1 Denoising

    Noisy data are ubiquitous in medical datasets, such as in Electroencephalography (EEG) that can be affected by many conditions such as muscle movements in the face. Deep denoising approaches have been recently explored for noise removal from images or other data modalities. Few-shot Metadenoising [36] is a denoising approach that is based on meta learning shown in Fig. 1.7. Given a dataset of clean and noisy data, a deep network is trained for the denoising task. The network receives a noisy data sample and is enforced to output the clean data. However, the mentioned approach would be prone to overfitting if the training set is small, which is a typical scenario. Therefore, it is proposed in [36] to generate a large dataset with pairs of clean and synthesized noisy data samples for pretraining using a meta learning approach, namely Reptile [21]. The pretrained model is then fine-tuned on the small data with pairs of normal and real noisy samples as a few-shot learning task. The adoption of meta learning approaches makes it possible to have a careful design of training tasks to improve the method further with a potentially infinite amount of synthetic data in the training set. Unlike transfer learning, meta learning can provide a model that adapts fast to new data, even pretrained on synthetic data. This method is evaluated on CT-Scan [77] and ECG [78] denoising.

    Figure 1.7 Pairs of real noisy and denoised data are difficult to obtain. However, generating synthetic noise and adding it to normal data is an easy task that provides a large number of labeled data. This information is used in [36] for few-shot denoising of medical images.

    1.7.3.2 Anomaly detection

    Anomaly detection in medical data is of high importance because of the unbalanced nature of diseases. In most medical datasets, there are many labeled data from healthy samples and only a few from unhealthy ones. It is also challenging to acquire data from diseases such as cancer that can be considered anomalies. To this cause, there have been a few works studying few-shot anomaly detection in medical scenarios. Standard anomaly detection methods generally intend to learn the normal image distribution, i.e., healthy samples, and perform the anomaly detection during testing by detecting outlier samples, i.e., diseased samples which are far from the learned distribution. This can be useful in many cases, yet it is challenging for some medical scenarios such as colonoscopy images with a small polyp due to the sensitivity of the standard approaches to outliers that are close to the inliers distribution. To overcome this issue, few-shot anomaly detection for polyp frames from colonoscopy [37] is

    Enjoying the preview?
    Page 1 of 1