1 Introduction
With the recent explosion of deep learning, signal recognition has made some remarkable advances O’Shea et al. (2016); Jas et al. (2017); Song et al. (2020); Dong et al. (2021a)
. To achieve these, a large volume of data is required to obtain satisfactory performance. However, the deep learning models trained with traditional supervised learning methods often perform poorly or even fail when only a small amount of data is available or when they need to adapt to unseen tasks or timevarying ones. In practical signal recognition tasks, the collection and annotation of abundant data are notoriously expensive, especially for some rare but important signals. Another critical challenge is the presence of noise, because the signal data varies for different signaltonoise ratios (SNRs), and in realworld scenarios, the deep neural networks (DNNs) have to adapt to realtime variations in SNRs.
Metalearning technique Finn et al. (2017, 2018); Yoon et al. (2018); Zhang et al. (2018); Balaji et al. (2018) seeks to resolve above challenges by learning how to learn like humans do. We know that humans can effectively utilize prior knowledge and experience to learn new skills rapidly with very few examples. Similarly, the metalearner is trained on the distribution of homogeneous tasks, with the goal of learning internal features that are broadly applicable to all tasks, rather than a single individual task. Equipped with these sensitive internal features, the metalearner is able to produce significant improvements of adaptation ability via finetuning. Recently, metalearning has demonstrated promising performance in many fields Liu et al. (2019); Khodadadeh et al. (2019); Xie et al. (2019); Alet et al. (2019); Jerfel et al. (2019); Khodak et al. (2019); Zhou et al. (2019); Rajeswaran et al. (2019); Zhuang et al. (2020); Kong et al. (2020); Denevi et al. (2020); Chen et al. (2020); Yao et al. (2020); Baik et al. (2020); Sitzmann et al. (2020); Harrison et al. (2020); Ji et al. (2020); Boutilier et al. (2020); Confavreux et al. (2020); Goldblum et al. (2020). Please see the supplementary material for detailed related work in Section G. However, for some particular fields, especially signal recognition, existing metalearning methods generally neglect the prior knowledge of the signals, i.e., temporal information and complex domain information. For models with insufficient training data, it is crucial to incorporate this prior knowledge.
As such, we take into account the attention mechanism Vaswani et al. (2017) and the complexvalued neural network Trabelsi et al. (2018a); Hirose (2012); Tu et al. (2020)
for signal recognition, respectively. Attention mechanisms have been widely adopted in many time series learning tasks, such as natural language processing. It became an integral component of Recurrent neural networks (RNNs), long shortterm memory
Hochreiter and Schmidhuber (1997) and gated recurrent Chung et al. (2014) neural networks, until Transformer Vaswani et al. (2017) was proposed. Since then, selfattention is able to replace RNN with better performance and parallel computation. Therefore, we adapt the attention mechanism to the signal recognition task. Since the signals contain both magnitude and phase, complex numbers are used for the representation of signals. Consequently, complex arithmetic operations are the essential part of signal processing. Intuitively, complexvalued neural networks should be built to address the signal recognition problem. However, to the best of our knowledge, the metalearning method equipped with attention mechanisms in the complexvalued neural networks has not been investigated.In this paper, we propose a Complexvalued Attentional MEta Learner (CAMEL), for fewshot signal recognition, which generalizes metalearning and attention to the complex domain. With the help of these novel designs, CAMEL has succeeded in capturing more information from the signal data. The prior knowledge assists CAMEL in preventing overfitting and improving its performance. For better understanding the proposed architecture, the overview of CAMEL is illustrated in Figure 1. Notice that CAMEL can be applied to any kind of complexvalued data. By leveraging existing metalearning and few learning methods in extensive experiments, the proposed method shows consistently better performance compared with the stateoftheart methods. The effectiveness of each novel component in CAMEL is verified via ablation studies. From the convergence analysis of complexvalued MAML, it is shown that CAMEL is able to find an firstorder stationary point for any positive after at most iterations with secondorder information.
The code of this paper will be released upon acceptance. Please see the supplementary material for notations, detailed derivation of Lemma, and more experiment results.
2 Motivation
Metalearning is one of the most suited techniques to solve signal recognition problems, because, in the real world, signal annotation is expensive and models need to adapt to changing SNRs, whereas metalearning has an explicit goal of fast adaptation. To further improve the effectiveness of metalearning in applications to signal processing, we consider incorporating prior knowledge of signal data to the model by CAMEL that can generalize metalearning with attention to the complex domain so that we are able to extract complex domain and temporal information from signal data.
However, lots of so called complexvalued neural networks treat a complex number as two real numbers, i.e., real and imaginary parts of the complex number, and design special network structures to recover complex operations using these real numbers. We refer to these special complexvalued neural networks as inphase/quadrature complexvalued neural networks (IQCVNNs). Although IQCVNNs can deal with complexvalued problems, essentially the neural nets are still working with realvalued ones, since IQCVNNs work without defining complex derivatives and the complex chain rules in backpropagation. We refer to the complexvalued neural networks that define complex derivatives and the complex chain rules as complex derivatives complexvalued neural networks (CDCVNNs). It turns out that compared with IQCVNNs, CDCVNNs can perform complex operations with fewer parameters. To be more specific, we give the following lemmas to show the significance of CDVNNs compared with IQCVNNs with respect to time complexity.
Lemma 1
If a function is complex analytic, the time complexity of the derivative of in IQCVNNs are twice that of the complex derivative of in CDCVNNs.
Lemma 2
The complexvalued convolutional layer and complexvalued fully connected layer is complex analytic.
As we know, the convolutional and fully connected layers are the most computationally intensive parts of a neural network. Therefore, although it has a similar effect to the complexvalued neural network, IQCVNNs far exceed the CDCVNNs in terms of the time complexity of backpropagation. Especially, metalearning requires secondorder information of the objective function to guarantee convergence Fallah et al. (2020), which forces us to implement CDCVNNs. The complex chain rule is a key to implementing CDCVNNs. According to the complex chain rule, we are able to derive the outerloop update process of CAMEL, which is different from that of MAML.
Complexvalued attention is also necessary for CAMEL to obtain the temporal information from signal data. However, in complexvalued attention, it is required to compute the derivative of the mapping from complex to real domain since calculating the similarity coefficient between two pairs leads to the real numbers in the activation function of the complexvalued neural nets. Given the following lemma, we know that the derivative of the function will be nonanalytic, since constant function is useless in identifying the features of data.
Lemma 3
, is analytic if and only if is a constant function.
To the best of our knowledge, attention in the complex domain has rarely been studied. ^{1}^{1}1A closely related work is Yang et al. (2020), which proposed a complex transformer and developed attention and encoderdecoder network operating for complex input. However, they utilized eight attentions to represent complexvalued attention without considering the nonlinear components of attention such as softmax and activation functions, etc.. Therefore, we here study complexvalued attention and propose CAMEL as presented in the next section.
3 Camel
Please see the supplementary material for the definitions of complex derivative, analytic function, and the CauchyRiemann equations in Section D.
3.1 Algorithm Design
CAMEL utilizes complexvalued neural networks and attention to provide prior knowledge, i.e., complex domain and temporal information, to prevent overfitting during training. It resembles its namesake animal, camel, which stores water and nutrients with its hump to ensure its survival in extreme conditions.
CAMEL updates parameters through backpropagation by the chain rule. However, traditional chain rule does not work, because CAMEL is nonanalytic.
The chain rule for complex variables The chain rule is different when the function is nonanalytic. For a nonanalytic composite function , where , we can apply the following chain rule:
(1) 
where is a continuous function and
denotes the conjugate vector of
. Note that if the function is analytic, the second term equals zero and (1) turns into the normal chain rule. In the case of matrix derivatives, the chain rule can be written as:(2) 
where is nonanalytic, and are two complex matrices, and denotes the transpose of a matrix.
Under (1) and (2), CAMEL is able to update the parameters as expected. Formally, we define the base model of CAMEL to be a complexvalued attentional neural network with metaparameters . The goal is to learn a sensitive initial , for which the network performs well on the th query set after few gradient update steps on the th support set to obtain . Here,
is a task randomly sampled from the task probability distribution
. The update steps above are termed as the innerloop update process, which can be represented as:(3) 
where is a learning rate and denotes the gradient on the support set of task . The metaparameters are trained by optimizing the performance of . Consequently, the metaobjective is defined as follows:
(4) 
where denotes the loss on the query set of task after the innerloop update process. As the underlying is unknown, evaluation of the expectation in the right hand side of (4) is often computationally prohibitive. Therefore, we can minimize the function with a batch of tasks that are independently drawn from , which can be expressed as:
(5)  
The optimization of the metaobjective is referred to as the outerloop update process, which can be expressed as:
(6) 
where denotes the meta learning rate. Define
(7) 
Lemma 4
In response to complex metaparameters , we have
(8) 
3.2 Complexvalued Attention
The attention mechanisms are widely used in various areas of deep learning, but attention for the complex domain have rarely been addressed. A significant reason is that the attention has to utilize the softmax function to calculate the similarity coefficient, which must be real numbers rather than complex numbers. According to Lemma 3, it is a constant function or a nonanalytic function. However, the constant functions are useless and discardable in neural networks, while nonanalytic functions cannot be derived at arbitrary points in complex domain. As a result, we had to utilize the complex gradient vector.
Complex gradient vector If is the real function of a complex vector , then the complex gradient vector is given by Hjørungnes (2011):
(10) 
Complexvalued softmax function Under (10), we are able to define the generalized complexvalued softmax function as:
(11) 
where denotes the softmax function in real case and denotes any function that maps complex numbers to real numbers, such as (i.e., the magnitude of the complex numbers), , and , etc.
Given a complex matrix , we can compute the complex matrix , and
using linear transformations, which are similar to complexvalued fully connected layers. Then the complexvalued attention can be written as:
(12)  
where acts on each row of the matrix and denotes the row dimension of i.e. scaling factor.
Complexvalued multihead attention Complexvalued multiheaded attention allows models to jointly focus on information from different representations.
(13) 
where , and are the projection matrices and denotes the concatenation of inputs matrices.
Complexvalued normalization
Normalization, such as batch normalization
Ioffe and Szegedy (2015) and layer normalization Ba et al. (2016), is an important component of neural networks. Especially, the batch normalization is commonly employed. However, for a complex vector, its variance, which has to be computed in normalization, is real. According to Lemma
3, the variance is nonanalytic. Therefore, in the backpropagation of complexvalued normalization, we have to utilize the complex gradient vector (10). Define as the complex scaling parameters and as the complex shift parameters, the complexvalued normalization can be expressed as:(14)  
where and denote the expectation and variance, respectively, and denotes the conjugate transpose of .
Complexvalued activation function The activation function is nonlinear, so that it is scarcely to be analytic. Most of the wellknown activation functions are not analytic in the complex domain, such as Sigmoid, Tanh, and ReLU Goodfellow et al. (2016), etc. Especially, the complex Sigmoid and Tanh is not bounded while in complex ReLU the complex numbers cannot be compared with zero. To this end, the complexvalued activation function can be defined as:
(15) 
where denotes the activation function in real case. In this way, the and are bounded because the real and imaginary parts of them are bounded. Meanwhile, the complex can be compared with zero because the real and imaginary parts of inputs can be compared with zero. However, since the complexvalued activation functions defined above are nonanalytic in most cases, the complex chain rule is required for derivatives.
Please see the supplementary material for detailed complexvalued convolutional layer and complexvalued fully connected layer in Section F.
4 Convergence of CAMEL
In this section, we will show the convergence behavior of complexvalued MAML by following the previous work Fallah et al. (2020) in proving the convergence MAML in the real domain. To prove the complexvalued MAML, we need to utilize twice continuously differentiable, smooth, Lipschitz continuous, and Hessian, etc. in complex domain. Please see the supplementary material for detailed Assumptions, Lemma, and proof of Theorem 1 in Section H.
Theorem 1
Suppose that Assumptions 15 hold and . Consider running complexvalued MAML with batch sizes and . Following the definition in Lemma 5, let . Then for any , complexvalued MAML finds a solution that
(16) 
after running for
(17) 
iterations, where is defined in Assumption 1 and and denotes the size of the support set and query set, respectively.
The result in Theorem 1 demonstrates that after running CAMEL for iterations, we are able to find a point at which the expected gradient norm satisfies (16).
5 Experiments
We train the model on 3 datasets: RadioML 2016.10A O’Shea et al. (2016), a dataset with 220,000 total samples, 20,000 samples for each class and 11,000 samples for each SNR, consists of dimension input X in 11 classes. The 11 classes correspond to 11 modulation types: 8PSK, AMDSB, AMSSB, BPSK, CPFSK, GFSK, PAM4, QAM16, QAM64, QPSK, WBFM. And RadioML 2016.04C O’Shea et al. (2016)
, a synthetic dataset, is generated with GNU Radio, consisting of about 110 thousand signals. These samples are uniformly distributed in SNR from 20dB to +20dB and tagged so that we can evaluate performance on specific subsets. Actually 2016.10A represents a cleaner and more normalized version of the 2016.04C dataset. The third one is SIGNAL2020.02
Dong et al. (2021b), whose data is modulated at a rate of 8 samples per symbol, while 128 samples per frame, with 20 different SNRs, even values between [2dB, 40dB].5.1 Experimental setup
The CAMEL is implemented in Pytorch
Paszke et al. (2019) with python on a RTX3090 Graphics Processing Units, and trained using the Adam optimizer Kingma and Ba (2014). In the classification experiments of three datasets, RadioML 2016.04C, RadioML 2016.10A and SIGNAL 2020.02, the default hyperparameters are as follows: the training epochs are 400,000; the meta batch size is 2; the metalevel outer learning rate is 0.001 and the tasklevel inner update learning rate is 0.1; the tasklevel inner update step is 5 and the update step for finetuning is 10. All of our experiments use the same hyperparameter as the default setting. We change the support set shot number in 1 and 5 to have different results of 5way 1shot case and 5way 5shot case.
5.2 Our Model
First, we study the influence of adding a multihead self attention mechanism in this network, which can focus attention on important information. We perform a multihead attention with 8 heads. Instead of performing a single attention function with input dimensional keys, values and queries, it is found beneficial to linearly project the queries, keys and values times with different, learned linear projections to , , dimensions, respectively. Then perform the attention function in parallel, concatenate the outputs and do the projection again to get the final result Dauphin and Schoenholz (2019). In our experiments, as illustrated in Table 1, the performance is much better with the addition of the multihead attention mechanism. As the batch size increases, the performance improves while increasing computation and timeconsuming. To make a tradeoff, we set the batch size to be 64 when using multihead attention. We observe that the model with attention mechanism demonstrates a greater ability to increase the accuracy owing to various improvements.
RADIOML 2016.10A  SIGNAL2020.02  

Method  1shot  5shot  1shot  5shot 
MAML Finn et al. (2017)  86.57%  94.50%  43.26%  67.77% 
MAML+attention  95.80%  97.70%  54.44%  63.33% 
MAML+complex  91.40%  96.38%  59.50%  64.00% 
SNAIL Mishra et al. (2018)  71.18%  78.48%  35.01%  36.34% 
Reptilec Nichol et al. (2018)  69.16%  92.32%  55.01%  69.39% 
MAML+complex+CT Yang et al. (2020)  96.40%  97.50%  58.40%  69.80% 
CAMEL (ours)  97.23%0.13%  98.22%0.08%  64.80%0.10%  74.27%0.15% 
shows 95% confidence intervals over tasks.
RADIOML 2016.04C  

Method  1shot  5shot 
MAML Finn et al. (2017)  88.93%0.13%  93.59%0.62% 
MAML+attention  92.12%0.22%  95.51%0.05% 
MAML+complex  91.65%0.35%  96.28%0.53% 
SNAIL Mishra et al. (2018)  89.21%0.75%  96.90%0.19% 
Reptile Nichol et al. (2018)  87.08%2.88%  92.07%5.65% 
MAML+complex+CT Yang et al. (2020)  93.58%1.15%  96.52%0.08% 
CAMEL (ours)  96.30%0.22%  97.51%0.15% 
Further study concerns the influence of adding a complexvalued neural network, because we notice that complex numbers could have a richer representational capacity. For these signals inputs, using complex number can probably obtain more useful details than real numbers and could also facilitate noiserobust memory retrieval mechanisms
Trabelsi et al. (2018b). We need to deal with the complex building blocks to construct a complex number neural network: representing of complex numbers, Complex gradient vectors, complex weight initialization, complex convolutions, complexvalued activation, complexvalued normalization and complexvalued multihead attention mechanism. These blocks are determined by their own algorithm and the algorithm of complex numbers. We figure out from the results in Table 1 and Table 2 that this complex features improve the classification accuracy in both 5way 1shot and 5way 5shot cases with different datasets.In the training process, we adjust the number of convolution kernels to 128. For the multihead attention part, we set the source sequence length and output sequence length to 64, number of heads to 8. We observe that such complexvalued models are more competitive than their real valued counterparts. These build our final model: CAMEL, ModelAgnostic MetaLearning with features of multihead attention and complexvalued neural network. Compared with the other metalearning models, CAMEL achieves the best classification accuracy.
The Complex Transformer Yang et al. (2020) implements complex attention in another way: It rewrites all complex functions into two separate real functions and computes the multiplication of queries, keys and values to get the complex attention with 8 attention functions having different inputs. We also conduct SNAIL Mishra et al. (2018), which combines a casual attention operation over the context produced by temporal convolutions, and Reptile Nichol et al. (2018), which uses only firstorder derivatives for metalearning updates. To have a comparison, Table 1 and Table 2 list the accuracies of several models based on MAML applied on different datasets. Results in thses two tables demonstrate that our model CAMEL have the stateoftheart performance among all. In particular, some models are not well performed on the task in the dataset SIGNAL2020.02, but our model CAMEL still has a stable and great performance on this challenging task. Figure 3 indicates that CAMEL could get the highest accuracy at a relatively fast convergence speed. The results also show that, on these challenging signal classification tasks, the CAMEL model apparently outperforms other metalearning models in accuracy and stability, which could be figured out from the smooth accuracy curves and narrow confidence intervals for CAMEL model in both 1 shot and 5 shot cases.
5.3 Ablation study
In this section, we have conducted the ablation studies on CAMEL in three scenarios, as shown in Table 3. The first scenario uses samples whose SNR 0, of which 75% is selected as the training set and 25% is selected as the test set. For the second scenario, showed in the column "SNR = 0" in Table 3, we pick samples with SNR=0 and randomly select 75% of them to form the training set and 25% of them as the test set. The third scenario forms the (PredictionOther) PO set as follow: pick 5 classes of signal samples (SNR 0) as set P and the rest 5 classes of samples (SNR 0) form set O. Pick all samples in set O and 5% of samples in set P as the training set. The remaining 95% of samples in set P constitute the test set.
On 3 training and testing sets mentioned above, we construct the MAML model first, and then add some features on it step by step. We add attention components and complex numbers separately and together. From the results we observe that in CAMEL, all the features added on the original MAML model help improve the classification accuracy.
Accurancy  SNR 0  SNR = 0  PO set 

MAML  87.20%  81.64%  89.06% 
MAML+attention  93.00%  87.26%  91.90% 
MAML+complex  91.10%  91.75%  91.30% 
CAMEL (ours)  93.70%  92.10%  96.30% 
6 Conclusion
In this paper, we have proposed a complex domain attentional metalearning framework for signal recognition named CAMEL. CAMEL utilizes complexvalued neural networks and attention to provide prior knowledge, i.e., complex domain and temporal information, which helps CAMEL improve performance and prevent overfitting. As two byproducts of CAMEL, we have designed the complexvalued metalearning and complexvalued attention, which can be of independent interest. With secondorder information, CAMEL is able to find firstorder stationary points of general nonconvex problems. Furthermore, CAMEL has achieved the stateoftheart results on extensive datasets. Finally, the ablation studies in three scenarios have demonstrated the effectiveness of the components of CAMEL.
References
 Neural relational inference with fast modular metalearning. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Link Cited by: §1.
 Layer normalization. arXiv preprint arXiv:1607.06450. Cited by: §3.2.

Metalearning with adaptive hyperparameters
. In Advances in Neural Information Processing Systems, Vol. 33, pp. 20755–20765. External Links: Link Cited by: Appendix G, §1.  Metareg: towards domain generalization using metaregularization. Advances in Neural Information Processing Systems 31, pp. 998–1008. Cited by: §1.
 Differentiable metalearning of bandit policies. In Advances in Neural Information Processing Systems, Vol. 33, pp. 2122–2134. External Links: Link Cited by: §1.
 Modular metalearning with shrinkage. In Advances in Neural Information Processing Systems, Vol. 33, pp. 2858–2869. External Links: Link Cited by: Appendix G, §1.
 Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: §1.
 A metalearning approach to (re)discover plasticity rules that carve a desired function into a neural network. In Advances in Neural Information Processing Systems, Vol. 33, pp. 16398–16408. External Links: Link Cited by: §1.
 MetaInit: initializing learning by learning to initialize. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlchéBuc, E. Fox, and R. Garnett (Eds.), Vol. 32, pp. . External Links: Link Cited by: §5.2.
 The advantage of conditional metalearning for biased regularization and fine tuning. In Advances in Neural Information Processing Systems, Vol. 33, pp. 964–974. External Links: Link Cited by: §1.

SSRCNN: a semisupervised learning framework for signal recognition
. IEEE Transactions on Cognitive Communications and Networking (), pp. 1–1. External Links: Document Cited by: §1.  SR2CNN: zeroshot learning for signal recognition. IEEE Transactions on Signal Processing 69 (), pp. 2316–2329. External Links: Document Cited by: §5.

On the convergence theory of gradientbased modelagnostic metalearning algorithms.
In
International Conference on Artificial Intelligence and Statistics
, pp. 1082–1092. Cited by: §2, §4, Lemma 5, Proof 5. 
Modelagnostic metalearning for fast adaptation of deep networks.
In
International Conference on Machine Learning
, pp. 1126–1135. Cited by: §1, Table 1, Table 2.  Probabilistic modelagnostic metalearning. In Advances in Neural Information Processing Systems, Vol. 31. External Links: Link Cited by: §1.
 Adversarially robust fewshot learning: a metalearning approach. In Advances in Neural Information Processing Systems, Vol. 33, pp. 17886–17895. External Links: Link Cited by: Appendix G, §1.
 Deep learning. Vol. 1, MIT press Cambridge. Cited by: §3.2.
 Continuous metalearning without tasks. In Advances in Neural Information Processing Systems, Vol. 33, pp. 17571–17581. External Links: Link Cited by: Appendix G, §1.
 Complexvalued neural networks. Vol. 400, Springer Science & Business Media. Cited by: §1, Assumption 2.
 Complexvalued matrix derivatives: with applications in signal processing and communications. Cambridge University Press. Cited by: §3.2, Assumption 2, Proof 5.
 Long shortterm memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: §1.
 Batch normalization: accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pp. 448–456. Cited by: §3.2.
 Learning the morphology of brain signals using alphastable convolutional sparse coding. In Advances in Neural Information Processing Systems, Vol. 30. External Links: Link Cited by: §1.
 Reconciling metalearning and continual learning with online mixtures of tasks. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Link Cited by: Appendix G, §1.
 Convergence of metalearning with taskspecific adaptation over partial parameters. In Advances in Neural Information Processing Systems, Vol. 33, pp. 11490–11500. External Links: Link Cited by: §1.
 Unsupervised metalearning for fewshot image classification. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Link Cited by: Appendix G, §1.
 Adaptive gradientbased metalearning methods. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Link Cited by: Appendix G, §1.
 Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.1.

Robust metalearning for mixed linear regression with small batches
. In Advances in Neural Information Processing Systems, Vol. 33, pp. 4683–4696. External Links: Link Cited by: §1.  Selfsupervised generalisation with meta auxiliary learning. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Link Cited by: Appendix G, §1.
 A simple neural attentive metalearner. In International Conference on Learning Representations, External Links: Link Cited by: §5.2, Table 1, Table 2.
 On firstorder metalearning algorithms. arXiv preprint arXiv:1803.02999. Cited by: §5.2, Table 1, Table 2.
 Convolutional radio modulation recognition networks. In International Conference on Engineering Applications of Neural Networks, pp. 213–226. Cited by: §1, §5.
 Pytorch: an imperative style, highperformance deep learning library. arXiv preprint arXiv:1912.01703. Cited by: §5.1.
 Metalearning with implicit gradients. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Link Cited by: Appendix G, §1.
 MetaSDF: metalearning signed distance functions. In Advances in Neural Information Processing Systems, Vol. 33, pp. 10136–10147. External Links: Link Cited by: §1.
 Graph signal processing approach to qsar/qspr model learning of compounds. IEEE Transactions on Pattern Analysis and Machine Intelligence (), pp. 1–1. External Links: Document Cited by: §1.
 Deep complex networks. In 6th International Conference on Learning Representations (ICLR), External Links: Link Cited by: §1.
 Deep complex networks. In International Conference on Learning Representations, External Links: Link Cited by: §5.2.
 Complexvalued networks for automatic modulation classification. IEEE Transactions on Vehicular Technology 69 (9), pp. 10085–10089. Cited by: §1.
 Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30, pp. . External Links: Link Cited by: §1.
 Meta learning with relational information for short sequences. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Link Cited by: Appendix G, §1.
 Complex transformer: a framework for modeling complexvalued sequence. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4232–4236. Cited by: §5.2, Table 1, Table 2, footnote 1.
 Online structured metalearning. In Advances in Neural Information Processing Systems, Vol. 33, pp. 6779–6790. External Links: Link Cited by: Appendix G, §1.
 Bayesian modelagnostic metalearning. In Advances in Neural Information Processing Systems, pp. 7343–7353. Cited by: §1.

Convergence analysis of fully complex backpropagation algorithm based on wirtinger calculus
. Cognitive Neurodynamics 8 (3), pp. 261–266. Cited by: Assumption 3.  MetaGAN: an adversarial approach to fewshot learning.. Advances in Neural Information Processing Systems 2, pp. 8. Cited by: §1.
 A complexvalued projection neural network for constrained optimization of real functions in complex variables. IEEE Transactions on Neural Networks and Learning Systems 26 (12), pp. 3227–3238. Cited by: Assumption 2, Assumption 3.
 Efficient meta learning via minibatch proximal update. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Link Cited by: Appendix G, §1.
 Noregret nonconvex online metalearning. In ICASSP 20202020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3942–3946. Cited by: §1.
Appendix A Proof of Lemma 1
Lemma 1
If a function is complex analytic, the time complexity of the derivative of in IQCVNNs are twice that of the complex derivative of in CDCVNNs.
Proof 1
We consider two scenarios, the derivative of simple analytic function and composite analytic function.
1. For a simple analytic function , in CDCVNNs, the complex derivative of with respect to a complex vector is equal to . While IQCVNNs considers and , therefore
(18)  
Thus, in this scenario, the time complexity of the derivative of in IQCVNNs are twice that of the complex derivative of in CDCVNNs.
2. For a composite analytic function where is also complex analytic, in CDCVNNs, the complex derivative of with respect to a complex vector can be computed according to the complex chain rule.
(19) 
Owing to the fact that is complex analytic, is equal to zero. So, (19) can be simplified to
(20) 
where and . Hence, the time complexity of the complex derivative of in CDCVNNs is . However, in IQCVNNs, we have
(21)  
where the size of each above tensor in (
21) is . Hence, the time complexity of the derivative of in IQCVNNs is . Note that the composite function of layers, which can be seen as composite functions of two layers calculated serially. As a result, in CDCVNNs, the time complexity of the complex derivative of is , while in IQCVNNs, the time complexity of the derivative of is . Hence, the Lemma holds in the scenario of the derivative of composite analytic function.To sum up, the Lemma 1 is established in both two scenarios. This completes the proof.
Appendix B Proof of Lemma 2
Lemma 2
The complexvalued convolutional layer and complexvalued fully connected layer is complex analytic.
Proof 2
It is obviously that the complexvalued convolution layer and complexvalued fully connected layer are linear and continuous. Assume that a linear function is continuous with respect to a complex vector , then we can obtain
In a similar way,
Therefore,
According to the function is continuous and satisfies the CauchyRiemann equations, the linear function is complex analytic. Hence, the complexvalued convolution layer and complexvalued fully connected layer are complex analytic. This completes the proof.
Appendix C Proof of Lemma 3
Lemma 3
, is analytic if and only if is a constant function.
Proof 3
Assume a function is analytic. Then has to satisfy the CauchyRiemann equations:
where is the complex input vector. Since the partial derivatives of are all equal to 0, is a constant function. This completes the proof.
Appendix D Definition Recall
In this section, we recall the definitions of complex derivative, analytic function, and the CauchyRiemann equations.
Complex derivative Let , where . If is continuous at a point , we can define its complex derivative as:
(22) 
This is similar to the definition of the derivative for the function of a real variable. In the real case, the existence of the derivative implies that the limits of both exist and are equal when the point converges to from both the left and right directions. However, in the complex case, it means that the limits of exist and are equal when the point converges to from any directions in the complex plane. If a function satisfies this property at a point , we say that the function is complexdifferentiable at .
Analytic function If a function is complexdifferentiable for all points in some domain , then is said to be analytic, i.e., is a complex analytic function also known as holomorphic function, in .
The CauchyRiemann equations
The CauchyRiemann equations are a pair of real partial differential equations, and their complex analytic function needs to satisfy:
(23) 
where and denote the real and imaginary parts of the complex number, respectively. The necessary and sufficient condition for to be complex analytic function in is that the function is continuous and satisfies the CauchyRiemann equations in .
Appendix E Proof of Lemma 4
Lemma 4
In response to complex metaparameters , we have
(11) 
Proof 4
According to (5), it is obviously that
(24) 
Note that, since , following the definition of complex gradient vector (10), we have
(25)  
where the second equality is because the output of is real, the third equality follows the complex chain rule, and the last equality is given by the definition of complex gradient vector. Next, according to innerloop update process (3), we have
(26)  
Similarly,
(27)  
Now, using (7), we can write (25) as
(28) 
(29) 
This completes the proof.
Appendix F Complexvalued Neural Networks
Neural networks require backpropagation to update their parameters via firstorder derivatives, as do complexvalued neural networks. We would prefer the functions in complexvalued neural networks to be analytic. Define and as the complex input vector and complex bias vector for each function, respectively.
Complexvalued convolutional layer The complexvalued convolutional layer implements the convolution operation on complex input signals. Define as the complex convolution kernel. Given , , and , since the complexvalued convolutional layer is linear, we are able to compute the real and imaginary parts of its outputs separately as the following
(30a)  
(30b) 
Then, according to the (30a) and (30b), the complexvalued convolutional layer can be represented as follows:
where denotes the convolution operation in real case.
Complexvalued fully connected layer The complexvalued fully connected layer achieves the linear transformation of complex inputs. Define as the complex weight matrix. Given , , and , the real and imaginary parts of the outputs of complexvalued fully connected layer can be computed as:
(31a)  
(31b) 
Similarly, the complexvalued fully connected layer can be expressed as:
(32)  
Appendix G Related work
Recently, metalearning has demonstrated promising performance in many fields. Khodadadeh et al. Khodadadeh et al. (2019) proposed an unsupervised algorithm for modelindependent metalearning for classification tasks. The work Liu et al. (2019) proposed a new method that automatically learns appropriate labels for auxiliary tasks. The work Xie et al. (2019) proposed a new metalearning method to learn heterogeneous point process models from short event sequence data and relational networks. In addition, the work Jerfel et al. (2019)
proposed a Dirichlet process mixture for hierarchical Bayesian models with the parameters of arbitrary parametric models. Khodak et al.
Khodak et al. (2019) built a theoretical framework for the design and understanding of practical metalearning methods. The authors in Zhou et al. (2019) proposed a metalearning method based on minibatch proximal update for learning effective hypothesis transfer.Moreover, the work Rajeswaran et al. (2019) proposed an implicit MAML algorithm which relies only on the solution to the inner level optimization. The work Chen et al. (2020) a metalearning approach that avoids the need for this often suboptimal handselection. The work Yao et al. (2020) proposed an online structured metalearning framework. Additionally, the authors in Baik et al. (2020) proposed a new weight update rule that greatly enhances the fast adaptation process. The work Harrison et al. (2020) proposed a metalearning approach via online changepoint analysis to augment with a differentiable Bayesian changepoint detection scheme. The work Goldblum et al. (2020) proposed an adversarial querying algorithm for generating adversarially robust metalearners and thoroughly investigated the causes for adversarial vulnerability.
Appendix H Convergence Analysis
For ease of writing and derivation, in our notation, represents the loss function on the task , represents the loss on the task after the innerloop update process, and represents the metaobjective. By drawing task from task probability distribution , our optimization problem can be rewritten as
(33) 
Definition 1
A random vector is called an approximate first order stationary point for problem 33 if it satisfies .
Then, we formally state our assumptions as below.
Assumption 1
is bounded below, and
Comments
There are no comments yet.