Overcoming the Vanishing Gradient Problem during Learning Recurrent Neural Nets (RNN)

Takudzwa Fadziso

doi:10.18034/ajase.v9i1.41

Authors

Takudzwa Fadziso

Keywords:

Vanishing Gradient

Recurrent Neural Network

Deep learning

Error Flow

Abstract

Artificial neural nets have been equipped with working out the difficulty that arises as a result of exploding and vanishing gradients. The difficulty of working out is worsened exponentially particularly in deep learning understanding. With gradient-oriented learning approaches the up-to-date error gesture has to “flow back in time” throughout the response links to previously feedbacks for designing suitable feedback storage. To address the gradient vanishing delinquent, adaptive optimization approaches are given. With adaptive learning proportion, the adaptive gradient classifier switches the constraint for substantial hyper factor fine-tuning. Based on the numerous outstanding advances that recurrent neural nets (RNN) have added in the erstwhile in the field of Deep Learning. The objective of this paper is to have a concise synopsis of this evolving topic, with a focus on how to over the vanishing gradient problems during learning RNN. There are four types of methods adopted in this study to provide solutions to the gradient vanishing problem and they include approaches that do not employ gradients; approaches that enforce larger gradients, approaches that work at a higher level, and approaches that make use of unique structures. The inaccuracy flow for gradient-oriented recurrent learning approaches was hypothetically examined. This analysis exhibited that learning to link long-term lags can be problematic. Cutting-edge approaches to solving the gradient vanishing difficulty were revealed, but these methods have serious disadvantages, for example, practicable only for discrete data. The study deep-rooted that orthodox learning classifiers for recurrent neural networks are not able to learn long-term lag complications at a reasonable interval.

References

Bynagari, N. B. (2014). Integrated Reasoning Engine for Code Clone Detection. ABC Journal of Advanced Research, 3(2), 143-152. https://doi.org/10.18034/abcjar.v3i2.575

Bynagari, N. B. (2017). Prediction of Human Population Responses to Toxic Compounds by a Collaborative Competition. Asian Journal of Humanity, Art and Literature, 4(2), 147-156. https://doi.org/10.18034/ajhal.v4i2.577

Bynagari, N. B. (2018). On the ChEMBL Platform, a Large-scale Evaluation of Machine Learning Algorithms for Drug Target Prediction. Asian Journal of Applied Science and Engineering, 7, 53–64. Retrieved from https://upright.pub/index.php/ajase/article/view/31

Bynagari, N. B. (2019). GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Asian Journal of Applied Science and Engineering, 8, 25–34. Retrieved from https://upright.pub/index.php/ajase/article/view/32

Bynagari, N. B., & Amin, R. (2019). Information Acquisition Driven by Reinforcement in Non-Deterministic Environments. American Journal of Trade and Policy, 6(3), 107-112. https://doi.org/10.18034/ajtp.v6i3.569

Bynagari, N. B., & Fadziso, T. (2018). Theoretical Approaches of Machine Learning to Schizophrenia. Engineering International, 6(2), 155-168. https://doi.org/10.18034/ei.v6i2.568

Cleeremans, A., Servan-Schreiber, D. and McClelland, J. L. (1989). Finite-state sutomata and simple recurrent networks, Neural Computation, 1, 372-381.

Duchi, J., Hazan, E. and Singer, Y. (2011). Adaptive sub-gradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., 12, 2121–2159.

Elman, J. L. (1988). Finding structure in time, Technical Report CRL 8801, Center for Research in Language, Univ. of California, San Diego.

Fahlman, S. E. (1991). The recurrent cascade-correlation learning algorithm, in advances in Neural Information Processing Systems, ed. R. P. Lippmann et al., (Morgan Kaufmann, San Meteo, 1991), 190 – 196.

Ganapathy, A. (2016). Virtual Reality and Augmented Reality Driven Real Estate World to Buy Properties. Asian Journal of Humanity, Art and Literature, 3(2), 137-146. https://doi.org/10.18034/ajhal.v3i2.567

Ganapathy, A. (2018a). Cascading Cache Layer in Content Management System. Asian Business Review, 8(3), 177-182. https://doi.org/10.18034/abr.v8i3.542

Ganapathy, A. (2018b). UI/UX Automated Designs in the World of Content Management Systems. Asian Journal of Applied Science and Engineering, 7(1), 43-52.

Ganapathy, A. (2019). Cyber Security for the Cloud Infrastructure. Asian Journal of Applied Science and Engineering, 8(1), 15-24.

Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks, in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. PMLR, 249–256.

Glorot, X., Bordes, A. and Bengio, Y. (2011). Deep sparse rectifier neural networks, in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 315–323.

Hochreiter, S. and Schmidhuber, J. (1996). Bridging long time lags by weight guessing and Long short-term memory, In Spatiotemporal models in biological and artificial systems, ed. F. L. Silva et al (IOS Press, Amsterdam, Netherlands, pp. 1996.

Hochreiter, S. and Schmidhuber, J. (1997). Flat minima. Neural Computation, 9(1): 1-42.

Hochreiter, S. and Schmidhuber, J. (1997). Long short term memory. Neural Computation, 9(8): 1735-1780.

Hochreiter, S. and Schmidhuber, J. (1997). LSTM can solve hard long time lag problem, in in Advances in Neural Information Processing Systems 9, ed. M. C.Mozer et al. (Morgan Kaufmann, San Meteo), pp. 473-479.

Lang, K., Waibel, A. and Hinton, G. E. (1990). A time-delay neural network architecture for isolated word recognition. Neural Networks, 3: 23- 43.

Lau, M. M. and Lim, K. M. (2018). Review of adaptive activation function in deep neural network, in 2018 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), Dec 2018, pp. 686–690.

LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning, Nature, 521(7533), 436 – 444.

Lin, T., Horne, B. G., Tino, P. and Giles, C. L. (1996). Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions Neural Networks, 7(6), 1329-1338.

Maas, A. L., Hannun, A. Y. and Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models, in Proceedings of the 30th International Conference on Machine Learning.

Miller, C. B. and Giles, C. L. (1993). Experimentl comparison of the effect of order in recurrent neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7(4), 849 – 872.

Paruchuri, H. (2019). Market Segmentation, Targeting, and Positioning Using Machine Learning. Asian Journal of Applied Science and Engineering, 8(1), 7-14.

Paruchuri, H., & Asadullah, A. (2018). The Effect of Emotional Intelligence on the Diversity Climate and Innovation Capabilities. Asia Pacific Journal of Energy and Environment, 5(2), 91-96. https://doi.org/10.18034/apjee.v5i2.561

Pearlmutter, B. A. (1995). Gradient calculation for dynamic recurrent neural networks: A survey. IEEE Transactions on Neural Network, 6(5), 1212 – 1228.

Pearlmutter, B. A. 1989. Learning state space trajectories neural networks. Neural Computation, 1(2): 263 – 269.

Plate, T. A. (1993). Holographic recurrent networks, in Advances in Neural Information Processing Systems 5, ed. J.D. Cowan et al. (Morgan Kaufmann, San Meteo), 34-41.

Puskorius, G. V. and Feldkamp, L. A. (1994). Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks. IEEE Transactions on Neural Networks, 5(2), 279-297.

Robinson, A. J. and Fallside, F. (1987). The utility driven dynamic error propagation network, Technical Report CUED/F-INFENG/TR.1, Cambridge Univ, Engineering Department.

Schmidhuber, J. (1992). A fixed size storage O(n3) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation, 4(2), 243- 246.

Schmidhuber, J. and Hochreiter, S. (1996). Guessing can outperform many long timelag algorithms.. Technical Report IDSIA-19-96, IDSIA.

Smith, A. w. and Zipser, D. (1989). Learning sequential structures with the real-time recurrent learning algorithm. International Journal of Neural Systems, 1(2), 125-131.

Sun, G., Chen, H. and Lee, Y. (1993). Time warping invariant neural networks, in Advances in Neural Information Processing Systems 5, ed. J. D Cowan et al. (Morgan Kaufmann, San Meteo), 180-187.

Vadlamudi, S. (2016). What Impact does Internet of Things have on Project Management in Project based Firms?. Asian Business Review, 6(3), 179-186. https://doi.org/10.18034/abr.v6i3.520

Vadlamudi, S. (2019). How Artificial Intelligence Improves Agricultural Productivity and Sustainability: A Global Thematic Analysis. Asia Pacific Journal of Energy and Environment, 6(2), 91-100. https://doi.org/10.18034/apjee.v6i2.542

Watron, R. L. and Kuhn, G. M. (1992). Induction of finite-state languages using second-order recurrent networks. Neural Computation, 4, 406-414.

Williams, R. J. (1989). Complexity of exact gradient computatuion algorithms for recurrent neural networks, Technical Report NU-CCS-89-27, Boston: Northeastern Univ., College of Computer Science.

Yang, G. and Schoenholz, S. S. (2017). Mean field residual networks: On the edge of chaos, CoRR, vol. abs/1712.08969, 2017. [Online]. Available: http://arxiv.org/abs/1712.08969