Reading group on RMT and Machine Learning

[A-1] Peter L. Bartlett, Andrea Montanari, and Alexander Rakhlin. “Deep Learning: A Statistical Viewpoint”. In: Acta Numerica 30 (May 2021), pp. 87–201

[B-1-1] Leonid Pastur. “On Random Matrices Arising in Deep Neural Networks. Gaussian Case”. 2020.
[][B-1-2] Leonid Pastur and Victor Slavin. “On Random Matrices Arising in Deep Neural Networks: General I.I.D. Case”. In: Random Matrices: Theory and Applications 12.01 (Jan. 2023), p. 2250046
[][B-1-3] Leonid Pastur. “Eigenvalue Distribution of Large Random Matrices Arising in Deep Neural Networks: Orthogonal Case”. In: Journal of Mathematical Physics 63.6 (2022), p. 063505

[B-2-1] Lucas Benigni and Sandrine Péché. “Eigenvalue Distribution of Some Nonlinear Models of Random Matrices”. In: Electronic Journal of Probability 26.none (Jan. 2021), pp. 1–37
[][B-2-2] Lucas Benigni and Sandrine Péché. “Largest Eigenvalues of the Conjugate Kernel of Single-Layered Neural Networks”. 2022.

[][B-3-1] Jeffrey Pennington and Pratik Worah. “Nonlinear random matrix theory for deep learning”. In: Advances in Neural Information Processing Systems. 2017, pp. 2634–2643 (GOOD TO KNOW)
[B-3-2] Jeffrey Pennington, Samuel Schoenholz, and Surya Ganguli. “Resurrecting the Sigmoid in Deep Learning through Dynamical Isometry: Theory and Practice”. In: Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc., 2017, pp. 4785–4795 (GOOD TO KNOW)
[][B-3-3] Charles H. Martin, Michael W. Mahoney. “Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning”. 2018. (mostly empirical)
[][B-3-4] Charles H. Martin, Michael W. Mahoney. “Traditional and Heavy-Tailed Self Regularization in Neural Network Models”. 2019. (mostly empirical)
[B-3-5] Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney. “Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data”. 2022. (mostly empirical)
[B-3-6] Matthias Thamm, Max Staats, Bernd Rosenow. “Random matrix analysis of deep neural network weight matrices”. PhysRevE. 2022.

[D-1] Arthur Jacot, Franck Gabriel, and Clément Hongler. “Neural tangent kernel: Convergence and generalization in neural networks”. In: Advances in neural information processing systems. 2018, pp. 8571–8580
[D-2] Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Jascha Sohl-Dickstein, Jeffrey Pennington. “Wide Neural Networts of Any Depth Evolve As Linear Models Under Gradients Descent”. NeurIPS 2019.
[][D-3] Ph.D. thesis of Arthur Jacor, “Theory of Deep Learning: Neural Tangent Kernel and Beyond”, 2023.
[D-4] Zhou Fan and Zhichao Wang. “Spectra of the Conjugate Kernel and Neural Tangent Kernel for Linear-Width Neural Networks”. In: Advances in Neural Information Processing Systems. Vol. 33. Curran Associates, Inc., 2020, pp. 7710–7721
[][D-5] Hong Hu, Yue M. Lu. “Universality laws for high-dimensional learning with random features. IEEE Transactions on Information Theory 69 (3), 1932-1964. 2022.
[][D-6] Zhichao Wang, Andrew Engel, Anand Sarwate, Ioana Dumitriu, Tony Chiang. “Spectral evolution and invariance in linear-width neural networks”. 2022.
[][D-7] Theodor Misiakiewicz and Andrea Montanari, “Six Lectures on Linearized Neural Networks”. 2023

[][E-1] Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang. “High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation”. 2022
[][E-2] Behrad Moniri, Donghwan Lee, Hamed Hassani, Edgar Dobriban. “A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks”. 2023
[][E-3] Gerard Ben Arous, Reza Gheissari, and Aukosh Jagannath. “High-Dimensional Limit Theorems for SGD: Effective Dynamics and Critical Scaling”. In: Advances in Neural Information Processing Systems 35 (Dec. 2022), pp. 25349–25362
[][E-4] Gerard Ben Arous, Reza Gheissari, Jiaoyang Huang, Aukosh Jagannath. “High-Dimensional SGD Aligns with Emerging Outlier Eigenspaces”. 2023.

[F-1] Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant. “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes”. NeurISP 2022.
[F-2] Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou. “What learning algorithm is in-context learning? Investigations with linear models”. ICLR 2023.
[F-3] Johannes von Oswald et al. “Transformers Learn In-Context by Gradient Descent”. ICML 2022.
[][F-4] Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak. “Transformers as Algorithms: Generalization and Stability in In-context Learning”. ICML 2023.
[][F-5] Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, Song Mei. “Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection”. 2023.
[F-6] Zhang, Ruiqi, Spencer Frei, and Peter L. Bartlett. “Trained Transformers Learn Linear Models In-Context”. 2023. arXiv preprint arXiv:2306.09927 (2023).

Zhenyu Liao, EIC, Huazhong University of Science and Technology
Prof. Shurong Zheng, School of Mathematics and Statistics, Northeast Normal University
Prof. Jeff Yao, School of Data Science, The Chinese University of Hong Kong, Shenzhen

The organizers are grateful for support from NSFC via fund NSFC-62206101 and NSFC-12141107.