Schedule
Date | Speaker | Papers to be presented |
---|---|---|
Dec. 27, 2023 | Zhaorui Dong | [B-1-1] |
Jan. 3, 2024 | Zhuofan Xu | [B-3-5] |
Jan. 10, 2024 | Xuran Meng | [C-2] |
Jan. 17, 2024 | Jing Chen | [F-1] |
Jan. 24, 2024 | Xingkai Wen | [B-3-6] |
Jan. 31, 2024 | Tingting Zou | [C-5] |
Feb. 7, 2024 | Mengze Li | [F-3] |
Feb. 28, 2024 | Xiaoyi Wang | [C-1] |
Mar. 6, 2024 | Muen Wu | [F-2] |
Mar. 13, 2024 | Chengmei Niu | [C-3] |
Mar. 20, 2024 | Zhenyu Liao | [A-1] |
Mar. 27, 2024 | Zhenyu Liao | [A-1] |
Apr. 3, 2024 | [empty] | [empty] |
Apr. 10, 2024 | Zhenyu Liao | [A-1] |
Apr. 17, 2024 | Mengze Li | [B-3-2] |
Apr. 24, 2024 | Xuran Meng | [F-6] |
May 1, 2024 | [empty] | [empty] |
May 8, 2024 | Tingting Zou | [H-1] |
May 15, 2024 | [empty] | [empty] |
May 22, 2024 | [empty] | [empty] |
May 29, 2024 | Jing Chen | [D-2] |
June 5, 2024 | Zhaorui Dong | [B-2-1] |
June 12, 2024 | [empty] | [empty] |
June 19, 2024 | Jie Wei | [C-6] [C-7] |
June 26, 2024 | Zhoufan Xu | [D-6] |
List of papers
[A] Overview paper
- [A-1] Peter L. Bartlett, Andrea Montanari, and Alexander Rakhlin. “Deep Learning: A Statistical Viewpoint”. In: Acta Numerica 30 (May 2021), pp. 87–201
[B] On the interface between RMT and deep neural networks:
[B-1] Pastur’s papers
- [B-1-1] Leonid Pastur. “On Random Matrices Arising in Deep Neural Networks. Gaussian Case”. 2020.
- [][B-1-2] Leonid Pastur and Victor Slavin. “On Random Matrices Arising in Deep Neural Networks: General I.I.D. Case”. In: Random Matrices: Theory and Applications 12.01 (Jan. 2023), p. 2250046
- [][B-1-3] Leonid Pastur. “Eigenvalue Distribution of Large Random Matrices Arising in Deep Neural Networks: Orthogonal Case”. In: Journal of Mathematical Physics 63.6 (2022), p. 063505
[B-2] Péché’s papers
- [B-2-1] Lucas Benigni and Sandrine Péché. “Eigenvalue Distribution of Some Nonlinear Models of Random Matrices”. In: Electronic Journal of Probability 26.none (Jan. 2021), pp. 1–37
- [][B-2-2] Lucas Benigni and Sandrine Péché. “Largest Eigenvalues of the Conjugate Kernel of Single-Layered Neural Networks”. 2022.
[B-3] Others
- [][B-3-1] Jeffrey Pennington and Pratik Worah. “Nonlinear random matrix theory for deep learning”. In: Advances in Neural Information Processing Systems. 2017, pp. 2634–2643 (GOOD TO KNOW)
- [B-3-2] Jeffrey Pennington, Samuel Schoenholz, and Surya Ganguli. “Resurrecting the Sigmoid in Deep Learning through Dynamical Isometry: Theory and Practice”. In: Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc., 2017, pp. 4785–4795 (GOOD TO KNOW)
- [][B-3-3] Charles H. Martin, Michael W. Mahoney. “Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning”. 2018. (mostly empirical)
- [][B-3-4] Charles H. Martin, Michael W. Mahoney. “Traditional and Heavy-Tailed Self Regularization in Neural Network Models”. 2019. (mostly empirical)
- [B-3-5] Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney. “Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data”. 2022. (mostly empirical)
- [B-3-6] Matthias Thamm, Max Staats, Bernd Rosenow. “Random matrix analysis of deep neural network weight matrices”. PhysRevE. 2022.
[C] Double descent
- [C-1] Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani. “Surprises in High-Dimensional Ridgeless Least Squares Interpolation”. In: The Annals of Statistics 50.2 (Apr. 2022), pp. 949–986
- [C-2] Song Mei and Andrea Montanari. “The generalization error of random features regression: Precise asymptotics and double descent curve”. Communications on Pure and Applied Mathematics, 2021.
- [C-3] Denny Wu and Ji Xu. “On the Optimal Weighted ell 2 Regularization in Overparameterized Linear Regression”. NeurIPS 2020.
- [][C-4] Ben Adlam, Jeffrey Pennington. “The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization”. ICML 2020.
- [C-5] Francis Bach. “High-dimensional analysis of double descent for linear regression with random projections”. 2023.
- [C-6] B Kelly, S Malamud, K Zhou. “The virtue of complexity in return prediction”. 2024
- [C-7] A Didisheim, SB Ke, BT Kelly, S Malamud. “Complexity in factor pricing models”. 2023
[D] Neural Tangent Kernel and linearized neural networks
- [D-1] Arthur Jacot, Franck Gabriel, and Clément Hongler. “Neural tangent kernel: Convergence and generalization in neural networks”. In: Advances in neural information processing systems. 2018, pp. 8571–8580
- [D-2] Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Jascha Sohl-Dickstein, Jeffrey Pennington. “Wide Neural Networts of Any Depth Evolve As Linear Models Under Gradients Descent”. NeurIPS 2019.
- [][D-3] Ph.D. thesis of Arthur Jacor, “Theory of Deep Learning: Neural Tangent Kernel and Beyond”, 2023.
- [D-4] Zhou Fan and Zhichao Wang. “Spectra of the Conjugate Kernel and Neural Tangent Kernel for Linear-Width Neural Networks”. In: Advances in Neural Information Processing Systems. Vol. 33. Curran Associates, Inc., 2020, pp. 7710–7721
- [][D-5] Hong Hu, Yue M. Lu. “Universality laws for high-dimensional learning with random features. IEEE Transactions on Information Theory 69 (3), 1932-1964. 2022.
- [][D-6] Zhichao Wang, Andrew Engel, Anand Sarwate, Ioana Dumitriu, Tony Chiang. “Spectral evolution and invariance in linear-width neural networks”. 2022.
- [][D-7] Theodor Misiakiewicz and Andrea Montanari, “Six Lectures on Linearized Neural Networks”. 2023
[E] High-dimensional dynamics of DNN
- [][E-1] Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang. “High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation”. 2022
- [][E-2] Behrad Moniri, Donghwan Lee, Hamed Hassani, Edgar Dobriban. “A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks”. 2023
- [][E-3] Gerard Ben Arous, Reza Gheissari, and Aukosh Jagannath. “High-Dimensional Limit Theorems for SGD: Effective Dynamics and Critical Scaling”. In: Advances in Neural Information Processing Systems 35 (Dec. 2022), pp. 25349–25362
- [][E-4] Gerard Ben Arous, Reza Gheissari, Jiaoyang Huang, Aukosh Jagannath. “High-Dimensional SGD Aligns with Emerging Outlier Eigenspaces”. 2023.
[F] Transformer-based model and in-context learning
- [F-1] Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant. “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes”. NeurISP 2022.
- [F-2] Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou. “What learning algorithm is in-context learning? Investigations with linear models”. ICLR 2023.
- [F-3] Johannes von Oswald et al. “Transformers Learn In-Context by Gradient Descent”. ICML 2022.
- [][F-4] Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak. “Transformers as Algorithms: Generalization and Stability in In-context Learning”. ICML 2023.
- [][F-5] Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, Song Mei. “Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection”. 2023.
- [F-6] Zhang, Ruiqi, Spencer Frei, and Peter L. Bartlett. “Trained Transformers Learn Linear Models In-Context”. 2023. arXiv preprint arXiv:2306.09927 (2023).
[H] Random kernel matrices
- [H-1] Xiuyuan Cheng, Amit Singer. “The Spectrum of Random Inner-product Kernel Matrices”. 2012.
- [][H-2] Zhou Fan, Andrea Montanari. “The Spectral Norm of Random Inner-Product Kernel Matrices”. 2017.
- [][H-3] Yue M. Lu, Horng-Tzer Yau. “An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices with Polynomial Scalings”. 2023.
More information
Contact and Thanks
- Zhenyu Liao, EIC, Huazhong University of Science and Technology
- Prof. Shurong Zheng, School of Mathematics and Statistics, Northeast Normal University
- Prof. Jeff Yao, School of Data Science, The Chinese University of Hong Kong, Shenzhen
The organizers are grateful for support from NSFC via fund NSFC-62206101 and NSFC-12141107.