List of papers
[A] Tensor Program
-
[][A-1] Yang, Greg. “Wide feedforward or recurrent neural networks of any architecture are gaussian processes.” Advances in Neural Information Processing Systems 32 (2019).
-
[][A-2] Yang, Greg. “Tensor programs ii: Neural tangent kernel for any architecture.” arXiv preprint arXiv:2006.14548 (2020).
-
[][A-3] Yang, Greg, and Etai Littwin. “Tensor programs iib: Architectural universality of neural tangent kernel training dynamics.” International Conference on Machine Learning. PMLR, 2021.
-
[][A-4] Yang, Greg, et al. “Tensor programs vi: Feature learning in infinite-depth neural networks.” arXiv preprint arXiv:2310.02244 (2023).
-
[][A-5] Noci, Lorenzo, et al. “Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning.” arXiv preprint arXiv:2402.17457 (2024).
-
[][A-6] Li, Ping, and Phan-Minh Nguyen. “On random deep weight-tied autoencoders: Exact asymptotic analysis, phase transitions, and implications to training.” International Conference on Learning Representations. 2018.
-
[][A-7] Noci, Lorenzo, et al. “Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning.” arXiv preprint arXiv:2402.17457 (2024).
[B] Theory and Practice for Transformers
-
[][B-1] Cowsik, Aditya, et al. “Geometric Dynamics of Signal Propagation Predict Trainability of Transformers.” arXiv preprint arXiv:2403.02579 (2024).
-
[][B-2] Noci, Lorenzo, et al. “Signal propagation in transformers: Theoretical perspectives and the role of rank collapse.” Advances in Neural Information Processing Systems 35 (2022): 27198-27211.
-
[][B-3] Malladi, Sadhika, et al. “A kernel-based view of language model fine-tuning.” International Conference on Machine Learning. PMLR, 2023.
-
[][B-4] Hayou, Soufiane, Nikhil Ghosh, and Bin Yu. “LoRA+: Efficient Low Rank Adaptation of Large Models.” arXiv preprint arXiv:2402.12354 (2024). Together with the original LoRA paper
[C] Random kernel matrices
- [][C-1] Xiuyuan Cheng, Amit Singer. “The Spectrum of Random Inner-product Kernel Matrices”. 2012.
- [][C-2] Zhou Fan, Andrea Montanari. “The Spectral Norm of Random Inner-Product Kernel Matrices”. 2017.
- [][C-3] Z. Liao, R. Couillet, “Inner-product Kernels are Asymptotically Equivalent to Binary Discrete Kernels”, 2019.
- [][C-4] Z. Liao, R. Couillet, and M. Mahoney. “Sparse Quantized Spectral Clustering.” 2021.
- [][C-5] Yue M. Lu, Horng-Tzer Yau. “An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices with Polynomial Scalings”. 2023.
- [][C-6] Sofiia Dubova et al. “Universality for the global spectrum of random inner-product kernel matrices in the polynomial regime.” 2023.
[D] Others
-
[][D-1] Bordelon, Blake, Alexander Atanasov, and Cengiz Pehlevan. “A Dynamical Model of Neural Scaling Laws.” arXiv preprint arXiv:2402.01092 (2024).
-
[][D-2] Bahri, Yasaman et al. “Explaining Neural Scaling Laws.” 2021.
-
[][D-3] Kumar, Tanishq, et al. “Grokking as the transition from lazy to rich training dynamics.” arXiv preprint arXiv:2310.06110 (2023).
-
[][D-4] Papyan, Vardan, X. Y. Han, and David L. Donoho. “Prevalence of neural collapse during the terminal phase of deep learning training.” Proceedings of the National Academy of Sciences 117.40 (2020): 24652-24663.
-
[][D-5] Adityanarayanan Radhakrishnan et al. “Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features.” 2023.
-
[][D-6] Noam Levi, Alon Beck, and Yohai Bar Sinai. “Grokking in Linear Estimators – A Solvable Model that Groks without Understanding.” 2023.
More information
Contact and Thanks
- Zhenyu Liao, EIC, Huazhong University of Science and Technology
The organizers are grateful for support from NSFC via fund NSFC-62206101.