Schedule
Date | Speaker | Papers to be presented | |
---|---|---|---|
1 | June 14, 2024 | Yanlei Liu | [D-9] |
2 | June 21, 2024 | Chengmei Niu | [D-8] |
3 | June 28, 2024 | Jaiqing Liu | [D-2] |
4 | July 05, 2024 | Kexin Chen | [D-10] |
5 | July 12, 2024 | Muen Wu | [D-5] |
6 | July 19, 2024 | Yue Xu | [D-4] |
List of papers
[A] Tensor Program
-
[][A-1] Yang, Greg. “Wide feedforward or recurrent neural networks of any architecture are gaussian processes.” Advances in Neural Information Processing Systems 32 (2019).
-
[][A-2] Yang, Greg. “Tensor programs ii: Neural tangent kernel for any architecture.” arXiv preprint arXiv:2006.14548 (2020).
-
[][A-3] Yang, Greg, and Etai Littwin. “Tensor programs iib: Architectural universality of neural tangent kernel training dynamics.” International Conference on Machine Learning. PMLR, 2021.
-
[][A-4] Yang, Greg, et al. “Tensor programs vi: Feature learning in infinite-depth neural networks.” arXiv preprint arXiv:2310.02244 (2023).
-
[][A-5] Noci, Lorenzo, et al. “Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning.” arXiv preprint arXiv:2402.17457 (2024).
-
[][A-6] Li, Ping, and Phan-Minh Nguyen. “On random deep weight-tied autoencoders: Exact asymptotic analysis, phase transitions, and implications to training.” International Conference on Learning Representations. 2018.
-
[][A-7] Noci, Lorenzo, et al. “Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning.” arXiv preprint arXiv:2402.17457 (2024).
[B] Theory and Practice for Transformers
-
[][B-1] Cowsik, Aditya, et al. “Geometric Dynamics of Signal Propagation Predict Trainability of Transformers.” arXiv preprint arXiv:2403.02579 (2024).
-
[][B-2] Noci, Lorenzo, et al. “Signal propagation in transformers: Theoretical perspectives and the role of rank collapse.” Advances in Neural Information Processing Systems 35 (2022): 27198-27211.
-
[][B-3] Malladi, Sadhika, et al. “A kernel-based view of language model fine-tuning.” International Conference on Machine Learning. PMLR, 2023.
-
[][B-4] Hayou, Soufiane, Nikhil Ghosh, and Bin Yu. “LoRA+: Efficient Low Rank Adaptation of Large Models.” arXiv preprint arXiv:2402.12354 (2024). Together with the original LoRA paper
[C] Random kernel matrices
- [C-1] Xiuyuan Cheng, Amit Singer. “The Spectrum of Random Inner-product Kernel Matrices”. 2012.
- [C-2] Zhou Fan, Andrea Montanari. “The Spectral Norm of Random Inner-Product Kernel Matrices”. 2017.
- [C-3] Z. Liao, R. Couillet, “Inner-product Kernels are Asymptotically Equivalent to Binary Discrete Kernels”, 2019.
- [C-4] Z. Liao, R. Couillet, and M. Mahoney. “Sparse Quantized Spectral Clustering.” 2021.
- [C-5] Yue M. Lu, Horng-Tzer Yau. “An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices with Polynomial Scalings”. 2023.
- [C-6] Sofiia Dubova et al. “Universality for the global spectrum of random inner-product kernel matrices in the polynomial regime.” 2023.
[D] Transformer-based model and in-context learning
- [][D-1] Ruiqi Zhang, Spencer Frei, and Peter L. Bartlett. “Trained transformers learn linear models in-context”. Journal of Machine Learning Research, 25(49):1–55, 2024.
- [D-2] Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant. “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes”. NeurISP 2022.
- [][D-3] Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou. “What learning algorithm is in-context learning? Investigations with linear models”. ICLR 2023.
- [D-4] Johannes von Oswald et al. “Transformers Learn In-Context by Gradient Descent”. ICML 2022.
- [D-5] Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak. “Transformers as Algorithms: Generalization and Stability in In-context Learning”. ICML 2023.
- [][D-6] Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, Song Mei. “Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection”. 2023.
- [][D-7] Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, and Peter Bartlett. “How many pretraining tasks are needed for in-context learning of linear regression?”. ICLR 2024.
- [D-8] Yue M. Lu, Mary I. Leteya, Jacob A. Zavatone-Veth, Anindita Maiti, and Cengiz Pehlevan. “Asymptotic theory of in-context learning by linear attention”. 2024.
- [D-9] Aaditya K Singh, Stephanie C.Y. Chan, Ted Moskovitz, Erin Grant, Andrew M Saxe, Felix Hill. “The Transient Nature of Emergent In-Context Learning in Transformers”. NeurIPS 2023.
- [D-10] Allan Raventos, Mansheej Paul, Feng Chen, Surya Ganguli. “Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression”. NeurIPS 2023.
- [][D-11] Siyu Chen, Heejune Sheen, Tianhao Wang, Zhuoran Yang. “Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality”. 2024.
[F] Others
-
[][F-1] Bordelon, Blake, Alexander Atanasov, and Cengiz Pehlevan. “A Dynamical Model of Neural Scaling Laws.” arXiv preprint arXiv:2402.01092 (2024).
-
[][F-2] Bahri, Yasaman et al. “Explaining Neural Scaling Laws.” 2021.
-
[][F-3] Kumar, Tanishq, et al. “Grokking as the transition from lazy to rich training dynamics.” arXiv preprint arXiv:2310.06110 (2023).
-
[][F-4] Papyan, Vardan, X. Y. Han, and David L. Donoho. “Prevalence of neural collapse during the terminal phase of deep learning training.” Proceedings of the National Academy of Sciences 117.40 (2020): 24652-24663.
-
[][F-5] Adityanarayanan Radhakrishnan et al. “Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features.” 2023.
-
[][F-6] Noam Levi, Alon Beck, and Yohai Bar Sinai. “Grokking in Linear Estimators – A Solvable Model that Groks without Understanding.” 2023.
More information
Contact and Thanks
- Zhenyu Liao, EIC, Huazhong University of Science and Technology
The organizers are grateful for support from NSFC via fund NSFC-62206101.