MoCaO Lectures:  Data Science

 July 11-15, 2022

We are pleased to announce the inaugural MoCaO Lectures in Computation and Optimisation. For 2022 we are focusing on Data science and in particular machine learning, its algorithms, mathematical foundations and applications. These lectures are designed to be accessible to novices to the field who have a mathematics and computational background, such as phd students, postdoc and/or inquisitive academics who wish to have a better understanding of recent advances in this dynamic field.

These one hour lectures will be held each day during the week of July the 11 to the 15th and will be scheduled at 12noon on the Monday through to the Thursday and will be starting at 12.30 on the Friday and run for 2 hours that day. All lectures will be broadcast via Zoom.

Speakers:

Prof. Stephen Wright:  is the George B. Dantzig Professor of Computer Sciences at the University of Wisconsin-Madison. He is a past chair of the Mathematical Optimization Society and a SIAM Fellow. Currently he directs the Institute for Foundations of Data Science at the University of Wisconsin Madison. Steve is a world renowned expert in optimization and the author of several highly cited books in this field.

Video 1 (July, 11) can be accessed here. Passcode: 5RYL3+r3.

Video 2 (July, 12) can be accessed here. Passcode: ?xYUkh!9

Video 3 (July, 13) can be accessed here. Passcode: *Q#5EvkE

Presentation Slides: Formulations and Algorithms

Prof. Guoyin Li: is a professor in the School of Mathematics and Statistics at University of New South Wales. He was awarded an Australian Research Council Future Fellowship (for mid-career researchers) during 2014-2018. His research interests include optimisation, variational analysis, machine learning and tensor computations.

Video (July, 15) can be accessed here. Passcode: hMV!83Sk

Dr. Quoc Thong Le Gia: is a Senior Lecturer in the School of Mathematics and Statistics, UNSW, Sydney. His research interests include Numerical Analysis, Approximation Theory; Partial Differential Equations; Machine Learning and Stochastic Processes.

Video (July, 14) can be accessed here. Passcode: 8o9*7dRR

====================================================

Registration: can be found at the end of this webpage

Topics intended on being discussed include the following:

FORMULATIONS: Optimization formulations of DS problems.

  • linear least squares
  • robust linear regression
  • nonlinear least squares
  •  matrix sensing / completion
  • nonnegative matrix factorization
  • recovering dependencies in graphs: sparse inverse covariance
  •  sparse PCA (principal components analysis)
  • “sparse plus low-rank” matrix recovery.
  • subspace identification
  • linear SVM
  •  kernel SVM
  • binary logistic regression
  • multiclass logistic regression
  • atomic-norm regularization
  • community detection in graphs
  • adversarial ML (robust optimization formulations)
  • K-means
  • benign nonconvexity in matrix problems.
  • neural networks:
    • different architectures (CNN, RNN, ResNet)
    • formulating the training problem as a nonsmooth nonconvex optimization of finite sums
    • Over parametrization and sketch of theories of effectiveness
    • two big theoretical issues: why convergence to zero loss solutions? Why good generalization (no overfitting)?
  •  reinforcement learning
    • relationship to control
    • optimization perspectives (see Bertsekas)

ALGORITHMS AND THEORY:

Continuous optimization algorithms suited to ML and DS problems.

Sketch the algorithms and mention theory issues.

Common structures:

  • expectation and finite sum
  • nonsmooth regularization terms e.g. l_1
  •  composite objectives (e.g. least squares + linear, Tukey biweight + linear)
  •  convex in many cases, but nonconvex more intensely studied now.

Algorithms:

  • First-order
    • full gradient steepest descent
    • momentum: heavy ball, accelerated gradient, CG
    •  prox-gradient for regularized problems.
  •  nonconvex issues
    • behavior of gradient method near saddle points.
    • perturbed gradients
    • simple algorithm with Hessian for avoiding saddle points
    • complexity results
  • stochastic gradient
    • basic version
    • Minibatch
    • Momentum
    • diagonal scaling: Adam
  • coordinate descent
    • single coordinate, block.
    • random vs sequential.
  • augmented Lagrangian and ADMM
  • computing gradients using backprop / chain rule

Registration for MoCaO Lectures:  Data Science, July 11-15, 2022

Attention: Due to unforeseen problems with the registration system, all registrations up till until the date 29/06/2022 have been lost. We encourage those who have already registered to re-register using the new google form. We apologies for any inconvenience. If you have any enquiries please send an email to MoCaO@austms.org.au. Please check the website prior to the lectures for last minute information or announcements.

Please register using this Google Form.