Liu Liu
About Me
I am a Ph.D student in the ECE department of The University of Texas at Austin.
I am fortunately advised by Prof. Constantine Caramanis.
My research interests are in theoretical machine learning, optimization and high dimensional statistics.
Before moving to Austin, I received the B.E. degree from Department of Electronic Engineering, Tsinghua University, Beijing, China, in 2016.
Publications and Preprints
Expand all abstracts
Collapse all abstracts
High Dimensional Robust Estimation of Sparse Models via Trimmed Hard Thresholding
Liu Liu, Tianyang Li, Constantine Caramanis.
ArXiv preprint arXiv:1901.08237, 2019. [Arxiv Link] [PDF]
See abstract
We study the problem of sparsity constrained Mestimation with arbitrary corruptions to both explanatory and response variables in
the highdimensional regime, where the number of variables d is larger than the sample size n. Our main contribution is a highly efficient gradientbased optimization algorithm that we call Trimmed Hard Thresholding – a robust variant of Iterative Hard Thresholding (IHT) by using trimmed mean in gradient computations. Our algorithm can deal with a wide class of sparsity constrained Mestimation problems, and we can tolerate a nearly dimension independent fraction of arbitrarily corrupted samples. More specifically, when the corrupted fraction satisfies \(\epsilon \lesssim {1} /\left({\sqrt{k} \log (nd)}\right)\), where k is the sparsity of the parameter, we obtain accurate estimation and model selection guarantees with optimal sample complexity. Furthermore, we extend our algorithm to sparse Gaussian graphical model (precision matrix) estimation via a neighborhood selection approach. We demonstrate the effectiveness of robust estimation in sparse linear, logistic regression, and sparse precision matrix estimation on synthetic
and realworld US equities data.
High Dimensional Robust Sparse Regression
Liu Liu, Yanyao Shen, Tianyang Li, Constantine Caramanis.
ArXiv preprint arXiv:1805.11643, 2018. [Arxiv Link] [PDF] [Slides]
See abstract
We provide a novel – and to the best of our knowledge, the first – algorithm for high dimensional sparse regression with a constant fraction of corruptions in explanatory and/or response variables. Our algorithm recovers the true sparse parameters in the presence of a constant fraction of arbitrary corruptions. Our main contribution is a robust variant of Iterative Hard Thresholding. Using this, we provide accurate estimators with sample complexity sublinear in d: when the covariance matrix in sparse regression is identity, our error guarantee is near informationtheoretically optimal. We propose a filtering algorithm which consists of a novel randomized outlier removal technique for robust sparse mean estimation that may be of interest in its own right: it is orderwise more efficient computationally than existing algorithms, and succeeds with high probability, thus making it suitable for general use in iterative algorithms. We then deal with robust sparse regression with unknown covariance matrix, where our algorithm achieves the best known error guarantee for any polynomial time statistical query algorithms for a wide class of structured covariance matrices; and our algorithm only requires sublinear sample complexity. We demonstrate the effectiveness on largescale sparse regression problems with arbitrary corruptions.
Statistical inference using SGD
Tianyang Li, Liu Liu, Anastasios Kyrillidis, Constantine Caramanis.
Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, 2018. [Arxiv Link] [PDF]
See abstract
We present a novel method for frequentist statistical inference in Mestimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the OrnsteinUhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGDbased inference procedure is a first order method, and is wellsuited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.
Approximate Newtonbased statistical inference using only stochastic gradients
Tianyang Li, Anastasios Kyrillidis, Liu Liu, Constantine Caramanis.
ArXiv preprint arXiv:1805.08920, 2018. [Arxiv Link] [PDF]
See abstract
We present a novel inference framework for convex empirical risk minimization, using approximate stochastic Newton steps. The proposed algorithm is based on the notion of finite differences and allows the approximation of a Hessianvector product from firstorder information. In theory, our method efficiently computes the statistical error covariance in Mestimation, both for unregularized convex learning problems and highdimensional LASSO regression, without using exact second order information, or resampling the entire data set. In practice, we demonstrate the effectiveness of our framework on largescale machine learning problems, that go even beyond convexity: as a highlight, our work can be used to detect certain adversarial attacks on neural networks.
