参考文献/References:
[1] Zhang Tong.Solving large scale linear prediction problems using stochastic gradient descent algorithms[EB/OL].[2020-08-11].https://dl.acm.org/doi/abs/10.1145/1015330.1015332.
[2] Rakhlin A,Shamir O,Sridharan K.Making gradient descent optimal for strongly convex stochastic optimization[EB/OL].[2020-08-11].https://arxiv.org/abs/1109.5647.
[3] Shamir O,Zhang Tong.Stochastic gradient descent for non-smooth optimization:convergence results and optimal averaging schemes[EB/OL].[2020-08-11].http://adsabs.harvard.edu/abs/2012arXiv1212.1824S.
[4] Duchi J,Singer Y.Efficient online and batch learning using forward backward splitting[J].Journal of Machine Learning Research,2009,10:2899-2934.
[5] Luo Zhiquan,Tseng P.On the convergence of the coordinate descent method for convex differentiable minimization[J].Journal of Optimization Theory and Applications,1992,72(1):7-35.
[6] Mangasarian O L,Musicant D R.Successive overrelaxation for support vector machines[J].IEEE Transactions on Neural Networks,1999,10(5):1032-1037.
[7] Hsieh C J,Chang Kaiwei,Lin C J,et al.A dual coordinate descent method for large-scale linear SVM[EB/OL].[2020-08-11].http://dl.acm.org/citation.cfm?id=1390208.
[8] Shalev-Shwartz S,Tewari A.Stochastic methods for l1-regularized loss minimization[J].Journal of Machine Learning Research,2011,12:1865-1892.
[9] Lacoste-Julien S,Jaggi M,Schmidt M,et al.Stochastic block-coordinate frank-wolfe optimization for structural SVMs[EB/OL].[2020-08-11].http://arxiv.org/pdf/1207.4747v1.pdf.
[10] Nesterov Y.Efficiency of coordinate descent methods on huge-scale optimization problems[J].SIAM Journal on Optimization,2012,22(2):341-362.
[11] Shalev-Shwartz S,Zhang Tong.Proximal stochastic dual coordinate ascent[EB/OL].[2020-08-11].https://arxiv.org/abs/1211.2717.
[12] Shalev-Shwartz S,Zhang Tong.Stochastic dual coordinate ascent methods for regularized loss minimization[J].Journal of Machine Learning Research,2013,14:567-599.
[13] Zhao Peilin,Zhang Tong.Stochastic optimization with importance sampling for regularized loss minimization[EB/OL].[2020-08-11].https://dl.acm.org/doi/10.5555/3045118.3045120.
[14] 张林刚,严广乐,路晓伟.嵌套分割算法:一种新的并行随机优化算法[J].计算机应用研究,2007,24(6):79-81.
[15] 米永强,高岳林.求解约束优化问题的改进粒子群优化算法[J].江西师范大学学报:自然科学版,2015,39(1):59-63.
[16] 夏红卫,文传军.一般非线性约束优化问题的信赖域法[J].江西师范大学学报:自然科学版,2012,36(3):253-256.
[17] Robbins H,Monro S.A stochastic approximation method[J].The Annals of Mathematical Statistics,1951,22(3):400-407.
[18] Shalev-Shwartz S,Zhang Tong.Accelerated mini-batch stochastic dual coordinate ascent[EB/OL].[2020-08-11].https://dl.acm.org/doi/10.5555/2999611.2999654.
[19] Defazio A,Bach F,Lacoste-Julien S.SAGA:a fast incremental gradient method with support for non-strongly convex composite objectives[EB/OL].[2020-08-11].https://dl.acm.org/doi/10.5555/2968826.2969010.
[20] Li Mu,Zhang Tong,Chen Yuqiang,et al.Efficient mini-batch training for stochastic optimization[EB/OL].[2020-08-11].https://dl.acm.org/doi/10.1145/2623330.2623612.
[21] Luo Zhiquan,Tseng P.On the convergence of the coordinate descent method for convex differentiable minimization[J].Journal of Optimization Theory and Applications,1992,72(1):7-35.
[22] Tseng P.Convergence of a block coordinate descent method for nondifferentiable minimization[J].Journal of Optimization Theory and Applications,2001,109(3):475-494.
[23] Saha A,Tewari A.On the nonasymptotic convergence of cyclic coordinate descent methods[J].SIAM Journal on Optimization,2013,23(1):576-601.
[24] Wright S J.Coordinate descent algorithms[J].Mathematical Programming,2015,151(1):3-34.
[25] Gurbuzbalaban M,Ozdaglar A,Parrilo P A,et al.When cyclic coordinate descent outperforms randomized coordinate descent[EB/OL].[2020-08-11].http://mert.lids.mit.edu/w-content/uploads/sites/10/2017/11/CCDvsRCD.pdf.
[26] Nutini J,Schmidt M,Laradji I H,et al.Coordinate descent converges faster with the Gauss-Southwell rule than random selection[EB/OL].[2020-08-11].http://arxiv.org/abs/1506.00552.
[27] Zheng Qu,Richtárik P.Coordinate descent with arbitrary sampling I:algorithms and complexity[J].Optimization Methods and Software,2016,31(5):829-857.
[28] Stich S U,Raj A,Jaggi M.Approximate steepest coordinate descent[C]∥Precup D,Teh Y W.Proceedings of the 34th International Conference on Machine Learning,Sydney,Australia,Aug 6-11,2017.Sydney:PMLR,2017,70:3251-3259.
[29] Nesterov Y.Efficiency of coordinate descent methods on huge-scale optimization problems[J].SIAM Journal on Optimization,2012,22(2):341-362.
[30] Richtárik P,Takácˇ M.Distributed coordinate descent method for learning with big data[J].Journal of Machine Learning Research,2016,17:75.
[31] Richtárik P,Takácˇ M.On optimal probabilities in stochastic coordinate descent methods[J].Optimization Letters,2016,10(6):1233-1243.
[32] Richtárik P,Takácˇ M.Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function[J].Mathematical Programming,2014,144(1/2):1-38.
[33] Lin Qihang,Lu Zhaosong,Xiao Lin.An accelerated proximal coordinate gradient method[EB/OL].[2020-08-11].https://dl.acm.org/doi/10.5555/2969033.2969168.
[34] Byrd R H,Chin G M,Nocedal J,et al.Sample size selection in optimization methods for machine learning[J].Mathematical Programming,2012,134(1):127-155.
[35] 杨杰明,闫欣,曲朝阳,等.基于数据密度分布的欠采样方法研究[J].计算机应用研究,2016,33(10):2997-3000.
[36] Shapiro A,Homem-de-Mello T.On the rate of convergence of optimal solutions of Monte Carlo approximations of stochastic programs[J].SIAM Journal on Optimization,2000,11(1):70-86.
[37] Shapiro A,Wardi Y.Convergence analysis of stochastic algorithms[J].Mathematics of Operations Research,1996,21(3):615-628.
[38] Kleywegt A J,Shapiro A,Homem-de-Mello T.The sample average approximation method for stochastic discrete optimization[J].SIAM Journal on Optimization,2001,12(2):479-502.
[39] Shapiro A,Homem-de-Mello T.A simulation-based approach to two-stage stochastic programming with recourse[J].Mathematical Programming,1998,81(3):301-325.
[40] Homem-de-Mello T.Variable-sample methods for stochastic optimization[J].ACM Transactions on Modeling and Computer Simulation,2003,13(2):108-133.
[41] Bastin F,Cirillo C,Toint P L.An adaptive Monte Carlo algorithm for computing mixed logit estimators[J].Computational Management Sciences,2006,3(1):55-79.
[42] Sutskever I,Martens J,Dahl G,et al.On the importance of initialization and momentum in deep learning[EB/OL].[2020-08-11].https://dl.acm.org/doi/10.5555/3042817.3043064.
[43] Qian Ning.On the momentum term in gradient descent learning algorithms[J].Neural Networks,1999,12(1):145-151.
[44] Kingma D P,Ba J L.Adam:a method for stochastic optimization[EB/OL].[2020-08-11].https://pubmed.ncbi.nlm.nih.gov/12662723/.