Abstract
Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, and amenability to parallelization. Three main algorithmic choices influence the performance of BCD methods; the block partitioning strategy, the block selection rule, and the block update rule. We explore these three building blocks and propose variations for each that can lead to significantly faster BCD methods. We support all of our findings with numerical results for the classic machine learning problems of logistic regression, multi-class logistic regression, support vector machines, and label propagation.