Abstract
In the last 18 months, Natural Language Processing has been transformed by the success of using deep contextual word representations, that is, the output of ELMo, BERT, and their rapidly growing circle of friends, as a universal representation suitable as a source for fine-tuning for any language task. The simple training loss of predicting a word in context played out over mountains of text has been so successful that regardless of whether you're doing parsing, sentiment analysis or question answering, you now cannot win on a benchmark test without them. In some sense these models are full of knowledge. In another sense these models still just seem to reflect local text statistics. At any rate, can these models reason? Against this backdrop, what are more successful ways to build neural networks that can reason? And can we use them to tackle more of the problems of old-fashioned AI?