Li, L. H.; Chen, P. H.; Hsieh, C.-J.; and Chang, K.-W. 2019.
Efficient Contextual Representation Learning With Contin-
uous Outputs. Transactions of the Association for Compu-
tational Linguistics 7: 611–624.
Lin, C.-Y.; and Hovy, E. 2003. Automatic evaluation of sum-
maries using n-gram co-occurrence statistics. In NAACL-
HLT.
Liu, X.; Han, Z.; Wen, X.; Liu, Y.-S.; and Zwicker, M. 2019.
L2g auto-encoder: Understanding point clouds by local-to-
global reconstruction with hierarchical self-attention. In
Proceedings of the 27th ACM International Conference on
Multimedia.
Luan, Y.; Eisenstein, J.; Toutanova, K.; and Collins, M.
2020. Sparse, Dense, and Attentional Representations for
Text Retrieval. arXiv preprint arXiv:2005.00181 .
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; and Dean,
J. 2013. Distributed representations of words and phrases
and their compositionality. In NeurIPS.
Milajevs, D.; Kartsaklis, D.; Sadrzadeh, M.; and Purver, M.
2014. Evaluating Neural Word Representations in Tensor-
Based Compositional Settings. In EMNLP.
Mimno, D. M.; and McCallum, A. 2008. Topic Models Con-
ditioned on Arbitrary Features with Dirichlet-multinomial
Regression. In UAI.
Neelakantan, A.; Shankar, J.; Passos, A.; and McCallum, A.
2014. Efficient Non-parametric Estimation of Multiple Em-
beddings per Word in Vector Space. In EMNLP.
Pagliardini, M.; Gupta, P.; and Jaggi, M. 2018. Unsuper-
vised Learning of Sentence Embeddings using Composi-
tional n-Gram Features. In NAACL-HLT, 528–540.
Paul, R.; Chang, H.-S.; and McCallum, A. 2021. Multi-facet
Universal Schema. In EACL.
Pavlick, E.; Rastogi, P.; Ganitkevitch, J.; Van Durme, B.; and
Callison-Burch, C. 2015. PPDB 2.0: Better paraphrase rank-
ing, fine-grained entailment relations, word embeddings,
and style classification. In ACL.
Pennington, J.; Socher, R.; and Manning, C. 2014. GloVe:
Global vectors for word representation. In EMNLP.
Peters, M. E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark,
C.; Lee, K.; and Zettlemoyer, L. 2018. Deep contextualized
word representations. In NAACL-HLT.
Qin, K.; Li, C.; Pavlu, V.; and Aslam, J. A. 2019. Adapting
RNN Sequence Prediction Model to Multi-label Set Predic-
tion. In NAACL.
Reimers, N.; and Gurevych, I. 2019. Sentence-BERT:
Sentence Embeddings using Siamese BERT-Networks. In
EMNLP-IJCNLP.
Rezatofighi, S. H.; Kaskman, R.; Motlagh, F. T.; Shi, Q.;
Cremers, D.; Leal-Taixé, L.; and Reid, I. 2018. Deep perm-
set net: learn to predict sets with unknown permutation
and cardinality using deep neural networks. arXiv preprint
arXiv:1805.00613 .
See, A.; Liu, P. J.; and Manning, C. D. 2017. Get To The
Point: Summarization with Pointer-Generator Networks. In
ACL.
Shu, R.; and Nakayama, H. 2018. Compressing Word Em-
beddings via Deep Compositional Code Learning. In ICLR.
Singh, S. P.; Hug, A.; Dieuleveut, A.; and Jaggi, M. 2020.
Context mover’s distance & barycenters: Optimal transport
of contexts for building representations. In International
Conference on Artificial Intelligence and Statistics.
Srivastava, A.; and Sutton, C. A. 2017. Autoencoding Vari-
ational Inference For Topic Models. In ICLR.
Stern, M.; Chan, W.; Kiros, J.; and Uszkoreit, J. 2019. In-
sertion Transformer: Flexible Sequence Generation via In-
sertion Operations. In ICML.
Stewart, R.; Andriluka, M.; and Ng, A. Y. 2016. End-to-end
people detection in crowded scenes. In CVPR.
Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence to
sequence learning with neural networks. In NeurIPS.
Tieleman, T.; and Hinton, G. 2012. Lecture 6.5-rmsprop:
Divide the gradient by a running average of its recent mag-
nitude. COURSERA: Neural networks for machine learning
4(2): 26–31.
Turney, P. D. 2012. Domain and function: A dual-space
model of semantic relations and compositions. Journal of
Artificial Intelligence Research .
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones,
L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. At-
tention is all you need. In NeurIPS.
Verga, P.; Belanger, D.; Strubell, E.; Roth, B.; and McCal-
lum, A. 2016. Multilingual Relation Extraction using Com-
positional Universal Schema. In NAACL-HLT.
Vilnis, L.; and McCallum, A. 2015. Word Representations
via Gaussian Embedding. In ICLR.
Wang, T.; Cho, K.; and Wen, M. 2019. Attention-based
mixture density recurrent networks for history-based rec-
ommendation. In Proceedings of the 1st International
Workshop on Deep Learning Practice for High-Dimensional
Sparse Data.
Welleck, S.; Brantley, K.; Daumé III, H.; and Cho, K. 2019.
Non-Monotonic Sequential Text Generation. In ICML.
Welleck, S.; Yao, Z.; Gai, Y.; Mao, J.; Zhang, Z.; and Cho, K.
2018. Loss Functions for Multiset Prediction. In NeurIPS.
Yang, Y.; Feng, C.; Shen, Y.; and Tian, D. 2018a. Fold-
ingnet: Point cloud auto-encoder via deep grid deformation.
In CVPR.
Yang, Z.; Dai, Z.; Salakhutdinov, R.; and Cohen, W. W.
2018b. Breaking the softmax bottleneck: A high-rank RNN
language model. In ICLR.
Yu, M.; and Dredze, M. 2015. Learning composition models
for phrase embeddings. Transactions of the Association for
Computational Linguistics 3: 227–242.
Zheng, H.; and Lapata, M. 2019. Sentence Centrality Revis-
ited for Unsupervised Summarization. In ACL.