A comparative study of language Transformers for video question answeringJul 1, 2021·Zekun Yang,Noa Garcia,Chenhui Chu,Mayu Otani,Yuta Nakashima,Haruo Takemura· 0 min read Cite DOI URLTypeJournal articlePublicationNeurocomputingLast updated on Jul 1, 2021 ← Attending self-attention: A case study of visually grounded supervision in vision-and-language transformers Aug 1, 2021MTUNet: Few-shot image classification with visual explanations Jun 1, 2021 →