Joint representation of video and text using deep neural networks with help of web images4月 1, 2016·Yuta Nakashima· 0 分で読める 引用タイプ学会論文収録Microsoft Research Asia, Beijing最終更新 4月 1, 2016 ← Human action recognition-based video summarization for RGB-D personal sports video 7月 1, 2016Privacy protection for social video via background estimation and CRF-based videographer's intention modeling 4月 1, 2016 →