複合知能メディア研究室 mimlab
  • トップ
  • メンバー
  • 発表文献
  • 研究内容
  • アクセス
  • 研究内容
    • AIのバイアスとその低減
    • 説明可能なAI
    • 大規模モデルの応用
    • Vision and Language
  • 最新情報
    • 🔄 複合知能メディア分野はリニューアルします
    • 🔄 複合知能メディア分野はリニューアルします
    • 🧑‍🎓 中島研究室(MIMLab)への配属を希望する学生さんへ
  • 発表文献
    • No Annotations for Object Detection in Art through Stable Diffusion
    • PALADIN: Understanding Video Intentions in Political Advertisement Videos
    • Cross-modal Guided Visual Representation Learning for Social Image Retrieval
    • DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models
    • From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
    • Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
    • Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes
    • A picture may be worth a hundred words for visual question answering
    • Is cardiovascular risk profiling from UK Biobank retinal images using explicit deep learning estimates of traditional risk factors equivalent to actual risk measurements? A prospective cohort study design
    • MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Subtle Clue Dynamics in Video Dialogues
    • Stable Diffusion Exposed: Gender Bias from Prompt to Image
    • Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization
    • Situating the social issues of image generation models in the model life cycle: a sociotechnical approach
    • Auditing Image-based NSFW Classifiers for Content Filtering
    • Exploring Emotional Stimuli Detection in Artworks: A Benchmark Dataset and Baselines Evaluation
    • GOYA: Leveraging Generative Art for Content-Style Disentanglement
    • Would Deep Generative Models Amplify Bias in Future Models?
    • Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis
    • Retrieving Emotional Stimuli in Artworks
    • Instruct me more! Random prompting for visual in-context learning
    • Revisiting pixel-level contrastive pre-training on scene images
    • Societal Bias in Vision-and-Language Datasets and Models
    • Automatic evaluation of atlantoaxial subluxation in rheumatoid arthritis by a deep learning model
    • Enhancing Fake News Detection in Social Media via Label Propagation on Cross-Modal Tweet Graph
    • ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation
    • Learning bottleneck concepts in image classification
    • Model-agnostic gender debiased image captioning
    • Multi-modal humor segment prediction in video
    • Not only generative art: Stable diffusion for content-style disentanglement in art analysis
    • Toward verifiable and reproducible human evaluation for text-to-image generation
    • Uncurated image-text datasets: Shedding light on demographic bias
    • Real-time estimation of the remaining surgery duration for cataract surgery using deep convolutional neural networks and long short-term memory
    • Improving facade parsing with vision transformers and line integration
    • Explainability matters in medical applications
    • Development of a vertex finding algorithm using recurrent neural network
    • Inference Time Evidences of Adversarial Attacks for Forensic on Transformers
    • Toward better communication between humans and AI: What do neural networks see?
    • Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
    • Emotional Intensity Estimation based on Writer’s Personality
    • Foundation of AI
    • What do models see? Bias in neural networks
    • Deep Gesture Generation for Social Robots Using Type-Specific Libraries
    • Corpus Construction for Historical Newspapers: A Case Study on Public Meeting Corpus Construction Using OCR Error Correction
    • Depthwise spatio-temporal STFT convolutional neural networks for human action recognition
    • 深層学習の最近の話題と医療分野への応用
    • 分野を超えた人工知能研究と最新の話題について
    • Match them up: Visually explainable few-shot image classification
    • Multi-label disengagement and behavior prediction in online learning
    • A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain
    • AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval
    • Gender and racial bias in visual question answering datasets
    • Optimal Correction Cost for Object Detection Evaluation
    • Quantifying Societal Bias Amplification in Image Captioning
    • Tone Classification for Political Advertising Video using Multimodal Cues
    • Information Extraction from Public Meeting Articles
    • Recent Machine Learning Techniques and Exploration of New Physics
    • Anonymous identity sampling and reusable synthesis for sensitive face camouflage
    • Integration of gesture generation system using gesture library with DIY robot design kit
    • The semantic typology of visually grounded paraphrases
    • Explain me the painting: Multi-topic knowledgeable art description generation
    • GCNBoost: Artwork Classification by Label Propagation Through a Knowledge Graph
    • Image Retrieval by Hierarchy-aware Deep Hashing Based on Multi-task Learning
    • SCOUTER: Slot attention-based classifier for explainable image recognition
    • Transferring domain-agnostic knowledge in video question answering
    • Built year prediction from Buddha face with heterogeneous labels
    • Visual question answering with textual representations for images
    • Learners' efficiency prediction using facial behavior analysis
    • Museum Experience into a Souvenir: Generating Memorable Postcards from Guide Device Behavior Log
    • PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimation
    • Attending self-attention: A case study of visually grounded supervision in vision-and-language transformers
    • 機械は世界をどう見ているのか?
    • A comparative study of language Transformers for video question answering
    • MTUNet: Few-shot image classification with visual explanations
    • WRIME: A new dataset for emotional intensity estimation with subjective and objective annotations
    • Noisy-LSTM: Improving temporal awareness for video semantic segmentation
    • Generation and detection of media clones
    • Preventing fake information generation against media clone attacks
    • The laughing machine: Predicting humor in video
    • ContextNet: Representation and exploration for painting classification and retrieval in context
    • Cross-lingual visual grounding
    • IDSOU at WNUT-2020 Task 2: Identification of informative COVID-19 English tweets
    • Improving topic modeling through homophily for legal documents
    • Uncovering hidden challenges in query-based video moment retrieval
    • Visually grounded paraphrase identification via gating and phrase localization
    • A dataset and baselines for visual question answering on art
    • Demographic Influences on Contemporary Art with Unsupervised Style Embeddings
    • Knowledge-based video question answering with unsupervised scene descriptions
    • Privacy sensitive large-margin model for face de-identification
    • Joint learning of vessel segmentation and artery/vein classification with post-processing
    • Knowledge-Based Visual Question Answering in Videos
    • Yoga-82: A new dataset for fine-grained classification of human poses
    • Constructing a public meeting corpus
    • Warmer environments increase implicit mental workload even if learning efficiency is enhanced
    • BERT representations for video question answering
    • IterNet: Retinal image segmentation utilizing structural redundancy in vessel networks
    • Toward predicting learners' efficiency for adaptive e-learning
    • Video analytics in blended learning: Insights from learner-video interaction patterns
    • KnowIT VQA: Answering knowledge-based questions about videos
    • 3D image reconstruction from multi-focus microscopic images
    • Speech-driven face reenactment for a video sequence
    • Public Meeting Corpus Construction and Content Delivery
    • Human shape reconstruction with loose clothes from partially observed data by pose specific deformation
    • Legal information as a complex network: Improving topic modeling through homophily
    • Adaptive gating mechanism for identifying visually grounded paraphrases
    • BUDA.ART: A multimodal content-based analysis and retrieval system for Buddha statues
    • Historical and modern features for Buddha statue classification
    • Using external knowledge in the deep learning framework
    • Facial expression recognition with skip-connection to leverage low-level features
    • Buddha statues archive retrieval system
    • Collecting relation-aware video captions
    • GANを用いた顔のRGB画像と奥行画像の同時生成
    • Video meets knowledge in visual question answering
    • Video question answering with BERT
    • AI/機械学習/深層学習入門
    • Context-aware embeddings for automatic art analysis
    • Rethinking the evaluation of video summaries
    • コメディドラマにおける字幕と表情を用いた笑い予測
    • Multimodal learning analytics: Society 5.0 project in Japan
    • Problems dealt with machine learning/deep learning and its applications to nuclear physics
    • Talking Head Generation with Deep Phoneme and Viseme Representation and Generative Adversarial Networks
    • 情報学と物理学のクロスオーバー
    • Faces in an Archive of Buddhism Pictures
    • 多重焦点顕微鏡画像列からの細胞の3次元形状復元
    • Finding important people in a video using deep neural networks with conditional random fields
    • Exploration and Mining of 50,000 Buddha Pictures
    • iParaphrasing: Extracting visually grounded paraphrases via an image
    • Iterative applications of image completion with CNN-based failure detection
    • OpenCVとPythonによる機械学習プログラミング
    • Phrase localization-based visually grounded paraphrase identification
    • Representing a partially observed non-rigid 3D human using eigen-texture and eigen-deformation
    • Summarization of user-generated sports video by using deep action recognition features
    • Synthesis of human shape in loose cloth using eigen-deformation
    • Linking videos and languages: Representations and their applications
    • Extracting Paraphrases Grounded by an Image
    • Finding Video Parts with Natural Language
    • 自由視点画像生成のためのEigen-Texture法における係数の回帰
    • Augmented reality marker hiding with texture deformation
    • Video question answering to find a desired video segment
    • Novel view synthesis with light-weight view-dependent texture mapping for a stereoscopic HMD
    • 画像処理・機械学習プログラミングOpenCV 3対応
    • Video summarization using textual descriptions for authoring video blogs
    • DNNを用いたカメラの6自由度相対運動推定
    • 最近の重要な論文の紹介 -- テキストとの対応付けによる映像の理解に関連して
    • Increasing pose comprehension through augmented reality reenactment
    • ReMagicMirror: Action learning using human reenactment with the mirror metaphor
    • Flexible human action recognition in depth video sequences using masked joint trajectories
    • 深層学習を利用した映像要約への取り組み
    • Video summarization using deep semantic features
    • Learning joint representations of videos and sentences with web image search
    • Human action recognition-based video summarization for RGB-D personal sports video
    • Joint representation of video and text using deep neural networks with help of web images
    • Privacy protection for social video via background estimation and CRF-based videographer's intention modeling
    • Novel View Synthesis Based on View-dependent Texture Mapping with Geometry-aware Color Continuity
    • 3D shape template generation from RGB-D images capturing a moving and deforming object
    • 畳み込みニューラルネットワークを用いた修復失敗領域の自動検出による画像修復の反復的適用
    • Acceleration of View-dependent Texture Mapping-based Novel View Synthesis for stereoscopic HMD
    • Evaluating protection capability for visual privacy information
    • 画像修復における畳み込みニューラルネットワークを用いた修復失敗領域の自動検出
    • 2035年のマルチメディアの姿を予想--ICME 2015 会議レポート
    • OpenCV 3 プログラミングブック
    • 単一のRGB-Dカメラを用いた非剛体物体の3次元形状復元
    • Facial expression preserving privacy protection using image melding
    • Textual description-based video summarization for video blogs
    • テクスチャの連続性を考慮した視点依存テクスチャマッピングによる自由視点画像生成
    • 特徴点の明示的な対応付けを伴わないカメラ位置姿勢推定
    • AR image generation using view-dependent geometry modification and texture mapping
    • Protection and utilization of privacy information via sensing
    • RGB-Dカメラを用いた非剛体物体の動き復元のためのRGB画像上の対応点に基づく3次元テンプレート生成
    • テキストと映像の類似度を用いた映像要約
    • 特徴点の明示的な対応付けを伴わないカメラ位置姿勢推定
    • RGB-Dカメラを用いた非剛体物体の動き復元のための3次元テンプレート形状生成
    • 特徴点の類似度尺度による対応付けを伴わないカメラ位置姿勢推定手法の検討
    • Background estimation for a single omnidirectional image sequence captured with a moving camera
    • Free-viewpoint AR human-motion reenactment based on a single RGB-D video stream
    • 画像のコンテキストを保持した視覚的に自然なプライバシー保護処理
    • 自由視点画像生成に基づく移動撮影した全方位動画像からの動物体除去
    • Single RGB-D Video-stream Based Human-motion Reenactment
    • Augmented reality image generation with virtualized real objects using view-dependent texture and geometry
    • Inferring what the videographer wanted to capture
    • Real-time privacy protection system for social videos using intentionally-captured persons detection
    • 拡張現実感のための視点依存テクスチャ・ジオメトリに基づく仮想化実物体の輪郭形状の修復
    • Markov random field-based real-time detection of intentionally-captured persons
    • 顔画像に対するプライバシー保護処理の有効性の定量的評価
    • Intended human object detection for automatically protecting privacy in mobile video surveillance
    • Extracting intentionally captured regions using point trajectories
    • Indoor positioning system using digital audio watermarking
    • Automatic generation of privacy-protected videos using background estimation
    • カメラの動きと映像特徴からの撮影者が意図した領域の推定
    • Automatically protecting privacy in consumer generated videos using intended human object detector
    • Discriminating intended human objects in consumer videos
    • Real-time user position estimation in indoor environments using digital watermarking for audio signals
    • Detecting intended human objects in human-captured videos
    • Digital diorama: Sensing-based real-world visualization
    • 映像中の撮影者が意図した人物被写体の検出
    • 音響電子透かしを用いた屋内での録音位置推定
    • 映像特徴に基づく撮影者が意図した人物被写体の推定
    • Watermarked movie soundtrack finds the position of the camcorder in a theater
    • 音響電子透かしの検出強度を用いた位置推定
    • Maximum-likelihood estimation of recording position based on audio watermarking
    • Determining Recording Location Based on Synchronization Positions of Audio watermarking
    • Estimation of recording location using audio watermarking

Video meets knowledge in visual question answering

8月 1, 2019·
Noa Garcia
,
Chenhui Chu
,
Mayu Otani
,
Yuta Nakashima
· 0 分で読める
引用
タイプ
学会論文
収録
画像の認識・理解シンポジウム, 4 pages
最終更新 8月 1, 2019

← GANを用いた顔のRGB画像と奥行画像の同時生成 8月 1, 2019
Video question answering with BERT 8月 1, 2019 →

© 2025 大阪大学 産業科学研究科 第一研究部門 複合知能メディア研究分野

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.