複合知能メディア研究室 mimlab
  • top
  • people
  • publications
  • research topics
  • contact
  • Blog
    • 🔄 The Department of Intelligent Media
    • 🧑‍🎓 For Prospective Students
  • Research Topics
    • AIのバイアスとその低減
    • 大規模モデルの応用
    • 説明可能なAI
    • Vision and Language
  • Publications
    • No Annotations for Object Detection in Art through Stable Diffusion
    • PALADIN: Understanding Video Intentions in Political Advertisement Videos
    • Cross-modal Guided Visual Representation Learning for Social Image Retrieval
    • DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models
    • From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
    • Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
    • Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes
    • A picture may be worth a hundred words for visual question answering
    • Is cardiovascular risk profiling from UK Biobank retinal images using explicit deep learning estimates of traditional risk factors equivalent to actual risk measurements? A prospective cohort study design
    • MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Subtle Clue Dynamics in Video Dialogues
    • Stable Diffusion Exposed: Gender Bias from Prompt to Image
    • Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization
    • Situating the social issues of image generation models in the model life cycle: a sociotechnical approach
    • Auditing Image-based NSFW Classifiers for Content Filtering
    • Exploring Emotional Stimuli Detection in Artworks: A Benchmark Dataset and Baselines Evaluation
    • GOYA: Leveraging Generative Art for Content-Style Disentanglement
    • Would Deep Generative Models Amplify Bias in Future Models?
    • Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis
    • Retrieving Emotional Stimuli in Artworks
    • Instruct me more! Random prompting for visual in-context learning
    • Revisiting pixel-level contrastive pre-training on scene images
    • Societal Bias in Vision-and-Language Datasets and Models
    • Automatic evaluation of atlantoaxial subluxation in rheumatoid arthritis by a deep learning model
    • Enhancing Fake News Detection in Social Media via Label Propagation on Cross-Modal Tweet Graph
    • ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation
    • Learning bottleneck concepts in image classification
    • Model-agnostic gender debiased image captioning
    • Multi-modal humor segment prediction in video
    • Not only generative art: Stable diffusion for content-style disentanglement in art analysis
    • Toward verifiable and reproducible human evaluation for text-to-image generation
    • Uncurated image-text datasets: Shedding light on demographic bias
    • Real-time estimation of the remaining surgery duration for cataract surgery using deep convolutional neural networks and long short-term memory
    • Improving facade parsing with vision transformers and line integration
    • Development of a vertex finding algorithm using recurrent neural network
    • Inference Time Evidences of Adversarial Attacks for Forensic on Transformers
    • Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
    • Emotional Intensity Estimation based on Writer’s Personality
    • Deep Gesture Generation for Social Robots Using Type-Specific Libraries
    • Corpus Construction for Historical Newspapers: A Case Study on Public Meeting Corpus Construction Using OCR Error Correction
    • Depthwise spatio-temporal STFT convolutional neural networks for human action recognition
    • Match them up: Visually explainable few-shot image classification
    • Multi-label disengagement and behavior prediction in online learning
    • A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain
    • AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval
    • Gender and racial bias in visual question answering datasets
    • Optimal Correction Cost for Object Detection Evaluation
    • Quantifying Societal Bias Amplification in Image Captioning
    • Tone Classification for Political Advertising Video using Multimodal Cues
    • Information Extraction from Public Meeting Articles
    • Anonymous identity sampling and reusable synthesis for sensitive face camouflage
    • Integration of gesture generation system using gesture library with DIY robot design kit
    • The semantic typology of visually grounded paraphrases
    • Explain me the painting: Multi-topic knowledgeable art description generation
    • GCNBoost: Artwork Classification by Label Propagation Through a Knowledge Graph
    • Image Retrieval by Hierarchy-aware Deep Hashing Based on Multi-task Learning
    • SCOUTER: Slot attention-based classifier for explainable image recognition
    • Transferring domain-agnostic knowledge in video question answering
    • Built year prediction from Buddha face with heterogeneous labels
    • Visual question answering with textual representations for images
    • Learners' efficiency prediction using facial behavior analysis
    • Museum Experience into a Souvenir: Generating Memorable Postcards from Guide Device Behavior Log
    • PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimation
    • Attending self-attention: A case study of visually grounded supervision in vision-and-language transformers
    • A comparative study of language Transformers for video question answering
    • MTUNet: Few-shot image classification with visual explanations
    • WRIME: A new dataset for emotional intensity estimation with subjective and objective annotations
    • Noisy-LSTM: Improving temporal awareness for video semantic segmentation
    • Generation and detection of media clones
    • Preventing fake information generation against media clone attacks
    • The laughing machine: Predicting humor in video
    • ContextNet: Representation and exploration for painting classification and retrieval in context
    • Cross-lingual visual grounding
    • IDSOU at WNUT-2020 Task 2: Identification of informative COVID-19 English tweets
    • Improving topic modeling through homophily for legal documents
    • Uncovering hidden challenges in query-based video moment retrieval
    • Visually grounded paraphrase identification via gating and phrase localization
    • A dataset and baselines for visual question answering on art
    • Demographic Influences on Contemporary Art with Unsupervised Style Embeddings
    • Knowledge-based video question answering with unsupervised scene descriptions
    • Privacy sensitive large-margin model for face de-identification
    • Joint learning of vessel segmentation and artery/vein classification with post-processing
    • Knowledge-Based Visual Question Answering in Videos
    • Yoga-82: A new dataset for fine-grained classification of human poses
    • Constructing a public meeting corpus
    • Warmer environments increase implicit mental workload even if learning efficiency is enhanced
    • BERT representations for video question answering
    • IterNet: Retinal image segmentation utilizing structural redundancy in vessel networks
    • Toward predicting learners' efficiency for adaptive e-learning
    • Video analytics in blended learning: Insights from learner-video interaction patterns
    • KnowIT VQA: Answering knowledge-based questions about videos
    • 3D image reconstruction from multi-focus microscopic images
    • Speech-driven face reenactment for a video sequence
    • Human shape reconstruction with loose clothes from partially observed data by pose specific deformation
    • Legal information as a complex network: Improving topic modeling through homophily
    • Adaptive gating mechanism for identifying visually grounded paraphrases
    • BUDA.ART: A multimodal content-based analysis and retrieval system for Buddha statues
    • Historical and modern features for Buddha statue classification
    • Facial expression recognition with skip-connection to leverage low-level features
    • Context-aware embeddings for automatic art analysis
    • Rethinking the evaluation of video summaries
    • Multimodal learning analytics: Society 5.0 project in Japan
    • Finding important people in a video using deep neural networks with conditional random fields
    • iParaphrasing: Extracting visually grounded paraphrases via an image
    • Iterative applications of image completion with CNN-based failure detection
    • Representing a partially observed non-rigid 3D human using eigen-texture and eigen-deformation
    • Summarization of user-generated sports video by using deep action recognition features
    • Augmented reality marker hiding with texture deformation
    • Realtime novel view synthesis with eigen-texture regression
    • Video question answering to find a desired video segment
    • Novel view synthesis with light-weight view-dependent texture mapping for a stereoscopic HMD
    • Video summarization using textual descriptions for authoring video blogs
    • Increasing pose comprehension through augmented reality reenactment
    • ReMagicMirror: Action learning using human reenactment with the mirror metaphor
    • Flexible human action recognition in depth video sequences using masked joint trajectories
    • Video summarization using deep semantic features
    • Learning joint representations of videos and sentences with web image search
    • Human action recognition-based video summarization for RGB-D personal sports video
    • Privacy protection for social video via background estimation and CRF-based videographer's intention modeling
    • Novel View Synthesis Based on View-dependent Texture Mapping with Geometry-aware Color Continuity
    • 3D shape template generation from RGB-D images capturing a moving and deforming object
    • Evaluating protection capability for visual privacy information
    • Facial expression preserving privacy protection using image melding
    • Textual description-based video summarization for video blogs
    • AR image generation using view-dependent geometry modification and texture mapping
    • Protection and utilization of privacy information via sensing
    • Background estimation for a single omnidirectional image sequence captured with a moving camera
    • Free-viewpoint AR human-motion reenactment based on a single RGB-D video stream
    • Augmented reality image generation with virtualized real objects using view-dependent texture and geometry
    • Inferring what the videographer wanted to capture
    • Real-time privacy protection system for social videos using intentionally-captured persons detection
    • Markov random field-based real-time detection of intentionally-captured persons
    • Intended human object detection for automatically protecting privacy in mobile video surveillance
    • Extracting intentionally captured regions using point trajectories
    • Indoor positioning system using digital audio watermarking
    • Automatic generation of privacy-protected videos using background estimation
    • Automatically protecting privacy in consumer generated videos using intended human object detector
    • Discriminating intended human objects in consumer videos
    • Real-time user position estimation in indoor environments using digital watermarking for audio signals
    • Detecting intended human objects in human-captured videos
    • Digital diorama: Sensing-based real-world visualization
    • Watermarked movie soundtrack finds the position of the camcorder in a theater
    • Maximum-likelihood estimation of recording position based on audio watermarking
    • Determining Recording Location Based on Synchronization Positions of Audio watermarking
    • Estimation of recording location using audio watermarking
  • Projects
    • Pandas
    • PyTorch
    • scikit-learn

PyTorch

Oct 26, 2023 · 1 min read
Go to Project Site

PyTorch is a Python package that provides tensor computation (like NumPy) with strong GPU acceleration.

Last updated on Oct 26, 2023
Hugo Wowchemy Markdown
Department of Intelligent Media
Authors
Department of Intelligent Media
MIMLab (Nakashima Lab)

← Pandas Oct 26, 2023
scikit-learn Oct 26, 2023 →

© 2025 大阪大学 産業科学研究科 第一研究部門 複合知能メディア研究分野

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.