Aashish's Portfolio
Aashish's Portfolio
Posts
Publications
Contact
CV
Light
Dark
Automatic
article
CORDIAL: Can Multimodal Large Language Models effectively understand Coherence Relationships?
We assess the competency of MLLMs in performing Multimodal Discourse Analysis (MDA) using Coherence Relations. Our benchmark, CORDIAL, encompasses a broad spectrum of Coherence Relations across 3 different discourse domains at varying levels of granularity. Through our experiments on 10+ MLLMs employing different prompting strategies, we show that even top models like Gemini 1.5 Pro and GPT-4o fail to match the performance of simple classifier-based baselines.
Aashish Anantha Ramakrishnan
,
Aadarsh Anantha Ramakrishnan
,
Dongwon Lee
Cite
Code
DOI
Arxiv
ANCHOR: LLM-driven news subject conditioning for Text-to-Image Synthesis
To evaluate the ability of T2I models to capture intended subjects from news captions, we introduce the Abstractive News Captions with High-level cOntext Representation (ANCHOR) dataset, containing 70K+ samples sourced from 5 different news media organizations. Our proposed method Subject-Aware Finetuning (SAFE), selects and enhances the representation of key subjects in synthesized images by leveraging LLM-generated subject weights. It also adapts to the domain distribution of news images and captions through custom Domain Fine-tuning, outperforming current T2I baselines on ANCHOR.
Aashish Anantha Ramakrishnan
,
Sharon X Huang
,
Dongwon Lee
Cite
Dataset
DOI
Arxiv
Cite
×