Why Are Vision Transformers Focusing on Boring Backgrounds?
2023-10-3 05:14:21 Author: hackernoon.com(查看原文) 阅读量:5 收藏

Hackernoon logo

Why Are Vision Transformers Focusing on Boring Backgrounds? by@mikeyoung44

Too Long; Didn't Read

Vision Transformers (ViTs) have gained popularity for image-related tasks but exhibit strange behavior: focusing on unimportant background patches instead of the main subjects in images. Researchers found that a small fraction of patch tokens with abnormally high L2 norms cause these spikes in attention. They hypothesize that ViTs recycle low-information patches to store global image information, leading to this behavior. To fix it, they propose adding "register" tokens to provide dedicated storage, resulting in smoother attention maps, better performance, and improved object discovery abilities. This study highlights the need for ongoing research into model artifacts to advance transformer capabilities.

featured image - Why Are Vision Transformers Focusing on Boring Backgrounds?

Mike Young HackerNoon profile picture


@mikeyoung44

Mike Young



Receive Stories from @mikeyoung44


Credibility

react to story with heart

RELATED STORIES

Article Thumbnail

Article Thumbnail

Article Thumbnail

Article Thumbnail

Article Thumbnail

L O A D I N G
. . . comments & more!


文章来源: https://hackernoon.com/why-are-vision-transformers-focusing-on-boring-backgrounds?source=rss
如有侵权请联系:admin#unsafe.sh