Abstract: Online test-time adaptation (OTTA) of vision-language models (VLMs) has recently garnered increased attention to take advantage of data observed along a stream to improve future predictions.
One night in 2010, Mohit Gupta decided to try something before leaving the lab. Then a Ph.D. student at Carnegie Mellon ...
Abstract: Image captioning is an emerging field at the intersection of computer vision and natural language processing (NLP). It has shown great potential to enhance accessibility by automatically ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results