Abstract: Online test-time adaptation (OTTA) of vision-language models (VLMs) has recently garnered increased attention to take advantage of data observed along a stream to improve future predictions.
One night in 2010, Mohit Gupta decided to try something before leaving the lab. Then a Ph.D. student at Carnegie Mellon ...
Abstract: Image captioning is an emerging field at the intersection of computer vision and natural language processing (NLP). It has shown great potential to enhance accessibility by automatically ...