Memory and reasoning are two crucial components of self-evolving AI. Traditional works have explored their potential for LLM agents or VLMs seperately without considering their subtle connection. Recently, MemGen has discovered that memory and reasoning are not discrete but rather interweave in the context of LLM agents. However, few works investigate the memory mechanisms in the context of visual intelligence. In this repo, we want to ask: Can memory benefit visual understanding by incentivizing visual reasoning?
This work is implemented based on VLM-R1 and MemGen. We greatly appreciate their valuable contributions to the community.