Self-KD [ACL 2025 Findings] Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models