This talk addresses computational cognitive vision and perception at the interface of (spatial) language, (spatial) logic, (spatial) cognition, and artiﬁcial intelligence. Summarizing recent works, I present general methods for the semantic interpretation of dynamic visuospatial imagery with an emphasis on the ability to perform abstraction, reasoning, and learning with cognitively rooted structured characterizations of commonsense knowledge pertaining to space and motion. I will particularly highlight:
- explainable models of computational visuospatial commonsense at the interface of symbolic and neural techniques;
- deep semantics, entailing systematically formalised declarative (neurosymbolic) reasoning and learning with aspects pertaining to space, space-time, motion, actions & events, spatio-linguistic conceptual knowledge; and
- general foundational commonsense abstractions of space, time, and motion needed for representation mediated (grounded) reasoning and learning with dynamic visuospatial stimuli.
The presented works – demonstrated in the backdrop of applications in autonomous driving, cognitive robotics, visuoauditory media, and cognitive psychology – are intended to serve as a systematic model and general methodology integrating diverse, multi-faceted AI methods pertaining knowledge representation and reasoning, computer vision, and machine learning towards realising practical, human-centred, computational visual intelligence. I will conclude by highlighting a bottom-up interdisciplinary approach – at the conﬂuence of cognition, AI, Interaction, and design science – necessary to better appreciate the complexity and spectrum of varied human-centred challenges for the design and (usable) implementation of (explainable) artiﬁcial visual intelligence solutions in diverse human-system interaction contexts.