Video-text retrieval techniques endeavour to bridge the semantic gap between visual content and natural language descriptions. By learning joint representations for both video and text, these ...