Multimodal mid-level representations for semantic analysis of broadcast video

- Duan, Lingyu