Visual model exploits similarity of 打, 拍, 拉; text model starts from embeddings
Three renditions of 人工智能—full, 80 % retained, 50 % retained—appear side by side. You can read each instantly, even though the latter two show only a slice of the original image.