StepFun was one of the pioneering AI labs in China specialising in multimodal models, company founder and CEO Jiang Daxin said in an interview with Tencent’s news portal on Saturday. Multimodal models are designed to understand multiple types of input data such as text, video and audio, unlike traditional models that only handle one type.
Jiang emphasised StepFun’s strengths across audio, image, video and music generation models, along with its focus on foundational AI technology.
“We’re doing pretty well in these areas, which allow us to integrate them and explore cutting-edge directions,” he was quoted as saying.

This comes as the Chinese AI market becomes more homogeneous after a rapid influx of model releases that lack significant differentiation since earlier this year, according to analysts.