According to Gartner’s latest research, 40% of GenAI solutions will be multimodal (text, image, audio, and video) by 2027.

This shift is expected to provide an enhanced human to AI interaction, as well as the opportunity for GenAI-enabled offerings to be differentiated.

“As the GenAI market evolves towards models natively trained on more than one modality, this helps capture relationships between different data streams and has the potential to scale the benefits of GenAI across all data types and applications. It also allows AI to support humans in performing more tasks, regardless of the environment.” says Erick Brethenoux, VP analyst at Gartner.

Multimodal GenAI will have a transformational impact on enterprise applications by enabling the addition of new features and functionality otherwise unachievable. The impact is not limited to specific industries or use cases, and can be applied at any touchpoint between AI and humans.

Today, many multimodal models are limited to two or three modalities, though this will increase over the next few years to include more.

“Multimodal GenAI is important because data is typically multimodal. When single modality models are combined or assembled to support multimodal GenAI applications, it often leads to latency and less accurate results, resulting in a lower quality experience.” said Brethenoux.

Post Views: 87