UA24 — Новини України

Microsoft has recently announced the launch of three new foundational artificial intelligence models capable of generating text, audio, and images, aiming to enhance competition in the AI sector despite its existing partnership with OpenAI.

Microsoft has recently unveiled three new foundational artificial intelligence models designed to generate text, audio, and images. This strategic move is intended to bolster competition against other laboratories in the artificial intelligence field, even as the company maintains its partnership with OpenAI. The announcement was reported by TechCrunch.

The newly introduced models are named MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2. The MAI-Transcribe-1 model specializes in speech recognition and converts spoken language into text across 25 different languages. According to the company, this model operates 2.5 times faster than the previous version, Azure Fast, indicating significant advancements in speech recognition technology.

The second model, MAI-Voice-1, is capable of generating sound and can create up to 60 seconds of audio in just one second. It also allows users to customize their own voice, opening new avenues for personalizing audio content. The third model, MAI-Image-2, is designed for video creation, marking an important step in the development of visual artificial intelligence technologies.

It is noteworthy that the MAI-Image-2 model was already showcased in the MAI Playground testing environment on March 19. Now, all three models are available on the Microsoft Foundry platform, along with new tools for working with text and voice in the MAI Playground, making them accessible to a wide range of users.

The development of these new models was undertaken by the MAI Superintelligence team, led by Microsoft AI CEO Mustafa Suleyman. This team was established in November 2025 with the goal of accelerating research in the field of artificial intelligence.

Microsoft emphasized that it is focusing on a 'human-centered' approach to model development. This means that the training of the models takes into account real-world communication methods, making them more user-friendly and practical. Furthermore, Microsoft plans to continue releasing new models and integrating them into its products, reflecting a strategic approach to the advancement of artificial intelligence.

The company also aims to compete in the market by reducing the cost of its services. Prices for the new models start at $0.36 per hour for speech recognition, $22 for 1 million characters for voice generation, $5 for 1 million tokens of text, and $33 for 1 million tokens of images.

Despite launching its own models, Microsoft confirmed that it continues to collaborate with OpenAI. According to Mustafa Suleyman, a review of the partnership terms has allowed the company to more actively develop its own research in superintelligence. Microsoft has invested over $13 billion in the development of OpenAI and continues to utilize its models in its products as part of a multi-year agreement.

Thus, the launch of Microsoft's new artificial intelligence models not only highlights the growing competition in the market but also demonstrates the company’s commitment to staying at the forefront of technologies that have the potential to transform the way we interact with information and technology as a whole.

Microsoft Launches New AI Models to Compete with OpenAI