close
close

Meta Releases Four New Public AI Models for Developers to Use

Meta Releases Four New Public AI Models for Developers to Use

The top figure shows the temporal blurring process, showing source separation, merging and broadcasting. The bottom figure shows the high-level JASCO representation. The terms are first projected into a low-dimensional representation and are merged in the channel dimensions. The green blocks have the parameters to learn, while the blue blocks are frozen. Source: arXiv (2024). DOI: 10.48550/arxiv.2406.10970

The AI ​​research team at Meta’s Fundamental AI Research team is making four new AI models publicly available to researchers and developers building new applications. The team published a paper on arXiv preprint server describing one of the new models, JASCO, and how it is used.

As interest in AI applications grows, key players in the field are creating AI models that can be used by others to add AI capabilities to their own applications. In this new endeavor, the Meta team has released four new models: JASCO, AudioSeal, and two versions of Chameleon.

JASCO is designed to accept different types of audio input and create enhanced sound. The model, the team says, allows users to customize features like drum sounds, guitar chords, and even melodies to create a melody. The model can also accept text input and use it to spice up the melody.

For example, ask the model to generate a blues song with a lot of bass and drums. Similar descriptions would then appear for other instruments. The Meta team also compared JASCO with other systems designed to do similar things and found that JASCO outperformed them on three key metrics.

AudioSeal can be used to add watermarks to speech generated by an AI app, allowing the results to be easily identified as artificially generated. They note that it can also be used to watermark AI speech segments that have been added to real speech, and that it will come with a commercial license.

Both Chameleon models convert text into visual representations and are released with limited capabilities. Versions 7B and 34B, the team notes, require the models to gain a sense of understanding both text and images. Because of this, they can perform reverse processing, such as generating captions for photos.

More information:
Or Tal et al., Joint sound and symbol conditioning for time-controlled text-to-music generation, arXiv (2024). DOI: 10.48550/arxiv.2406.10970

Demo page: pages.cs.huji.ac.il/adiyoss-lab/JASCO/

Information about the magazine:
arXiv

© 2024 Science X Network

Quote:Meta Releases Four New Public AI Models for Developer Use (2024, July 3) retrieved July 3, 2024, from https://techxplore.com/news/2024-07-meta-ai.html

This document is subject to copyright. Apart from any fair use for private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.