Models and Machine Learning | The Engine and the Pistons of Generative AI
Models are the workhorse of Generative AI - think of the model as the engine behind the scenes that does the work. On one side something is fed into the model and then something is generated out the other side.
The two types of models that Nomad Media has used for many years are:
Audio To Text - The audio track to a video is fed into the model and what pops out the other side is a subtitle that says this is all the things that person was saying. Typically, models are referred to as “something to something." So in this case the model is audio to text because it's taking the audio in and the text is coming out on the other side. Nomad Media has been using audio to text models for many years, and that's how Nomad Media generates subtitles.
Image To Text - Regarding the video content, think of it as a series of of images, and Nomad Media Generative AI takes each image and generates text for that image, creating text labels and text concepts. An example of a concept is something like a concert, gambling, or friendship. A label is an object, for example, a basketball, chair, or car. Other examples include things like taking a license plate out of an image and turning that into text.
The underpinnings of Generative AI is Machine Learning: it's the engine. Machine learning is at the deepest level and it uses the models to generate content. The models used have been trained and validated and tuned and the Generative AI world is making them consistent, repeatable, safe. The term AI is used to represent the overarching services or product– think of AI as like the car and Machine Learning and models are the car engine.
Nomad Media is now able to take advantage of the fast, frantic pace that Machine Learning models are being created. Today there are thousands of research scientists all across the world, figuring out how to make models smarter and faster. They train the models, meaning they literally give it a billion pictures and say, "figure out the similarities and all these," and they use algorithms to figure out how to train the model.
Nomad Media can utilize models from sources like Hugging Face. For example, you can say, "I'm looking for a model that is the best one for doing medical term analysis from audio tracks." That model has been trained to be highly proficient at identifying medical terminology that it was trained on. When those data scientists created that model, they used that type of medical terminology to create the model. They tuned and created it so that it knows those medical words the best from audio. So when it's doing Generative AI, it's creating those words that most people would have no idea how to write it. Generative AI Models today are following two tracks: generic and specific.