The emerging company xAI unveiled the Grok-1.5 Vision model for multi-media generative artificial intelligence.
This new model can comprehend text, in addition to its ability to process information found in documents, graphs, charts, screenshots, and photographs.
A company owned by billionaire Elon Musk plans to introduce Grok-1.5 Vision or Grok-1.5V soon to early adopters of the lab and current Grok users.
The company announced in a tweet: “Grok-1.5 Vision can compete with current models of multimedia communication in several fields, ranging from thinking in multiple disciplines to understanding scientific documents, graphs, charts, screenshots, and photographs.”
The multi-media Grok-1.5 Vision model was unveiled after several weeks of xAI’s announcement of the updated Grok-1.5 model for AI chatbots.
The company highlights several examples to illustrate the capabilities of Grok-1.5 Vision, from converting a diagram into Python code instructions, to creating a simple story from a child’s drawing, and converting a table to a CSV file.
The company is proud of its product Grok-1.5 Vision outperforming its competitors in the RealWorldQA evaluation, a new standard created to assess spatial understanding in the real world.
xAI explained the RealWorldQA standard by using over 700 images with a question and answer for each element.
The images ranged from unknown sources to those captured from vehicles to other samples of real-life. The xAI’s RealWorldQA standard for public use was released under a Creative Commons license.
The emerging AI company continues to make continuous progress, striving to keep up with OpenAI and major players in the market since launching its chatbot in November 2023.
Grok-1.5 Vision is set to be launched within less than a month of the open-source Grok project launch, despite some controversy surrounding its efforts.
Earlier this month, researchers revealed that the chatbot Grok can guide users about criminal activities.
XAI company continues its progress in developing artificial intelligence that can understand the world and provide public benefits.