New fully open source vision encoder OpenVision arrives to improve on OpenAI’s Clip, Google’s SigLIP

Share This Post

[ad_1]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


The University of California, Santa Cruz has announced the release of OpenVision, a family of vision encoders that aim to provide a new alternative to models including OpenAI’s four-year-old CLIP and last year’s Google’s SigLIP.

A vision encoder is a type of AI model that transforms visual material and files — typically still images uploaded by a model’s creators — into numerical data that can be understood by other, non-visual AI models such as large language models (LLMs). A vision encoder is a necessary component for allowing many leading LLMs to be able to work with images uploaded by users, making it possible for an LLM to identify different image subjects, colors, locations, and more features within an image.

OpenVision, then, with its permissive Apache 2.0 license and family of 26 (!) different models spanning between 5.9 million parameters to 632.1 million parameters, allows any developer or AI model maker within an enterprise or organization to take and deploy an encoder that can be used to ingest everything from images on a construction job site to a user’s washing machine, allowing an AI model to offer guidance and troubleshooting, or myriad other use cases. The Apache 2.0 license allows for usage in commercial applications.

The models were developed by a team led by Cihang Xie, assistant professor at UCSC, along with contributors Xianhang Li, Yanqing Liu, Haoqin Tu, and Hongru Zhu.

The project builds upon the CLIPS training pipeline and leverages the Recap-DataComp-1B dataset, a re-captioned version of a billion-scale web image corpus using LLaVA-powered language models.

Scalable architecture for different enterprise deployment use cases

OpenVision’s design supports multiple use cases.

Larger models are well-suited for server-grade workloads that require high accuracy and detailed visual understanding, while smaller variants—some as lightweight as 5.9M parameters—are optimized for edge deployments where compute and memory are limited.

The models also support adaptive patch sizes (8×8 and 16×16), allowing for configurable trade-offs between detail resolution and computational load.

Strong results across multimodal benchmarks

In a series of benchmarks, OpenVision demonstrates strong results across multiple vision-language tasks.

While traditional CLIP benchmarks such as ImageNet and MSCOCO remain part of the evaluation suite, the OpenVision team cautions against relying solely on those metrics.

Their experiments show that strong performance on image classification or retrieval does not necessarily translate to success in complex multimodal reasoning. Instead, the team advocates for broader benchmark coverage and open evaluation protocols that better reflect real-world multimodal use cases.

Evaluations were conducted using two standard multimodal frameworks—LLaVA-1.5 and Open-LLaVA-Next—and showed that OpenVision models consistently match or outperform both CLIP and SigLIP across tasks like TextVQA, ChartQA, MME, and OCR.

Under the LLaVA-1.5 setup, OpenVision encoders trained at 224×224 resolution scored higher than OpenAI’s CLIP in both classification and retrieval tasks, as well as in downstream evaluations like SEED, SQA, and POPE.

At higher input resolutions (336×336), OpenVision-L/14 outperformed CLIP-L/14 in most categories. Even the smaller models, such as OpenVision-Small and Tiny, maintained competitive accuracy while using significantly fewer parameters.

Efficient progressive training reduces compute costs

One notable feature of OpenVision is its progressive resolution training strategy, adapted from CLIPA. Models begin training on low-resolution images and are incrementally fine-tuned on higher resolutions.

This results in a more compute-efficient training process—often 2 to 3 times faster than CLIP and SigLIP—with no loss in downstream performance.

Ablation studies — where components of a machine learning model are selectively removed to identify their importance or lack thereof to its functioning — further confirm the benefits of this approach, with the largest performance gains observed in high-resolution, detail-sensitive tasks like OCR and chart-based visual question answering.

Another factor in OpenVision’s performance is its use of synthetic captions and an auxiliary text decoder during training.

These design choices enable the vision encoder to learn more semantically rich representations, improving accuracy in multimodal reasoning tasks. Removing either component led to consistent performance drops in ablation tests.

Optimized for lightweight systems and edge computing use cases

OpenVision is also designed to work effectively with small language models.

In one experiment, a vision encoder was paired with a 150M-parameter Smol-LM to build a full multimodal model under 250M parameters.

Despite the tiny size, the system retained robust accuracy across a suite of VQA, document understanding, and reasoning tasks.

This capability suggests strong potential for edge-based or resource-constrained deployments, such as consumer smartphones or on-site manufacturing cameras and sensors.

Why OpenVision matters to enterprise technical decision makers

OpenVision’s fully open and modular approach to vision encoder development has strategic implications for enterprise teams working across AI engineering, orchestration, data infrastructure, and security.

For engineers overseeing LLM development and deployment, OpenVision offers a plug-and-play solution for integrating high-performing vision capabilities without depending on opaque, third-party APIs or restricted model licenses.

This openness allows for tighter optimization of vision-language pipelines and ensures that proprietary data never leaves the organization’s environment.

For engineers focused on creating AI orchestration frameworks, OpenVision provides models at a broad range of parameter scales—from ultra-compact encoders suitable for edge devices to larger, high-resolution models suited for multi-node cloud pipelines.

This flexibility makes it easier to design scalable, cost-efficient MLOps workflows without compromising on task-specific accuracy. Its support for progressive resolution training also allows for smarter resource allocation during development, which is especially beneficial for teams operating under tight budget constraints.

Data engineers can leverage OpenVision to power image-heavy analytics pipelines, where structured data is augmented with visual inputs (e.g., documents, charts, product images). Since the model zoo supports multiple input resolutions and patch sizes, teams can experiment with trade-offs between fidelity and performance without retraining from scratch. Integration with tools like PyTorch and Hugging Face simplifies model deployment into existing data systems.

Meanwhile, OpenVision’s transparent architecture and reproducible training pipeline allow security teams to assess and monitor models for potential vulnerabilities—unlike black-box APIs where internal behavior is inaccessible.

When deployed on-premise, these models avoid the risks of data leakage during inference, which is critical in regulated industries handling sensitive visual data such as IDs, medical forms, or financial records.

Across all these roles, OpenVision helps reduce vendor lock-in and brings the benefits of modern multimodal AI into workflows that demand control, customization, and operational transparency. It gives enterprise teams the technical foundation to build competitive, AI-enhanced applications—on their own terms.

Open for business

The OpenVision model zoo is available in both PyTorch and JAX implementations, and the team has also released utilities for integration with popular vision-language frameworks.

As of this release, models can be downloaded from Hugging Face, and training recipes are publicly posted for full reproducibility.

By providing a transparent, efficient, and scalable alternative to proprietary encoders, OpenVision offers researchers and developers a flexible foundation for advancing vision-language applications. Its release marks a significant step forward in the push for open multimodal infrastructure—especially for those aiming to build performant systems without access to closed data or compute-heavy training pipelines.

For full documentation, benchmarks, and downloads, visit the OpenVision project page or GitHub repository.


[ad_2]
Source link

Related Posts

- Advertisement -spot_img
Slot Gacor Slot777slot mahjongslot mahjongjudi bola onlinesabung ayam onlinejudi bola onlinelive casino onlineslot danaslot thailandsabung ayam onlinejudi bola onlinesitus live casino onlineslot mahjong waysbandar togel onlinejudi bolasabung ayam onlinejudi bolaSABUNG AYAM ONLINESABUNG AYAM ONLINEJUDI BOLA ONLINESABUNG AYAM ONLINEjudi bola onlineslot mahjong wayslive casino onlinejudi bola onlinejudi bola onlinesabung ayam onlinejudi bola onlinemahjong wayssabung ayam onlinesbobet88slot mahjongsabung ayam onlinesbobet mix parlayslot777judi bola onlinesabung ayam onlinesabung ayam onlinejudi bola onlinelive casino onlineslot mahjong waysjuara303juara303juara303juara303juara303juara303juara303juara303SV388Mix ParlayBLACKJACKSLOT777Sabung Ayam OnlineBandar Judi BolaAgen Sicbo Online
agen sabung ayamslot mahjong gacorsabung ayam onlinejudi bola onlinelive casino onlineslot mahjongsabung ayam onlinejudi bola onlinelive casino onlineslot mahjongslot mahjongsabung ayam onlinescatter hitamlive casino onlinemix parlaysabung ayam onlinelive casinomahjong waysmix parlaysabung ayam onlinelive casinomahjong waysmix parlaySBOBETSBOBETCASINO ONLINESBOBETSBOBET88SABUNG AYAM ONLINESBOBETagen judi bolalive casino onlinesabung ayam onlinejudi bola sbobetsabung ayam onlineSabung Ayam OnlineJudi Bola OnlineAgen Live Casino OnlineMahjong Ways 2Sabung Ayam OnlineJudi Bola OnlineAgen Live Casino OnlineMahjong Ways 2Sabung Ayam OnlineJudi Bola OnlineAgen Live Casino OnlineMahjong Ways 2slot gacorjudi bolamix parlayjudi bolasv388SABUNG AYAM ONLINELIVE CASINO ONLINEJUDI BOLAMAHJONG WAYSSLOT MAHJONGJUDI BOLA ONLINELIVE CASINO ONLINESABUNG AYAM ONLINE
SABUNG AYAM ONLINESABUNG AYAM ONLINEJUDI BOLA ONLINEJUDI BOLA ONLINESABUNG AYAM ONLINESABUNG AYAM ONLINESABUNG AYAM ONLINESABUNG AYAM ONLINEjudi bola onlinesabung ayam onlinelive casino onlinesitus toto 4djudi bola onlinejudi bola onlinesabung ayam onlinelive casino onlinejudi bola onlinemix parlaysbobet88sv388sbobet mix parlayws168sbobet88sv388sv388sbobet88sabung ayam onlinejudi bola onlinesabung ayam onlinesbobet mix parlaysabung ayam onlinejudi bola onlineslot gacorsabung ayam onlinejudi bola onlinelive casino onlineslot mahjong waysjuara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303SV388Mix ParlayLive Casino OnlineSitus Slot GacorSV388SBOBET WAPBlackjackPragmatic PlaySV388Judi Bola OnlineBlackjackKakek ZeusSV388Mix ParlayAgen BlackjackSlot Gacor Onlinesabung ayam onlinejudi bola onlinesabung ayam onlinejudi bola onlinejudi bola onlinejudi bola onlinejudi bola onlinesabung ayam onlinejudi bola onlineslot mahjong wayssabung ayam onlinejudi bolaslot mahjonglive casino onlinesabung ayam onlinejudi bola onlineslot mahjong gacorsitus toto togel 4Dsabung ayam onlinesitus toto togel 4Dsitus live casinojudi bola onlinesitus slot mahjongjudi bolasabung ayam onlinesabung ayam onlinemahjong wayssabung ayam onlinejudi bolasabung ayam onlinejudi bola
judi bola onlinejudi bola onlinejudi bola onlinejudi bola onlineJUDI BOLA ONLINESBOBET88JUDI BOLA ONLINEJUDI BOLA ONLINESV388Judi Bola OnlineBlackjackKakek ZeusSV388SBOBET WAPAgen BlackjackSlot Gacor Onlinejuara303juara303juara303juara303juara303juara303juara303juara303judi bola onlinejudi bola onlinejudi bola onlinesabung ayam onlinejudi bolasabung ayam onlinesabung ayam onlinejudi bola onlinesitus live casino onlineslot mahjong wayssabung ayam onlinesitus live casinojudi bola onlinedexel
Slot Mahjong Waysslot danaslot danaslot danasabung ayam onlinesabung ayam onlineJUDI BOLA ONLINESV388Mix ParlayAgen Casino OnlineSLOT777Sabung Ayam OnlineAgen Judi BolaLive Casino Onlinesabung ayam onlinesabung ayam onlinejudi bola onlineslot mahjong wayssabung ayam onlinejudi bola onlinesitus live casino onlineagen togel onlineSabung Ayam OnlineJudi Bola OnlineSlot MahjongBandar togelSabung Ayam OnlineJudi Bola Onlinejudi bola onlinejudi bola onlinesabung ayam onlinelive casino onlineJUDI BOLA ONLINESBOBET88JUDI BOLA ONLINEmix parlaymix parlaylive casinosabung ayam onlinemix parlayslot danaslot mahjongslot mahjongjudi bolaMAHJONG WAYS 2SABUNG AYAM ONLINELIVE CASINO ONLINESABUNG AYAM ONLINESBOBETLIVE CASINO ONLINESLOT MAHJONG WAYSSABUNG AYAM ONLINEMIX PARLAYSABUNG AYAM ONLINESABUNG AYAM ONLINEWALA MERONWALA MERONSITUS SABUNG AYAMSITUS SABUNG AYAMjudi bola terpercayaSabung Ayam Onlinemix parlaySabung Ayam OnlineZeus Slot GacorSitus Judi BolaSabung Ayam Onlinesitus sabung ayamSlot MahjongSV388SBOBET88live casino onlineslot mahjong gacorSV388SBOBET88live casino onlineslot mahjong gacorSabung Ayam OnlineJudi Bola OnlineCasino OnlineMahjong Ways 2Sabung Ayam OnlineJudi Bola OnlineLive Casino OnlineMahjong Ways 2judi bolacasino onlinesv388sabung ayam onlinejudi bola onlineagen live casino onlinemahjong waysLIVE CASINOJUDI BOLA ONLINESABUNG AYAM ONLINESITUS BOLASV388LIVE CASINO ONLINESLOT QRISSABUNG AYAM ONLINEMIX PARLAYMIX PARLAYJUDI BOLA ONLINESLOT MAHJONG
Mahjong Ways 2mahjong ways 2indojawa88daftar dan login wahanabetCapWorks Official ContactAynsley Official SitedexelHarifuku Clinic Official AccessNusa Islands Bali Official PackagesTrinidad and Tobago Pilots’ Association Official About PageNusa Islands Bali Official ContactCapworks Official SiteTech With Mike First Official SiteSahabat Tiopan Official SiteOcean E Soft Official SiteCang Vu Hai Phong Official SiteThe Flat Official SiteTop Dawg Tavern Official SiteDuhoc Interlink Official SiteRatiohead Official SiteMAN Surabaya E-Learning Official SiteShaker Group Official SiteTakaKawa Shoten Official SiteBrydan Solutions Official SiteConcursos Rodin Official SiteConmou Official SiteCareer Wings Official SiteMontero Espinosa Official SiteBDF Ventura Official SiteAkura Official SiteNamulanda Technical Institute Official Sitemenu home roasted coffeetosayama academy workshopjudi bola onlineContactez le Monaco Rugby Sevens - Club Professionnel à 7Virtual Eco Museum Official Event 2025DRT Seitai Official Contacta leading company in UWB technology development