发布即可部署，英特尔酷睿Ultra平台完成百度文心4.5模型端侧适配_AI人工智能

智驾网 2025-06-30 23:39

发布即可部署，英特尔酷睿Ultra平台完成百度文心4.5模型端侧适配

OpenVINO工具套件是由英特尔开发的开源工具套件，旨在优化和加速深度学习模型的推理性能，支持跨平台部署并充分利用英特尔硬件资源。OpenVINO助力行业中广泛的先进模型在英特尔人工智能产品和解决方案中的性能，应用在AI PC、边缘AI和更多人工智能的使用场景当中。

今天（6月30日），百度正式发布文心大模型4.5系列开源模型，这是百度首次开源其文心大模型。

英特尔OpenVINO 与百度飞桨多年来一直保持着紧密的合作。在此次文心系列模型的发布过程中，英特尔借助OpenVINO在模型发布的首日，也即Day0即实现对文心端侧模型的适配和在英特尔酷睿Ultra平台上的端侧部署。

从2021年开始，百度飞桨和英特尔OpenVINO进行深入合作，双方进行深度适配，为开发者提供了更有效更便捷的AI开发工具链。

经过双方适配的众多模型，如PaddleOCR，PaddleSeg，PaddleDection等，在金融、医疗、智能智造等领域被广泛应用，开发者可以直接将飞桨模型用OpenVINOTM推理和部署，或通过OpenVINO™的模型优化器转化为IR格式，进一步部署和推理。

在文心4.5系列大模型宣布开源同时一时间，英特尔即宣布OpenVINO已经对0.3B参数量的稠密模型成功适配，并在英特尔酷睿Ultra平台上成功部署且获得了优异的推理性能。

英特尔表示，将持续与百度保持紧密合作，适配更多的文心系列模型。

与此同时，英特尔发布了如何在端侧布署文心大模型4.5系列开源模型的快速上手指南 (Get Started)——

第一步，环境准备

基于以下命令可以完成模型部署任务在Python上的环境安装。

python -m venv py_venv

./py_venv/Scripts/activate.bat

pip install --pre -U openvino-genai --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly

pip install nncf

pip install git+https://github.com/openvino-dev-samples/optimum-intel.git@ernie

第二步，模型下载和转换

在部署模型之前，我们首先需要将原始的PyTorch模型转换为OpenVINOTM的IR静态图格式，并对其进行压缩，以实现更轻量化的部署和最佳的性能表现。通过Optimum提供的命令行工具optimum-cli，我们可以一键完成模型的格式转换和权重量化任务：

optimum-cli export openvino --model baidu/ERNIE-4.5-0.3B-PT --task text-generation-with-past --weight-format fp16 --trust-remote-code ERNIE-4.5-0.3B-PT-OV

开发者可以根据模型的输出结果，调整其中的量化参数，包括：

· --model：为模型在HuggingFace上的model id，这里我们也提前下载原始模型，并将model id替换为原始模型的本地路径，针对国内开发者，推荐使用ModelScope魔搭社区作为原始模型的下载渠道，具体加载方式可以参考ModelScope官方指南：https://www.modelscope.cn/docs/models/download

· --weight-format：量化精度，可以选择fp32,fp16,int8,int4,int4_sym_g128,int4_asym_g128,int4_sym_g64,int4_asym_g64

· --group-size：权重里共享量化参数的通道数量

· --ratio：int4/int8权重比例，默认为1.0，0.6表示60%的权重以int4表，40%以int8表示

· --sym：是否开启对称量化

第三步，模型部署

针对ERNIE-4.5系列的文本生成类模型，我们可以使用Optimum-Intel进行任务部署和加速。Optimum-Intel可以通过调用OpenVINO™ runtime后端，以实现在Intel CPU及GPU平台上的性能优化，同时由于其兼容Transformers库，因此我们可以直接参考官方示例，将其迁移至Optimum-Intel执行。

from transformers import AutoTokenizer

from optimum.intel import OVModelForCausalLM

model_path = "ERNIE-4.5-0.3B-PT-OV"

# load the tokenizer and the model

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model = OVModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)

# prepare the model input

prompt = "Give me a short introduction to large language model."

messages = [

{"role": "user", "content": prompt}

]

text = tokenizer.apply_chat_template(

messages,

tokenize=False,

add_generation_prompt=True

)

model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)

# conduct text completion

generated_ids = model.generate(

model_inputs.input_ids,

max_new_tokens=1024

)

output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# decode the generated ids

generate_text = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")

print("generate_text:", generate_text)

输入结果参考：

generate_text: "Large Language Models (LLMs) are AI-powered tools that use natural language processing (NLP) techniques to generate human-like text, answer questions, and perform reasoning tasks. They leverage massive datasets, advanced algorithms, and computational power to process, analyze, and understand human language, enabling conversational AI that can understand, interpret, and respond to a wide range of inputs. Their applications range from customer support to academic research, from language translation to creative content generation."

打赏