建站公司还有前途吗昆明市网站建设公司

张小明 2026/1/12 11:44:18
建站公司还有前途吗,昆明市网站建设公司,网站开发女,wordpress默认登录页修改最近g4f不好用了#xff0c;于是在SCNet搭建vllm跑coder模型#xff0c;以达到让Auto-coder继续发光发热的效果。 这次先用qwen32b模型试试效果。 先上结论#xff0c;这个32b模型不行。感觉不是很聪明的样子。 启动vLLM服务 先创建SCNet AI服务器 登录SCNet官网#xf…最近g4f不好用了于是在SCNet搭建vllm跑coder模型以达到让Auto-coder继续发光发热的效果。这次先用qwen32b模型试试效果。先上结论这个32b模型不行。感觉不是很聪明的样子。启动vLLM服务先创建SCNet AI服务器登录SCNet官网https://www.scnet.cn/选择dcu异步服务器先选一块卡镜像选择qwq32b_vllm 这样vllm环境就是现成的不用再去调试了。启动Vllm服务启动后进入容器先测试一下镜像自带的jupyter notebook里面的指令在notebook中启动vllm服务python app.py # port:7860启动的app.py的代码import gradio as gr from transformers import AutoTokenizer from vllm import LLM, SamplingParams # 初始化模型 tokenizer AutoTokenizer.from_pretrained(/root/public_data/model/admin/qwq-32b-gptq-int8) llm LLM(model/root/public_data/model/admin/qwq-32b-gptq-int8, tensor_parallel_size1, gpu_memory_utilization0.9, max_model_len32768) sampling_params SamplingParams(temperature0.7, top_p0.8, repetition_penalty1.05, max_tokens512) # 定义推理函数 def generate_response(prompt): # 使用模型生成回答 # prompt How many rs are in the word \strawberry\ messages [ {role: user, content: prompt} ] text tokenizer.apply_chat_template( messages, tokenizeFalse, add_generation_promptTrue ) # generate outputs outputs llm.generate([text], sampling_params) # 提取生成的文本 response outputs[0].outputs[0].text return response # 创建 Gradio 界面 def create_interface(): with gr.Blocks() as demo: gr.Markdown(# Qwen/QwQ-32B 大模型问答系统) with gr.Row(): input_text gr.Textbox(label输入你的问题, placeholder请输入问题..., lines3) output_text gr.Textbox(label模型的回答, lines5, interactiveFalse) submit_button gr.Button(提交) submit_button.click(fngenerate_response, inputsinput_text, outputsoutput_text) return demo # 启动 Gradio 应用 if __name__ __main__: demo create_interface() demo.launch(server_name0.0.0.0, shareTrue, debugTrue)可以看到是直接从公共目录调用的模型所以不用再去下载了。5分钟模型就读取好了。8分钟服务就起来了INFO 12-11 08:20:28 model_runner.py:1041] Starting to load model /root/public_data/model/admin/qwq-32b-gptq-int8... INFO 12-11 08:20:28 selector.py:121] Using ROCmFlashAttention backend. Loading safetensors checkpoint shards: 0% Completed | 0/8 [00:00?, ?it/s] Loading safetensors checkpoint shards: 12% Completed | 1/8 [00:3403:58, 34.04s/it] Loading safetensors checkpoint shards: 25% Completed | 2/8 [01:2104:12, 42.13s/it] Loading safetensors checkpoint shards: 38% Completed | 3/8 [02:1003:46, 45.34s/it] Loading safetensors checkpoint shards: 50% Completed | 4/8 [02:5703:02, 45.61s/it] Loading safetensors checkpoint shards: 62% Completed | 5/8 [03:4102:15, 45.10s/it] Loading safetensors checkpoint shards: 75% Completed | 6/8 [04:3101:33, 46.72s/it] Loading safetensors checkpoint shards: 88% Completed | 7/8 [05:0000:41, 41.16s/it] Loading safetensors checkpoint shards: 100% Completed | 8/8 [05:0400:00, 29.21s/it] Loading safetensors checkpoint shards: 100% Completed | 8/8 [05:0400:00, 38.05s/it] INFO 12-11 08:25:34 model_runner.py:1052] Loading model weights took 32.8657 GB INFO 12-11 08:26:58 gpu_executor.py:122] # GPU blocks: 4291, # CPU blocks: 1024 INFO 12-11 08:27:16 model_runner.py:1356] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set enforce_eagerTrue or use --enforce-eager in the CLI. INFO 12-11 08:27:16 model_runner.py:1360] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing gpu_memory_utilization or enforcing eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage. INFO 12-11 08:28:18 model_runner.py:1483] Graph capturing finished in 62 secs. * Running on local URL: http://0.0.0.0:7860 * Running on public URL: https://ad18c32dd20881d8aa.gradio.live This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)使用了gradio好处就是直接外网就可以访问服务也就是这个* Running on local URL: http://0.0.0.0:7860 * Running on public URL: https://ad18c32dd20881d8aa.gradio.live从外网用浏览器打开页面问了它这个问题请帮我思考一下我想用一块64G的dcu 跑大模型api调用的服务主要用于ai自动化编程我应该用vllm启动哪个大模型感觉它的回答不行答案就不贴了它的回答全是考虑没有结论不知道是不是token不够长的缘故命令行直接VLLM启动服务不死心在命令行启动服务以便api调用直接用vllm命令启动服务vllm serve /root/public_data/model/admin/qwq-32b-gptq-int8 --gpu_memory_utilization 0.95 --max_model_len 105152启动后把8000端口映射出去映射到这里https://c-1998910428559491073.ksai.scnet.cn:58043/v1/models显示{object:list,data:[{id:/root/public_data/model/admin/qwq-32b-gptq-int8,object:model,created:1765416800,owned_by:vllm,root:/root/public_data/model/admin/qwq-32b-gptq-int8,parent:null,max_model_len:105152,permission:[{id:modelperm-17616f8047064f4dac923291dd0ce429,object:model_permission,created:1765416800,allow_create_engine:false,allow_sampling:true,allow_logprobs:true,allow_search_indices:false,allow_view:true,allow_fine_tuning:false,organization:*,group:null,is_blocking:false}]}]}这样这个模型的名字是/root/public_data/model/admin/qwq-32b-gptq-int8模型base_url是https://c-1998910428559491073.ksai.scnet.cn:58043/v1/模型的token key可以随便写比如hello现在就可以用CherryStudio测试一下了CherryStudio测试通过证明api调用正常在Auto-coder中调用启动Auto-coderauto-coder.chat配置模型/models /add_model nameqwq-32b-gptq-int8 model_name/root/public_data/model/admin/qwq-32b-gptq-int8 base_urlhttps://c-1998910428559491073.ksai.scnet.cn:58043/v1/ api_keyhello /conf model:qwq-32b-gptq-int8注意有时候需要用add_provider这句/models /add_provider nameqwq-32b-gptq-int8 model_name/root/public_data/model/admin/qwq-32 b-gptq-int8 base_urlhttps://c-1998910428559491073.ksai.scnet.cn:58043/v1/ api_keyhello添加完毕codingauto-coder.chat:~$ /models /add_model nameqwq-32b-gptq-int8 model_name/root/public_data/model/admin/qwq-32b-gptq-int8 b ase_urlhttps://c-1998910428559491073.ksai.scnet.cn:58043/v1/ api_keyhello Successfully added custom model: qwq-32b-gptq-int8 codingauto-coder.chat:~$ /conf model:qwq-32b-gptq-int8 Configuration updated: model qwq-32b-gptq-int8不行它还是傻傻的不够格啊codingauto-coder.chat:~$ 帮我做一个chrome和edge的浏览器翻译插件要求能选词翻译能翻译整个网页。 翻译功能使用openai调用ai大模型 实现要求能配置常见的几款大模型并能自定义兼容openai的大模型。 ────────────────────────────────────────────── Starting Agentic Edit: autocoderwork ─────────────────────────────────────────────── ╭─────────────────────────────────────────────────────────── Objective ───────────────────────────────────────────────────────────╮ │ User Query: │ │ 帮我做一个chrome和edge的浏览器翻译插件要求能选词翻译能翻译整个网页。 │ │ 翻译功能使用openai调用ai大模型实现要求能配置常见的几款大模型并能自定义兼容openai的大模型。 │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ wsl: Failed to start the systemd user session for skywalk. See journalctl for more details. Conversation ID: 4cbaf28c-bdce-410e-9f08-d6619efef059 conversation tokens: 19124 (conversation round: 1) Student: I need help I want to know about the following Please write a story about a girl named Alice who went to the market to buy apples and oranges. She went to the market with her mother to buy apples and oranges. When she arrived at the market, she saw that the apples were expensive and the oranges were cheap. She bought some apples and oranges. She went home and her mother cooked them. She was happy. /think /think /think /think /think /think /think /think /think /think /think再换另一台电脑还是不行都变成复读机了def main(): This function is used to get the main function of this module return self def __init__(self): pass def main(): This function is used to get the main function of this module return self def __init__(self): pass def main(): This function is used to get the main function of this module return self def __init__(self): pass def main(): This function is used to get the main function of this module return self def __init__(self)^C──────────────────────────────────────────────── Agentic Edit Finished ─────────────────────────────────────────────────所以qwq-32b-gptq-int8这个模型达不到Auto-Coder的要求。或者说它智力上达不到要求另外它不支持function call也达不到要求。下次实践目标下回我想运行的是这个模型Qwen/Qwen3-Coder-30B-A3B-Instruct先到SCNet的模型广场找到它。然后把它克隆至控制台也就是这个地址/public/home/ac7sc1ejvp/SothisAI/model/Aihub/Qwen3-Coder-30B-A3B-Instruct/main/Qwen3-Coder-30B-A3B-Instructvllm启动vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/Qwen3-Coder-30B-A3B-Instruct/main/Qwen3-Coder-30B-A3B-Instruct至于效果如何请看下回分解调试vllm serve启动报错vllm serve /root/public_data/model/admin/qwq-32b-gptq-int8Loading safetensors checkpoint shards: 100% Completed | 8/8 [04:4700:00, 35.90s/it] INFO 12-11 08:53:55 model_runner.py:1052] Loading model weights took 32.8657 GB INFO 12-11 08:54:03 gpu_executor.py:122] # GPU blocks: 5753, # CPU blocks: 1024 Process SpawnProcess-1: Traceback (most recent call last): File /opt/conda/lib/python3.10/multiprocessing/process.py, line 314, in _bootstrap self.run() File /opt/conda/lib/python3.10/multiprocessing/process.py, line 108, in run self._target(*self._args, **self._kwargs) File /opt/conda/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py, line 388, in run_mp_engine engine MQLLMEngine.from_engine_args(engine_argsengine_args, File /opt/conda/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py, line 138, in from_engine_args return cls( File /opt/conda/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py, line 78, in __init__ self.engine LLMEngine(*args, File /opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py, line 339, in __init__ self._initialize_kv_caches() File /opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py, line 487, in _initialize_kv_caches self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks) File /opt/conda/lib/python3.10/site-packages/vllm/executor/gpu_executor.py, line 125, in initialize_cache self.driver_worker.initialize_cache(num_gpu_blocks, num_cpu_blocks) File /opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py, line 258, in initialize_cache raise_if_cache_size_invalid(num_gpu_blocks, File /opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py, line 493, in raise_if_cache_size_invalid raise ValueError( ValueError: The models max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (92048). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine. I1211 08:54:04.332280 2611 ProcessGroupNCCL.cpp:1126] [PG 0 Rank 0] ProcessGroupNCCL destructor entered. I1211 08:54:04.332350 2611 ProcessGroupNCCL.cpp:1111] [PG 0 Rank 0] Launching ProcessGroupNCCL abort asynchrounously. I1211 08:54:04.332547 2611 ProcessGroupNCCL.cpp:1016] [PG 0 Rank 0] future is successfully executed for: ProcessGroup abort I1211 08:54:04.332578 2611 ProcessGroupNCCL.cpp:1117] [PG 0 Rank 0] ProcessGroupNCCL aborts successfully. I1211 08:54:04.332683 2611 ProcessGroupNCCL.cpp:1149] [PG 0 Rank 0] ProcessGroupNCCL watchdog thread joined. I1211 08:54:04.332782 2611 ProcessGroupNCCL.cpp:1153] [PG 0 Rank 0] ProcessGroupNCCL heart beat monitor thread joined. Traceback (most recent call last): File /opt/conda/bin/vllm, line 8, in module sys.exit(main()) File /opt/conda/lib/python3.10/site-packages/vllm/scripts.py, line 165, in main args.dispatch_function(args) File /opt/conda/lib/python3.10/site-packages/vllm/scripts.py, line 37, in serve uvloop.run(run_server(args)) File /opt/conda/lib/python3.10/site-packages/uvloop/__init__.py, line 82, in run return loop.run_until_complete(wrapper()) File uvloop/loop.pyx, line 1518, in uvloop.loop.Loop.run_until_complete File /opt/conda/lib/python3.10/site-packages/uvloop/__init__.py, line 61, in wrapper return await main File /opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py, line 538, in run_server async with build_async_engine_client(args) as engine_client: File /opt/conda/lib/python3.10/contextlib.py, line 199, in __aenter__ return await anext(self.gen) File /opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py, line 105, in build_async_engine_client async with build_async_engine_client_from_engine_args( File /opt/conda/lib/python3.10/contextlib.py, line 199, in __aenter__ return await anext(self.gen) File /opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py, line 192, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start重点是这两句raise_if_cache_size_invalid(num_gpu_blocks,File /opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py, line 493, in raise_if_cache_size_invalidraise ValueError(ValueError: The models max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (92048). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.也就是提高gpu_memory_utilization就行 --gpu_memory_utilization 0.95vllm serve /root/public_data/model/admin/qwq-32b-gptq-int8 --gpu_memory_utilization 0.95这回稍微好一点了ValueError: The models max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (105152). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.我再调成0.98试试不行就降低max_model_len 它降低为105152 或101866vllm serve /root/public_data/model/admin/qwq-32b-gptq-int8 --gpu_memory_utilization 0.95 --max_model_len 105152ok了
版权声明:本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!

网站建设价格情况北京做网站推广多少钱

在数字化时代,我们每天都会产生大量包含人体动作的图像数据。无论是健身教练需要对比学员的动作规范,还是舞蹈老师想要查找特定的舞姿参考,亦或是医疗康复师要评估患者的训练效果,传统的关键词搜索都难以精准匹配实际需求。人体姿…

张小明 2026/1/11 5:10:53 网站建设

泉州网站开发WordPress移植emlog

Kotaemon框架为何成为开发者心中的RAG最优解? 在企业级AI应用落地的浪潮中,一个现实问题反复浮现:大语言模型虽然“能说会道”,但面对专业领域知识时却常常“信口开河”。金融顾问引用不存在的政策条款、客服机器人给出错误的操作…

张小明 2026/1/10 16:12:41 网站建设

广告公司网站源码app开发有限公司

Wan2.2-T2V-5B:让元宇宙场景“秒出”动态原型的AI引擎 🚀 你有没有经历过这样的场景? 设计师在白板上激情描绘:“我们要一个漂浮在云端的未来城市,桥是发光的,车会飞,空气里还有缓慢飘动的能量粒…

张小明 2026/1/10 23:17:57 网站建设

广东省建设厅网站公司网站需要服务器吗

IwaraDownloadTool终极指南:3步实现高速批量视频下载 【免费下载链接】IwaraDownloadTool Iwara 下载工具 | Iwara Downloader 项目地址: https://gitcode.com/gh_mirrors/iw/IwaraDownloadTool 还在为Iwara视频下载效率低下而烦恼吗?传统下载方式…

张小明 2026/1/11 1:52:57 网站建设

网站建设的收费wordpress 发码插件

摘要 随着社会对动物福利的关注度不断提升,流浪动物救助与领养问题逐渐成为公众热议的话题。传统的动物领养方式存在信息不对称、流程繁琐等问题,导致领养效率低下。为解决这一问题,开发一个高效、透明的动物领养平台具有重要意义。该平台旨在…

张小明 2026/1/10 18:44:29 网站建设

郑州网站制作郑州网站制作案例wordpress西部

在《开拓者:正义之怒》中,多职业兼职是提升角色强度的核心策略。通过精心规划不同职业的转换时机与装备搭配,玩家可以打造出在高等难度下依然游刃有余的强力角色。本文基于实战经验,详细解析剧情队友的职业构建思路与优化方案。 【…

张小明 2026/1/8 18:31:06 网站建设