MAC上图形图像视频处理AI平台Draw Things模型功能与性能记录表

社区话题 📺 VFX Pipeline | 数字创意工作流 MAC上图形图像视频处理AI平台Draw Things模型功能与性能记录表

标签: ,

正在查看 5 条回复
  • 作者
    帖子
    • #131796

      追光
      参与者

      随着 AI 影像技术的爆发,从 FLUX 到 Z Image,从 Try-on 到 Depth Anything V2,各种模型层出不穷。为了帮助创作者在 Draw Things、ComfyUI 等工具中更高效地进行生产力配置,我将自己使用或者测试过的模型系统性地记录各类前沿模型的实测表现。

      本记录表不仅是简单的“说明书”,更是一份实战“跑分表”。内容将涵盖:

      1. 功能解析: 深入挖掘模型在虚拟换装(Try-on)、深度预测(Depth Anything)及真实感增强(Real LoRA)等垂直领域的应用上限。

      2. 性能实测: 基于不同硬件环境,记录主模型(如 Z Image Turbo)在不同步数(Steps)与引导值(CFG)下的出图速度与质量平衡点。

      3. 最优参数方案: 总结各类加速 LoRA 与多模态工作流的协同逻辑,提供“拿来即用”的提示词模版与参数组合建议。

      希望通过这些详尽的数据沉淀,能为社区里的影视后期、数字绘画及 AIGC 爱好者提供一份客观、准确的参考指南,共同推动 AI 影像创作的工程化落地。
      建议: 在 NewVFX 发布时,你可以配合一张你整理的 Excel 截图或一张具体的模型对比图(例如 Z Image 2步 vs 4步的画质对比),这样会更有说服力!

      Draw Things:M系列芯片原生本地AI图像、视频处理AI模型应用详解

    • #131800

      追光
      参与者

      SeedVR模型

      SeedVR2 是 Seed团队推出的顶级视频与图像修复(Video/Image Restoration)模型,是 2026 年影像修复领域的行业标杆。SeedVR2 是一款基于 Diffusion Transformer (DiT) 架构的通用影像修复大模型。它专注于解决视频与图像在现实世界中遇到的各种退化问题(如:画质模糊、噪点、低分辨率、旧胶片划痕等)。相比于初代 SeedVR,它通过“对抗性蒸馏(Adversarial Distillation)”技术,极大地缩短了推理时间。

      左侧为3B,右侧为7B

      7B 版本的核心跨越(相比 3B)


      核心功能与亮点
      一步推理(One-Step Inference): 传统的扩散修复模型通常需要 20-50 步才能出图,而 SeedVR2 仅需 1 步 即可完成高清修复。这使得它在视频处理这种“巨量计算”场景下,速度提升了数十倍

      极致画质提升: 能够将低清(如 480p 或更低)的旧素材瞬间拉升至 4K/8K 级别,同时精准补全皮肤毛孔、织物纹理等精细细节,而非简单的平滑模糊。

      任意分辨率支持: 具备“分辨率不敏感”特性,支持非标准比例的影像修复。

      时域一致性: 针对视频修复,它能确保前后帧之间的细节、色彩、光影完美衔接,彻底告别“闪烁”感。


      3. 性能参数记录

      使用Z Image turbo生成,Step:2,使其不清晰

      环境:应用平台Mac M1 pro芯片,Draw things平台

      SeedVR 3B 8bits:3B 的参数量使其对 VRAM(显存)的要求极低,在 8GB 显存环境下即可流畅运行,是笔记本用户的“性能救星”,展现出了惊人的运行效率,其修复的画面锐利、清晰,效果惊艳,普遍会导致眼睛偏蓝,但对细节和一致性的保持相对7B明显逊色,但速度快。

      3B版本:Step1:15秒,Step2:37.23秒

      SeedVR 7B 8bits:拥有7B参数量,处理速度慢,但对细节的还原程度远超3B,同时能非常完善的保持原始画面的细节,处理不是那么猛烈,但细节丰富、结果和原始图片拥有超高的一致性,适合对一致性要求较高的用户。

      7B版本:Step1:30秒;Step2:54.48秒


      语义理解深度: 在修复人脸时,7B 版本能识别出极其细微的情绪纹路和虹膜细节;在修复自然景观时,它能区分不同植物的叶片质感,而非统一“糊”成绿色。

      光影重构能力: 它具备更强的全局光照预测能力,不仅能提升画质,还能修正原图中由于拍摄器材限制导致的动态范围不足(HDR 模拟效果)。

      极端修复(Extreme Inpainting): 对于大面积缺失或损坏的素材,7B 的补全逻辑更符合人类视觉常识,能生成极具欺骗性的真实细节。

    • #131813

      追光
      参与者

      Qwen image 2512

      Qwen Image 2512 是阿里巴巴通义千问团队推出的新一代图像生成与多模态视觉模型。该版本聚焦高精度视觉创作与深度语义理解,支持文生图。基于大规模图文对齐数据与扩散-Transformer混合架构,模型在细节还原、光影渲染、构图逻辑及多主体交互方面实现显著突破,能精准遵循复杂提示词,生成符合物理规律与美学标准的高质量图像。

      A cinematic portrait of a [20-year-old Asian woman with short, black robotic-styled hair, looking directly with an intense gaze], wearing a [high-tech, armored techwear jacket with integrated blue LED lights and strap details, over a black mesh turtleneck]. She is standing in a [rain-slicked, neon-lit futuristic Tokyo street at night, reflection of holographic ads in a puddle], shot on 35mm lens, realistic, volumetric lighting.
      Steps: 2, Sampler: DPM++ 2M Trailing, Guidance Scale: 1.0, Seed: 3882193997, Size: 768×768, Model: qwen_image_2512_i8x.ckpt, Strength: 1.0, Seed Mode: Scale Alike, LoRA Model: wuli_2512_2steps_lora_q8p.ckpt, LoRA Weight: 1.0 {“c”:”A cinematic portrait of a [20-year-old Asian woman with short, black robotic-styled hair, looking directly with an intense gaze], wearing a [high-tech, armored techwear jacket with integrated blue LED lights and strap details, over a black mesh turtleneck]. She is standing in a [rain-slicked, neon-lit futuristic Tokyo street at night, reflection of holographic ads in a puddle], shot on 35mm lens, realistic, volumetric lighting.”,”lora”:[{“model”:”wuli_2512_2steps_lora_q8p.ckpt”,”weight”:1}],”mask_blur”:1.5,”model”:”qwen_image_2512_i8x.ckpt”,”profile”:{“duration”:35.161870083335089,”timings”:[{“durations”:[4.4902754166687373],”name”:”text_encoded”},{“durations”:[0.002849958331353264],”name”:”controls_generated”},{“durations”:[4.2678395416696731,14.533455374999903,11.108817874999659],”name”:”sampling”},{“durations”:[0.75542641666470445],”name”:”image_decoded”}]},”sampler”:”DPM++ 2M Trailing”,”scale”:1,”seed”:3882193997,”seed_mode”:”Scale Alike”,”size”:”768×768″,”steps”:2,”strength”:1,”uc”:””,”v2″:{“aestheticScore”:6,”batchCount”:1,”batchSize”:1,”causalInference”:0,”causalInferencePad”:0,”cfgZeroInitSteps”:0,”cfgZeroStar”:false,”clipSkip”:1,”clipWeight”:1,”compressionArtifacts”:”disabled”,”compressionArtifactsQuality”:43.100000000000001,”controls”:[],”cropLeft”:0,”cropTop”:0,”decodingTileHeight”:640,”decodingTileOverlap”:128,”decodingTileWidth”:640,”diffusionTileHeight”:1024,”diffusionTileOverlap”:128,”diffusionTileWidth”:1024,”fps”:5,”guidanceEmbed”:3.5,”guidanceScale”:1,”guidingFrameNoise”:0.02,”height”:768,”hiresFix”:false,”hiresFixHeight”:640,”hiresFixStrength”:0.69999999999999996,”hiresFixWidth”:640,”id”:0,”imageGuidanceScale”:1.5,”imagePriorSteps”:5,”loras”:[{“file”:”wuli_2512_2steps_lora_q8p.ckpt”,”mode”:”all”,”weight”:1}],”maskBlur”:1.5,”maskBlurOutset”:0,”model”:”qwen_image_2512_i8x.ckpt”,”motionScale”:127,”negativeAestheticScore”:2.5,”negativeOriginalImageHeight”:512,”negativeOriginalImageWidth”:512,”negativePromptForImagePrior”:true,”numFrames”:14,”originalImageHeight”:768,”originalImageWidth”:576,”preserveOriginalAfterInpaint”:true,”refinerStart”:0.84999999999999998,”resolutionDependentShift”:false,”sampler”:15,”seed”:3882193997,”seedMode”:2,”separateClipL”:false,”separateOpenClipG”:false,”separateT5″:false,”sharpness”:0,”shift”:1,”speedUpWithGuidanceEmbed”:true,”stage2Guidance”:1,”stage2Shift”:1,”stage2Steps”:10,”startFrameGuidance”:1,”steps”:2,”stochasticSamplingGamma”:0.29999999999999999,”strength”:1,”t5TextEncoder”:true,”targetImageHeight”:768,”targetImageWidth”:576,”teaCache”:false,”teaCacheEnd”:-1,”teaCacheMaxSkipSteps”:3,”teaCacheStart”:5,”teaCacheThreshold”:0.20000000000000001,”tiledDecoding”:false,”tiledDiffusion”:false,”upscalerScaleFactor”:0,”width”:768,”zeroNegativePrompt”:false}}

      广泛适用于电商设计、广告创意、游戏资产、教育课件等内容生产场景。相较于前代,2512版本在推理速度、显存优化与多语言支持上全面升级,并深度打通与Qwen大语言模型的协同链路,实现“指令理解—图像生成—二次迭代”的无缝工作流。

      个人测试实际结果:使用了Wuli 2Step加速lora后俩步即可生成高质量图像,生成速度是Z image Turbo的俩倍,是Flux2-klein-9B得1.5倍,在M1 pro上面生成768*768的高质量图片仅31秒左右。

      质量对比:除了速度快以外,其生成图片的质量非常接近于Google的banana,稍微逊色于Z image,但对亚洲地区的人物有更优秀的表现。

      我个人经过反复优化和调整的参数配置,可以复制直接粘贴使用

      
      {"seedMode":2,"controls":[],"resolutionDependentShift":false,"preserveOriginalAfterInpaint":true,"sharpness":0,"height":768,"loras":[{"mode":"all","file":"wuli_2512_2steps_lora_q8p.ckpt","weight":1}],"upscaler":"","steps":2,"cfgZeroInitSteps":0,"faceRestoration":"","model":"qwen_image_2512_i8x.ckpt","hiresFix":false,"causalInferencePad":0,"cfgZeroStar":false,"tiledDecoding":false,"shift":1,"width":768,"strength":1,"maskBlur":1.5,"seed":4107639530,"batchCount":1,"batchSize":1,"maskBlurOutset":0,"refinerModel":"","tiledDiffusion":false,"sampler":15,"guidanceScale":1}
      
    • #131815

      追光
      参与者

      Z Image Turbo

      一款面向高效视觉创作场景的新一代 AI 图像生成模型。正如其名,“Turbo”代表了其在推理速度与响应效率上的极致突破。模型采用轻量化扩散架构与动态步长蒸馏技术,在保持高保真画质的同时,将生成耗时大幅压缩,真正实现“所写即所见”的实时创作体验。

      A cinematic portrait of a [20-year-old Asian woman with short, black robotic-styled hair, looking directly with an intense gaze], wearing a [high-tech, armored techwear jacket with integrated blue LED lights and strap details, over a black mesh turtleneck]. She is standing in a [rain-slicked, neon-lit futuristic Tokyo street at night, reflection of holographic ads in a puddle], shot on 35mm lens, realistic, volumetric lighting.
      Steps: 4, Sampler: UniPC Trailing, Guidance Scale: 1.0, Seed: 3629999393, Size: 768×768, Model: z_image_turbo_1.0_i8x.ckpt, Strength: 1.0, Seed Mode: Scale Alike, Shift: 3.0 {“c”:”A cinematic portrait of a [20-year-old Asian woman with short, black robotic-styled hair, looking directly with an intense gaze], wearing a [high-tech, armored techwear jacket with integrated blue LED lights and strap details, over a black mesh turtleneck]. She is standing in a [rain-slicked, neon-lit futuristic Tokyo street at night, reflection of holographic ads in a puddle], shot on 35mm lens, realistic, volumetric lighting.”,”mask_blur”:1.5,”model”:”z_image_turbo_1.0_i8x.ckpt”,”profile”:{“duration”:35.73231183333337,”timings”:[{“durations”:[2.6575174999998126],”name”:”text_encoded”},{“durations”:[0.0025598749998607673],”name”:”controls_generated”},{“durations”:[0.58710766666627023,7.8125014166653273,7.6204510416682751,7.8706594583345577,7.7509831249990384],”name”:”sampling”},{“durations”:[1.426829166666721],”name”:”image_decoded”}]},”sampler”:”UniPC Trailing”,”scale”:1,”seed”:3629999393,”seed_mode”:”Scale Alike”,”shift”:3,”size”:”768×768″,”steps”:4,”strength”:1,”uc”:””,”v2″:{“aestheticScore”:6,”batchCount”:1,”batchSize”:1,”causalInference”:0,”causalInferencePad”:0,”cfgZeroInitSteps”:0,”cfgZeroStar”:false,”clipSkip”:1,”clipWeight”:1,”compressionArtifacts”:”disabled”,”compressionArtifactsQuality”:43.100000000000001,”controls”:[],”cropLeft”:0,”cropTop”:0,”decodingTileHeight”:640,”decodingTileOverlap”:128,”decodingTileWidth”:640,”diffusionTileHeight”:1024,”diffusionTileOverlap”:128,”diffusionTileWidth”:1024,”fps”:5,”guidanceEmbed”:3.5,”guidanceScale”:1,”guidingFrameNoise”:0.02,”height”:768,”hiresFix”:false,”hiresFixHeight”:448,”hiresFixStrength”:0.69999999999999996,”hiresFixWidth”:448,”id”:0,”imageGuidanceScale”:1.5,”imagePriorSteps”:5,”loras”:[],”maskBlur”:1.5,”maskBlurOutset”:0,”model”:”z_image_turbo_1.0_i8x.ckpt”,”motionScale”:127,”negativeAestheticScore”:2.5,”negativeOriginalImageHeight”:0,”negativeOriginalImageWidth”:0,”negativePromptForImagePrior”:true,”numFrames”:14,”originalImageHeight”:0,”originalImageWidth”:0,”preserveOriginalAfterInpaint”:true,”refinerStart”:0.84999999999999998,”resolutionDependentShift”:false,”sampler”:17,”seed”:3629999393,”seedMode”:2,”separateClipL”:false,”separateOpenClipG”:false,”separateT5″:false,”sharpness”:0,”shift”:3,”speedUpWithGuidanceEmbed”:true,”stage2Guidance”:1,”stage2Shift”:1,”stage2Steps”:10,”startFrameGuidance”:1,”steps”:4,”stochasticSamplingGamma”:0.29999999999999999,”strength”:1,”t5TextEncoder”:true,”targetImageHeight”:0,”targetImageWidth”:0,”teaCache”:false,”teaCacheEnd”:-1,”teaCacheMaxSkipSteps”:3,”teaCacheStart”:5,”teaCacheThreshold”:0.20000000000000001,”tiledDecoding”:false,”tiledDiffusion”:false,”upscalerScaleFactor”:0,”width”:768,”zeroNegativePrompt”:false}}

      在核心能力上,Z Image Turbo 强化了复杂语义解析与空间逻辑推理,支持多主体精准控制、物理光影模拟及超分辨率无损输出。内置的智能编辑引擎可完成局部重绘、风格迁移与构图自适应调整,并提供分层掩码与提示词权重调节功能,满足专业设计师的精细化需求。模型经过大规模多模态数据对齐训练,对长尾场景、抽象概念及跨文化视觉元素的还原度显著提升。

      个人实测:在M1 pro的Draw things中使用 Z image turbo的默认配置,生成768*768高质量图片的时间为 61秒左右,生成质量非常稳定可靠,也是我创作制作中最主要的模型之一。

      • 该回复由 追光 于 1 小时, 23 分 前 修正。
      • 该回复由 追光 于 1 小时, 19 分 前 修正。
      • 该回复由 追光 于 1 小时, 19 分 前 修正。
    • #131823

      追光
      参与者

      Flux2-klein-9B

      FLUX.2 [klein] 9B 是由 Black Forest Labs 推出的旗舰级轻量化文生图模型,专为实时图像生成与编辑场景打造。模型基于 90 亿参数的整流流(Rectified Flow)Transformer 架构,集成 8B Qwen3 文本编码器实现精准语义理解,并通过步骤蒸馏技术将推理压缩至仅需 4 步,达成亚秒级端到端生成速度 。

      A cinematic portrait of a [20-year-old Asian woman with short, black robotic-styled hair, looking directly with an intense gaze], wearing a [high-tech, armored techwear jacket with integrated blue LED lights and strap details, over a black mesh turtleneck]. She is standing in a [rain-slicked, neon-lit futuristic Tokyo street at night, reflection of holographic ads in a puddle], shot on 35mm lens, realistic, volumetric lighting.
      Steps: 4, Sampler: DDIM Trailing, Guidance Scale: 1.0, Seed: 2189345100, Size: 768×768, Model: flux_2_klein_9b_i8x.ckpt, Strength: 1.0, Seed Mode: Scale Alike, Shift: 3.0, CLIP Skip: 2 {“c”:”A cinematic portrait of a [20-year-old Asian woman with short, black robotic-styled hair, looking directly with an intense gaze], wearing a [high-tech, armored techwear jacket with integrated blue LED lights and strap details, over a black mesh turtleneck]. She is standing in a [rain-slicked, neon-lit futuristic Tokyo street at night, reflection of holographic ads in a puddle], shot on 35mm lens, realistic, volumetric lighting.”,”clip_skip”:2,”mask_blur”:2.5,”model”:”flux_2_klein_9b_i8x.ckpt”,”profile”:{“duration”:49.741630291664478,”timings”:[{“durations”:[4.147645166664006],”name”:”text_encoded”},{“durations”:[0.004927791665977566],”name”:”controls_generated”},{“durations”:[0.31301587500274763,11.253990499997599,10.579278375000285,11.137198791668197,10.936794375000318],”name”:”sampling”},{“durations”:[1.3637274166649149],”name”:”image_decoded”}]},”sampler”:”DDIM Trailing”,”scale”:1,”seed”:2189345100,”seed_mode”:”Scale Alike”,”shift”:3,”size”:”768×768″,”steps”:4,”strength”:1,”uc”:””,”v2″:{“aestheticScore”:6,”batchCount”:1,”batchSize”:1,”causalInference”:0,”causalInferencePad”:0,”cfgZeroInitSteps”:0,”cfgZeroStar”:false,”clipSkip”:2,”clipWeight”:1,”compressionArtifacts”:”disabled”,”compressionArtifactsQuality”:43.100000000000001,”controls”:[],”cropLeft”:0,”cropTop”:0,”decodingTileHeight”:640,”decodingTileOverlap”:128,”decodingTileWidth”:640,”diffusionTileHeight”:1024,”diffusionTileOverlap”:128,”diffusionTileWidth”:1024,”fps”:5,”guidanceEmbed”:3.5,”guidanceScale”:1,”guidingFrameNoise”:0.02,”height”:768,”hiresFix”:false,”hiresFixHeight”:640,”hiresFixStrength”:0.69999999999999996,”hiresFixWidth”:640,”id”:0,”imageGuidanceScale”:1.5,”imagePriorSteps”:5,”loras”:[],”maskBlur”:2.5,”maskBlurOutset”:0,”model”:”flux_2_klein_9b_i8x.ckpt”,”motionScale”:127,”negativeAestheticScore”:2.5,”negativeOriginalImageHeight”:512,”negativeOriginalImageWidth”:512,”negativePromptForImagePrior”:true,”numFrames”:14,”originalImageHeight”:768,”originalImageWidth”:768,”preserveOriginalAfterInpaint”:true,”refinerStart”:0.84999999999999998,”resolutionDependentShift”:false,”sampler”:16,”seed”:2189345100,”seedMode”:2,”separateClipL”:false,”separateOpenClipG”:false,”separateT5″:false,”sharpness”:0,”shift”:3,”speedUpWithGuidanceEmbed”:true,”stage2Guidance”:1,”stage2Shift”:1,”stage2Steps”:10,”startFrameGuidance”:1,”steps”:4,”stochasticSamplingGamma”:0.29999999999999999,”strength”:1,”t5TextEncoder”:true,”targetImageHeight”:768,”targetImageWidth”:768,”teaCache”:false,”teaCacheEnd”:-1,”teaCacheMaxSkipSteps”:3,”teaCacheStart”:5,”teaCacheThreshold”:0.29999999999999999,”tiledDecoding”:false,”tiledDiffusion”:false,”upscalerScaleFactor”:0,”width”:768,”zeroNegativePrompt”:false}}

      在核心能力上,FLUX.2 [klein] 9B 统一了文生图、单图编辑与多图参考融合三大任务,支持复杂提示词解析、多主体空间关系控制及高保真细节还原。相比前代及同尺寸模型,其在光影渲染、文字生成准确性与提示词遵循度方面表现更优,同时保持对消费级硬件的友好性(约 29GB 显存,RTX 4090 可运行)。
      应用场景涵盖电商素材快速打样、游戏资产迭代、社交媒体内容创作及交互式视觉工具开发。模型提供非商业开源权重与商业 API 双路径,Diffusers、ComfyUI 等主流框架集成,并内置内容安全过滤与 C2PA 数字水印机制,保障合规商用 。

      作为”速度 – 质量”帕累托前沿的代表,FLUX.2 [klein] 9B 以紧凑架构重新定义了实时视觉生成的效率标准。

      个人在Mac M1 pro上draw things中的实测:生成768*768图片的时间为:49.74秒

      配置参数

      {"upscaler":"","faceRestoration":"","shift":3,"batchCount":1,"seed":2189345100,"maskBlur":2.5,"sharpness":0,"preserveOriginalAfterInpaint":true,"sampler":16,"strength":1,"height":768,"loras":[],"refinerModel":"","steps":4,"resolutionDependentShift":false,"width":768,"tiledDecoding":false,"maskBlurOutset":0,"tiledDiffusion":false,"hiresFix":false,"seedMode":2,"controls":[],"batchSize":1,"cfgZeroInitSteps":0,"cfgZeroStar":false,"guidanceScale":1,"model":"flux_2_klein_9b_i8x.ckpt","causalInferencePad":0}
    • #131828

      追光
      参与者

      Qwen image Edit 2511

      Qwen Image Edit 2511 是专业级智能图像编辑模型,聚焦”精准理解、可控修改、高效迭代”三大核心能力。模型基于多模态对齐的扩散-Transformer 混合架构,支持自然语言指令驱动的局部重绘、对象替换、风格迁移、背景移除与光影重构,可精准识别用户意图中的空间关系与语义层级,实现像素级编辑精度。

      「AI图片处理」Qwen Image Edit 能做什么?六大核心能力与实际应用解析

      「AI创作」Qwen-Image-Edit 多角度控制 LoRA 官方提示词库(96组)

      在技术层面,2511 版本引入动态掩码生成与注意力引导机制,无需手动绘制选区即可完成复杂对象的智能定位与无损修改;同时支持多参考图融合与历史编辑回溯,确保创作过程灵活可控。模型经过海量图文对与编辑指令微调,在人脸一致性保持、文字渲染、材质还原等长尾场景表现显著提升,并内置物理规律约束模块,避免生成违背常识的视觉结果。

      从“脸部”创建高保真全身照片 Qwen Image Edit + F2P LoRA 使用流程

      应用方面,Qwen Image Edit 2511 提供 REST API、SDK 及主流设计软件插件,支持企业私有化部署与内容安全合规检测,广泛适用于电商修图、广告创意、影视后期及自媒体内容生产。相较于通用文生图模型,该版本以”编辑”为核心定位,大幅降低专业修图门槛,让创作者通过简单指令即可完成复杂视觉调整,真正实现”所想即所得”的智能工作流。

      【AI创作】Qwen image Edit | AnyPose工作流程:无需OpenPose实现动作复刻

      个人感受:这个是目前开源图形编辑中无可替代的模型与Flux2-klein-9B都有着统治级的能力,在人物一致性、空间、人物手、脚方面有着出色的表现,同时可以实现命令即可操作复杂的图像编辑。而Flux2-klein-9B在产品编辑方面出众,俩者结合灵活使用,可以获得更加宽阔的处理能力和质量。

正在查看 5 条回复
  • 在下方一键注册,登录后就可以回复啦。