인공지능팩토리 로고

Function call score

📊 Function Call total score

모델명 (Single-turn)total
Linkbricks-Horizon-AI-Llama-3.3-Korean-70B-sft-dpo-AWQ0.6
Qwen2.5-72B-Instruct-AWQ0.868
DeepSeek-R1-Distill-Qwen-14B0.868
DeepSeek-R1-Distill-Llama-8B

-

DeepSeek-R1-Distill-Qwen-7B

-

deepseek-llm-7b-chat

-

모델명 (Multi-turn )total
Linkbricks-Horizon-AI-Llama-3.3-Korean-70B-sft-dpo-AWQ0.58
Qwen2.5-72B-Instruct-AWQ0.88
DeepSeek-R1-Distill-Qwen-14B

-

DeepSeek-R1-Distill-Llama-8B

-

DeepSeek-R1-Distill-Qwen-7B

-

deepseek-llm-7b-chat

-

 

🎯 모델별 세부 평가 결과

Linkbricks-Horizon-AI-Llama-3.3-Korean-70B-sft-dpo-AWQ

Single-turnpass countpass rate
exact67/1000.67
4_random62/1000.62
4_close56/1000.56
8_random59/1000.59
8_close56/1000.56
total300/5000.6
Multi-turnpass countpass rate
call47/700.67
completion56/710.79
slot4/360.11
relevance9/230.39
total116/2000.58

 

 

Qwen2.5-72B-Instruct-AWQ

Single-turnpass countpass rate
exact86/1000.86
4_random86/1000.86
4_close86/1000.86
8_random89/1000.89
8_close87/1000.87
total434/5000.868
multi-turnpass countpass rate
call64/700.91
completion64/710.90
slot30/360.83
relevance18/230.78
total176/2000.88

 

 

DeepSeek-R1-Distill-Qwen-14B

Single-turnpass countpass rate
exact92/1000.92
4_random86/1000.86
4_close87/1000.87
8_random91/1000.91
8_close78/1000.78
total434/5000.868