
Function call score
📊 Function Call total score
모델명 (Single-turn) | total |
---|---|
Linkbricks-Horizon-AI-Llama-3.3-Korean-70B-sft-dpo-AWQ | 0.6 |
Qwen2.5-72B-Instruct-AWQ | 0.868 |
DeepSeek-R1-Distill-Qwen-14B | 0.868 |
DeepSeek-R1-Distill-Llama-8B | - |
DeepSeek-R1-Distill-Qwen-7B | - |
deepseek-llm-7b-chat | - |
모델명 (Multi-turn ) | total |
---|---|
Linkbricks-Horizon-AI-Llama-3.3-Korean-70B-sft-dpo-AWQ | 0.58 |
Qwen2.5-72B-Instruct-AWQ | 0.88 |
DeepSeek-R1-Distill-Qwen-14B | - |
DeepSeek-R1-Distill-Llama-8B | - |
DeepSeek-R1-Distill-Qwen-7B | - |
deepseek-llm-7b-chat | - |
🎯 모델별 세부 평가 결과
Linkbricks-Horizon-AI-Llama-3.3-Korean-70B-sft-dpo-AWQ
Single-turn | pass count | pass rate |
---|---|---|
exact | 67/100 | 0.67 |
4_random | 62/100 | 0.62 |
4_close | 56/100 | 0.56 |
8_random | 59/100 | 0.59 |
8_close | 56/100 | 0.56 |
total | 300/500 | 0.6 |
Multi-turn | pass count | pass rate |
---|---|---|
call | 47/70 | 0.67 |
completion | 56/71 | 0.79 |
slot | 4/36 | 0.11 |
relevance | 9/23 | 0.39 |
total | 116/200 | 0.58 |
Qwen2.5-72B-Instruct-AWQ
Single-turn | pass count | pass rate |
---|---|---|
exact | 86/100 | 0.86 |
4_random | 86/100 | 0.86 |
4_close | 86/100 | 0.86 |
8_random | 89/100 | 0.89 |
8_close | 87/100 | 0.87 |
total | 434/500 | 0.868 |
multi-turn | pass count | pass rate |
---|---|---|
call | 64/70 | 0.91 |
completion | 64/71 | 0.90 |
slot | 30/36 | 0.83 |
relevance | 18/23 | 0.78 |
total | 176/200 | 0.88 |
DeepSeek-R1-Distill-Qwen-14B
Single-turn | pass count | pass rate |
---|---|---|
exact | 92/100 | 0.92 |
4_random | 86/100 | 0.86 |
4_close | 87/100 | 0.87 |
8_random | 91/100 | 0.91 |
8_close | 78/100 | 0.78 |
total | 434/500 | 0.868 |