Model classification results

Results are split by dataset, with file name and difficulty level.

Last update: 2026-04-12

Results by dataset

dataset-email-bot-vs-nonbot-en-qwen3-4b-v1.jsonl

Qwen3-4B first-step
100%

Accuracy: 12/12

dataset-email-bot-vs-nonbot-pl-bielik-v1.jsonl

Bielik 1.5B first-step
66.67%

Accuracy: 8/12

dataset-email-bot-vs-nonbot-pl-bielik-v1.jsonl

Bielik 11B first-step
100%

Accuracy: 12/12

dataset-email-bot-vs-nonbot-gpt-4o-mini-v1.jsonl

Bielik 4.5B first-step
100%

Accuracy: 12/12

Example dataset

From: [email protected] Label: bot

Subject: Order confirmation #A-49312

Thank you for your purchase. Your order has been confirmed and is now being processed. Estimated delivery time: 2-3 business days.

From: [email protected] Label: non-bot

Subject: Can we move tomorrow's meeting?

Hi, can we move our meeting from 10:00 to 13:00? I have a conflict in the morning, but after lunch I am fully available. Let me know what works for you.