Contact
Back to Projects
🧬

TODAK LLM Dataset

LIVE
Complexity:

TODAK LLM Dataset provides custom training data for Sofia AI, including company knowledge, HR policies, bilingual conversations, and real interaction logs.

85
Active Users
100%
Uptime
5K
API Calls
12/22/2025
Last Updated

Tech Stack

JSONLPythonHugging FaceNLPData Processing

Key Features

  • 25+ specialized JSONL datasets
  • Bilingual (English/Malay) conversations
  • Real Sofia conversation logs
  • Company knowledge and HR policies

Challenges Solved

  • Curating high-quality training data
  • Ensuring bilingual consistency
  • Balancing personality with accuracy

Outcomes

  • 50,000+ training examples created
  • 40% improvement in Sofia responses
  • Published on Hugging Face
Project Timeline
Started: June 2024
Category: research