Theta One AI Presents Code-Switching Speech Recognition Paper at EACL 2026
Theta One AI presented a research paper on Korean-English code-switching speech recognition at EACL 2026 (European Chapter of the Association for Computational Linguistics), one of the largest NLP conferences in Europe.
The paper, HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition, proposes the world's first evaluation framework for systematically assessing how accurately AI models recognize speech that mixes Korean and English.
Mixing Korean and English in everyday speech has become the norm across industries such as IT, healthcare, and education. Yet even state-of-the-art multilingual ASR models like Whisper exhibit error rates up to 14 times higher in code-switching scenarios compared to monolingual settings. With no objective benchmark available to measure and compare performance in this area, HiKE marks an important milestone for the research community.
HiKE (Hierarchical Korean-English Code-Switching Benchmark) evaluates ASR models' code-switching capabilities through a three-level hierarchical labeling scheme covering word, phrase, and sentence-level code-switching, along with a natural speech dataset collected across eight topics where code-switching frequently occurs. Using this framework, the researchers empirically demonstrated the code-switching performance limitations of major multilingual ASR models and showed that fine-tuning with synthetic data can effectively improve performance.

At EACL 2026, the HiKE paper received strong recognition from NLP researchers worldwide. In particular, the hierarchical evaluation framework that precisely captures various aspects of code-switching at the word, phrase, and sentence levels, and the careful design of separately labeling loanwords in Korean contexts to distinguish them from genuine code-switching, drew significant interest. Loanword handling has long been an overlooked issue in code-switching ASR research, and HiKE's systematic treatment of this problem was recognized as a meaningful research contribution that enables more accurate performance evaluation.
This interest is also reflected in numbers. Since being made publicly available on Hugging Face and GitHub, the HiKE dataset has been downloaded over 700 times within just six months, seeing active adoption by researchers around the world. The fact that a dataset in such a specialized domain as Korean-English code-switching has attracted this level of attention demonstrates the strong demand for code-switching ASR research and confirms that HiKE is filling a critical gap in the field.
Following its multimodal AI research presentation at ACL 2025, Theta One AI continues to build its presence on the international stage with the HiKE paper at EACL 2026. Theta One is a startup that leverages AI to solve challenges in education and communication, developing and operating AI-powered English learning services such as Langflix. Building on the code-switching ASR expertise accumulated through the HiKE research, the company also offers a code-switching specialized speech recognition (CS-ASR) API through the Theta One AI platform, alongside education-focused voice AI solutions including child speech recognition and English pronunciation assessment.
Paper: HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition Authors: Gio Paik, Yongbeom Kim, Soungmin Lee, Sangmin Ahn, Chanwoo Kim Venue: Findings of EACL 2026 Dataset: Hugging Face | GitHub
