Publications
Direct Links: Google Scholar | ACL Anthology | DBLP
Workshop, Conference, and Journal
2024
David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song, Henok Biadglign Ademtew, Hernán Maina, Holy Lovenia, Israel Abebe Azime, Jan Christian Blaise Cruz, Jay Gala, Jiahui Geng, Jesus-German Ortiz-Barajas, Jinheon Baek, Jocelyn Dunstan, Laura Alonso Alemany, Kumaranage Ravindu Yasas Nagasinghe, Luciana Benotti, Luis Fernando D'Haro, Marcelo Viridiano, Marcos Estecha-Garitagoitia, Maria Camila Buitrago Cabrera, Mario Rodríguez-Cantelar, Mélanie Jouitteau, Mihail Mihaylov, Mohamed Fazli Mohamed Imam, Muhammad Farid Adilazuarda, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Naome Etori, Olivier Niyomugisha, Paula Mónica Silva, Pranjal Chitale, Raj Dabre, Rendi Chevi, Ruochen Zhang, Ryandito Diandaru, Samuel Cahyawijaya, Santiago Góngora, Soyeong Jeong, Sukannya Purkayastha, Tatsuki Kuribayashi, Thanmay Jayakumar, Tiago Timponi Torrent, Toqeer Ehsan, Vladimir Araujo, Yova Kementchedjhieva, Zara Burzo, Zheng Wei Lim, Zheng Xin Yong, Oana Ignat, Joan Nwatu, Rada Mihalcea, Thamar Solorio, Alham Fikri Aji. CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark. In Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, Canada. [paper]
Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James Validad Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Jann Railey Montalan, Ryan Ignatius Hadiwijaya, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze GAO, Patrick Amadeus Irawan, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse, Ivan Halim Parmonangan, Maria Khelli, Wenyu Zhang, Lucky Susanto, Reynard Adha Ryanda, Sonny Lazuardi Hermawan, Dan John Velasco, Muhammad Dehan Al Kautsar, Willy Fitra Hendria, Yasmin Moslem, Noah Flynn, Muhammad Farid Adilazuarda, Haochen Li, Johanes Lee, R. Damanhuri, Shuo Sun, Muhammad Reza Qorib, Amirbek Djanibekov, Wei Qi Leong, Quyet V. Do, Niklas Muennighoff, Tanrada Pansuwan, Ilham Firdausi Putra, Yan Xu, Tai Ngee Chia, Ayu Purwarianti, Sebastian Ruder, William Chandra Tjhi, Peerat Limkonchotiwat, Alham Fikri Aji, Sedrick Keh, Genta Indra Winata, Ruochen Zhang, Fajri Koto, Zheng Xin Yong, Samuel Cahyawijaya. SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages. In Proceedings of The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), Miami, USA. [paper]
Fari Koto. Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia. Preprint (2024) [paper]
Fajri Koto, Rahmad Mahendra, Nurul Aisyah, and Timothy Baldwin. IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces. Transactions of the Association for Computational Linguistics (TACL 2024). [paper] [data]
Fajri Koto, Tilman Beck, Zeerak Talat, Iryna Gurevych, and Timothy Baldwin. Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024), Malta. [paper] [code]
Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, and Timothy Baldwin. ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic. In Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand. [paper] [code]
Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, and Timothy Baldwin. CMMLU: Measuring massive multitask language understanding in Chinese. In Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand. [paper] [code]
Samuel Cahyawijaya*, Holy Lovenia*, Fajri Koto*, Rifki Afina Putri*, Emmanuel Dave, Jhonson Lee, Nuur Shadieq, Wawan Cenggoro, Salsabil Maulana Akbar, Muhammad Ihza Mahendra, Dea Annisayanti Putri, Bryan Wilie, Genta Indra Winata, Alham Fikri Aji*, Ayu Purwarianti, and Pascale Fung. Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Bangkok, Thailand. [paper]
Chen Cecilia Liu, Fajri Koto, Timothy Baldwin, and Iryna Gurevych. Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Mexico. [paper] [code]
Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Liping Tang, Nikhil Ranjan, Zhiqiang Shen, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze, Preslav Nakov, Timothy Baldwin, Eric P. Xing. LLM360: Towards Fully Transparent Open-Source LLMs. In Proceedings of the First Conference on Language Modeling (COLM 2024), Philadelphia, USA. [paper] [code]
2023
Fajri Koto, Nurul Aisyah, Haonan Li, and Timothy Baldwin. Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU. In Proceedings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), Singapore. [paper] [code]
Samuel Cahyawijaya*, Holy Lovenia*, Fajri Koto*, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, hanung linuwih, Bryan Wilie, Galih Muridan, Genta Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti and Pascale Fung. NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages. In Proceedings of the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL 2023), Bali, Indonesia. [paper] (Best Resource Paper Award)
Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Alham Fikri Aji, Zhengzhong Liu, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Preslav Nakov, Timothy Baldwin, and Eric Xing. Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models. Technical Report. [paper] [model] [website]
Haonan Li*, Fajri Koto*, Minghao Wu, Alham Fikri Aji, and Timothy Baldwin. Bactrian-X: A Multilingual Replicable Instruction-Following Model with Low-Rank Adaptation. Preprint. [paper] [code]
Samuel Cahyawijaya*, Holy Lovenia*, Alham Fikri Aji*, Genta Indra Winata*, Bryan Wilie*, Fajri Koto*, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Jennifer Santoso, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Muhammad Satrio Wicaksono, Ivan Halim Parmonangan, Ika Alfina, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh Dhole, Arie Suryani, Rifki Afina Putri, Dan Su, Keith David Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius Hadiwijaya, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Inastra Damapuspita, Haryo Akbarianto Wibowo, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Pascale Fung, Herry Sujaini, Sakriani Sakti, and Ayu Purwarianti. NusaCrowd: Open Source Initiative for Indonesian NLP Resources. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada. [paper]
Genta Indra Winata*, Alham Fikri Aji*, Samuel Cahyawijaya*, Rahmad Mahendra*, Fajri Koto*, Ade Romadhony*, Kemal Kurniawan*, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, and Sebastian Ruder. NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Dubrovnik, Croatia. [paper] [code] (Outstanding Paper Award)
2022
Fajri Koto. From Discourse and Keyphrases, to Language Modeling in Automatic Summarization. Ph.D. Thesis, The University of Melbourne, 2022. [thesis]
Fajri Koto, Timothy Baldwin, and Jey Han Lau. FFCI: A Framework for Interpretable Automatic Evaluation of Summarization. Journal of Artificial Intelligence Research (JAIR 2022) [paper] [code]
Fajri Koto, Timothy Baldwin, and Jey Han Lau. LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022), Gyeongju, Republic of Korea. [paper] [code]
Andrew Shen, Fajri Koto, Jey Han Lau, and Timothy Baldwin. Easy-First Bottom-Up Discourse Parsing via Sequence Labelling. In Proceedings of the 3rd Workshop on Computational Approaches to Discourse (CODI at COLING 2022), Gyeongju, Republic of Korea. [paper] [code]
Alham Fikri Aji*, Genta Indra Winata*, Fajri Koto*, Samuel Cahyawijaya*, Ade Romadhony*, Rahmad Mahendra*, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, and Sebastian Ruder. One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), Dublin, Ireland. [paper]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ads Text for Product Descriptions?. In Proceedings of the 5th Workshop on e-Commerce and NLP (ECNLP at ACL 2022), Dublin, Ireland. [paper]
Fajri Koto, Timothy Baldwin, and Jey Han Lau. Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian. In Proceedings of Commonsense Representation and Reasoning Workshop 2022 (CSRR at ACL 2022), Dublin, Ireland. [paper] [data] (Best Paper Award)
Biaoyan Fang*, and Fajri Koto*. Context-Aware Sentence Classification in Evidence-Based Medicine. In Proceedings of the Australasian Language Technology Association Workshop 2022 (ALTA 2022), Adelaide, Australia. [paper] (1st place in the shared task)
2021
Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), Dominican Republic (virtual). [paper] [code]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Evaluating the Efficacy of Summarization Evaluation across Languages. In Findings of the Association for Computational Linguistics: ACL 2021, Bangkok (virtual). [paper] [data]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Discourse Probing of Pretrained Language Models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021), Mexico (virtual). [paper] [code]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Top-down Discourse Parsing via Sequence Labelling. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021), Greece (virtual). [paper] [code]
Fajri Koto*, and Biaoyan Fang*. Handling Variance of Pretrained Language Models in Grading Evidence in the Medical Literature. In Proceedings of the Australasian Language Technology Association Workshop 2021 (ALTA 2021), Australia (virtual). [paper] (2nd place in the shared task)
2020
Fajri Koto, Afshin Rahimi, Jey Han Lau, and Timothy Baldwin. IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), Spain (virtual). [paper] [code] [website]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Liputan6: A Large-scale Indonesian Dataset for Text Summarization. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL 2020), China (virtual). [paper] [code]
Fajri Koto, and Ikhwan Koto. Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation. In Proceedings of the 34th Pacific Asia Conference on Language, Information, and Computation (PACLIC 2020), Vietnam (virtual). [paper] [code]
2019
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Improved Document Modelling with a Neural Discourse Parser. In Proceedings of the 17th Australasian Language Technology Workshop (ALTA 2019), Sydney, Australia. [paper] [code]
2017
Fajri Koto, and Gemala Y. Rahmaningtyas. InSet Lexicon: Evaluation of a Word List for Indonesian Sentiment Analysis in Microblogs. In Proceedings of the 21st International Conference on Asian Language Processing. IEEE. (IALP 2017), Singapore. [paper] [data]
2016
Fajri Koto. A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia. [paper]
Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. Automatic Detection of Memorable Spoken Quotes. In The 2016 Spring Meeting of the Acoustical Society of Japan (ASJ 2016), Yokohama, Japan. [paper]
Fajri Koto, and Omar Abdillah. Automatic Advisor for Detecting Summarizable Chat Conversations in Online Instant Messages. In Proceedings of the 12th International Conference on Computing and Information Technology. Springer. (IC2IT 2016), Thailand. [paper]
2015
Fajri Koto, and Mirna Adriani. HBE: Hashtag-Based Emotion Lexicons for Twitter Sentiment Analysis. In Proceedings of the 6th Forum for Information Retrieval. ACM. (FIRE 2015), Gandhinagar, India. [paper]
Fajri Koto, and Mirna Adriani. A Comparative Study on Twitter Sentiment Analysis: Which Features are Good? In Proceedings of the 20th International Conference on Applications of Natural Language To Information Systems. Springer. (NLDB 2015), Passau, Germany. [paper]
Fajri Koto, and Mirna Adriani. The Use of POS Sequence for Analyzing Sentence Patterns in Twitter Sentiment Analysis. In Proceedings of the 8th International Symposium on Mining and Web (joint with the 29th AINA Conference). IEEE. (MAW-WAINA 2015), Gwangju, Korea. [paper]
Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. A Study On Natural Expressive Speech: Automatic Memorable Spoken Quote Detection. In Proceedings of the 6th International Workshop on Spoken Dialog Systems. Springer. (IWSDS 2015), Busan, Korea. [paper]
2014
Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. The Use of Semantic and Acoustic Features for Open-Domain TED Talk Summarization. In Proceedings of the 6th Asia Pacific Signal and Information Processing Association. IEEE. (APSIPA 2014), Siem Reap, Cambodia. [paper]
Fajri Koto. SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An Enhancement Strategy to Handle Imbalance in Data Level. In Proceedings of the 6th International Conference on Advanced Computer Science and Information Systems. IEEE. (ICACSIS 2014), Jakarta, Indonesia. [paper] [code]
Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. Memorable Spoken Quote Corpora of TED Public Speaking. In Proceedings of the 17th Oriental COCOSDA Conference. IEEE. (OCOCOSDA 2014), Phuket, Thailand. [paper]
Patents
Patent United States US 2020/0082699 A1 - Gilang Kusuma Jati, Agus Kurniawan, Fajri "Personal safety device and operating method therefor" Issued March 12, 2020 [Patent]
Patent WO/2018/124584 A1 - Gilang Kusuma Jati, Agus Kurniawan, Fajri "Personal safety device and operating method therefor" Issued May 7, 2018 [Patent]
Patent United States US 2017/0177797 A1 - Agus Kurniawan, Fajri, Omar Abdillah "Apparatus and method for sharing personal electronic - data of health" Issued June 22, 2017 [Patent]
Patent United States US 2016/0147387 A1 - Yanuar Rahman, Omar Abdillah, Fajri "Method And Apparatus For Displaying Summarized Data" Issued November 20, 2015 [Patent]
Books
Agus Kurniawan, Fajri Koto, Gilang Kusuma Jati, "Panduan Dasar Pemrograman Tizen". Published by Samsung Research Indonesia. Jakarta, 2016. [Book]
Contact
Fajri Koto (Assistant Professor)
NLP Department, MBZUAI, Masdar City, Abu Dhabi, UAE
Email: fajri.koto@mbzuai.ac.ae