Publications

Direct Links: Google Scholar | ACL Anthology | DBLP

Workshop, Conference, and Journal

2025

Muhammad Dehan Al Kautsar, Lucky Susanto, Derry Tanti Wijaya, Fajri Koto. What Do Indonesians Really Need from Language Technology? A Nationwide Survey. In Proceedings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), Suzhou, China
Muhammad Falensi Azmi, Muhammad Dehan Al Kautsar, Alfan Farizki Wicaksono, Fajri Koto. IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages. In Proceedings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), Suzhou, China
Saeed Almheiri, Rania Elbadry, Mena Attia, Chenxi Wang, Preslav Nakov, Timothy Baldwin, Fajri Koto. Data-Efficient Fine-Grained Cross-Cultural Transfer of Commonsense Reasoning in LLMs. In Findings of the Association for Computational Linguistics: EMNLP 2025, Suzhou, China
Fajri Koto, Rituraj Joshi, Nurdaulet Mukhituly, Yuxia Wang, Zhuohan Xie, Rahul Pal, Daniil Orel, Parvez Mullah, Diana Turmakhan, Maiya Goloburda, Mohammed Kamran, Samujjwal Ghosh, Bokang Jia, Jonibek Mansurov, Mukhammed Togmanov, Debopriyo Banerjee, Nurkhan Laiyk, Akhmed Sakip, Xudong Han, Ekaterina Kochmar, Alham Fikri Aji, Aaryamonvikram Singh, Alok Anil Jadhav, Satheesh Katipomu, Samta Kamboj, Monojit Choudhury, Gurpreet Gosal, Gokulakrishnan Ramakrishnan, Biswajit Mishra, Sarath Chandran, Avraham Sheinin, Natalia Vassilieva, Neha Sengupta, Preslav Nakov. Sherkala-Chat: Building a State-of-the-Art LLM for Kazakh in a Moderately Resourced Setting. In Proceedings of the Second Conference on Language Modeling (COLM 2025), Montreal, Canada.
Abdelrahman Sadallah, Junior Cedric Tonga, Khalid Almubarak, Saeed Almheiri, Farah Atif, Chatrine Qwaider, Karima Kadaoui, Sara Shatnawi, Yaser Alesh, Fajri Koto. Commonsense Reasoning in Arab Culture. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria. [paper]
Amir Hossein Yari, Fajri Koto. Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria. [paper]
Nurkhan Laiyk, Daniil Orel, Rituraj Joshi, Maiya Goloburda, Yuxia Wang, Preslav Nakov, Fajri Koto. Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria. [paper]
Mukhammed Togmanov, Nurdaulet Mukhituly, Diana Turmakhan, Jonibek Mansurov, Maiya Goloburda, Akhmed Sakip, Zhuohan Xie, Yuxia Wang, Bekassyl Syzdykov, Nurkhan Laiyk, Alham Fikri Aji, Ekaterina Kochmar, Preslav Nakov, Fajri Koto. KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria. [paper]
Samuel Cahyawijaya, Holy Lovenia, Joel Ruben Antony Moniz, Tack Hwa Wong, Mohammad Rifqi Farhansyah, Thant Thiri Maung, Frederikus Hudi, David Anugraha, Muhammad Ravi Shulthan Habibi, Muhammad Reza Qorib, Amit Agarwal, Joseph Marvin Imperial, Hitesh Laxmichand Patel, Vicky Feliren, Bahrul Ilmi Nasution, Manuel Antonio Rufino, Genta Indra Winata, Rian Adam Rajagede, Carlos Rafael Catalan, Mohamed Fazli Mohamed Imam, Priyaranjan Pattnayak, Salsabila Zahirah Pranida, Kevin Pratama, Yeshil Bangera, Adisai Na-Thalang, Patricia Nicole Monderin, Yueqi Song, christian simon, Lynnette Hui Xian Ng, Richardy Lobo Sapan, Taki Hasan Rafi, Bin Wang, Supryadi, Kanyakorn Veerakanjana, Piyalitt Ittichaiwong, Matthew Theodore Roque, Karissa Vincentio, Takdanai Kreangphet, Phakphum Artkaew, Kadek Hendrawan Palgunadi, Yanzhi Yu, Rochana Prih Hastuti, William Nixon, Mithil Bangera, Adrian Xuan Wei Lim, Aye Hninn Khine, Hanif Muhammad Zhafran, Teddy Ferdinan, Audra Aurora Izzani, Ayushman Singh, Evan, Jauza Akbar Krito, Michael Anugraha, Fenal Ashokbhai Ilasariya, Haochen Li, John Amadeo Daniswara, Filbert Aurelian Tjiaranata, Eryawan Presma Yulianrifat, Can Udomcharoenchaikit, Fadil Risdian Ansori, Mahardika Krisna Ihsani, Giang Nguyen, Anab Maulana Barik, Dan John Velasco, Rifo Ahmad Genadi, Saptarshi Saha, Chengwei Wei, Isaiah Edri W. Flores, Kenneth Chen Ko Han, Anjela Gail D. Santos, Wan Shen Lim, Kaung Si Phyo, Tim Santos, Meisyarah Dwiastuti, Jiayun Luo, Jan Christian Blaise Cruz, Ming Shan Hee, Ikhlasul Akmal Hanif, M.Alif Al Hakim, Muhammad Rizky Sya'ban, Kun Kerdthaisong, Lester James Validad Miranda, Fajri Koto, Tirana Noor Fatyanosa, Alham Fikri Aji, Jostin Jerico Rosal, Jun Kevin, Robert Wijaya, Onno P. Kampman, Ruochen Zhang, Börje F. Karlsson, Peerat Limkonchotiwat. Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria. [paper]
Maiya Goloburda, Nurkhan Laiyk, Diana Turmakhan, Yuxia Wang, Mukhammed Togmanov, Jonibek Mansurov, Askhat Sametov, Nurdaulet Mukhituly, Minghan Wang, Daniil Orel, Zain Muhammad Mujahid, Fajri Koto, Timothy Baldwin, Preslav Nakov. Qorǵau: Evaluating Safety in Kazakh-Russian Bilingual Contexts. In Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria. [paper]
Fajri Koto. Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia. In Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025), Industry Track, USA. [paper]
Haonan Li, Xudong Han, Zenan Zhai, Honglin Mu, Hao Wang, Zhenxuan Zhang, Yilin Geng, Shom Lin, Renxi Wang, Artem Shelmanov, Xiangyu Qi, Yuxia Wang, Donghai Hong, Youliang Yuan, Meng Chen, Haoqin Tu, Fajri Koto, Cong Zeng, Tatsuki Kuribayashi, Rishabh Bhardwaj, Bingchen Zhao, Yawen Duan, Yi Liu, Emad A. Alghamdi, Yaodong Yang, Yinpeng Dong, Soujanya Poria, Pengfei Liu, Zhengzhong Liu, Hector Xuguang Ren, Eduard Hovy, Iryna Gurevych, Preslav Nakov, Monojit Choudhury, Timothy Baldwin. Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability. In Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025), Demo Track, USA. [paper]
Angelika Romanou, Negar Foroutan, Anna Sotnikova, Sree Harsha Nelaturu, Shivalika Singh, Rishabh Maheshwary, Micol Altomare, Zeming Chen, Mohamed A. Haggag, Snegha A, Alfonso Amayuelas, Azril Hafizi Amirudin, Danylo Boiko, Michael Chang, Jenny Chim, Gal Cohen, Aditya Kumar Dalmia, Abraham Diress, Sharad Duwal, Daniil Dzenhaliou, Daniel Fernando Erazo Florez, Fabian Farestam, Joseph Marvin Imperial, Shayekh Bin Islam, Perttu Isotalo, Maral Jabbarishiviari, Börje F. Karlsson, Eldar Khalilov, Christopher Klamm, Fajri Koto, Dominik Krzemiński, Gabriel Adriano de Melo, Syrielle Montariol, Yiyang Nan, Joel Niklaus, Jekaterina Novikova, Johan Samir Obando Ceron, Debjit Paul, Esther Ploeger, Jebish Purbey, Swati Rajwal, Selvan Sunitha Ravi, Sara Rydell, Roshan Santhosh, Drishti Sharma, Marjana Prifti Skenduli, Arshia Soltani Moakhar, Bardia soltani moakhar, Ayush Kumar Tarun, Azmine Toushik Wasi, Thenuka Ovin Weerasinghe, Serhan Yilmaz, Mike Zhang, Imanol Schlag, Marzieh Fadaee, Sara Hooker, Antoine Bosselut. INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge. In Proceedings of The Thirteenth International Conference on Learning Representations (ICLR 2025), Singapore. [paper]

2024

David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song, Henok Biadglign Ademtew, Hernán Maina, Holy Lovenia, Israel Abebe Azime, Jan Christian Blaise Cruz, Jay Gala, Jiahui Geng, Jesus-German Ortiz-Barajas, Jinheon Baek, Jocelyn Dunstan, Laura Alonso Alemany, Kumaranage Ravindu Yasas Nagasinghe, Luciana Benotti, Luis Fernando D'Haro, Marcelo Viridiano, Marcos Estecha-Garitagoitia, Maria Camila Buitrago Cabrera, Mario Rodríguez-Cantelar, Mélanie Jouitteau, Mihail Mihaylov, Mohamed Fazli Mohamed Imam, Muhammad Farid Adilazuarda, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Naome Etori, Olivier Niyomugisha, Paula Mónica Silva, Pranjal Chitale, Raj Dabre, Rendi Chevi, Ruochen Zhang, Ryandito Diandaru, Samuel Cahyawijaya, Santiago Góngora, Soyeong Jeong, Sukannya Purkayastha, Tatsuki Kuribayashi, Thanmay Jayakumar, Tiago Timponi Torrent, Toqeer Ehsan, Vladimir Araujo, Yova Kementchedjhieva, Zara Burzo, Zheng Wei Lim, Zheng Xin Yong, Oana Ignat, Joan Nwatu, Rada Mihalcea, Thamar Solorio, Alham Fikri Aji. CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark. In Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, Canada. [paper]
Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James Validad Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Jann Railey Montalan, Ryan Ignatius Hadiwijaya, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze GAO, Patrick Amadeus Irawan, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse, Ivan Halim Parmonangan, Maria Khelli, Wenyu Zhang, Lucky Susanto, Reynard Adha Ryanda, Sonny Lazuardi Hermawan, Dan John Velasco, Muhammad Dehan Al Kautsar, Willy Fitra Hendria, Yasmin Moslem, Noah Flynn, Muhammad Farid Adilazuarda, Haochen Li, Johanes Lee, R. Damanhuri, Shuo Sun, Muhammad Reza Qorib, Amirbek Djanibekov, Wei Qi Leong, Quyet V. Do, Niklas Muennighoff, Tanrada Pansuwan, Ilham Firdausi Putra, Yan Xu, Tai Ngee Chia, Ayu Purwarianti, Sebastian Ruder, William Chandra Tjhi, Peerat Limkonchotiwat, Alham Fikri Aji, Sedrick Keh, Genta Indra Winata, Ruochen Zhang, Fajri Koto, Zheng Xin Yong, Samuel Cahyawijaya. SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages. In Proceedings of The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), Miami, USA. [paper]
Fajri Koto, Rahmad Mahendra, Nurul Aisyah, and Timothy Baldwin. IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces. Transactions of the Association for Computational Linguistics (TACL 2024). [paper] [data]
Fajri Koto, Tilman Beck, Zeerak Talat, Iryna Gurevych, and Timothy Baldwin. Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024), Malta. [paper] [code]
Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, and Timothy Baldwin. ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic. In Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand. [paper] [code]
Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, and Timothy Baldwin. CMMLU: Measuring massive multitask language understanding in Chinese. In Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand. [paper] [code]
Samuel Cahyawijaya*, Holy Lovenia*, Fajri Koto*, Rifki Afina Putri*, Emmanuel Dave, Jhonson Lee, Nuur Shadieq, Wawan Cenggoro, Salsabil Maulana Akbar, Muhammad Ihza Mahendra, Dea Annisayanti Putri, Bryan Wilie, Genta Indra Winata, Alham Fikri Aji*, Ayu Purwarianti, and Pascale Fung. Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Bangkok, Thailand. [paper]
Chen Cecilia Liu, Fajri Koto, Timothy Baldwin, and Iryna Gurevych. Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Mexico. [paper] [code]
Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Liping Tang, Nikhil Ranjan, Zhiqiang Shen, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze, Preslav Nakov, Timothy Baldwin, Eric P. Xing. LLM360: Towards Fully Transparent Open-Source LLMs. In Proceedings of the First Conference on Language Modeling (COLM 2024), Philadelphia, USA. [paper] [code]

2023

Fajri Koto, Nurul Aisyah, Haonan Li, and Timothy Baldwin. Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU. In Proceedings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), Singapore. [paper] [code]
Samuel Cahyawijaya*, Holy Lovenia*, Fajri Koto*, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, hanung linuwih, Bryan Wilie, Galih Muridan, Genta Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti and Pascale Fung. NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages. In Proceedings of the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL 2023), Bali, Indonesia. [paper] (Best Resource Paper Award)
Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Alham Fikri Aji, Zhengzhong Liu, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Preslav Nakov, Timothy Baldwin, and Eric Xing. Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models. Technical Report. [paper] [model] [website]
Haonan Li*, Fajri Koto*, Minghao Wu, Alham Fikri Aji, and Timothy Baldwin. Bactrian-X: A Multilingual Replicable Instruction-Following Model with Low-Rank Adaptation. Preprint. [paper] [code]
Samuel Cahyawijaya*, Holy Lovenia*, Alham Fikri Aji*, Genta Indra Winata*, Bryan Wilie*, Fajri Koto*, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Jennifer Santoso, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Muhammad Satrio Wicaksono, Ivan Halim Parmonangan, Ika Alfina, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh Dhole, Arie Suryani, Rifki Afina Putri, Dan Su, Keith David Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius Hadiwijaya, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Inastra Damapuspita, Haryo Akbarianto Wibowo, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Pascale Fung, Herry Sujaini, Sakriani Sakti, and Ayu Purwarianti. NusaCrowd: Open Source Initiative for Indonesian NLP Resources. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada. [paper]
Genta Indra Winata*, Alham Fikri Aji*, Samuel Cahyawijaya*, Rahmad Mahendra*, Fajri Koto*, Ade Romadhony*, Kemal Kurniawan*, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, and Sebastian Ruder. NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Dubrovnik, Croatia. [paper] [code] (Outstanding Paper Award)

2022

Fajri Koto. From Discourse and Keyphrases, to Language Modeling in Automatic Summarization. Ph.D. Thesis, The University of Melbourne, 2022. [thesis]
Fajri Koto, Timothy Baldwin, and Jey Han Lau. FFCI: A Framework for Interpretable Automatic Evaluation of Summarization. Journal of Artificial Intelligence Research (JAIR 2022) [paper] [code]
Fajri Koto, Timothy Baldwin, and Jey Han Lau. LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022), Gyeongju, Republic of Korea. [paper] [code]
Andrew Shen, Fajri Koto, Jey Han Lau, and Timothy Baldwin. Easy-First Bottom-Up Discourse Parsing via Sequence Labelling. In Proceedings of the 3rd Workshop on Computational Approaches to Discourse (CODI at COLING 2022), Gyeongju, Republic of Korea. [paper] [code]
Alham Fikri Aji*, Genta Indra Winata*, Fajri Koto*, Samuel Cahyawijaya*, Ade Romadhony*, Rahmad Mahendra*, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, and Sebastian Ruder. One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), Dublin, Ireland. [paper]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ads Text for Product Descriptions?. In Proceedings of the 5th Workshop on e-Commerce and NLP (ECNLP at ACL 2022), Dublin, Ireland. [paper]
Fajri Koto, Timothy Baldwin, and Jey Han Lau. Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian. In Proceedings of Commonsense Representation and Reasoning Workshop 2022 (CSRR at ACL 2022), Dublin, Ireland. [paper] [data] (Best Paper Award)
Biaoyan Fang*, and Fajri Koto*. Context-Aware Sentence Classification in Evidence-Based Medicine. In Proceedings of the Australasian Language Technology Association Workshop 2022 (ALTA 2022), Adelaide, Australia. [paper] (1st place in the shared task)

2021

Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), Dominican Republic (virtual). [paper] [code]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Evaluating the Efficacy of Summarization Evaluation across Languages. In Findings of the Association for Computational Linguistics: ACL 2021, Bangkok (virtual). [paper] [data]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Discourse Probing of Pretrained Language Models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021), Mexico (virtual). [paper] [code]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Top-down Discourse Parsing via Sequence Labelling. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021), Greece (virtual). [paper] [code]
Fajri Koto*, and Biaoyan Fang*. Handling Variance of Pretrained Language Models in Grading Evidence in the Medical Literature. In Proceedings of the Australasian Language Technology Association Workshop 2021 (ALTA 2021), Australia (virtual). [paper] (2nd place in the shared task)

2020

Fajri Koto, Afshin Rahimi, Jey Han Lau, and Timothy Baldwin. IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), Spain (virtual). [paper] [code] [website]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Liputan6: A Large-scale Indonesian Dataset for Text Summarization. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL 2020), China (virtual). [paper] [code]
Fajri Koto, and Ikhwan Koto. Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation. In Proceedings of the 34th Pacific Asia Conference on Language, Information, and Computation (PACLIC 2020), Vietnam (virtual). [paper] [code]

2019

Fajri Koto, Jey Han Lau, and Timothy Baldwin. Improved Document Modelling with a Neural Discourse Parser. In Proceedings of the 17th Australasian Language Technology Workshop (ALTA 2019), Sydney, Australia. [paper] [code]

2017

Fajri Koto, and Gemala Y. Rahmaningtyas. InSet Lexicon: Evaluation of a Word List for Indonesian Sentiment Analysis in Microblogs. In Proceedings of the 21st International Conference on Asian Language Processing. IEEE. (IALP 2017), Singapore. [paper] [data]

2016

Fajri Koto. A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia. [paper]
Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. Automatic Detection of Memorable Spoken Quotes. In The 2016 Spring Meeting of the Acoustical Society of Japan (ASJ 2016), Yokohama, Japan. [paper]
Fajri Koto, and Omar Abdillah. Automatic Advisor for Detecting Summarizable Chat Conversations in Online Instant Messages. In Proceedings of the 12th International Conference on Computing and Information Technology. Springer. (IC2IT 2016), Thailand. [paper]

2015

Fajri Koto, and Mirna Adriani. HBE: Hashtag-Based Emotion Lexicons for Twitter Sentiment Analysis. In Proceedings of the 6th Forum for Information Retrieval. ACM. (FIRE 2015), Gandhinagar, India. [paper]
Fajri Koto, and Mirna Adriani. A Comparative Study on Twitter Sentiment Analysis: Which Features are Good? In Proceedings of the 20th International Conference on Applications of Natural Language To Information Systems. Springer. (NLDB 2015), Passau, Germany. [paper]
Fajri Koto, and Mirna Adriani. The Use of POS Sequence for Analyzing Sentence Patterns in Twitter Sentiment Analysis. In Proceedings of the 8th International Symposium on Mining and Web (joint with the 29th AINA Conference). IEEE. (MAW-WAINA 2015), Gwangju, Korea. [paper]
Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. A Study On Natural Expressive Speech: Automatic Memorable Spoken Quote Detection. In Proceedings of the 6th International Workshop on Spoken Dialog Systems. Springer. (IWSDS 2015), Busan, Korea. [paper]

2014

Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. The Use of Semantic and Acoustic Features for Open-Domain TED Talk Summarization. In Proceedings of the 6th Asia Pacific Signal and Information Processing Association. IEEE. (APSIPA 2014), Siem Reap, Cambodia. [paper]
Fajri Koto. SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An Enhancement Strategy to Handle Imbalance in Data Level. In Proceedings of the 6th International Conference on Advanced Computer Science and Information Systems. IEEE. (ICACSIS 2014), Jakarta, Indonesia. [paper] [code]
Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. Memorable Spoken Quote Corpora of TED Public Speaking. In Proceedings of the 17th Oriental COCOSDA Conference. IEEE. (OCOCOSDA 2014), Phuket, Thailand. [paper]

Patents

Patent United States US 2020/0082699 A1 - Gilang Kusuma Jati, Agus Kurniawan, Fajri "Personal safety device and operating method therefor" Issued March 12, 2020 [Patent]
Patent WO/2018/124584 A1 - Gilang Kusuma Jati, Agus Kurniawan, Fajri "Personal safety device and operating method therefor" Issued May 7, 2018 [Patent]
Patent United States US 2017/0177797 A1 - Agus Kurniawan, Fajri, Omar Abdillah "Apparatus and method for sharing personal electronic - data of health" Issued June 22, 2017 [Patent]
Patent United States US 2016/0147387 A1 - Yanuar Rahman, Omar Abdillah, Fajri "Method And Apparatus For Displaying Summarized Data" Issued November 20, 2015 [Patent]

Books

Agus Kurniawan, Fajri Koto, Gilang Kusuma Jati, "Panduan Dasar Pemrograman Tizen". Published by Samsung Research Indonesia. Jakarta, 2016. [Book]

Contact

Fajri Koto (Assistant Professor)
NLP Department, MBZUAI, Masdar City, Abu Dhabi, UAE
Email: fajri.koto@mbzuai.ac.ae

Page updated

Google Sites

Report abuse