Transformer-Based Semantic Retrieval for Cultural Heritage Question Answering

Tri Lathif Mardi Suryanto; Aji Prasetya Wibawa; Hariyono Hariyono; Andrew Nafalski

doi:10.12928/biste.v8i3.15775

Authors

Tri Lathif Mardi Suryanto Universitas Pembangunan Nasional Veteran Jawa Timur
Aji Prasetya Wibawa Universitas Negeri Malang
Hariyono Hariyono Universitas Negeri Malang
Andrew Nafalski University of South Australia

DOI:

https://doi.org/10.12928/biste.v8i3.15775

Keywords:

Cultural Heritage QA, Transformer-Based Retrieval, Domain-Specific Chatbot, Semantic Similarity, Epistemic Fidelity

Abstract

Cultural heritage knowledge presents significant challenges for Question Answering (QA) systems due to their interpretive, context-dependent, and symbolically rich nature. While Transformer-based models have achieved strong performance in semantic representation, they remain prone to hallucination and contextual misalignment, particularly in culturally sensitive domains. This study proposes a Transformer-based cultural knowledge retrieval framework for domain-specific chatbots, combining a bi-encoder (MiniLM and MPNet) for efficient semantic retrieval and a cross-encoder (BERT-base) for fine-grained reranking. A curated dataset of 4,016 question–answer pairs in Indonesia is developed from cultural heritage sources and validated for contextual consistency. The proposed approach is evaluated using both quantitative and qualitative metrics, including accuracy, F1-score, Exact Match (EM), and semantic-based measures such as F1-BLEU, F1-EDIT, and F1-ANS. Experimental results show that while all models achieve high classification performance (accuracy up to 0.99), the BERT + MPNet configuration significantly outperforms others in answer quality metrics, indicating superior semantic fidelity. However, qualitative analysis reveals persistent issues of hallucination and contextual misalignment, highlighting the limitations of relying solely on statistical evaluation. These findings demonstrate that high numerical performance does not guarantee meaningful understanding in cultural domains. Therefore, this study emphasizes the need for hybrid evaluation frameworks and context-aware mechanisms to ensure epistemic fidelity. The proposed approach contributes to the development of more reliable and culturally grounded QA systems.

References

L. Huang et al., “A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions,” ACM Trans. Inf. Syst., vol. 43, no. 2, pp. 1–58, 2025, https://doi.org/10.1145/3703155.

P. M. Patil, R. P. Bhavsar and B. V. Pawar, "A Review on Natural Language Processing based Automatic Question Generation," 2022 International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), pp. 01-06, 2022, https://doi.org/10.1109/ICAISS55157.2022.10010799.

M. Ali et al., “Natural language processing for disaster-resilient infrastructure : Research focus and future opportunities,” Resilient Cities Struct., vol. 4, no. 4, pp. 47–71, 2025, https://doi.org/10.1016/j.rcns.2025.11.003.

K. Fu, P. Gao, S. Liu, L. Qu, L. Gao, and M. Wang, “POS-BERT: Point cloud one-stage BERT pre-training,” Expert Syst. Appl., vol. 240, p. 122563, 2024, https://doi.org/10.1016/j.eswa.2023.122563.

S. Ravi, A. Chinchure, L. Sigal, R. Liao, and V. Shwartz, “VLC-BERT: Visual Question Answering With Contextualized Commonsense Knowledge,” In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 1155–1165, 2023, https://doi.org/10.1109/WACV56688.2023.00121.

S. C. Roy and M. M. H. Manik, “Question-Answering System for Bangla: Fine-tuning BERT-Bangla for a Closed Domain,” arXiv preprint arXiv:2410.03923, 2024, https://arxiv.org/abs/2410.03923v1.

A. Adhikari, A. Ram, R. Tang, and J. Lin, “DocBERT: BERT for Document Classification,” arXiv preprint arXiv:1904.08398, 2019, http://arxiv.org/abs/1904.08398.

J. Xu, N. Xu, W. Xie, C. Zhao, L. Yu, and W. Feng, “BERT-siRNA: siRNA target prediction based on BERT pre-trained interpretable model,” Gene, vol. 910, p. 148330, 2024, https://doi.org/10.1016/j.gene.2024.148330.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North, pp. 4171–4186, 2019, https://doi.org/10.18653/v1/N19-1423.

K. Song, X. Tan, T. Qin, J. Lu, and T. Y. Liu, “MPNet: Masked and permuted pre-training for language understanding,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 16857-16867, 2020, https://proceedings.neurips.cc/paper/2020/hash/c3a690be93aa602ee2dc0ccab5b7b67e-Abstract.html.

N. Annamalai, R. A. Rashid, U. Munir Hashmi, M. Mohamed, M. Harb Alqaryouti, and A. Eddin Sadeq, “Using chatbots for English language learning in higher education,” Comput. Educ. Artif. Intell., vol. 5, p. 100153, 2023, https://doi.org/10.1016/j.caeai.2023.100153.

T. Gerald, L. Tamames, S. Ettayeb, H.-Q. Le, P. Paroubek, and A. Vilnat, “CQuAE: A new Contextualized QUestion Answering corpus on Education domain,” Data Knowl. Eng., vol. 151, p. 102305, 2024, https://doi.org/10.1016/j.datak.2024.102305.

S. H. Alshammari and M. H. Alshammari, “Factors Affecting the Adoption and Use of ChatGPT in Higher Education,” Int. J. Inf. Commun. Technol. Educ., vol. 20, no. 1, pp. 1–16, 2024, https://doi.org/10.4018/IJICTE.339557.

S. Artur, “Students’ Acceptance of ChatGPT in Higher Education: An Extended Unifi ed Theory of Acceptance and Use of Technology,” Innov. High. Educ., vol. 49, no. 2, pp. 223-245. 2024, https://doi.org/10.1007/s10755-023-09686-1.

A. Pratita, S. Tri Lathif Mardi, P. Arista, and A. Wibowo, “ChatGPT in Education: Investigating Students Online Learning Behaviors,” Int. J. Inf. Educ. Technol., vol. 15, no. 3, pp. 510–524, 2025, https://doi.org/10.18178/ijiet.2025.15.3.2262.

A. Babu and S. B. Boddu, “BERT-Based Medical Chatbot: Enhancing Healthcare Communication through Natural Language Understanding,” Explor. Res. Clin. Soc. Pharm., vol. 13, p. 100419, 2024, https://doi.org/10.1016/j.rcsop.2024.100419.

S. Ouali and S. El Garouani, “MedQA-MA: A Moroccan Arabic medical question-answering dataset for virtual healthcare assistants and large language models,” Data Br., vol. 65, p. 112537, 2026, https://doi.org/10.1016/j.dib.2026.112537.

H. Yu, C. Yu, Z. Wang, D. Zou and H. Qin, "Enhancing Healthcare Through Large Language Models: A Study on Medical Question Answering," 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS), pp. 895-900, 2024, https://doi.org/10.1109/ICPICS62053.2024.10797141.

Y. Maini, A. Jha, P. Jha, and D. J. Sharma, “NORA – HealthCare Voice Based Chatbot,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 11, no. 1, pp. 839–847, 2023, https://doi.org/10.22214/ijraset.2023.48660.

I. Hafidz et al., “Chatbot Model Development Using BERT for West Sumatera Halal Tourism Information,” Halal Res. J., vol. 4, no. 2, pp. 117–131, 2024, https://doi.org/10.12962/j22759970.v4i2.1819.

P. Rajasshrie and S. Brijesh, “Adoption of AI-based chatbots for hospitality and tourism,” vol. 32, no. 10, pp. 3199-3226, 2020, https://doi.org/10.1108/ijchm-04-2020-0259.

M.-G. Santiago, G.-T. Desiderio, and B.-G. J., “Predicting the intentions to use chatbots for travel and tourism,” vol. 24, no. 2, pp. 192-210, 2021, https://doi.org/10.1080/13683500.2019.1706457.

I. D. Wahyono, K. Asfani, M. M. Mohamad, A. Aripriharta, A. P. Wibawa, and W. Wibisono, “New Smart Map for Tourism using Artificial Intelligence,” in 2020 10th Electrical Power, Electronics, Communications, Controls and Informatics Seminar (EECCIS), pp. 213–216, 2020, https://doi.org/10.1109/EECCIS49483.2020.9263435.

Z. Fan and C. Chen, “CuPe-KG: Cultural perspective–based knowledge graph construction of tourism resources via pretrained language models,” Inf. Process. Manag., vol. 61, no. 3, p. 103646, 2024, https://doi.org/10.1016/J.IPM.2024.103646.

B. Meskó and E. J. Topol, “The imperative for regulatory oversight of large language models (or generative AI) in healthcare,” npj Digit. Med., vol. 6, no. 1, p. 120, 2023, https://doi.org/10.1038/s41746-023-00873-0.

R. Li, Y. Wang, Z. Wen, M. Cui, and Q. Miao, “Different paths to the same destination: Diversifying LLMs generation for multi-hop open-domain question answering,” Knowledge-Based Syst., vol. 309, p. 112789, 2025, https://doi.org/10.1016/j.knosys.2024.112789.

P. K. Rachabatuni, F. Principi, P. Mazzanti, and M. Bertini, “Context-aware chatbot using MLLMs for Cultural Heritage,” MMSys 2024 - Proc. 2024 ACM Multimed. Syst. Conf., pp. 459–463, 2024, https://doi.org/10.1145/3625468.3652193.

L. Xu, L. Lu, and M. Liu, “Construction and application of a knowledge graph-based question answering system for Nanjing Yunjin digital resources,” Herit. Sci., vol. 11, no. 1, pp. 1–17, 2023, https://doi.org/10.1186/S40494-023-01068-2/TABLES/6.

T. L. M. Suryanto, A. P. Wibawa, H. Hariyono, and A. Nafalski, “Comparative Performance of Transformer Models for Cultural Heritage in NLP Tasks,” Adv. Sustain. Sci. Eng. Technol., vol. 7, no. 1, p. 0250115, 2025, https://doi.org/10.26877/asset.v7i1.1211.

A. Shang, X. Zhu, M. Danner, and M. Rätsch, “Unsupervised question-retrieval approach based on topic keywords filtering and multi-task learning,” Comput. Speech Lang., vol. 87, p. 101644, 2024, https://doi.org/10.1016/j.csl.2024.101644.

S. Pramanik, J. Alabi, R. S. Roy, and G. Weikum, “Uniqorn: Unified question answering over RDF knowledge graphs and natural language text,” J. Web Semant., vol. 83, p. 100833, 2024, https://doi.org/10.1016/j.websem.2024.100833.

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3980–3990, 2019, https://doi.org/10.18653/v1/D19-1410.

T. Gao, X. Yao, and D. Chen, “SimCSE: Simple Contrastive Learning of Sentence Embeddings,” EMNLP 2021 - 2021 Conf. Empir. Methods Nat. Lang. Process. Proc., pp. 6894–6910, 2021, https://doi.org/10.18653/v1/2021.emnlp-main.552.

J. A. Alzubi, R. Jain, A. Singh, P. Parwekar, and M. Gupta, “COBERT: COVID-19 Question Answering System Using BERT,” Arab. J. Sci. Eng., vol. 48, no. 8, pp. 11003–11013, 2023, https://doi.org/10.1007/S13369-021-05810-5/FIGURES/7.

J. Yang et al., “BERT and hierarchical cross attention-based question answering over bridge inspection knowledge graph,” Expert Syst. Appl., vol. 233, p. 120896, 2023, https://doi.org/10.1016/J.ESWA.2023.120896.

R. Liu et al., “Knowledge Enhanced Industrial Question-Answering Using Large Language Models,” Engineering, 2025, https://doi.org/10.1016/j.eng.2025.07.035.

W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, “MINILM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 5776-5788, 2020, https://proceedings.neurips.cc/paper/2020/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.

S. Zhang, E. Phan, P. Velmovitsky, Q. Pham, and S. Sanner, “Retrieval-Augmented Generation for Medical Question Answering on a Heart Failure Dataset: Performance Analysis,” JMIR Form. Res., vol. 10, 2026, https://doi.org/https://doi.org/10.2196/84932.

G. Shidaganti, R. Shetty, T. Edara, P. Srinivas, and S. C. Tammineni, “Exploratory analysis on the natural language processing models for task specific purposes,” Bull. Electr. Eng. Informatics, vol. 13, no. 2, pp. 1245–1255, 2024, https://doi.org/10.11591/eei.v13i2.6360.

C. Clark, K. Lee, M. W. Chang, T. Kwiatkowski, M. Collins, and K. Toutanova, “Boolq: Exploring the surprising difficulty of natural yes/no questions,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, pp. 2924–2936, 2019, https://doi.org/10.18653/v1/N19-1300.

M. Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 7871–7880, 2019, https://doi.org/10.18653/v1/2020.acl-main.703.

J. Su, S. Yu, X. Ye, and D. Ma, “BERT-KRS: A BERT-Based Model for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots,” In International Conference on Applied Intelligence, pp. 310–321, 2024, https://doi.org/10.1007/978-981-97-0827-7_27.

K. Peyton and S. Unnikrishnan, “A comparison of chatbot platforms with the state-of-the-art sentence BERT for answering online student FAQs,” Results Eng., vol. 17, p. 100856, 2023, https://doi.org/10.1016/j.rineng.2022.100856.

J. Staš, D. Hládek, and T. KOCTu, “Slovak Question Answering Dataset Based On The Machine Translation Of The Squad V2.0,” Jazykoved. Cas., vol. 74, no. 1, pp. 381–390, 2023, https://doi.org/10.2478/JAZCAS-2023-0054.

V. K and A. Mishra, “Dataset for legal question answering system in the Indian judiciary context,” Data Br., vol. 60, p. 111647, 2025, https://doi.org/10.1016/j.dib.2025.111647.

H. C. Wang, M. Maslim, and C. H. Kan, “A question–answer generation system for an asynchronous distance learning platform,” Educ. Inf. Technol., vol. 28, no. 9, pp. 12059–12088, 2023, https://doi.org/10.1007/S10639-023-11675-Y.

R. Doi, T. Charoenporn, and V. Sornlertlamvanich, “Automatic Question Generation for Chatbot Development,” ICBIR 2022 - 2022 7th Int. Conf. Bus. Ind. Res. Proc., pp. 301–305, 2022, https://doi.org/10.1109/ICBIR54589.2022.9786384.

V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019, http://arxiv.org/abs/1910.01108.

D. A. Sulistyo, D. D. Prasetya, F. A. Ahda, and A. P. Wibawa, “Pivoted Low Resource Multilingual Translation with NER Optimization,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 24, no. 5, pp. 1–16, 2025, https://doi.org/10.1145/3727876.

H. Pires, L. Paucar, and J. P. Carvalho, “DeB3RTa: A Transformer-Based Model for the Portuguese Financial Domain,” Big Data Cogn. Comput., vol. 9, no. 3, pp. 1–30, 2025, https://doi.org/10.3390/bdcc9030051.

S. Behmanesh, A. Talebpour, M. Shamsfard, and M. M. Jafari, “Improved relation span detection in question answering systems over extracted knowledge bases,” Expert Syst. Appl., vol. 224, p. 119973, 2023, https://doi.org/10.1016/j.eswa.2023.119973.

M. Wang, Z. Li, X. Zhao, and Q. Guo, “Eliminate-Then-Select: A human-centric reasoning framework for educational question answering with LLMs,” Inf. Process. Manag., vol. 63, no. 2, p. 104422, 2026, https://doi.org/10.1016/j.ipm.2025.104422.

Y. Choi, S. Kim, Y. C. F. Bassole, and Y. Sung, “Enhanced Retrieval-Augmented Generation Using Low-Rank Adaptation,” Appl. Sci. vol. 15, no. 8, p. 4425, 2025, https://doi.org/10.3390/APP15084425.