Archives - IJAEA

Journal Information

Journal Name: International Journal of Advanced Engineering Application

ISSN: 3048-6807

Frequency: Monthly (12 issues per year)

Language: English

Publication Format: Online

Website: https://ijaea.com/

Email ID: editorinchief.ijaea@gmail.com:

Publisher: NERD Publication

Indexing: Google Scholar, Crossref, Zenodo and more

Lightweight Vision Transformer Framework for Real-Time Human–Object Interaction Recognition

Author(s):Michael Turner¹, Olivia Reed², Ethan Walker³

Affiliation: 1,2,3Department of Computer Engineering, Westbridge Institute of Technology, Wellington, New Zealand

Page No: 28-33

Volume issue & Publishing Year: Volume 2 Issue 11, Nov-2025

Journal: International Journal of Advanced Engineering Application (IJAEA)

ISSN NO: 3048-6807

DOI: https://doi.org/10.5281/zenodo.17753367

Download PDF

Article Indexing:

Abstract:
Human–Object Interaction (HOI) recognition is a fundamental task in intelligent computing systems, enabling machines to understand how humans engage with surrounding objects in real-time environments. Traditional deep learning approaches for HOI rely heavily on convolutional architectures, which often struggle with long-range dependencies and are computationally expensive for edge deployment. This paper proposes a Lightweight Vision Transformer Framework (LVTF) designed specifically for efficient and accurate real-time HOI recognition. The framework employs a patch-based visual encoder combined with optimized multi-head attention mechanisms to capture global contextual relationships between humans and objects. A lightweight decoder further refines these representations to generate interaction labels with minimal latency. Experimental evaluations conducted on benchmark HOI datasets demonstrate that the LVTF achieves competitive accuracy while reducing computational complexity by nearly 40% compared to conventional transformer and CNN-based models. The reduced model footprint and low inference delay make the proposed approach highly suitable for real-time intelligent applications, including smart surveillance, assistive robotics, and human–computer interaction systems

Keywords: Vision transformer, human–object interaction, real-time recognition, lightweight architecture, attention mechanism, intelligent systems.

Reference:

[1] A. Dosovitskiy et al., “An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale,” Proc. ICLR, 2021.
[2] N. Carion et al., “End-to-End Object Detection with Transformers,” Proc. ECCV, pp. 213–229, 2020.
[3] X. Chen, S. Li, and R. Wang, “Vision Transformer Applications in Real-Time Object Understanding,” IEEE Trans. Multimedia, vol. 25, pp. 645–657, 2023.
[4] Y. Zhang et al., “Human–Object Interaction Detection Using Deep Neural Networks,” Proc. CVPR Workshops, 2020.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proc. CVPR, pp. 770–778, 2016.
[6] A. Radford et al., “Learning Transferable Visual Models with Natural Language Supervision,” Proc. ICML, 2021.
[7] H. Tan and M. Bansal, “LXMERT: Learning Cross-Modality Encoder Representations,” Proc. EMNLP, 2019.
[8] A. Newell, Z. Huang, and J. Deng, “Pose-Attentive Relational Networks for Human–Object Interaction,” Proc. ICCV, pp. 834–845, 2019.
[9] S. G. Kong and X. Li, “Efficient Vision Transformations for Embedded AI Systems,” IEEE Embedded Systems Letters, vol. 14, no. 3, pp. 253–257, 2022.
[10] H. Wu, S. Li, and J. Liu, “Lightweight Transformer Designs for Mobile Vision Applications,” Pattern Recognition, vol. 138, art. no. 109407, 2023.
[11] X. Wang et al., “GPNN: Graph Parsing Neural Networks for Human–Object Interaction,” Proc. ECCV, pp. 407–423, 2018.
[12] Z. Fang, Q. Huang, and T. Lu, “Real-Time Human Action Recognition Using Hybrid Attention Networks,” IEEE Access, vol. 10, pp. 114320–114332, 2022.

Notice Board

Peer Review Timeline average processing time is 1 to 2 days

Call for Papers , Volume 3, Issue 1(Jan 2026) is now open.

Contact :editorinchief.ijaea@gmail.com | submitpaper@ijaea.comn

ISSN : 3048-6807 (Online) has been officially allotted to IJAEA.

Call for Papers , Volume 3, Issue 1(Jan 2026) is now open.

BASE, CORE, WorldCat have indexed journal content.

Google Scholar & Zenodo have indexed IJAEA articles

DOI is assigned to each published article via Zenodo.

INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING APPLICATION

Lightweight Vision Transformer Framework for Real-Time Human–Object Interaction Recognition