TY - GEN
T1 - GPTVoiceTasker
T2 - 37th Annual ACM Symposium on User Interface Software and Technology, UIST 2024
AU - Vu, Minh Duc
AU - Wang, Han
AU - Chen, Jieshan
AU - Li, Zhuang
AU - Zhao, Shengdong
AU - Xing, Zhenchang
AU - Chen, Chunyang
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/10/13
Y1 - 2024/10/13
N2 - Virtual assistants have the potential to play an important role in helping users achieves different tasks. However, these systems face challenges in their real-world usability, characterized by inefficiency and struggles in grasping user intentions. Leveraging recent advances in Large Language Models (LLMs), we introduce GptVoiceTasker, a virtual assistant poised to enhance user experiences and task efficiency on mobile devices. GptVoiceTasker excels at intelligently deciphering user commands and executing relevant device interactions to streamline task completion. For unprecedented tasks, GptVoiceTasker utilises the contextual information and on-screen content to continuously explore and execute the tasks. In addition, the system continually learns from historical user commands to automate subsequent task invocations, further enhancing execution efficiency. From our experiments, GptVoiceTasker achieved 84.5% accuracy in parsing human commands into executable actions and 85.7% accuracy in automating multi-step tasks. In our user study, GptVoiceTasker boosted task efficiency in real-world scenarios by 34.85%, accompanied by positive participant feedback. We made GptVoiceTasker open-source, inviting further research into LLMs utilization for diverse tasks through prompt engineering and leveraging user usage data to improve efficiency.
AB - Virtual assistants have the potential to play an important role in helping users achieves different tasks. However, these systems face challenges in their real-world usability, characterized by inefficiency and struggles in grasping user intentions. Leveraging recent advances in Large Language Models (LLMs), we introduce GptVoiceTasker, a virtual assistant poised to enhance user experiences and task efficiency on mobile devices. GptVoiceTasker excels at intelligently deciphering user commands and executing relevant device interactions to streamline task completion. For unprecedented tasks, GptVoiceTasker utilises the contextual information and on-screen content to continuously explore and execute the tasks. In addition, the system continually learns from historical user commands to automate subsequent task invocations, further enhancing execution efficiency. From our experiments, GptVoiceTasker achieved 84.5% accuracy in parsing human commands into executable actions and 85.7% accuracy in automating multi-step tasks. In our user study, GptVoiceTasker boosted task efficiency in real-world scenarios by 34.85%, accompanied by positive participant feedback. We made GptVoiceTasker open-source, inviting further research into LLMs utilization for diverse tasks through prompt engineering and leveraging user usage data to improve efficiency.
UR - http://www.scopus.com/inward/record.url?scp=85215098692&partnerID=8YFLogxK
U2 - 10.1145/3654777.3676356
DO - 10.1145/3654777.3676356
M3 - Conference contribution
AN - SCOPUS:85215098692
T3 - UIST 2024 - Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology
BT - UIST 2024 - Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology
PB - Association for Computing Machinery, Inc
Y2 - 13 October 2024 through 16 October 2024
ER -