A paper from Prof. Lee jinkyu's lab. Papers was approveed by publication at ACM/IEEE DAC 2024, IEEE RTAS 2024
2024-07-01
제목: 이진규 교수 연구실(실시간 컴퓨팅 연구실, RTCL@SKKU) ACM/IEEE DAC 2024, IEEE RTAS 2024 논문 발표 실시간 컴퓨팅 연구실(지도교수: 이진규)에서 작성한 논문이 ACM/IEEE DAC 2024 (the 61th Design Automation Conference)와 IEEE RTAS 2024 (30th IEEE Real-Time and Embedded Technology and Applications Symposium)에 발표되었습니다. ACM/DAC은 Design Automation 분야의 Top1 국제 학술대회(정보과학회 최우수 등급, BK21+ IF3)이고 올해는 미국 샌프란시스코에서 2024년 6월 23일~27일 개최되었며, IEEE RTAS는 실시간 시스템 분야의 Top2 국제 학술대회(정보과학회 최우수 등급, BK21+ IF2)이며 올해는 홍콩에서 2024년 5월 13일~16일 총 29편의 논문이 발표되었습니다. ACM/IEEE DAC 2024 논문은 MCU등 소형 IoT 기기에서 인공지능 응용 작업 실행에 대한 실시간성 보장을 다루고 있으며, 실시간 컴퓨팅 연구실 석사과정 강석민 학생(제1저자), 박사과정 이성태 학생(공동제1저자), 학부과정 구현우 학생이 이진규 교수의 지도하에 참여하였고, DGIST 좌훈승 교수와의 공동연구로 진행되었습니다. IEEE RTAS 2024 논문은 메모리 부족 환경에서의 인공지능 응용 작업의 실행에 대한 실시간성 보장을 다루고 있으며, DGIST 좌훈승 교수 연구팀 주도하에 이진규 교수가 참여하였습니다. ACM/IEEE DAC 2024 홈페이지 https://www.dac.com/ IEEE RTAS 2024 홈페이지 https://2024.rtas.org/ 실시간 컴퓨팅 연구실 홈페이지 https://rtclskku.github.io/website/ - 논문제목: RT-MDM: Real-Time Scheduling Framework for Multi-DNN on MCU Using External Memory - Abstract: As the application scope of DNNs executed on microcontroller units (MCUs) extends to time-critical systems, it becomes important to ensure timing guarantees for increasing demand of DNN inferences. To this end, this paper proposes RT-MDM, the first Real-Time scheduling framework for Multiple DNN tasks executed on an MCU using external memory. Identifying execution-order dependencies among segmented DNN models and memory requirements for parallel execution subject to the dependencies, we propose (i) a segment-group-based memory management policy that achieves isolated memory usage within a segment group and sharded memory usage across different segment groups, and (ii) an intra-task scheduler specialized for the proposed policy. Implementing RT-MDM on an actual system and optimizing its parameters for DNN segmentation and segment-group mapping, we demonstrate the effectiveness of RT-MDM in accommodating more DNN tasks while providing their timing guarantees. - 논문제목: RT-Swap: Addressing GPU Memory Bottlenecks for Real-Time Multi-DNN Inference - Abstract: The increasing complexity and memory demands of Deep Neural Networks (DNNs) for real-time systems pose new significant challenges, one of which is the GPU memory capacity bottleneck, where the limited physical memory inside GPUs impedes the deployment of sophisticated DNN models. This paper presents, to the best of our knowledge, the first study of addressing the GPU memory bottleneck issues, while simultaneously ensuring the timely inference of multiple DNN tasks. We propose RT-Swap, a real-time memory management framework, that enables transparent and efficient swap scheduling of memory objects, employing the relatively larger CPU memory to extend the available GPU memory capacity, without compromising timing guarantees. We have implemented RT-Swap on top of representative machine-learning frameworks, demonstrating its effectiveness in making significantly more DNN task sets schedulable at least 72% over existing approaches even when the task sets demand up to 96.2% more memory than the GPU’s physical capacity. 이진규 | jinkyu.lee@skku.edu | 실시간컴퓨팅 Lab. | https://rtclskku.github.io/website/ Title: Papers from Prof. Jinkyu Lee’s Lab. (RTCL@SKKU) published in ACM/IEEE DAC 2024 and IEEE RTAS 2024 A paper from RTCL@SKKU (Advisor: Jinkyu Lee) has been published in ACM/IEEE DAC 2024 and IEEE RTAS 2024. ACM/IEEE DAC 2024 Website https://www.dac.com/ IEEE RTAS 2024 Website https://2024.rtas.org/ Real-Time Computing Lab. Website https://rtclskku.github.io/website/ - Paper Title: RT-MDM: Real-Time Scheduling Framework for Multi-DNN on MCU Using External Memory - Abstract: As the application scope of DNNs executed on microcontroller units (MCUs) extends to time-critical systems, it becomes important to ensure timing guarantees for increasing demand of DNN inferences. To this end, this paper proposes RT-MDM, the first Real-Time scheduling framework for Multiple DNN tasks executed on an MCU using external memory. Identifying execution-order dependencies among segmented DNN models and memory requirements for parallel execution subject to the dependencies, we propose (i) a segment-group-based memory management policy that achieves isolated memory usage within a segment group and sharded memory usage across different segment groups, and (ii) an intra-task scheduler specialized for the proposed policy. Implementing RT-MDM on an actual system and optimizing its parameters for DNN segmentation and segment-group mapping, we demonstrate the effectiveness of RT-MDM in accommodating more DNN tasks while providing their timing guarantees. - Paper Title: RT-Swap: Addressing GPU Memory Bottlenecks for Real-Time Multi-DNN Inference - Abstract: The increasing complexity and memory demands of Deep Neural Networks (DNNs) for real-time systems pose new significant challenges, one of which is the GPU memory capacity bottleneck, where the limited physical memory inside GPUs impedes the deployment of sophisticated DNN models. This paper presents, to the best of our knowledge, the first study of addressing the GPU memory bottleneck issues, while simultaneously ensuring the timely inference of multiple DNN tasks. We propose RT-Swap, a real-time memory management framework, that enables transparent and efficient swap scheduling of memory objects, employing the relatively larger CPU memory to extend the available GPU memory capacity, without compromising timing guarantees. We have implemented RT-Swap on top of representative machine-learning frameworks, demonstrating its effectiveness in making significantly more DNN task sets schedulable at least 72% over existing approaches even when the task sets demand up to 96.2% more memory than the GPU’s physical capacity. Jinkyu Lee | jinkyu.lee@skku.edu | RTCL@SKKU | https://rtclskku.github.io/website/