Filters
Advanced filters
On

Cake Talent Search

Avatar of Ying-Hung Lo.
Avatar of Ying-Hung Lo.
Manager @Wistron NeWeb Corporation
2021 ~ Present
Software Manager / Software Supervisor
Within one month
同的車廠。 車用OEM / AM產品開發經驗 車用儀錶板 / NAD / Dashcam 產品開發經驗 Deep Learning 主要專注在如何在 Embedded Linux上使用 Deep learning相關的演算法. 除了單純移植應用以外, 也研讀了許多論文並且實作了相關應用。 參與與矽谷AI晶片公司的 middleware開發, 使用 compiler方式
C Programming
JAVA
Embedded Linux
Employed
Ready to interview
Taiwan
Full-time / Interested in working remotely
More than 15 years
Nation Tainan University
Wirelesss Sensor Network
Avatar of the user.
Avatar of the user.
Past
Data Engineer @國泰金融控股股份有限公司(Cathay Financial Holdings)
2022 ~ 2023
MLOps | AI/ML Engineer
Within one month
專案管理
html5/javascript/css3/ajax
Functional Testing
Unemployed
Ready to interview
Taiwan
Full-time / Interested in working remotely
6-10 years
Conestoga College
Applied Artificial Intelligence & Machine Learning
Avatar of LLC.
Avatar of LLC.
Bigdata Engineer @BiMAP Inc.
2024 ~ Present
軟體工程師
Within one month
LiCheng Lin I mainly develop back-end services, I also have experience in Android application development. As a management major, I can combine business logic into projects well. and I am willing to communicate with customers and team members. I hold a Certification of JLPT level 1 and have worked overseas for 2 years, I Hosted several customer training lectures in English and Japanese. Software Engineer Taipei Taiwan [email protected] Skills Language and Framework Golang Java Spring boot Spring JPA Mybatis Database PostgreSQL ElasticSeatch InfluxDB SQLite H2 MongoDB other Ubuntu GCP VirtualBox Docker
Java
Data Structures
Algorithm
Employed
Open to opportunities
Taiwan
Full-time / Interested in working remotely
4-6 years
National Taipei University of Technology
Industrial Engineering and management
Avatar of 徐健綸.
Avatar of 徐健綸.
軟體副理 @浪live
2021 ~ Present
影音串流工程師
Within one month
Main product「浪Live」和「Popo Note」 Led research, development, and maintenance of live streaming SDK for Android and iOS platforms, significantly enhancing user experience and stability Engineered low-latency streaming solutions with optimized buffer management, reducing broadcast delay from 3 seconds to under 1 second Implemented adaptive bitrate streaming algorithms that dynamically adjust video quality based on network conditions, enhancing viewing experience in unstable environments Developed and integrated MediaPipe-based cross-platform SDK supporting both Android and iOS, implementing interactive features including filters, beautification, and background blur Architected comprehensive QoS metrics monitoring system using Grafana for real-time streaming
C++
Objective-C
C
Employed
Open to opportunities
Taiwan
Full-time / Interested in working remotely
10-15 years
元智大學
資訊工程學系
Avatar of the user.
Avatar of the user.
Backend engineer @WOO X
2024 ~ Present
Software Engineer
Within one month
Mysql
MongoDB
Cassandra
Employed
Full-time / Interested in working remotely
6-10 years
淡江大學
資訊工程學系
Avatar of Kobe Yu.
Offline
Avatar of Kobe Yu.
Offline
Ph.D. Student @NTHU
Software engineer
Within three months
Meng-Shiun Yu Email: [email protected] Mengshiun Yu is a Ph.D. candidate in the Computer Science Department at National Tsing Hua University, Taiwan, expected to graduate in JuneHe is also a visiting scholar in the Machine Learning Department at Carnegie Mellon University (CMU), Pennsylvania, USA. As a member of the Programming Language Research Lab, he is advised by Prof. Jenq-kuen Lee. His research interests include software and hardware co-design as well as compiler optimization for machine learning and computer vision algorithms. Research Interests Compiler Design and Optimization Software
C/C++
Python
Deep Learning
Taiwan
6-10 years
國立清華大學
Computer Science
Avatar of the user.
Avatar of the user.
Senior Firmware Engineer @神盾股份有限公司
2015 ~ Present
資深軟/韌體工程師
Within six months
C/C++
mcu
ADC
Employed
Taiwan
Full-time / Interested in working remotely
10-15 years
China University of Science and Technology
Electrical Engineering
Avatar of Lewis Chang.
Avatar of Lewis Chang.
Backend Engineering Manager @Appier 沛星互動科技
2022 ~ Present
軟體工程師
Within six months
junior developers for both onboarding and enhancing code quality. - Introduced typing system by Typescript for better maintainability. - Migrated frontend framework from AngularJS to Angular. Software Engineer • AI4quant MaySepUsed NLP model and image processing help E-Commerce client to eliminate duplicate products. - Assessed NLP models for time series data. - Studied the state-of-the-art DL models like Efficient-Net, Transformer-XL. - Built a mobile app to receive data from various wearable devices via blue-tooth connection for monitoring user's health condition by DL models. AI Algorithm Development Engineer • Sky...
Python
Backend Development
Frontend Development
Employed
Not open to opportunities
Full-time
4-6 years
National Central University
Department of Optics and Photonics
Avatar of 黃資恩.
Avatar of 黃資恩.
Engineer @鴻霖
2022 ~ Present
軟體工程師
Within one year
設計了一套框架,使其他同事減少開發新功能的學習成本,事後發展成其他產品也可套用這框架。在SOA Mapping的部分,我設計了演算法使原本需要數秒才能mapping到服務進而優化為數毫秒。此外,我也在團隊中積極導入CI/CD流程,這期間增強了許多有
HTML5
Ruby
CSS3
Employed
Taiwan
Full-time / Interested in working remotely
6-10 years
國立中正大學
資訊工程
Avatar of Johnny Hsieh.
Avatar of Johnny Hsieh.
Founder @MorphusAI
2023 ~ Present
Nope
Within two months
domain names. Currently, I manage ArgsData and MorphusAI, innovative companies focused on delivering cutting-edge data solutions and advancing digital human technology through advanced deep learning methodologies. Professional Skills: Artificial Intelligence Facial Recognition and Facial AI Technologies: Real-Time Facial Expression Tracking and Analysis: Utilizing advanced machine learning algorithms for immediate recognition and analysis of facial expressions, enhancing user interaction experiences. Emotion Recognition Systems: Developing AI systems capable of understanding and responding to user emotions, used to enhance customer service and user interactions. Voice Technology (TTS): Natural Voice Generation: Employing Text-to-Speech (TTS) technology to
Solidity
blockchain development
Docker
Employed
Not open to opportunities
Full-time / Not interested in working remotely
4-6 years

The Most Lightweight and Effective Recruiting Plan

Search resumes and take the initiative to contact job applicants for higher recruiting efficiency. The Choice of Hundreds of Companies.

  • Browse all search results
  • Unlimited access to start new conversations
  • Resumes accessible for only paid companies
  • View users’ email address & phone numbers
Within three months
Ph.D. Student @ NTHU
Taichung City, Taiwan
台灣
Professional Background
Current status
Job Search Progress
Professions
Other
Fields of Employment
Software
Work experience
6-10 years
Management
Skills
C/C++
Python
Deep Learning
Languages
Job search preferences
Positions
Software engineer
Job types
Locations
Remote
Freelance
Educations
School
國立清華大學
Major
Computer Science
Print


Meng-Shiun Yu

 

Mengshiun Yu is a Ph.D. candidate in the Computer Science Department at National Tsing Hua University, Taiwan, expected to graduate in June 2025. He is also a visiting scholar in the Machine Learning Department at Carnegie Mellon University (CMU), Pennsylvania, USA. As a member of the Programming Language Research Lab, he is advised by Prof. Jenq-kuen Lee. His research interests include software and hardware co-design as well as compiler optimization for machine learning and computer vision algorithms.

Research Interests


  • Compiler Design and Optimization
  • Software and Hardware Co-design
  • Machine Leaning Compiler and Runtime

Education

  • Ph.D., Department of Computer Science, National Tsing Hua University                                2017/9 ~ 2025/6 (expected)

    • Advisor: Prof.  Jenq-Kuen Lee
    • Thesis: Integrating Custom Instructions and VLIW Architectures to Accelerate Computer Vision Workloads on RISC-V Using TVM

  • Visiting Scholar, Carnegie Mellon University                                                                                                   2024/3 ~ 2025/2

    • Advisor: Prof. Tianqi Chen
    • Research work:
      • Added support for new language models(Phi3 and Phi3 vision) in MLC LLM
      • Optimized performance of vision language model on CUDA platform
      • Added support MLC LLM for CPU backend
    • Under Taiwan NSTC Young Talent Award for Ph.D. student 

  • M.A., Department of Electrical Engineering, National Chung Cheng University                                  2006/9 ~ 2008/10

    • Advisor: Huei-Yung Lin
    • Thesis: A Vision Surveillance System for Mobile Robot Using Omni-directional and PTZ Cameras

  • B.A., Department of Electrical Engineering, National Chin Yi University of Technology                    2004/9 ~ 2006/6

Work Experience



National Tsing Hua University.Research Assistant.2023 / 2 ~ Present

  • Assisted in drafting research proposals for funding.
  • Document research and publish academic papers.
  • Led a research team with topics including:
    • Investigating out-of-core problem and solutions through graph partitioning techniques.
    • Developing MLC LLM for CPU backends.
    • Apply LLM on compiler optimization for embedded devices.
    • Designing FP8 instructions based on the RISC-V ISA.
    • Exploring VLIW ISA design using machine learning methods.



Kneron.NPU Compiler Engineer.2019 / 2 ~ 2023/2

  • Participated in the development of 6 NPU SoCs.
  • Developed compiler to compress weights for NPU, metadata for firmware control flow.
  • Propose a whole new NPU executable format(NEF) for the runtime.
  • Developed binary utilities to merge/extract multiple models to/from single NEF file.
  • Developed just in time compiler for image accelerator, used in freeRTOS and Embedded Linux environment.
  • Pre- and post-silicon verification(NPU and image accelerator).
  • Debug mismatch between CSim and RTL or performance issue with firmware, CSim, and HW team.
  • Recruit and lead interns to complete specific research work.

Fongfu.Software Engineering Manager.2016 / 9 ~ 2018 / 6

  • Build up a software team (~5 engineer) to deploy services on App and Asus Zenbo.
  • System architecture evaluation and planning (Mobile App,  Web backend)
  • Developed face recognition algorithm.

SmartAll.Senior Software Engineer.2016 / 2 ~ 2016 / 8

  • Developed App to control smart camera.
  • Porting ffmpeg and face detection algorithm on mobile phone.
  • Developed peer to peer commutation SDK.

MSI.Software Engineer.2013 / 11 ~ 2016 / 2

  • Developed App to control smart camera.

CEO(Create Electronic Optical Co., Ltd).Software Engineer.2010 /1 ~ 2013/3

  • Developed lane departure and forward collision warning systems using computer vision algorithms.
  • Developed media player for driver recorder.

Publications

  • Journal Paper

    1. Optimizing Computer Vision Algorithms with TVM on VLIW Architecture Based on RVV, Meng-shiun Yu, Hao-Chun Chang, Chong-Teng Wang, Yu-Wei Tien, Tai-Liang Chen, Jenq-Kuen Lee, Journal of Supercomputing, Volume 81, article number 172, 2025.
    2. Case Study: Optimization Methods with TVM Hybrid-OP on RISC-V Packed SIMD, Meng-shiun Yu, Chuan-yue Yuan, Tai-liang Chen, and Jenq-kuen Lee, IEEE Access, vol. 12, pp. 64193-64211, 2024.
    3. Support NNEF Execution Model for NNAPI, Yuan-Ming Chang, Chia-Yu Sung, Yu-Chien Sheu, Meng-Shiun Yu, Min-Yih Hsu, and Jenq-Kuen Lee, Journal of Supercomputing, 77, 10065-10096, Springer, 2021.
    4. NNBlocks: A Blockly Framework for AI Computing, Tai-Liang Chen, Yi-Ru Chen, Meng-Shiun Yu, and Jenq-Kuen Lee, Journal of Supercomputing, 77, 8622-8652, Springer, 2021

  • Conference Paper

    1. Enhancing RISC-V ISA to Support Sub-FP8 Quantization for Machine Learning Models, Meng-shiun Yu, Jhih-Kuan Lin, Fu-Jian Shen, Jenq-Kuen Lee, RISC-V Summit, Santa Clara, USA, Nov. 2024. (Poster)
    2. Developing an LLVM Backend for VLIW RISC-V Vector Extension Architectures, Hao-Chun Chang, Jhih-Kuan Lin, Meng-Shiun Yu, Tai-Liang Chen, and Jenq-Kuen Lee, Euro LLVM, Vienna, Austria, April 2024 (Poster)
    3. Verification of the RISC-V Vector Extension for the Gem5 Simulator, Chong-Teng Wang, Meng-Shiun Yu and Jenq-Kuen Lee, CTHPC, May. 2024.
    4. An AI Accelerator Simulator for 2D Mesh Architecture, Yu-Wen Shao, Chao-Lin Lee, Meng-Shiun Yu and Jenq-Kuen Lee, CTHPC, May. 2023.
    5. The Support of MLIR HLS Adaptor for LLVM IR, Geng-Ming Liang, Chuan-Yue Yuan, Meng-Shiun Yu, Tai-Liang Chen, Kuan-Hsun Chen, and Jenq-Kuen Lee, ICPP EMS 2022, Bordeaux, France, Aug. 29 - Sep. 1, 2022. (Virtual).
    6. C++OpenCL4TVM: Support C++OpenCL Kernel for TVM NN Operators, Po-Yao Chang, Tai-Liang Chen, Yu-Tse Huang, Meng-Shiun Yu, Jenq-Kuen Lee, IWOCL 2022 (poster paper), 2022. (also in ACM proceeding article No.: 27, Pages 1 - 2)
    7. Optimization with TVM Hybrid OP on RISC-V with P Extension, Chuan-Yue Yuan, Meng-Shiun Yu, Chao-Lin Lee, Chun-Chieh Yang, and Jenq-Kuen Lee, TVMcon 2021, Seattle, Dec. 15-17, 2021
    8. Accelerating NNEF framework on OpenCL devices using clDNN, Meng-Shiun Yu, Tai-Liang Chen and Jenq-Kuen Lee, IWOCL 2020, Germany, May 2020 (poster paper). (also in ACM proceeding article No.: 20, Pages 1 - 2)
    9. A Robot Vision System for Visual Surveillance, M. S. Yu, H. Wu, H. Y. Lin, International Journal of Innovative Computing, Information and Control, Vol. 10, No. 4, pp. 1267-1274, Aug. 2014.
    10. Lane departure and front collision warning using a single camera, H. Y. Lin, L. Q. Chen, M. S. Yu, International Symposium on Intelligent Signal Processing and Communication System, pp. 64-69, Nov. 2012. 
    11. A visual surveillance system for mobile robot using omnidirectional and PTZ cameras, MS Yu, H Wu, HY Lin, Proceedings of SICE Annual Conference, pp. 37-42. Aug. 2010.

Projects

  • Google Research Project with NTHU
    • DSP Applications and Performance Explorations with RISC-V Extensions. (2023 ~ 2024)
    • Compiler Directives and Optimization Methods for Sharding AI Models. (2024 ~ 2025)
    • Using LLM models to optimize LLVM compiler IRs and AI compiler optimizations. (2024 ~ 2025)
  • Open-Source Contributions(TVM and MLC LLM)
    • Optimized computer vision algorithm and deep learning models on RISC-V hardware under the TVM framework, including meta-scheduling and adding support for custom instructions based on RISC-V.
    • Developed a new code generator and runtime for TVM on Android, enabling neural network inference on custom hardware accelerators using Android Neural Networks API (NNAPI). (Co-authored-by Ming-Long Huang and Ming-Zhang Huang)
    RFC: https://discuss.tvm.apache.org/t/rfc-109-add-a-new-backend-nnapi-for-byoc/17717
    • Added MLC LLM support and optimization for vision language model(Phi 3 vision).
    PRs:  https://github.com/search?q=repo%3Amlc-ai%2Fmlc-llm+mengshyu&type=pullrequests
  • Academic Projects(National Science and Technology Commission of Taiwan)
    • Design sub-FP8 instruction based on RISC-V ISA.
    Model quantization has become an essential optimization strategy for improving performance in machine learning. It reduces data types from 32-bit and 16-bit to lower-bit formats, such as 8-bit, 6-bit, and 4-bit, in both integer and floating- point data types. However, the current RISC-V instruction set lacks support for floating-point formats below 8 bits, includ- ing FP8, 6-bit, 4-bit, and other sub-byte precision formats. This research proposes extending the RISC-V ISA to support sub-FP8 operations. We design custom instructions that enable RISC-V CPUs to execute sub-FP8 computations directly, enhancing machine learning workloads’ performance and energy efficiency. By integrating with AI compiler frameworks such as TVM and MLC LLM.
    • Hardware-aware Graph Partition for Mobile Inference Acceleration
    Modern smartphones integrate various specialized hardware units, such as CPUs, GPUs, DSPs, and accelerators, each tailored to specific computational tasks. Many mobile systems provide native neural network APIs to leverage these hardware units efficiently. However, combining compiler-aided optimizations with NN API backends can further enhance performance. This paper introduces a generic approach for hardware-aware graph partitioning to accelerate mobile inference. We profile the operators used in the model on supported hardware and then use the profiling data to create a cost model. Based on this cost model, we apply a graph partitioning strategy to maximize computation graph performance on mobile devices.
    • Apply LLM on compiler optimization for embedded devices.
    Large Language Models (LLMs) have been successfully applied in various fields, such as natural language processing, programming language development, and unstructured data organization and analysis. However, their application in the field of compiler optimization is still limited. This research aims to leverage LLMs to enhance and automate compiler optimization for embedded devices, focusing on key performance metrics such as code size, execution time, hardware utilization, and memory usage. Our goal is to use LLMs to identify the optimal combinations of LLVM compiler optimizations for embedded devices. LLVM optimizations can be categorized into target-independent IR optimizations and target-dependent instruction-level optimizations. LLVM opt provides a series of optimization functions that can be executed repeatedly or combined with other optimizations. This research will address the challenge of finding the best optimization strategies for different performance metrics among the numerous possible combinations. By leveraging open LLM models, we aim to identify the most effective optimization strategies tailored to different hardware and optimization combinations, ultimately enhancing the performance of embedded devices.

Resume
Profile


Meng-Shiun Yu

 

Mengshiun Yu is a Ph.D. candidate in the Computer Science Department at National Tsing Hua University, Taiwan, expected to graduate in June 2025. He is also a visiting scholar in the Machine Learning Department at Carnegie Mellon University (CMU), Pennsylvania, USA. As a member of the Programming Language Research Lab, he is advised by Prof. Jenq-kuen Lee. His research interests include software and hardware co-design as well as compiler optimization for machine learning and computer vision algorithms.

Research Interests


  • Compiler Design and Optimization
  • Software and Hardware Co-design
  • Machine Leaning Compiler and Runtime

Education

  • Ph.D., Department of Computer Science, National Tsing Hua University                                2017/9 ~ 2025/6 (expected)

    • Advisor: Prof.  Jenq-Kuen Lee
    • Thesis: Integrating Custom Instructions and VLIW Architectures to Accelerate Computer Vision Workloads on RISC-V Using TVM

  • Visiting Scholar, Carnegie Mellon University                                                                                                   2024/3 ~ 2025/2

    • Advisor: Prof. Tianqi Chen
    • Research work:
      • Added support for new language models(Phi3 and Phi3 vision) in MLC LLM
      • Optimized performance of vision language model on CUDA platform
      • Added support MLC LLM for CPU backend
    • Under Taiwan NSTC Young Talent Award for Ph.D. student 

  • M.A., Department of Electrical Engineering, National Chung Cheng University                                  2006/9 ~ 2008/10

    • Advisor: Huei-Yung Lin
    • Thesis: A Vision Surveillance System for Mobile Robot Using Omni-directional and PTZ Cameras

  • B.A., Department of Electrical Engineering, National Chin Yi University of Technology                    2004/9 ~ 2006/6

Work Experience



National Tsing Hua University.Research Assistant.2023 / 2 ~ Present

  • Assisted in drafting research proposals for funding.
  • Document research and publish academic papers.
  • Led a research team with topics including:
    • Investigating out-of-core problem and solutions through graph partitioning techniques.
    • Developing MLC LLM for CPU backends.
    • Apply LLM on compiler optimization for embedded devices.
    • Designing FP8 instructions based on the RISC-V ISA.
    • Exploring VLIW ISA design using machine learning methods.



Kneron.NPU Compiler Engineer.2019 / 2 ~ 2023/2

  • Participated in the development of 6 NPU SoCs.
  • Developed compiler to compress weights for NPU, metadata for firmware control flow.
  • Propose a whole new NPU executable format(NEF) for the runtime.
  • Developed binary utilities to merge/extract multiple models to/from single NEF file.
  • Developed just in time compiler for image accelerator, used in freeRTOS and Embedded Linux environment.
  • Pre- and post-silicon verification(NPU and image accelerator).
  • Debug mismatch between CSim and RTL or performance issue with firmware, CSim, and HW team.
  • Recruit and lead interns to complete specific research work.

Fongfu.Software Engineering Manager.2016 / 9 ~ 2018 / 6

  • Build up a software team (~5 engineer) to deploy services on App and Asus Zenbo.
  • System architecture evaluation and planning (Mobile App,  Web backend)
  • Developed face recognition algorithm.

SmartAll.Senior Software Engineer.2016 / 2 ~ 2016 / 8

  • Developed App to control smart camera.
  • Porting ffmpeg and face detection algorithm on mobile phone.
  • Developed peer to peer commutation SDK.

MSI.Software Engineer.2013 / 11 ~ 2016 / 2

  • Developed App to control smart camera.

CEO(Create Electronic Optical Co., Ltd).Software Engineer.2010 /1 ~ 2013/3

  • Developed lane departure and forward collision warning systems using computer vision algorithms.
  • Developed media player for driver recorder.

Publications

  • Journal Paper

    1. Optimizing Computer Vision Algorithms with TVM on VLIW Architecture Based on RVV, Meng-shiun Yu, Hao-Chun Chang, Chong-Teng Wang, Yu-Wei Tien, Tai-Liang Chen, Jenq-Kuen Lee, Journal of Supercomputing, Volume 81, article number 172, 2025.
    2. Case Study: Optimization Methods with TVM Hybrid-OP on RISC-V Packed SIMD, Meng-shiun Yu, Chuan-yue Yuan, Tai-liang Chen, and Jenq-kuen Lee, IEEE Access, vol. 12, pp. 64193-64211, 2024.
    3. Support NNEF Execution Model for NNAPI, Yuan-Ming Chang, Chia-Yu Sung, Yu-Chien Sheu, Meng-Shiun Yu, Min-Yih Hsu, and Jenq-Kuen Lee, Journal of Supercomputing, 77, 10065-10096, Springer, 2021.
    4. NNBlocks: A Blockly Framework for AI Computing, Tai-Liang Chen, Yi-Ru Chen, Meng-Shiun Yu, and Jenq-Kuen Lee, Journal of Supercomputing, 77, 8622-8652, Springer, 2021

  • Conference Paper

    1. Enhancing RISC-V ISA to Support Sub-FP8 Quantization for Machine Learning Models, Meng-shiun Yu, Jhih-Kuan Lin, Fu-Jian Shen, Jenq-Kuen Lee, RISC-V Summit, Santa Clara, USA, Nov. 2024. (Poster)
    2. Developing an LLVM Backend for VLIW RISC-V Vector Extension Architectures, Hao-Chun Chang, Jhih-Kuan Lin, Meng-Shiun Yu, Tai-Liang Chen, and Jenq-Kuen Lee, Euro LLVM, Vienna, Austria, April 2024 (Poster)
    3. Verification of the RISC-V Vector Extension for the Gem5 Simulator, Chong-Teng Wang, Meng-Shiun Yu and Jenq-Kuen Lee, CTHPC, May. 2024.
    4. An AI Accelerator Simulator for 2D Mesh Architecture, Yu-Wen Shao, Chao-Lin Lee, Meng-Shiun Yu and Jenq-Kuen Lee, CTHPC, May. 2023.
    5. The Support of MLIR HLS Adaptor for LLVM IR, Geng-Ming Liang, Chuan-Yue Yuan, Meng-Shiun Yu, Tai-Liang Chen, Kuan-Hsun Chen, and Jenq-Kuen Lee, ICPP EMS 2022, Bordeaux, France, Aug. 29 - Sep. 1, 2022. (Virtual).
    6. C++OpenCL4TVM: Support C++OpenCL Kernel for TVM NN Operators, Po-Yao Chang, Tai-Liang Chen, Yu-Tse Huang, Meng-Shiun Yu, Jenq-Kuen Lee, IWOCL 2022 (poster paper), 2022. (also in ACM proceeding article No.: 27, Pages 1 - 2)
    7. Optimization with TVM Hybrid OP on RISC-V with P Extension, Chuan-Yue Yuan, Meng-Shiun Yu, Chao-Lin Lee, Chun-Chieh Yang, and Jenq-Kuen Lee, TVMcon 2021, Seattle, Dec. 15-17, 2021
    8. Accelerating NNEF framework on OpenCL devices using clDNN, Meng-Shiun Yu, Tai-Liang Chen and Jenq-Kuen Lee, IWOCL 2020, Germany, May 2020 (poster paper). (also in ACM proceeding article No.: 20, Pages 1 - 2)
    9. A Robot Vision System for Visual Surveillance, M. S. Yu, H. Wu, H. Y. Lin, International Journal of Innovative Computing, Information and Control, Vol. 10, No. 4, pp. 1267-1274, Aug. 2014.
    10. Lane departure and front collision warning using a single camera, H. Y. Lin, L. Q. Chen, M. S. Yu, International Symposium on Intelligent Signal Processing and Communication System, pp. 64-69, Nov. 2012. 
    11. A visual surveillance system for mobile robot using omnidirectional and PTZ cameras, MS Yu, H Wu, HY Lin, Proceedings of SICE Annual Conference, pp. 37-42. Aug. 2010.

Projects

  • Google Research Project with NTHU
    • DSP Applications and Performance Explorations with RISC-V Extensions. (2023 ~ 2024)
    • Compiler Directives and Optimization Methods for Sharding AI Models. (2024 ~ 2025)
    • Using LLM models to optimize LLVM compiler IRs and AI compiler optimizations. (2024 ~ 2025)
  • Open-Source Contributions(TVM and MLC LLM)
    • Optimized computer vision algorithm and deep learning models on RISC-V hardware under the TVM framework, including meta-scheduling and adding support for custom instructions based on RISC-V.
    • Developed a new code generator and runtime for TVM on Android, enabling neural network inference on custom hardware accelerators using Android Neural Networks API (NNAPI). (Co-authored-by Ming-Long Huang and Ming-Zhang Huang)
    RFC: https://discuss.tvm.apache.org/t/rfc-109-add-a-new-backend-nnapi-for-byoc/17717
    • Added MLC LLM support and optimization for vision language model(Phi 3 vision).
    PRs:  https://github.com/search?q=repo%3Amlc-ai%2Fmlc-llm+mengshyu&type=pullrequests
  • Academic Projects(National Science and Technology Commission of Taiwan)
    • Design sub-FP8 instruction based on RISC-V ISA.
    Model quantization has become an essential optimization strategy for improving performance in machine learning. It reduces data types from 32-bit and 16-bit to lower-bit formats, such as 8-bit, 6-bit, and 4-bit, in both integer and floating- point data types. However, the current RISC-V instruction set lacks support for floating-point formats below 8 bits, includ- ing FP8, 6-bit, 4-bit, and other sub-byte precision formats. This research proposes extending the RISC-V ISA to support sub-FP8 operations. We design custom instructions that enable RISC-V CPUs to execute sub-FP8 computations directly, enhancing machine learning workloads’ performance and energy efficiency. By integrating with AI compiler frameworks such as TVM and MLC LLM.
    • Hardware-aware Graph Partition for Mobile Inference Acceleration
    Modern smartphones integrate various specialized hardware units, such as CPUs, GPUs, DSPs, and accelerators, each tailored to specific computational tasks. Many mobile systems provide native neural network APIs to leverage these hardware units efficiently. However, combining compiler-aided optimizations with NN API backends can further enhance performance. This paper introduces a generic approach for hardware-aware graph partitioning to accelerate mobile inference. We profile the operators used in the model on supported hardware and then use the profiling data to create a cost model. Based on this cost model, we apply a graph partitioning strategy to maximize computation graph performance on mobile devices.
    • Apply LLM on compiler optimization for embedded devices.
    Large Language Models (LLMs) have been successfully applied in various fields, such as natural language processing, programming language development, and unstructured data organization and analysis. However, their application in the field of compiler optimization is still limited. This research aims to leverage LLMs to enhance and automate compiler optimization for embedded devices, focusing on key performance metrics such as code size, execution time, hardware utilization, and memory usage. Our goal is to use LLMs to identify the optimal combinations of LLVM compiler optimizations for embedded devices. LLVM optimizations can be categorized into target-independent IR optimizations and target-dependent instruction-level optimizations. LLVM opt provides a series of optimization functions that can be executed repeatedly or combined with other optimizations. This research will address the challenge of finding the best optimization strategies for different performance metrics among the numerous possible combinations. By leveraging open LLM models, we aim to identify the most effective optimization strategies tailored to different hardware and optimization combinations, ultimately enhancing the performance of embedded devices.