

Guohui Wang (王国晖)

I am working on augmented reality projects at Bytedance Inc. Previously, I was an engineering manager at Snapchat working on computer vision and augmented reality. Before joining Snapchat, I was a senior GPGPU engineer in graphics team at Qualcomm. I am interested in augmented reality, GPGPU, mobile computing, computer vision, computer architecture, and signal processing.
I got my Ph.D. degree in electrical and computer engineering from Rice University, Houston, Texas. Before coming to the U.S., I got a M.S. degree in CS from Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, and a B.S. degree in EE from Peking University, Beijing, China.
Software
- OpenCL-Z Android: OpenCL information utility for Android.
- ezSIFT: an easy-to-use standalone open-source SIFT library.
- Software implementation of the scale-invariant feature transform (SIFT) algorithm;
- Written in C/C++;
- Doesn't require other 3rd-party packages;
- Download source code: https://github.com/robertwgh/ezSIFT.
- Tool: Android native program launcher.
- A tool allowing developers to launch native program on un-rooted devices without connecting the devices to an ADB shell with USB cable.
- Download Android Native Program Launcher from https://github.com/robertwgh/AndroidNativeLauncher..
- cuLDPC: CUDA implementation of LDPC decoding algorithm
- This is a highly efficient CUDA implementation of layered LDPC decoding algorithm. Our SASP2011, Asilomar2011, GlobalSIP2013 papers are based on this code. (Please note that the code used in the papers might not be exactly the same; but it should not be hard to modify this code to reproduce the results in the paper. We do believe that there are still a lot of room to further improve the performance given new GPU programming techniques and new hardware features during the past few years.)
- Download source code: https://github.com/robertwgh/cuLDPC.
Technical Notes
- Tutorial - part 1: Using OpenCV Nonfree Module (SIFT, SURF) in Android NDK Projects
- Demonstrate how to build OpenCV Nonfree feature module for Android and use it in an Android NDK project.
- Webpage: http://web.guohuiwang.com/technical-notes/sift_surf_opencv_android.
- Tutorial - part 2: Use OpenCV Nonfree module (SIFT, SURF) in Android Applications via JNI
- Demonstrate how to use OpenCV Nonfree feature module in an Android application with JNI.
- Webpage: http://web.guohuiwang.com/technical-notes/opencv_nonfree_android_jni_demo.
- Mastering Android NDK Build System - Part 1: Techniques with ndk-build
- Advanced techniques of using Android NDK building system.
- Webpage: http://web.guohuiwang.com/technical-notes/androidndk1.
- Mastering Android NDK Build System - Part 2: Standalone toolchain
- Tutorial and examples of using NDK standalone toolchain.
- Webpage: http://web.guohuiwang.com/technical-notes/androidndk2.
Publications
-
[Book chapter]
- Embedded Systems Networking: Applications, case studies, and technologies, Elsevier. In preparation.
-
High-Level Design Tools for Complex DSP Applications, DSP for Embedded and Real-Time Systems: Expert Guide, Elsevier, 2012. ISBN-13: 9780123865359. Link at Amazon.com.
Yang Sun, Guohui Wang, Bei Yin, Joseph R. Cavallaro, and Tai Ly
[Talks]
-
Massively Parallel Signal Processing for Wireless Communication Systems
Michael Wu and Guohui Wang GPU Technology Conference (GTC) 2013. March 18-21, 2013, San Jose, California. [Link to slides and video]
[Journal Papers]
-
Parallel Interleaver Design for a High Throughput HSPA+/LTE Multi-Standard Turbo Decoder
Guohui Wang, Hao Shen, Yang Sun, Joseph R. Cavallaro, Aida Vosoughi, and Yuanbin Guo
IEEE Transactions on Circuits and Systems I - Regular Papers (TCAS-I), 2014. (Invited)
Preprint version available at arXiv. http://arxiv.org/abs/1403.3759 IEEEExplore version: download link -
Computer Vision Accelerators for Mobile Systems based on OpenCL GPGPU Co-Processing
Guohui Wang, Yingen Xiong, Jay Yun, and Joseph R. Cavallaro
Journal of Signal Processing Systems (JSPS), 2014. (Invited)
Preprint version available at arXiv. http://arxiv.org/abs/1403.4238 Final version (Springer website): download link -
Large-Scale MIMO Detection for 3GPP LTE: Algorithm and FPGA Implementation
Michael Wu, Bei Yin, Guohui Wang, Chris Dick, Joseph R. Cavallaro, and Christoph Studer
IEEE Journal of Selected Topics in Signal Processing (JSTSP), 2014.
Preprint version available at arXiv. http://arxiv.org/abs/1403.5711 IEEEExplore version: download link -
GPU Acceleration of a Configurable N-Way MIMO Detector for Wireless Systems
Michael Wu, Bei Yin, Guohui Wang, Christoph Studer, and Joseph R. Cavallaro
Journal of Signal Processing Systems (JSPS), 2014. (Invited)
-
Implementation of a High Throughput 3GPP Turbo Decoder on GPU
Michael Wu, Yang Sun, Guohui Wang, and Joseph R. Cavallaro
Journal of Signal Processing Systems (JSPS), 2011.
-
A Novel Design Of the High Speed Buffer and Video/audio Synchronization in High Resolution Digital Cinema System
Guohui Wang, Zhenhua Zhu, Ke Zhang, Zhensong Wang
High Technology Letters (In Chinese), Vol.9, 2008.
-
Parallel VLSI Architecture for 3GPP LTE/LTE-Advanced Turbo Decoder
Yang Sun, Guohui Wang, and Joseph R. Cavallaro
submitted to IEEE Transactions on Signal Processing. (Under review)
[Conference Papers]
-
Revisiting Image Aesthetic Assessment via Self-Supervised Feature Learning
Kekai Sheng, Weiming Dong, Menglei Chai, Guohui Wang, Peng Zhou, Feiyue Huang, Bao-Gang Hu, Chongyang Ma, and Rongrong Ji
AAAI, 2020. -
Practical Urban Localization for Mobile AR
Tiantu Xu, Guohui Wang, and Felix Xiaozhu Lin
HotMobile 2020. -
Gbit/s Non-Binary LDPC Decoders: Throughput vs Energy Tradeoffs using High-Level Specifications
Oscar Ferraz, Srinivasan Subramaniyan, Guohui Wang, Joseph Cavallaro, Gabriel Falcao and Madhura Purnaprajna
The 28th IEEE International Symposium On Field-Programmable Custom Computing Machines (FCCM), May 2020. -
OpenCL-Based Mobile GPGPU Benchmarking: Methods and Challenges
Rotem Aviv, and Guohui Wang
International Workshop on OpenCL (IWOCL), April, 2016.
-
On the Performance of LDPC and Turbo Decoder Architectures with Unreliable Memories
Joao Andrade, Aida Vosoughi, Guohui Wang, Georgios Karakonstantis, Andreas Burg, Gabriel Falcao, Vitor Silva, and Joseph R. Cavallaro
48th IEEE Asilomar Conference on Signals, Systems, and Computers (ASILOMAR), November 2014.
-
A High Performance GPU-based Software-defined Basestation
Kaipeng Li, Michael Wu, Guohui Wang, and Joseph R. Cavallaro
48th IEEE Asilomar Conference on Signals, Systems, and Computers (ASILOMAR), November 2014.
-
Efficient Architecture Mapping of FFT/IFFT for Cognitive Radio Networks
Guohui Wang, Bei Yin, Inkeun Cho, Joseph R. Cavallaro, Shuvra Bhattacharyy, and Jorma Takala
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014.
-
A 3.8 Gb/s Large-scale MIMO Detector for 3GPP LTE-Advanced
Bei Yin, Michael Wu, Guohui Wang, Chris Dick, Joseph R. Cavallaro, and Christoph Studer
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014.
-
High Throughput Low Latency LDPC Decoding on GPU for SDR Systems
Guohui Wang, Michael Wu, Bei Yin, and Joseph R. Cavallaro
1st IEEE Global Conference on Signal and Information Processing (GlobalSIP), December 2013.
Reference code
-
Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone
Guohui Wang, Blaine Rister, and Joseph R. Cavallaro
1st IEEE Global Conference on Signal and Information Processing (GlobalSIP), December 2013.
-
HSPA+/LTE-A Turbo Decoder on GPU and Multicore CPU
Michael Wu, Guohui Wang, Bei Yin, Christoph Studer, and Joseph R. Cavallaro
47th IEEE Asilomar Conference on Signals, Systems, and Computers (ASILOMAR), November 2013.
-
Highly Scalable On-the-Fly Interleaved Address Generation for UMTS/HSPA+ Parallel Turbo Decoder
Aida Vosoughi, Guohui Wang, Hao Shen, Joseph R. Cavallaro, and Yuanbin Guo
24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), June 2013.
-
Accelerating Computer Vision Algorithms Using OpenCL Framework on the Mobile GPU - A Case Study
Guohui Wang, Yingen Xiong, Jay Yun, and Joseph R. Cavallaro
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2013.
-
A Fast and Efficient SIFT Detector using the Mobile GPU
Blaine Rister, Guohui Wang, Michael Wu and Joseph R. Cavallaro
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2013.
-
Parallel Interleaver Architecture with New Scheduling Scheme for High Throughput Configurable Turbo Decoder (Finalist, best student paper award)
Guohui Wang, Aida Vosoughi, Hao Shen, Joseph R. Cavallaro, and Yuanbin Guo
IEEE International Symposium on Circuits and Systems (ISCAS), May 2013.
-
Parallel Nonbinary LDPC Decoding on GPU
Guohui Wang, Hao Shen, Bei Yin, Michael Wu, Yang Sun, and Joseph R. Cavallaro
46th IEEE Asilomar Conference on Signals, Systems, and Computers (ASILOMAR), November 2012.
-
Low Complexity Opportunistic Decoder for Network Coding
Bei Yin, Michael Wu, Guohui Wang, and Joseph R. Cavallaro
46th IEEE Asilomar Conference on Signals, Systems, and Computers (ASILOMAR), November 2012.
-
GPGPU Accelerated Scalable Parallel Decoding of LDPC Codes
Guohui Wang, Michael Wu, Yang Sun, and Joseph R. Cavallaro
45th IEEE Asilomar Conference on Signals, Systems, and Computers (ASILOMAR), November 2011.
-
High-throughput Contention-Free concurrent interleaver architecture for multi-standard turbo decoder
Guohui Wang, Yang Sun, Joseph R. Cavallaro and Yuanbin Guo
22nd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), September 2011.
-
A Massively Parallel Implementation of QC-LDPC Decoder on GPU
Guohui Wang, Michael Wu, Yang Sun, and Joseph R. Cavallaro
9th IEEE Symposium on Application Specific Processor (SASP), June 2011.
-
Multi-Layer Parallel Decoding Algorithm and VLSI Architecture for Quasi-Cyclic LDPC Codes
Yang Sun, Guohui Wang, and Joseph R. Cavallaro
IEEE International Symposium on Circuits and Systems (ISCAS), May 2011.
-
FPGA Prototyping of A High Data Rate LTE Uplink Baseband Receiver
Guohui Wang, Bei Yin, Kiarash Amiri, Yang Sun, Michael Wu, and Joseph R. Cavallaro
43rd IEEE Asilomar Conference on Signals, Systems and Computers (ASILOMAR), November 2009.
[Posters]
-
Parallel Interleaver Design for High Throughput Configurable Turbo Decoder
2nd place winner best graduate student poster.
Annual Rice University ECE Affiliates Day Conference, April 2013.
-
Parallel Interleaver Design for High Throughput Configurable Turbo Decoder
IEEE Texas Workshop on Integrated System Exploration (TexasWISE), March 2013.
-
Low Energy Fast SIFT Detector on Heterogeneous Mobile Processors
IEEE Texas Workshop on Integrated System Exploration (TexasWISE), March 2013.
[Patents]
-
6 US patent applications on computer vision and deep learning technologies.
U.S. Patent Application. Filed by Snap Inc., 2017-2018.
-
System and Method for a Turbo Decoder with Parallel Interleaver
U.S. Patent Application. Filed by Huawei, October 2012.
-
System and Method for Turbo Code Interleaved Address Generation
U.S. Patent Application. Filed by Huawei, October 2012.
-
System and Method for Contention-Free Memory Access
U.S. Patent US8621160 B2. Filed by Huawei, December 2011. Granted, December 2013.
(Also published as CN103262425A, WO2012079543A1)
-
The Method, System and Device to Implement Video/audio Synchronization
China Patent ZL200710120585.0. Filed in August 2007. Granted, September 2012. -
A Fast and High Performance Zooming Method for Multimedia Video
China Patent ZL200710178188.9. Filed, 2007. Granted, October 2011.
-
A copyright protection method and system for audio and video contents in digital cinema
China Patent ZL200810114749.3. Filed in 2008. Granted, March 2010. -
A method of Watermark Generation and Detection for digital cinema Copyright Protection
China Patent ZL200810103472.4. Filed in 2008. Granted, September 2010.
[Thesis]
- Ph.D thesis: "Design Space Exploration of Parallel Algorithms and Architectures for Wireless Communication and Mobile Computing Systems", Rice University, Houston, Texas, May, 2014.
- Master's thesis: "VLSI Architecture for High Definition Digital Cinema Playback System" (Abstract), Chinese Academy of Sciences, Beijing, China, June, 2008. Relative Project: Research and Implementation of DCI-Compliant 2K Digital Cinema Server (Jan.2006-Jan.2008, in ICT,CAS, Beijing, China).
Academic Acativities
- Teaching assitant:
- ELEC 220 Fundamental of Computer Engineering (TA and lab instructor): Fall 2009, 2010, 2011, 2012.
- ELEC 303 Random Signals: Fall 2009
- ELEC 522 Advanced VLSI Design: Fall 2010
- Technical notes: Catapult C Synthesis Work Flow Tutorial, (Tools: Catapult C, Xilinx ISE, System Generator, ModelSim)
- Paper reviewer:
- Journal: JSSC, TSP, TCAS-I, TCAS-II, TPDS, TCSVT, TMM, TAES, TSC, CL, CAL, TECS, JSPS, CPE, JCST, JFCS, CSSP, SIVP, PARCO, JWCN, ADHOC, WINE, WIRE, JSCS, SRE, IJWI
- Conference: ISCAS, ICASSP, ICC, GLOBECOM, GlobalSIP, EUSIPCO, GLSVLSI, ISITA, ASAP, SIPS, VTC, IWCMC, ISIEA
Useful Links
- Friends: Hui Wang, Bei Yin, Xiaozhu Lin, Yang Sun, Kia Amiri, Yang Zhao, Michael Wu
[Back to Top]
[Last Updated: 03/2020]