Gooly – The Serious Computer Vision Blog

The Serious Computer Vision Blog

Author: Gooly

Understanding Stable Diffusion and ControlNet for a Bar Conversation

(By Li Yang Ku) In my last post I talked about how generative diffusion models (such as DALLE, Imagen, and Stable Diffusion) work. I also mentioned that I would talk about specific models and tools like Stable Diffusion and ControlNet. I admit that this second post took a bit longer than I expected, mostly due to my laziness…

December 18, 2023
Generative Diffusion Models: Explain to me like I am 35

(By Li Yang Ku) It’s interesting times to be in the field of Computer Vision. In the past I judge the quality of a Computer Vision publication based on it’s accuracy on benchmarks and the number of citations. Now I also consider how popular it is on Reddit and Youtube. With all the Computer Vision…

May 21, 2023
Vicarious Publications

(By Li Yang Ku) I worked at Vicarious, a robotics AI startup, from mid 2018 till it was acquired by Alphabet in 2022. Vicarious was a startup founded before the deep learning boom and it had been approaching AI through a more neuroscience based graphical model path. Nowadays it is definitely rare for AI startups…

January 22, 2023
Consciousness and Intelligence

(By Li Yang Ku) In the past I’ve always avoided to make comments about consciousness. My view was that due to consciousness being internal to ourselves it is extremely difficult if not impossible to evaluate scientifically. Also, why talk about consciousness when we couldn’t even understand intelligence? However, some recent readings have changed my view…

July 4, 2022
Visual Loop Machine

(By Li Yang Ku) 2023 August Update: Install file for Mac now available on my personal site. Visual Loop Machine is my new side project since the Rap Machine I made that completes rap sentences. It is a tool that plays visual loops generated by StyleGAN2 along music in real-time. One of the reasons I…

April 30, 2022
The Quest to Finding “The” Object Representation for Robot Manipulation

(By Li Yang Ku) For many researchers in the field of Computer Vision, coming up with “the” object representation is a lifetime goal. An object representation is the result of mapping an Image to a feature space such that an agent can recognize or interact with these object. The field came a long way from…

February 6, 2022
BARS 2021 Paper Picks

(By Li Yang Ku) I was at the Bay Area Robotics Symposium (BARS) at Stanford in person last week. It’s nice to see real person even though there is a mask mandate (which could be a good thing since the audience won’t be biased by the speaker’s look.) Faculty talks can be found in the…

November 3, 2021
Transformer for Vision

(By Li Yang Ku) In my previous post I talked about this web app I made that can generate rap lyrics using the transformer network. Transformer is currently the most popular approach for natural language related tasks (I am counting OpenAI’s GPT-3 as a transformer extension.) In this post I am going to talk about…

October 9, 2021
Task and Motion Planning

(By Li Yang Ku) In this post I’ll briefly go through the problem of Task and Motion Planning (TAMP) and talk about some recent works that try to tackle it. One of the main motivation of solving the TAMP problem is to allow robots to solve household tasks like the robot Rosey in the cartoon…

June 1, 2021