Life is a game, take it seriously

10 Facts About Human Vision a Computer Vision Scientist Might Not Know

In Computer Vision, Neural Science, Visual Illusion on June 7, 2020 at 3:14 pm

by Li Yang Ku (Gooly)

The one thing that all computer vision scientists can agree on is probably that as of today, human vision is a lot better than computer vision algorithms (in the range of visible lights) on understanding our surrounding world. However, most computer vision scientists don’t usually look into our vision system for inspiration since it is not part of the computer science curriculum. This post is about a few interesting facts about our own vision system that I considered less commonly known among computer vision folks. (plus a few more commonly known facts to make it add up to ten.)

1) You transmit more signal when you don’t see:

You might think that the photoreceptors in our eyes are like light sensors that emit signals when photons hit the sensor. Well its actually the opposite, the photoreceptors in ours eyes depolarizes and releases more neurotransmitter when there are no light.

Visual Signals

2) Stars are smaller then they look:

I would argue that you can’t really see stars, when you look at the starry night sky you are seeing your eye’s “pixel” (the smallest dot in your visual field). This is because stars are too far away and are smaller than your eye’s resolution. Angular diameter is used to measure how large a circle appears in a view, for example, the star that has the largest angular diameter when viewed on earth is R Doradus, which has an angular diameter of 0.06 arc second (or 1.66667e-5 degree), but our eyes can tell at most 28 arc seconds apart. Because the light emitted by stars are very strong, even though it’s light only hit a small portion of our photoreceptor, it is enough to cause that single neuron to polarize.

Stars are smaller then they look

(Note that because of the earth atmosphere and imperfection of human eyes, when the star light hit your photoreceptor it will already be blurred and can be larger then 28 arc seconds if the star is bright, in this case brighter stars may appear larger then others) (relevant link) (relevant link 2)

3) Visual illusions help survival:

Visual illusions aren’t just your vision system malfunctioning or some left over trait from our ancestors, it actually is crucial for our survival. What we see when we open our eyes aren’t the raw information we get from our photoreceptors, it’s actually heavily post processed information. Visual illusions are merely results of these post processing when the input is not something human normally encounter in nature. For example, if you look at the Kanizsa’s triangle shown below, you tend to see an upside down triangle even though there are no contours of one. This visual illusion is easy to notice in this image, but the same functionality is actually happening every moment you see. This is the reason you can easily identify different objects overlapping in your visual field. You might think you separate objects because of color or brightness, but if you actually take a digital picture and look at the pixel values, it is not always obvious where the contour is. If it is that easy, segmentation would be a solved computer vision problem already. (See my previous post on other visual illusions)

Kanizsa's Triangle

4) Some people can see more colors:

When I was a kid, one of my dreams was to discover a new kind of color. When I grew older I realized it was impossible since we can visualize all the colors in the visible light spectrum and no new color is left to discover. But I was actually wrong, because color isn’t measurable externally because it is an internal representation in our brain. So my childhood dream shouldn’t be to “discover” a new kind of color but to “sense” a new kind of color instead. So the remaining question is whether it is possible to sense a new kind of color.

People often disagree about colors, that’s because we all see colors a little bit differently. We typically have 3 different kinds of color sensors in our eyes that we call cones. These cones response to lights of different wavelengths and we associate these wavelengths to the colors we call red, green, and blue. If a light’s wavelength lies in between two of the cone types’ response range, both will fire and we see a different color. Your cones’ response range are slightly different than mine, therefore our representation of color would also be slightly different.

Some people can see more colors

Studies show that a percentage of human (one study says 15% of women) have a fourth type of cone that responses to lights with bandwidth between green and red lights. This means that colors are actually sensed very differently by these people. These people with four cone types may not realize they are sensing differently because color is an internal representation that cannot be compared. These people may be seeing a new color normal people can’t see and getting responses like “Oh, thats just a different shade of green”, while in fact they are having a totally different experience.

(Note that since our screens that fuses red green and blue lights to simulate other color lights are designed only for people with the red, green, and blue cones. These people with four cone types would probably found the color of display to be different from the real object.) (relevant link)

5) Cones are not evenly distributed

You might expect the color photoreceptor (cones) in your eyes to be evenly distributed on your retina, but thats not true. You can find large areas in your eye with mostly one type of cones (link). Would this be a problem? It shouldn’t be once your brain post processed it and fill in all the missing color. To demonstrate your brain’s color filling ability, the following image is actually a gray scale image with colored grid lines. You will notice your brain fills in the missing color if you look at it from a distance.

Your brain fills in colors

Image Source: https://www.patreon.com/posts/color-grid-28734535

6) The photoreceptors are located close to the last layer in your eye

If I am to design a digital camera I will probably put my light sensors facing towards the lens and have the wires connected on the other side so that it wouldn’t block the light source. This is however not how your eyes are designed. When lights go through your eye lens, it has to first go pass ganglion cells and their axions that transmit all the visual information to your brain, then another four layers that contain different neurons before hitting the photoreceptors that response to light. Luckily the five layers light has to pass through are mostly transparent, but still this seems to be a less optimal design.

To understand the reason our eyes have this kind of structure we might have to look at the early eyes that first appeared on earth. The following sequence of images shows the evolution of eyes, the first version is just some photoreceptor on the skin. A cavity gradually formed because it creates a pin hole camera effect that gives more information of the outer world, which really helps if your are trying to eat a prey or avoid becoming a prey. After millions years of evolution, the cavity closed and the lens is formed to provide the ability to focus. Since in the early designs these photoreceptors were flat, it might make sense that it was not located at the outer most layer so that it doesn’t get damaged easily. (It could also be just due to how it was wired originally, but it is very likely a design due to evolution.)

The evolution of eye

Image source: https://www.pnas.org/content/104/suppl_1/8567.figures-only

7) Car dash board colors are not designed to match style

Your car’s dash board may have colored backlight during night time, it may look cool but the color choice was suppose to keep you safe not to match your style. However, different car brands use different colors because designers can’t agree on what color is safer.

Why car dashboard light have different colors

There are two types of photoreceptors in our eyes, the cones that detects colors which we described earlier, and the rods that doesn’t provide color information but are sensitive to brightness changes. When it’s dark we are mostly just using rods, therefore we normally don’t see much color during night. Although the rods don’t provide any color information, they do prefer lights with bandwidths close to blue and green lights. Therefore, one argument is that having a dim blue or green dash board light can take advantage of the sensitivity of the rods so your dash board would be more visible during night time.

The other camp however suggests using bright red dash lights, the argument is that instead of having the rods do all the jobs why not let the cones detect the dash board light. Since rods are not sensitive to red, the bright red color wouldn’t effect cone’s night vision. Both argument sounds reasonable, I guess the take away is that if you prefer a dim light use green or blue, but if you prefer a brighter dash board use red.

8) You cannot see what you did not learn to see

Seeing the world around you happens so naturally it is hard to imagine a person with a normal biological vision system to not see something in front of them. However, this is something that can happen. If you did not experience with vertical lines when you are learning to see, you might not be able to see vertical lines when exposed to a normal world. This is demonstrated in a series of experiments I talked about in my previous post, the short summary is that vision is not something you are born with but something you need to experience in order to acquire.

cat experiment

9) The world becomes less colorful if you stopped moving

Photoreceptors in yours eyes gradually decrease response to light even if the light level doesn’t change. So if you stopped moving (including your eyeballs) in a static world for long enough, the world you see aren’t going to be as colorful. However, since it usually requires a huge effort to not blink and not saccade, this isn’t normally a problem.

The reason to have this mechanism is to be adaptable to different environments. This is similar to the white balance and auto brightness adjustment option on a camera. If you are in a bright room, it’s probably better to be less sensitive to brightness. The side effect of this mechanism is that you see opposite colors if you look at a patch of colors too long. This side effect is actually used to help make Disney’s grass look greener.

Disneyland uses pink walkways to make grass look green

(More details: Photoreceptors that receive photons generates more messengers called cGMP that causes sodium gates to close and photoreceptors to have a higher membrane potential, but closing the gate too long will also cause calcium concentration to drop which leads to the gate reopening again.)

10) Vision regions in the brain can be repurposed for other senses

The current consensus among the neuroscience community is that our neocortex, which handles most of our visual processing and many other intelligent behaviors, mostly have the same structure across our brain.  Studies show that areas normally dedicated to vision is repurposed to tactile or auditory senses among blind people. Because of this, with modern technology it is possible to allow blind people to see again through tactile senses. Brainport is a technology that uses an electrode array placed on the user’s tongue to allow blind people to see through a camera that is connected to this electrode array. The resolution is only 20×20, but the company mentioned that users can’t tell much difference when given a higher resolution.

Helping the blind to see

Another approach to make the blind see again is to use implants on the brain surface that generate electrical stimulations. One example is the Intracortical Visual Prosthesis Project, if done right this approach should be able to provide visual information with higher resolution.

These are 10 facts about human vision, but probably not the 10 most interesting ones. See my post about visual pathways and subscribe to my blog for more interesting discoveries of human vision.

Training a Rap Machine

In AI, brain, deep learning, Machine Learning, Serious Stuffs on January 9, 2020 at 7:15 pm

by Li Yang Ku (Gooly)

(link to the rap machine if you prefer to try it out first)

In my previous post, I gave a short tutorial on how to use the Google AI platform for small garage projects. In this post, I am going to follow up and talk about how I built (or more like an attempt to build) my holiday project, a machine that completes your rap lyrics using the “Transformer” neural network.

Transformer is a neural network model introduced by Google Brain for mostly language related tasks. What is interesting about this architecture is that instead of taking one word at a time, it takes in the whole input sentence at once and learns the relationship between each word. This allows transformers to learn useful relationships such as what a pronoun refers to in a sentence. In the original paper “Attention is All You Need”, this ability to understand relations between words are referred to as attention since it has the ability to focus more on certain pairs of words. I will not go into the details of Transformer since quite a few people already explained it in great length in their blogs (such as this blog and this blog.) My rationale was that Transformer’s ability to learn relationships between words in rap sentences should allow the network to learn what words rhyme well together or have the right flow.

I collected rap lyrics from 14 artists with a total of around 180000 lines of rap. These rap lyrics are further filtered to around 50000 lines of lyrics that I considered to “rhyme” with another line. The first line is the input and the second line is the output. Pairs of these sentences are then split into training and evaluation (90:10 split.) The Transformer architecture I used is mostly based on this Google Cloud Platform tutorial on generating poetry. After a little bit of hyper tuning, I ended up with a Transformer architecture of 2 hidden layers, 4 heads (number of attentions), and 512 hidden layer size. This model is then trained for 75000 iterations. The total cost of experimenting with different models and trial and error cost me a bit more then a hundred bucks (note that you are given $300 to start).

The final result was, well, entertaining but with limited success. Occasionally it will rhyme if your sentence ends with a common word but most of the time its just a weirdo spitting out random sentences with a semi-rapper attitude. The following are a few example results:

Input: This thing I made raps like a machine

Output: It’s all fair in love with me

Input: It generates raps that run deep

Output: That’s what I mean

Input: No rap is free, this thing cost me a fee

Output: I got a whole lotta guap on me

Input: Consider this as a new year gift that raps

Output: Chillin’ on some wildin’ out of that

Input: You might say my input rap is bad

Output: I was gone but now I m back

Input: If you don’t like your output rap

Output: Where the fuck am I going back

You can try out the rap machine here yourself. Thank you all for reading this blog and wish you all an entertaining 2020!

Tool Tutorial: Google AI Platform for Hobbyist

In AI, App, deep learning, Machine Learning, Serious Stuffs on October 27, 2019 at 10:44 pm

by Li Yang Ku (Gooly)

In this post I am going to talk about the Google AI platform (previously called Google ML engine) and how to use it if deep learning is just your after work hobby. I will provide links to other tutorials and details at the end so that you can try it out, but the purpose of this post is to give you a big picture of how it works without having to read through all the marketing phrases targeting company decision makers.

Google AI platform is part of the Google cloud and provides computing power for training and deploying deep networks. So what’s the difference between this platform and any other cloud computing services such as AWS (Amazon Web Services)? Google AI platform is specialized for deep learning and is suppose to simplify the process. If you are using TensorFlow (also developed by Google) with a pretty standard neural network architecture, it should be a breeze to train and deploy your model for online applications. There is no need to set up servers, all you need is a few lines of gcloud commands and your model will be trained and deployed in the cloud. (You also get a $300 dollar first year credit for signing up on Google Cloud Platform, which is quite a lot for home projects.) Note that Google AI platform is not the only shop in town, take a look at Microsoft’s Azure AI if you like to shop around.

So how does it work? First of all, there are four ways to communicate with Google AI platform. You can do it 1) locally: where you have all the code on your computer and communications are made through commands directly, 2) on Google Colab: Colab is another Google project that is basically a Jupyter notebook on the cloud which you can share with others, 3) on the AI platform notebook: which is similar to Colab but have more direct access to the platform and more powerful machines, and 4) on any other cloud server or jupyter notebook like webservice such as FloydHub. The main difference between using Colab versus AI platform notebook is pricing. Colab is free (even with GPU access), but has limitations such as up to 12 hours of run time and shuts down after 90 minutes of idle time. It provides you with about 12GB RAM and 50GB disk space (although the disk is half full when started due to preinstalled packages). After disconnected, you can still reconnect with whatever you wrote in the notebook, but you will lost whatever is in the RAM and disk. For a home project, Colab is probably sufficient, the disk space is not a limitation since we can store training data in google storage. (Note that it is also possible to connect Google drive in Colab so that you don’t need to start from scratch every time.) On the other hand, AI platform notebook could be pricey if you want to keep it running (0.137 / hour and 99.89 / month for a non-gpu machine).

Before we move on, we also have to understand the differences between computation and storage on the Google AI platform. Unlike personal computers where disk space and computation are tightly integrated, they are separated on the cloud. There are machines that are responsible for computation and machines that are responsible for storage. Here, Google AI platform is responsible for the computation while the Google Cloud Storage takes care of the stored data and code. Therefore, before we start using the platform we will need to first create a storage space called bucket. This can be easily done through a one line command once you created a Google Cloud account.

If you are using Colab, you will also need to have the code for training your neural network downloaded to your Colab virtual machine. One common work flow would be to use software version control services such as Github for your code and just clone the files to Colab every time you start. It makes more sense to use Colab if you are collaborating with others or want to share how you train your model, otherwise doing everything locally might be simpler.

So the whole training process looks like this:

  1. Create a Google Cloud Project.
  2. Create a bucket where the Google AI platform can perform computations on.
  3. With a single command, upload your code to the bucket and request the AI platform to perform training.
  4. Can also perform hyper parameter tuning if needed.
  5. If you want the trained model locally, you can simply download it from the bucket through a user interface or command.

A trained model is not very useful if not used. Google AI platform provides an easy way to deploy your model as a service in the cloud. Before continuing, we should clarify some Google terminology. At Google AI platform, a “model” means an interface that solves certain tasks and a trained model is named  a “version” of this “model” (reference). In the following, quotation marks will be put around Google specific terminologies to avoid confusion.

The deployment and prediction process is then the following:

  1. Create a “model” at AI platform.
  2. Create a “version” of the “model” by providing the trained model stored in the bucket.
  3. Make predictions through one of the following approaches:
    • gcloud commands
    • Python interface
    • Java interface
    • REST API
      (the first three methods are just easier ways to generate a REST request)

And that’s all you need to grant your home made web application access to scalable deep learning prediction capability. You can run this whole process I described above through this official tutorial in Colab and more descriptions of this tutorial can be found here. I will be posting follow up posts on building specific applications on Google AI platform, so stay tuned if you are interested.

References: