Welcome to My Blog

How to train a Conv2D network that at least converges… and does something.

If you have not seen XKCD’s comic on an intuitive summary of machine learning training, this is a good summary: https://xkcd.com/1838/

So I have been playing a bit more with neural networks latestly, specificily, ConvNet.  I read many many meme and non technical blog posts but finally, time to get my hands dirty and do something with all these cool toys and checkout the raging hype.

TensorFlow API was a bit painful to drink from similar to a firehose and keras were much nicer to beginners.

Some very interesting patterns have been emerging as I played with various settings to build a network to presence detect an artifical orientation marker in a 500 by 500 images.

Here is a quick reflection and thought dump of all the steps that made my training slightly more successful, eventually reaching 97% accuracy for out of sample detection (with about 6000 images with markers and 3000 images without markers, all augmented.). Nothing seriously cool but these hands on experience definitely made me learn a lot more about what I am doing…

  • GPU memory is the most relevant limitation, training time is ususlaly not for such simple network (data annotations can be another important limitation, more on this later). Most of my networks (no more than 4 Conv2d, 2 Dense) converged by 1hs into the training on a 4GB GTX 970. The most successful network with minimal stride and deeper took about 1.5h to reach 95% accuracy and 3h to reach 97%. At the beginning, I was trying to wait for HOURS upon HOURS overnight hoping the network that looked like this would improve: Let me save you some times, don’t bother  with any amount of waiting if you are not seeing any improvement 15 minutes into it. This MIGHt change if you are trying to produce a production level 99.9999999% accuracy network to perform best but you can always swap out the algorithm after everything else is mostly fixed. 2018-10-09_T233915&R&R&R.pngSimilarly, you want to see this in the first few epoch (in my case, that is about 5 minutes into the training). Note how it is immediately decreasing substantially in the first few epochs, different from previous image. Accuracy would go up.  slack-imgs.com
  • If you can afford it in terms of GPU memory, stride is best to kept low (1×1) and use more Conv2D layers as their combination across layers can make up for the lack of receptive area. This in the end training a 3 layer vs 5 layers with everything else kept the same resulted in the necessary boost from 80% accuracy to 95+%.
  • Stride should be large enough for the network structure and object. I had no clue what I was doing so my batch was in hindsight probably bigger than it should be. Either way, what ends up happen was the stride was too low (1×1 for three layers) and not enough unique data from the images are being sampled. By simply INCREASING the stride to 2×2 across all three layers and keeping everything else the same, it drasticly improved my performance from 55% to 75%. This MIGHT be unique to my situation as the marker I am trying to detect was sized differently in the augmented input from 30, 50, 100, 200, 300px in a 500 px images.  Obviously when the receptive field is too small, it is going to be hard to recognize images. You can probably afford to increase the stride a bit in the first Conv2D layer facing image input.  I found the BEST EVER illustration of stride, padding etc from here: https://github.com/vdumoulin/conv_arithmetic
  • Try Adadelta, adagrad as well as ADAM. In my case, adadelta was doing best, well illustrated by Sebastian Ruder’s blog post here: http://ruder.io/optimizing-gradient-descent/index.html#challenges. I had no clue what it is about to be honest, but the images looked pretty enough to convince me to try. Also VERY well illutrated here: http://www.robertsdionne.com/bouncingball/. A good speedy convergence should be like this. I believe this was left on overnight, this is trained obviously way longer than necessary.  Around 2h into it the peak in val_acc is good enough mostly. However, do notice the rather sharp convergence, which I BELIEVE is contributing to Adadelta but have not fully tested across everything. slack-imgs.com.png
  • In the beginning, when you are experimenting with architecture, you would rather OVERFIT than underfit. This is because overfit is a sign that at least your network is probably LOOKING at the right thing  in its receptive field: https://fomoro.com/tools/receptive-fields/. I had this problem early when it is absolutely not doing anything at all (see above). In hindside, overfit is a luxury and can be easily fixed with augmented data and dropout etc usually if your data source is abundant. This type of loss patterns are clear illustration of overfitting. Dip than rise never ending…  2018-10-10_T102137&R&R&R.png
  • Pick a right question to ask the neural network. For this project, the question is very straight forward: given a 500 by 500 images, can you tell me if my object is IN it or not. Since this is a artifical marker, I can generated millions of images with the augmentation appproaches. We were also trying to ask a network to regress the orientation w p r and that was a insanely hard question to first tackle……
  • LeakyRELU seems… to have improved the perforamnce. I am not 100% certain on this but I used it early and it seems to have no major downsides. I am using alpha of around 0.1. Definitely stay away from the other few beside Relu unless you have very clear reason to use.
  • Kera’s flow_from_directory and ImageGenerator class is GODSEND. I wish I had known about it earlier before I wrapping ImageAug python package extensively to do my own data augmentation. Literally just point that at a directory and fire and forget as long as you have data in the folder. It even does image resize which makes my job much easier as I standardarized my input images to 500 by 500.
  • This one is quite intuittive in hindsight but caught me off guard way too long… basicly: in conjunction with earlier point about receptive fields, IF you change your input size (e.g. do input size vs batch trade off), it will clearly change your neural network performance. So the same network architecture will perform differently on images of 500 by 500 vs 250 by 250 vs 50 by 50… My general intuition is that larger receptive field in relation to the input is better. This can be achieve either bigger stride or deeper netowrks, ideally, the latter.
  • Accuracy can be deceiving. This is also another huge lesson brutally taught to me by my data: Class imbalance can hugely bias accuracy so make sure it is well balanced.
Posted in Uncategorized

Massive Rant about Google Drive

Holy XXXX… You know they say that a backup is not a backup until you test it, today I had to rely on Google Drive to recover accidentally deleted files and that went bad….

So, I accidentally deleted some files online, no biggie, checked trash… not there… WTF?

then check history… and its gone. This issue aside (luckily, I didn’t lose much… but I will never ever expect to recovery from Google Drive again. ). Anyway, their instruction is to use SEARCh to find these hidden elf files roaming somewhere in the ether. This is when I ALSO realized Google Drive search does not support regular expression and has some epic quirky bugs.

Here are files taken screenshot with time stamped file name:

2018-09-29 8.07.58 PM:

 2018-09-29 8.07.58 PM.png

2018-09-29 8.08.07 PM:

 2018-09-29 8.08.07 PM.png

2018-09-29 8.08.13 PM:

 2018-09-29 8.08.13 PM.png

So as you can see… this makes zero sense. If you take away RE power from user, at least do a freaking competent, fool proof job doing searches (btw, this is Google… just so you know, very ironic for what they do tbh in one of their core product…).

Like, I am not even sure how this types of bugs exist,  I am not talking about special symbols here, a search with basic alpha numerical string (that is at the BEGINNING) of the file names… does not work. What.The.Hell.

Yeah, you get this… when you press enter, it still fail to find the file yet I assure you those files clearly exist because they are the new files which I just uploaded …

 2018-09-29 8.16.38 PM.png


After thoughts:

I think… most likely something went wrong during the indexing process or asynchronizing of the indexing that cause this odd bug but still, why part but not all of the file name? I am going to check again later to see if this bug still persist. Maybe it only exists for recent files like this one updated in like (19:45) and didn’t get indexed properly in 20 minutes. HOWEVER, this still doesn’t fully explain the issue that the search-as-you-type manage to pick up PART of file name but not all… Very odd.

Bottomline, this is the second time I got burned by Google Drive. I would store cat photos there but work related photos, I gotta get my shxt together and do some seriously hourly backups….

LASTLY: Just to say, I tried a few recovery attempts over the years in Dropbox and they went better than this… and I am paying for both….=_=…

Posted in Uncategorized

Using Python to establish a connection through Proxy/transport/intermediate server/(something in the middle) to your FinalDestination server

Quick and yet absolutely disgustingly insecure way to establish a password based authenticated connection to a server through proxy (aka jumphost/intermediate server/proxy).  Hopefully someone will find this useful. Blindly trust server may invoke  armageddon… Source code inspired by https://stackoverflow.com/questions/21609443/paramiko-proxycommand-fails-to-setup-socket

More on how to use the client object you get to do things like transport: https://stackoverflow.com/questions/3635131/paramikos-sshclient-with-sftp

import paramiko
def getSSHClient(proxy_ip, proxy_login, proxy_pw):
    Instantiate, setup and return a straight forward proxy SSH client
    :param proxy_ip:
    :param proxy_login:
    :param proxy_pw:
    client = paramiko.SSHClient()
    client.connect(proxy_ip, 22, username=proxy_login, password=proxy_pw)
    return client

def getProxySSHClient(proxy_ip, proxy_login, proxy_pw, destination_ip, destination_login, destination_pw):
    Establish a SSH client through the proxy.
    :param proxy_ip:
    :param proxy_login:
    :param proxy_pw:
    :param destination_ip:
    :param destination_login:
    :param destination_pw:
    proxy = getSSHClient(proxy_ip, proxy_login, proxy_pw)
    transport = proxy.get_transport()
    dest_addr = (destination_ip, 22)
    local_addr = ('', 10022)
    proxy_transport = transport.open_channel('direct-tcpip', dest_addr, local_addr)

    client = paramiko.SSHClient()
    client.connect(destination_ip, 22, username=destination_login, password=destination_pw, sock=proxy_transport)
    return client


Posted in Uncategorized

Building MincToolkit for CentOS7 FUN

You probably want to build OpenBLAS.

Also make sure to sudo yum install hdf5, gsl, itk, netcdf, pcre, zlib, openblas-devl (still end up having to make my own),  etc…. that seems to have helped getting around the aforementioned empty string hash issue.

CCmake3 and Cmake3 also seems to have helped.

I recall having to install ccache as well.


Overall, it is MUCH easier to build it in Ubuntu, sigh.

Posted in Uncategorized

More Compilation Fun with d41d8cd98f00b204e9800998ecf8427e

So, the magical string of: d41d8cd98f00b204e9800998ecf8427e

Is the MD5 hash that CMake (or any other pogram) typically yield when the module is not found and ran an MD5 sum on an empty string apparently…

If there is anywhere you get a DOWNLOAD HASH mismatch and showing actual hash is: d41d8cd98f00b204e9800998ecf8427e, in English, the program is complaining that the resource is not found and a MD5 has check revealed that the hash of an empty string is different than whatever you are expecting… Now that makes a bit more sense… but no where close to solution…

Posted in Uncategorized

DrawEM MIRTK compilation fun

Fun, as summarized in Dwarf Fortress.

So. Been compiling DrawEM module of MIRTK on both CentOS and Ubuntu18. Fun indeed. I will write about compilation fun in general another day as its a source of much fun elsewhere too.

So, if you ever see build errors like:

[ 75%] Building CXX object Packages/DrawEM/src/CMakeFiles/LibDrawEM.dir/BiasCorrection.cc.o
In file included from /home/dyt811/MIRTK/Packages/DrawEM/src/BiasCorrection.cc:21:0:
/home/WindStalker/MIRTK/Packages/DrawEM/include/mirtk/BiasCorrection.h:29:34: fatal error: mirtk/Transformation.h: No such file or directory
#include "mirtk/Transformation.h"
compilation terminated.
Linking CXX executable ../../../lib/tools/calculate-gradients
Linking CXX executable ../../../lib/tools/measure-volume
Linking CXX executable ../../../lib/tools/change-label
make[2]: *** [Packages/DrawEM/src/CMakeFiles/LibDrawEM.dir/BiasCorrection.cc.o] Error 1
make[1]: *** [Packages/DrawEM/src/CMakeFiles/LibDrawEM.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 75%] Built target change-label
[ 75%] Built target calculate-gradients
[ 75%] Built target measure-volume
Linking CXX executable ../../../lib/tools/padding
Linking CXX executable ../../../lib/tools/calculate-filtering
[ 75%] Built target padding
[ 75%] Built target calculate-filtering
Linking CXX executable ../../lib/tools/aggregate-images
[ 75%] Built target aggregate-images
/home/WindStalker/MIRTK/Applications/src/average-images.cc: In function ‘int main(int, char**)’:
/home/WindStalker/MIRTK/Applications/src/average-images.cc:876:5: error: ‘imdof_name’ was not declared in this scope
imdof_name .insert(imdof_name.end(), imdofs .begin(), imdofs .end());
/home/WindStalker/MIRTK/Applications/src/average-images.cc:877:5: error: ‘imdof_invert’ was not declared in this scope
imdof_invert.insert(imdof_invert.end(), invdofs.begin(), invdofs.end());
make[2]: *** [Applications/src/CMakeFiles/average-images.dir/average-images.cc.o] Error 1
make[1]: *** [Applications/src/CMakeFiles/average-images.dir/all] Error 2
make: *** [all] Error 2

This is because DrawEM during compilation expect home directory to be named “mirtk” instead of “MIRTK” which is what git clone generated from main GitHub repo… so rename to “mirtk” and regenerate CMAKE cache will fix it… Yup. it is a lot less scarier than it looked….

How I found out? While compiling on both CentOS and Ubuntu ran into this same issue…

Posted in Uncategorized

SmartThings, Google Home and WeMO mingle words

So this is not the start of a joke, but a super annoying problem that took me too long to resolve.

I call Google Assistant, “Turn off the bedroom”, and google assistant says he has no clue what I am talking about… EVEN though I set bedroom up in both SmartThings, and Google Home.  A few trial and errors later, I realized, that you can either bypass that by saying

  1. “Turn off everything in the bedroom”
  2. “Turn off bedroomS”
  3. rename things away from “Bedroom light” “Bedroom power” etc.

The root cause is because Google is currently not able to tell when singular, “Bedroom” refer to the room, and not the part of the name of the devices name. So hence why it says cannot tell which device you want to modify.

Another thing I ran into is WeMo has its own name when Linked to Google Home. Then, WeMo is also linked to SmartThing which is linked to GoogleHome which cause device multiplication and confusion, and same issue above. To resolve this, what I did was 1) make sure WeMO use some bizarre name like the A0F from the manufacturer’s unique ID. This ensure the device is referred by different names in SmartThing and Wemo and Google Home use the name I assigned in SmartThing despite its direct loading Wemo nonsensical name that no one will call it.

In the end I removed all rooms from SmartThings and only used Google Home to define rooms. Most automation things are handled by SmartThing routines but still, many bugs and I barely have time to figure out. Also, Google keep mishearing “Turn on” with “Turn off” vice versa. Very very annoying.  Oh and also SmartThing on iPhone phone presence never seemed to work for my wife, at, al. Damn it.

Overall, I would say home automation is still quirky as fffff. I do like the fact that thanks to Wemo’s mini plugs and the sort, I do not have to 1) hire an electrician, 2) connect a neutral from the nearest wall sockets 3) drill holes in the ceiling and connect to the switch etc.  JUST so that I can install GE ZWave compatible plugs. That … was ffff annoying. Still need electricians for 2 way switches though.

I shutter at the thought or replacing a lock with smart locks.

Posted in Uncategorized

Swarm, Driving, Traffic Signals and Crowd Sourced Data Processing

Funny thing I noticed recently.

I stopped paying as much attention to the surrounding when I drive as when I first started.

Not sure if you have noticed similar trend. When first start learning to drive, newbie drivers like me tend to be super stressed out because of the CRAZY amount of things I need to pay attention to: blind spots, car on the left, car on the right, upcoming intersections, is left turn forbidden here, what is that elderly pediatrician thinking planning to cross on red? etc etc etc…. Many many things. Experienced drivers check much fewer things: left turn: blank spot, biker, oncoming traffic, done. Outta here. In particular, I find interesting is the scenarios when it comes to  red light. Most people probably have the unfortunate experience of accidentally running the redlight a few times in their lives. From the few incidences I have seen, it is extremely rare for people to run red light when there are cars stopped at the intersection especially in the opposite direction or same direction, even if those cars are not in the same lane. On the other hand, people run red light usually when there is no car stopped at the red light lane. This observation got me thinking, maybe, it is not that we are paying attention to the light but more to the cars. In another way to look at this, perhaps we are not really following the signs/rules as diligently as we should have when we were beginners but instead, relying on other drivers’ reaction to the surrounding to gauge how we should behave.

Another example, beginner driver like me tend to stress over the speed I am driving at… constantly struggling between not exceeding speed limits too much but also at the same time ensure keep about similar pace as traffic surrounding them. That is especially stressful during late night driving when certain roads and some crowd tend to drive far far exceeding the speed limit but as a group. Eg. STM mignight buses.

Speaking from personal experience I think driving has become a balanced experience where while in the stream, I tend to follow how folks around me are driving and paying attention to very specific contextual details that are specific to me but relying on the other motorists to notice issues such as, oh they slowed down, I probably should slow down too, even if I am not in the same line. Herd instinct perhaps? While in the areas specific to my destination such as local streets etc, I have to be much more diligent.

Maybe the car of the future will be far more aware of each other’s presence and require relatively little onboard processing power but rely on the swarm of them to process the large amount of information required on the overall traversal goals. It would certainly be an interesting time.

Posted in Uncategorized

PyCharm, Terminal and Environmental Variables

Banged my head on this way too long before I realize: when you add new path to environmental variables, you probably want to restart pycharm to ensure the terminal is reloading the new path info and can now resolve whatever new resources being added to the path. =_=+ embarrassingly long.

Posted in Uncategorized


So yeah, another thing I noticed is that MANTIS’ skull stripping algorithm actually cannot have space in any of the path or else… it will crash. So.Typical.Windows.Bug….

Posted in Uncategorized