CommuAuto and Automobile/Flex in Montreal: A Cost Deep Dive

So, a few friends of mine has been asking about this service which we have used for several years so I figure I would write a detailed blog post about it to show case why how and when I am using service and why it is quite convenient.

So there are two main ways of using CommunAuto, and the first way (As I would like to call the CommuAuto way) is using it as a two way car rental this is very similar to the traditional car rental model where you pick up the car use it and drop it off back where it belongs. This is a very affordable for short-term rental but obviously can’t easily get more expensive versus renting a cheap car for a day (especially during winter promotion time where certain companies offer a $10 per day rental).

The second way is known as Automobile or currently has been rebranded as Flex. this is a much more innovative concept where you pick up the car free parked (or might be even ticked lol) in variety of residential neighborhoods by first checking out available vehicle near your current location, book it, go use it and drop it off at any of the compatible neighborhood similar to the one you drop off but could be anywhere within the green compatible release of where the municipality has reached agreement with the CommunAuto Company.

So far what have been mentioned are all publicly available information and that are fairly easy to understand and digest (despite some really awkwardly complicated pricing structure). See: https://www.communauto.com/en/rates.html

All these are fine and dandy but what I am about to tell you are to really interesting offering that people are probably not being fully aware: which is where AutoMobile Unlimited and ValuePlus Plan pricing “convovle”. See the below for the graphically summary of the “convolved pricing”

Update2: updated corrected video:

There is a hugely interesting facts that people seldomly notice or care: which is listed on this page: http://www.communauto.com/en/promos/unlimited.html#communauto

Specifically, the fine print:

Preferred rates with Auto-mobile vehicles

All subscriptions to Value Plans give access to discounted Auto-mobile vehicles : 35¢/min instead of 40¢ (other conditions remain unchanged: $12/hour or $50/day, 100km included, taxes not included), as well as the parity of the price of your package when it is more advantageous for you.

What does the parity of the price of your package mean?

This means that, at any time, when you use an Auto-mobile vehicle, the rate applied is the most advantageous between:

  • Discounted Auto-mobile Rate: 35¢/minute instead of 40¢ (other conditions remain unchanged)
  • Or, the basic price of your Value, Value Plus or Value Extra package with a minimum of 4 hours charged.

Regardless of the duration of your trip or the number of kilometers traveled, the best price is always granted automatically. You do not have to worry about anything!

This fine print is actually quite annoying to visualize but basically, you can envision Automobile cost as a Step-like Function that ramp to 12$ every hour with some plateau. and CommunAuto cost as a finer Step function that increase based on KM and TIME then use a MINIMUM function to summarize both and get the above visualization.

If one thing you got out of that image, it should tell you that regardless of how you use an automobile, within 3h and 100km limit, your cost will not exceed 35$. My typical experience is around 15$ or so per trip if I use extensive time but not extensive distance < 1h trips (e.g. airport runs, or the FLAT part of the fee diagram above.).

More to come as I add the actual recommendation part instead of just data rambling.

Update 1: 

I forgot to add the function is actually a bit more complicated than that as the above only capture the CommunAuto cost vs Automobile cost but does not reflect the minimum 4 hours + hours rule highlighted in red. 

To better illustrate this and explain it: It is best to show you a real actual bill of the record of travel. 

Okay, noticed the two lines above. The trip on 04-22 was short, using Flex vehicle for just over an hour in total traveling about 23km, it costed $13 while on 04-29th, against, it is a a Flex vehicle but much longer period of time while traveling similar distance (most likely shopping) yet costed just barely a few more bucks. This is a clear illustration that it may NOT be super advantageous once you reached ~15$ threshold ish as that triggers Local Rate converting Flex cost to CommunAuto and the rate stalls. In the above example, a three hours of extra usage costed about $3 more. Guess what I am going to do next time if I already went over the free period?

Another issue you might notice, above the first red line, the 04-20 19:53 trip, which I used a bit under 2h. I did not use 4h yet got billed as such, this is not a bug but a feature: the cost triggered a communauto “parity of the price” forcing a 4hour usage due to the red text in the quoted section mentioned above. Now, assume, I already did most of the travelling, having a car sitting around for two additional hours (than what I did), would have costed the same. This means, if I returned the Flex car at 20:38 instead of 17:52 with same km on the meter, I would have been charged the same. Now, you may wonder how that is going to help the end user. It does help tremendously, in the case of shopping or going to places WITH automobile, park, then come back, duration long enough to trigger parity of the price.

Major TLDR for now: I will try to explain this maybe in another post another time:

  1. You probably want to the ValuePlus plan + Unlimited Commuauto with to take advantage of all the synergy if ever travel beyond the free 30 minutes. Better yet, have your spouse on it too as the saving from monthly Flex membership alone is probably around 30×2 each month. This will probably come out to be around 30 (main) + 3 (comember) for two communauto + 30 x 2 (Unlimited Automobile/Flex) so a total of 93ish before tax for TWO people to have access to the Flex vehicles. Which is reasonable considering that monthly bus pass alone for one person is like… 85+ with STM.
  2. You want to either travel in less than 30 mins all the time or
  3. For some reason if you go over by enough time to reach the OTHER side of the cost pyramid e.g.a above 90 minutes or 1h ($12) beyond the free time, you trigger the parity of the price which means, from that point onward, usage is a LOT cheaper and by using the car for THREE/FOOUR hours since you are getting dinged for that regardless if you use those time or not (as that is a REQUIREMENT to trigger parity of price: minimum 4hours will be billed regardless if you use them all up or not.).

Update 2: Source code of the visualization process added:
https://github.com/dyt811/FlexCostCalculation

 

Combining PNGs into a Movie is harder than it looks…

So I was trying to combine a lot of pngs (5k images, 20kb each, 1k by 1k, not that much tbh) into a mpeg/movie and I thought this would take a few hours max.

Oh boy was I wrong on quite a few scales.

First, tried to use ffmpeg to render these matplotlib images in memory and make movie at the same time. After 40h that rendered a SINGLE video out of those images, I decided there has got to be a better way. I chalk this up as Python not being optimized or more likely, me not being smart enough to know what the right thing to do.

Plan B: render those 5K images to disk, then merge post. I TOTALLY thought this cannot be THAT hard and before we get GB of memories, this must be the way people do things in the ancient times like writing flip book animations.

So yeah, looked up ffmpeg, for a series of images not consecutively named, it is annoying to do. Then switched to magick convert. Took a few tried. Maxed out memory (12GB) several times before I realized, probably need disk cache. Have some pretty fancy looking errors like this:

(exception processing is suspended) @ warning/exception.c/ThrowException/1049. 
convert: unable to extend cache 'input_2018-09-03T120000.png': 
No space left on device @ error/cache.c/OpenPixelCache/3883.

In the end, took me a few wasted attempt (each 10+h) while I stare at the blank screen before more Google Fu finally lead me to something along the line of this:

magick convert 
-monitor 
-define registry:temporary-path=E:/Temp 
-limit memory 5GiB 
-combine *.png movie.mpeg

I tried too many times to just use last line to combine without thinking the spatial time complexity (thanks, to no compsci background). 2nd line give progress update, 3rd line FORCE a more reasonable temporary space. 4th line prevent RAM overflow. I really hope this is the end of this rendering fiasco. I am not a smart man… 5th line should NOT include combine, just input /output. One liner is good enough I split here for easier readability.

But glad to see the open source community thought about these types of problem through and through: start reading here: https://www.imagemagick.org/script/architecture.php#tera-pixel

Maybe I will link the video rendering later on once it ever finishes. BTW, it takes about 30minutes to write 5K files of 1K x 1K to disk. Sigh. How people use do all these.

Update 1: One liner:

magick convert -monitor -define registry:temporary-path=E:/Temp -limit memory 5GiB *.png movie.mpeg

turned out to be the best. Obviously, update your temporary path and this assume you want to turn everything in the current working directory to the movie. I never expected a simple conversion like this took me almost a few days of on and off GoogleFu to finalize. Hopefully whoever finds this can avoid such hassle.
Convertion time in the order of a few magnitude.

 

 

The Mysterious Case of <"OSError: [Errno 7] Argument list too long"> while calling subprocess from Python

Ran into this little kinky bastard a few days now. Got me keep scratching my head as to why I am calling a simple command from Python to call subprocess in the form of COMMAND ARG1 ARG2 (with ZERO path information, so super short) yet it failed with that error spectacularly.

I spend ages digging into how come such short commands would evoke this kind of problem, it turns out, it has nothing to do with the subprocess call or the way I called them.

I actually had a BUG in the previous lines where I attempted to add the path to COMMAND to the $PATH and this was called way too many time resulted in a $PATH variable that is insanely long mostly composed of PATH/TO/COMMAND being repettively added THOUSANDS of times to the $PATH variable for the current python session.

In short, I attempted to add the path to the command thousands of times to the $PATH variable and THAT was actually the root cause of the bug where [Errno 7] Argument List too long come from.

So, check how you are adding the path to your command where you can calling and ensure this is not the cause of the ERRNO7.

Case closed, I hope.

 

What I wish I had learned before my first real data science competition…

So recently I took up an interesting data science challenge that taught me a great deal of lessons that I am still trying to digest hard.

Over the past month or so, I was working on this: https://www.mindsumo.com/contests/weather-model-forecast

In short, a 256 x 256 x 15 dimension x 6000 observation data set. Not big by computer vision standard but by far the biggest I have dealt with.

I just handed my submission (incomplete, mind you) at noon, feeling utterly defeated. Worse than when I started.

I am almost confident I will be ranked pretty close to very last among the less than dozen entries…. not because I lack talent or confidence, my model was utterly, shit and I failed to truly understand what I was training or how I was dealing with the data and led to this hilarious utter fiasco.

Hopefully, it will inspire you to avoid my disaster.

I will try to summarize the things I would have done better next time:

1) This was the training set:

2019-02-21_T001658&R&R&R.png

Not looking too bad right?

2) This was the validation set:

2019-02-20_T072316&R&R&R.png

Hmm… I was very naive and thought… wow… look at that, the model must be SUPER well trained since 14ms into it and started overfitting and cannot generalize well… and yadda yadda yadda… Well, cool. 14 minutes on 62million parameters shitty 4 layers CNN model must be doing SOMETHING right… oh only if I knew how wrong I was.

To explain a bit more, the “training” data we fed were (about 500+) timepoint specific 256x256x19  measures (that also spatially encoded, day of the year, time of the day information).  We have those training on 50% of the date ranging from around Jan and around July of a few years (~2) while testing on a year unseen of their Jan/July (~300 timepoints).  In short, very high dimensional data.

Symptom of the signs I ignored:

  • Validation never really converged. What I “thought” was convergence, was merely testing on similar data.
  • Mean absolute error was always at least 5+. Meaning EVERY SINGLE PIXEL temperature estimation is probably either 5 degree higher or lower on AVERAGE. God bless the extreme temperature differences…. or mean absolute percentage differences in the 10^4 range… YUP. That is not an exaggeration.
  • A few spot check of prediction on first dataset (e.g Jan show something like this:)Comparison_validation_0000_0123.npy_2018-01-18T090000.png
    • Not too bad eh? Score in square bracket, SSIM. Lower = bettter.
    • Seeing pictures like this, I shrugged off those 6+ degree of differences and thinking, meh. Maybe just how it is. We maybe fundamentally missing some information to reconstruct the high resolution truth. Big deal.
  • THEN at 10AM of the deadline, it hit me. Hard. in the face, like a brick when I tried to predict summer July temperature. Hmm…Score of 100?  But… they look the same… thenComparison_validation_0123_0247.npy_2018-07-16T120000.png
  • A few data points later… hmm… have I not seen that prediction before?Comparison_validation_0123_0247.npy_2018-07-16T210000.png
  • … for some reason, it turned out, for the ENTIRE freaking month of July, the model is trolling me with a FREAKING static image as a prediction output…. Ladies and gentleman, this is the reason why you need, should and must visualize your neural network data, they troll you hard.
  • HOWEVER, I was being an idiot too. Think about it, 14 minutes of training, and thinking the CNN would learned EVERYTHING needed for a 62 million parametter backyard crappy architecture to predict 300+ previously unseen 256x256x19 input while trained over 19 steps of 64 batch of input (which has like is… by a large margin, unrealistic and by most common math people, prepostously naively stupidly over estimating the computing capability of GPU. I do not have a DGX-2. A meager 970 has no WAY to churn through that much data. But hey, I am no mathematician and lack common sense and sleep deprived. In short, I mathed hard on that ball.
  • In reality, the relative flat (and rising) mean absolute error is actually an indicator of UNDERFIT by a HUGE MASSIVE MARGIN. Because think about it, I am showing the high dimensional input of 256x256x19 from a particular hour of the day to try to predict temperature of that day probably has a VERY LITTLE bearing or information about how on another day/season/hour of the day on predicting that weather. Eg. telling you it is -40 in winter solace probably won’t help predict summer high temperature in the same region no matter of the amount of information given to you. Maybe a 100+ years history of such pattern, you can infer it. But DEFINITELY not on 1 year and mostly data from other timepoint as training dataset.
  • Taken the training and validation graph together, it should be clear that the loss is keep decreasing because the model is getting better, validation still sucks because we are training on a very different temporal environment which require much more observation to model. In short, a more recurrent model might be more suitable. But even to now, I am still not sure how to best tackle that problem.

Another huge idiotic problem I made is: source daata were binned 100 continous timepoints of 256x256x19 as input. I kept them as it is and load them together instead of breaking it apart into 100+ smaller files so gneerator class can LOAD ON THE FLY. The irony is I actually BUILT this exact approach before when dealing with IMAGING data so to at least traverse the ENTIRE dataset once before using the model instead of using one 15 minutes into training just because its mean absolte value is lowest… HOW NAIVE.

In short, I done goofed big time.

If you are still reading, I am impressed. Here are some pratical tips that will hopefully help you too.

  • Have  a callback function that ModelCheckpoint monitor training loss or whatever you are optimizing and save that every chance it improves. Instead of saving the model at the end (which could be interruppted).
  • Have  a callback function that ModelCheckpoint monitor validation loss or whatever you are truly validating and only save when it is true minimal.
  • Timestamp your log and model name it too.
  • Timestamp your model and name it descriptively too.
  • Look at your data. Look at your validation data. Look at your validation via RANDOM SAMPLING. I only looked at Jan, looks legit. (happen by chance most likely because first training data loaded is around Jan). Look at your data early. Look at your saliency map. Look at your output against sanity value checks. Look at your supervisory input. Look at the data more. Stare at it, admire its beauty. Be one with the data and live and breath it to ensure.
  • For large input files, break down into smaller files and index them via files so they can be loaded by your customized generator class.
  • Compile the model with all metrics mae, mse, mape, cosine. It is cheap and give you more info.
  • Do transfer learning, don’t be me and try rebuilt simple few layer CNN. Keras takes only a few lines to retrain. With even a few hundreds of images.
  • Make sure you run at least enough epoch SUCH that you have covered all input at once. This may not be necessary for most situations but in my case with different/unique! timepoints, it should have been MANDATORY. Yes, I was not too bright.
  • If you wish to witness the dumpster fire yourself, you can find it here: https://gitlab.com/dyt811/weathertrainer
  • Gitlab you can upload 700+mb models. Not on GitHub. They slap you at 100mb.
  • Always assume you are in an abusive relationship with your neural network where it is actively trying to deceive you like the current world leader and may be lying to you blatantly but you are too lazy to fact check the spew of conscious lies and that over time such small stabs at your reality made you question why you were asking about it in the first place. No, if you feel even slightly some thing is off, shit is about to go down.
  • Practice solving real world problem more.
  • Neural network evolve and adapt but evolution is not omnipotent and no amount of data can adapt extremes or unseen cases (unless are into sophiscated RL). Creatures cannot adapt to hot lava and neither bacteria to alcohol.

 

检测到后拉摄像头异常,录像已停止。请移除后拉摄像头重新连接

安装小米70迈后拉镜头连接小米智能后视镜时,有时会在后视镜上显示出:“检测到后拉摄像头异常,录像已停止。请移除后拉摄像头重新连接”的问题。我被蒙了半天后才发现如果小米后拉摄像头显示以上错误,解决方法为:把图像传输线(除连倒车灯以外的那根线)大力地插到底。。。 给我大力点,插插插插插到底后会发生“卡擦”一声的,然后以上信息会消失。理由很简单。。。

Literally, got stuck on this issue for far too long. Damn prompt message, made me thing something went wrong seriously. In the end, just press the connector HARDER into the AV in for MiJia DVR…. Damn misleading user error message. 

 

How to train a Conv2D network that at least converges… and does something.

If you have not seen XKCD’s comic on an intuitive summary of machine learning training, this is a good summary: https://xkcd.com/1838/

So I have been playing a bit more with neural networks latestly, specificily, ConvNet.  I read many many meme and non technical blog posts but finally, time to get my hands dirty and do something with all these cool toys and checkout the raging hype.

TensorFlow API was a bit painful to drink from similar to a firehose and keras were much nicer to beginners.

Some very interesting patterns have been emerging as I played with various settings to build a network to presence detect an artifical orientation marker in a 500 by 500 images.

Here is a quick reflection and thought dump of all the steps that made my training slightly more successful, eventually reaching 97% accuracy for out of sample detection (with about 6000 images with markers and 3000 images without markers, all augmented.). Nothing seriously cool but these hands on experience definitely made me learn a lot more about what I am doing…

  • GPU memory is the most relevant limitation, training time is ususlaly not for such simple network (data annotations can be another important limitation, more on this later). Most of my networks (no more than 4 Conv2d, 2 Dense) converged by 1hs into the training on a 4GB GTX 970. The most successful network with minimal stride and deeper took about 1.5h to reach 95% accuracy and 3h to reach 97%. At the beginning, I was trying to wait for HOURS upon HOURS overnight hoping the network that looked like this would improve: Let me save you some times, don’t bother  with any amount of waiting if you are not seeing any improvement 15 minutes into it. This MIGHt change if you are trying to produce a production level 99.9999999% accuracy network to perform best but you can always swap out the algorithm after everything else is mostly fixed. 2018-10-09_T233915&R&R&R.pngSimilarly, you want to see this in the first few epoch (in my case, that is about 5 minutes into the training). Note how it is immediately decreasing substantially in the first few epochs, different from previous image. Accuracy would go up.  slack-imgs.com
  • If you can afford it in terms of GPU memory, stride is best to kept low (1×1) and use more Conv2D layers as their combination across layers can make up for the lack of receptive area. This in the end training a 3 layer vs 5 layers with everything else kept the same resulted in the necessary boost from 80% accuracy to 95+%.
  • Stride should be large enough for the network structure and object. I had no clue what I was doing so my batch was in hindsight probably bigger than it should be. Either way, what ends up happen was the stride was too low (1×1 for three layers) and not enough unique data from the images are being sampled. By simply INCREASING the stride to 2×2 across all three layers and keeping everything else the same, it drasticly improved my performance from 55% to 75%. This MIGHT be unique to my situation as the marker I am trying to detect was sized differently in the augmented input from 30, 50, 100, 200, 300px in a 500 px images.  Obviously when the receptive field is too small, it is going to be hard to recognize images. You can probably afford to increase the stride a bit in the first Conv2D layer facing image input.  I found the BEST EVER illustration of stride, padding etc from here: https://github.com/vdumoulin/conv_arithmetic
  • Try Adadelta, adagrad as well as ADAM. In my case, adadelta was doing best, well illustrated by Sebastian Ruder’s blog post here: http://ruder.io/optimizing-gradient-descent/index.html#challenges. I had no clue what it is about to be honest, but the images looked pretty enough to convince me to try. Also VERY well illutrated here: http://www.robertsdionne.com/bouncingball/. A good speedy convergence should be like this. I believe this was left on overnight, this is trained obviously way longer than necessary.  Around 2h into it the peak in val_acc is good enough mostly. However, do notice the rather sharp convergence, which I BELIEVE is contributing to Adadelta but have not fully tested across everything. slack-imgs.com.png
  • In the beginning, when you are experimenting with architecture, you would rather OVERFIT than underfit. This is because overfit is a sign that at least your network is probably LOOKING at the right thing  in its receptive field: https://fomoro.com/tools/receptive-fields/. I had this problem early when it is absolutely not doing anything at all (see above). In hindside, overfit is a luxury and can be easily fixed with augmented data and dropout etc usually if your data source is abundant. This type of loss patterns are clear illustration of overfitting. Dip than rise never ending…  2018-10-10_T102137&R&R&R.png
  • Pick a right question to ask the neural network. For this project, the question is very straight forward: given a 500 by 500 images, can you tell me if my object is IN it or not. Since this is a artifical marker, I can generated millions of images with the augmentation appproaches. We were also trying to ask a network to regress the orientation w p r and that was a insanely hard question to first tackle……
  • LeakyRELU seems… to have improved the perforamnce. I am not 100% certain on this but I used it early and it seems to have no major downsides. I am using alpha of around 0.1. Definitely stay away from the other few beside Relu unless you have very clear reason to use.
  • Kera’s flow_from_directory and ImageGenerator class is GODSEND. I wish I had known about it earlier before I wrapping ImageAug python package extensively to do my own data augmentation. Literally just point that at a directory and fire and forget as long as you have data in the folder. It even does image resize which makes my job much easier as I standardarized my input images to 500 by 500.
  • This one is quite intuittive in hindsight but caught me off guard way too long… basicly: in conjunction with earlier point about receptive fields, IF you change your input size (e.g. do input size vs batch trade off), it will clearly change your neural network performance. So the same network architecture will perform differently on images of 500 by 500 vs 250 by 250 vs 50 by 50… My general intuition is that larger receptive field in relation to the input is better. This can be achieve either bigger stride or deeper netowrks, ideally, the latter.
  • Accuracy can be deceiving. This is also another huge lesson brutally taught to me by my data: Class imbalance can hugely bias accuracy so make sure it is well balanced.

 

Massive Rant about Google Drive

Holy XXXX… You know they say that a backup is not a backup until you test it, today I had to rely on Google Drive to recover accidentally deleted files and that went bad….

So, I accidentally deleted some files online, no biggie, checked trash… not there… WTF?

then check history… and its gone. This issue aside (luckily, I didn’t lose much… but I will never ever expect to recovery from Google Drive again. ). Anyway, their instruction is to use SEARCh to find these hidden elf files roaming somewhere in the ether. This is when I ALSO realized Google Drive search does not support regular expression and has some epic quirky bugs.

Here are files taken screenshot with time stamped file name:


2018-09-29 8.07.58 PM:

 2018-09-29 8.07.58 PM.png


2018-09-29 8.08.07 PM:

 2018-09-29 8.08.07 PM.png


2018-09-29 8.08.13 PM:

 2018-09-29 8.08.13 PM.png

So as you can see… this makes zero sense. If you take away RE power from user, at least do a freaking competent, fool proof job doing searches (btw, this is Google… just so you know, very ironic for what they do tbh in one of their core product…).

Like, I am not even sure how this types of bugs exist,  I am not talking about special symbols here, a search with basic alpha numerical string (that is at the BEGINNING) of the file names… does not work. What.The.Hell.

Yeah, you get this… when you press enter, it still fail to find the file yet I assure you those files clearly exist because they are the new files which I just uploaded …

 2018-09-29 8.16.38 PM.png

 

After thoughts:

I think… most likely something went wrong during the indexing process or asynchronizing of the indexing that cause this odd bug but still, why part but not all of the file name? I am going to check again later to see if this bug still persist. Maybe it only exists for recent files like this one updated in like (19:45) and didn’t get indexed properly in 20 minutes. HOWEVER, this still doesn’t fully explain the issue that the search-as-you-type manage to pick up PART of file name but not all… Very odd.

Bottomline, this is the second time I got burned by Google Drive. I would store cat photos there but work related photos, I gotta get my shxt together and do some seriously hourly backups….

LASTLY: Just to say, I tried a few recovery attempts over the years in Dropbox and they went better than this… and I am paying for both….=_=…


 

Using Python to establish a connection through Proxy/transport/intermediate server/(something in the middle) to your FinalDestination server

Quick and yet absolutely disgustingly insecure way to establish a password based authenticated connection to a server through proxy (aka jumphost/intermediate server/proxy).  Hopefully someone will find this useful. Blindly trust server may invoke  armageddon… Source code inspired by https://stackoverflow.com/questions/21609443/paramiko-proxycommand-fails-to-setup-socket

More on how to use the client object you get to do things like transport: https://stackoverflow.com/questions/3635131/paramikos-sshclient-with-sftp

import paramiko
def getSSHClient(proxy_ip, proxy_login, proxy_pw):
    """
    Instantiate, setup and return a straight forward proxy SSH client
    :param proxy_ip:
    :param proxy_login:
    :param proxy_pw:
    :return:
    """
    client = paramiko.SSHClient()
    client.load_system_host_keys()
    client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    client.connect(proxy_ip, 22, username=proxy_login, password=proxy_pw)
    return client

def getProxySSHClient(proxy_ip, proxy_login, proxy_pw, destination_ip, destination_login, destination_pw):
    """
    Establish a SSH client through the proxy.
    :param proxy_ip:
    :param proxy_login:
    :param proxy_pw:
    :param destination_ip:
    :param destination_login:
    :param destination_pw:
    :return:
    """
    proxy = getSSHClient(proxy_ip, proxy_login, proxy_pw)
    transport = proxy.get_transport()
    dest_addr = (destination_ip, 22)
    local_addr = ('127.0.0.1', 10022)
    proxy_transport = transport.open_channel('direct-tcpip', dest_addr, local_addr)

    client = paramiko.SSHClient()
    client.load_system_host_keys()
    client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    client.connect(destination_ip, 22, username=destination_login, password=destination_pw, sock=proxy_transport)
    return client

 


 

Building MincToolkit for CentOS7 FUN

You probably want to build OpenBLAS.

Also make sure to sudo yum install hdf5, gsl, itk, netcdf, pcre, zlib, openblas-devl (still end up having to make my own),  etc…. that seems to have helped getting around the aforementioned empty string hash issue.

CCmake3 and Cmake3 also seems to have helped.

I recall having to install ccache as well.

 

Overall, it is MUCH easier to build it in Ubuntu, sigh.


 

More Compilation Fun with d41d8cd98f00b204e9800998ecf8427e

So, the magical string of: d41d8cd98f00b204e9800998ecf8427e

Is the MD5 hash that CMake (or any other pogram) typically yield when the module is not found and ran an MD5 sum on an empty string apparently…

If there is anywhere you get a DOWNLOAD HASH mismatch and showing actual hash is: d41d8cd98f00b204e9800998ecf8427e, in English, the program is complaining that the resource is not found and a MD5 has check revealed that the hash of an empty string is different than whatever you are expecting… Now that makes a bit more sense… but no where close to solution…