Sunday, January 5, 2020

In the first two years of the history of Unix, no documentation existed

I decided to write this post as I was working through some research on the Tensorflow 1.0/2.0 notebooks using Long-Term-Short-Term (LSTM) Recurrent Neural Networks (RNN) for AI story generation and style cloning.  I became a bit frustrated as last year the API changed quite a bit, and some of the core documentation and system help was incorrect.

Project developers, open source or not, hopefully learn from their mistakes and improve their code.  It's exciting to start building something amazing, then seeing it take off and grow more legs, and minds.  Like the spider (or graph) that is the World Wide Web, a codebase can go in many directions.

Clone it, use it, break it, fork it, build it, test it, hack and change it
Test it, doc it, run it, use it, build it, upgrade then docs outdated.



With open source projects, new developers enter the team and old committers take on different roles.  This software maturity cycle, team churn, and feature/bug improvements inevitably results in some issues with stale documentation, examples, and tutorials.  This is especially prevalent with cloud and fast-moving technology, where by the time the user and technical docs would be complete it's time to move on to the *Next Big Thing*.

In the end, the code is the truth, the behaviour of the code is the truth and the code is the documentation of said truth.  Like Schrödinger's cat, the code is dead and alive at the same time.  It's only at the point of observation that the code becomes physically manifested.

Whether it is markdown or pdf, wiki or man pages, current documentation is important.  Even, and especially in an Agile world.

The following 8 points are tricks to digging through upgrade changes for an open source API that are causing stale code examples or dependencies to break.

1. RTFM

There is usually a guide from upgrading from one major version to another, and the guilt of the developers who have broken an API for the greater good of the project should easily surface the changes in the developers guidelines and release notes.

I once spent a few days pulling out the words "RTFM" from client code, and fixing documentation that could be "questionably unprofessional" to the client on the receiving end of the source code.  It's important to make your documentation as professional and readable as possible for a wide audience of (possibly) non-developers.  

2. "man" up, get help() and command discovery.

A man page, or manual page, Man is the help for Linux and an alias for Powershell.  When a dependency or module is loaded, you can walk through the api like the chapters of a good book by using the help system.  In python, you can use the help() command to review the documentation, or just search for the API in the source code in your favourite editor or in Github.

3. Generate the documentation.

In the first two years of the history of Unix, no documentation existed.  "One literally had to work beside the originators to learn it."

You don't necessarily have to depend on the developer to build external documentation.  Create an up-to-date API document using document generation tools.  For Python, this could be tools like Pydoc, Sphinx, pdoc or doxygen.

https://medium.com/@peterkong/comparison-of-python-documentation-generators-660203ca3804

Use man, help() and other tools to review the implementation and hopefully some decently commented code.

4. Browse through the Github issues or JIRA tickets for technical guides.

Much of the documentation of an open-source project lives in the tribal knowledge repo that is Github or JIRA.

5. Read the Request for Comments (RFC) documents and any community discussions.  

Often on larger projects it is essential for teams to collaborate on new features and APIs to create standards or best practices.

The Tensorflow RFCs
https://github.com/tensorflow/community/blob/master/rfcs/20180827-api-names.md

6. Ask for help.

The developers, committers, contributors and user community of large projects are likely more than willing to help with your questions, provided you read the manual first.

7. Watch for projects without a lot of activity

Stale code in Github more than 5 years old probably means it has been forked down another path.  It may no longer be maintained.

8. Watch for projects with too much activity.

Monster projects may have bigger challenges with documentation than smaller projects.  I see this a lot with projects in the Apache and Cloud Native ecosystems.  Projects like this become seriously complex and stale documentation can snowball a problem.  FAQs and cheat sheets are important for large projects.

TL/DR

In the end, you'll probably find some poor soul on Stackoverflow or Github with the same questions you have about why this or that example no longer works.  Figure out the answer then contribute some documentation to the project if you can.

In the case of Tensorflow, use your favourite RSS reader (Feedly?) to watch the blog.  https://blog.tensorflow.org/2019/02/upgrading-your-code-to-tensorflow-2-0.html






Monday, July 17, 2017

Virtual Reality and the Trough of Disillusionment

For Christmas last year, I picked up a $10 Insignia Cardboard VR for "my kids" stocking.  I wasn't expecting much out of a piece of cardboard with two magnifying glass lenses,  a magnetic button and a piece of elastic.


I was blown away.



After downloading a few apps for my Sony Xperia phone and almost melting the phone itself, I was amazed at how the seemingly simple visual magic trick of splitting a screen into a screen-per-eye immersed you in an alternate reality.  Everyone I showed it to was similarly impressed.  I tried it with a new set of Bluetooth headphones and was standing on stage listening to a Binaural immersive audio concert and watching Paul McCartney play Live and Let Die on the piano a couple feet from me.

So why isn't everyone talking about it and wearing these things outside?

Well, Cardboard VR looks pretty silly.  It's, um, cardboard.  My kids liked it, yet they don't talk or ask about it as they do with the IPad, XBox, or even Pokemon.  It heats up my phone and chews up valuable memory space.  It's blurry.  You have to hold it to your face.  It doesn't have a lot of easily-discoverable content.  You have to start an app from your device before you put it on to get it going.  There is no keyboard or mouse.  Talking to yourself (since you can't really see a device) isn't like talking to Alexa or Siri or Ok Google or Xbox.

Perhaps my kids would use it more if we had a more professional headset or "kid-friendly" one that didn't require a phone.  I don't think so though.  In any case, it's probably worse than an IPad in terms of the health and mental changes it would introduce to children, so I don't think it would be a good idea to let my kids play with it anyway.

"With appropriate programming such a display could literally be the Wonderland into which Alice walked." -- Ivan Sutherland

Running a dual split-screen display on a mobile device will burn your battery like no tomorrow.  At up to 90fps for a good VR experience, it's no wonder you need a $1000 gaming rig to get the best experience out of some of the VR devices like Oculus Rift and Steam VR.

Searching Github for VR brings 17,000 repos.  GoogleVR is number 3 returned best match. Facebook 360 is up there.  WebVR and WebGL are going crazy.  There's the Jahshaka VR Content Creation Suite.  360 Video is the Next Big Thing™.


So why is Virtual Reality stuck in the Trough of Disillusionment?

The Gartner Hype Cycle shows that technologies flow in cycles over time.  At the Peak of Inflated Expectations, technology and startups are "changing the world" and the press is clamoring for articles and content on the technology.  Everyone may not have the tech, everyone wants it and they don't really know why.  There are successes and many more failures. Early adopters make-or-break the product or technology based on their loyalty, interest, and evangelism.

And then a quick dip into the Trough of Disillusionment, where interest wanes as promises are not delivered.  Investors didn't get the quick return they were looking for.  Maintenance, Support, and Legal aspects of the technology start creeping up.  Everyone wants a piece of the pie, and the pie is getting a bit hard and crusty.

In the news recently, a $6 billion dollar case against Facebook and Oculus was lost, with a judgement for $500 million to ZeniMax.  Another case announced a couple days ago.   The pigs are feeding at the trough, not to equate anyone to pigs however the saying seems to fit.

I'm a big fan of the 2 Johns (Carmack and Romero) and their impact on the gaming world with id software,  They spawned a billion-dollar industry by starting with the concept that Super Mario Brothers 3 should be doable on the PC, outside a protected cartridge on a Super Nintendo.  By creating Dangerous Dave in Copyright Infringement, John Carmack brought the concept to reality, while mocking the concept of copyright itself.  Shareware was in business.

John's .plan files are an interesting historical trip through the mind of a game developer and 3D pioneer.

"Well, I have learned enough about it. I'm not going to finish the port. I have better things to do with my time." - John Carmack

His OpenGL position in January, 1996 tells a story about the state of the art in 3D and the heated battle of competing 3D standards within the hardware and software vendors and developers of the time.

John Carmack, May 14, 1997 .plan file.
"I am still going to press the OpenGL issue, which is going to be crucial for future generations of games." - John Carmack

In 1996, 3dfx Voodoo cards were awesome, and I still have one in my basement shop tech graveyard.  They surpassed console and arcade hardware on the PC.  Mine was unstable as hell, overheated my cpu, was incompatible with some games and crashed my PC all the time.  The bang for the buck and wow factor overcame that.

Matrix, a company based in Quebec, had their high-end prestige Millenium cards and released Mystique to compete with the Voodoo's price-point.  Nicknamed the "Matrix Mystake" it didn't hit the mark. Nvidia and ATI, a Markham, Canada company, took over the market.  NVidia bought 3dfx. Matrix is still  The video card industry had just started its exponential rise to meet the demands of gaming and video software.  Technology was diverging and converging, and commoditizing.  The climb to the Peak of Inflated Expectations was occurring.

"Many things that are a single line of GL code require half a page of D3D code to allocate a structure, set a size, fill something in, call a COM routine, then extract the result." - John Carmack

In 2011, Carmack suggested that Direct3D had surpassed OpenGL, though he still wouldn't use it.

In 2015, Microsoft brought DirectX 12 to Windows 10.
Lead developer Max McMullen, stated that the main goal of Direct3D 12 is to achieve "console-level efficiency on phone, tablet and PC"
Vulkan is a "closer-to-metal" API for hardware-accelerated graphics, and operates at a lower-level than OpenGL .  At the time, Valve suggested it made no sense to use Direct3D 12, and to stick with Vulkan, though it couldn't be used commercially(?).  Direct3D 12 was really the only commercial option.

Also in 2015, Microsoft announced GPU Capabilities in Azure.

2016.  OpenGL vs. Direct3D: Who's The Winner of Graphics API

2017.  OpenCL, OpenGL, OpenVX, Volkan, WebGL.  DirectX12.  However, gamers are no longer the only consumers of leading-edge graphics technology.  AI, Deep Learning, and GPU compute are the key use cases for the technology.  Clusters of machines running high-end GPUs are no longer used to display graphics at all.  Display Drivers have become Virtual Compute Drivers. Microsoft has released the Azure N-Series NVIDIA CPU Virtual Machines.

So back to VR.  The key mainstream platform for graphics and VR in 2017 is mobile.  The key mobile device is Android.  And Android runs OpenGL.  Macs run OpenGL.  Does that mean OpenGL wins?

As Facebook says, It's Complicated.  Valve developed a wrapper to translate Direct3D to OpenGL. Unity will produce both OpenGL and DirectX. WebGL and WebVR are slowly becoming platform-agnostic mainstream technologies.  DirectX is still Windows, and OpenGL is still everything else.

The Slope of Enlightenment will come to VR, and it will be when software, or virtual hosted hardware replaces local hardware.  When DirectX will support accelerated 3D over a network similar to VirtualGL, or Desktop Cloud Visualization. When the requirements for immersive virtual reality are that you have to look at an object and its reality is projected onto it, rather than putting on a device that projects it to you.

Push VR will make VR, and more realistically AR, a commercial success and commodity "necessity".





Tuesday, February 14, 2017

ADAM - Production Design for the Real-time Short Film

Georgi Simeonov and the team at Unity published a series of blog posts on a demo called ADAM.

The film is set in a future where human society is transformed by harsh biological realities and civilization has shrunk to a few scattered, encapsulated communities clinging to the memory of greatness.
Adam, as our main character, was the starting point of our visual design process. He was designed to provide a glimpse into the complex backstory of the world, by revealing himself as a human prisoner whose consciousness has been trapped in a cheap mechanical body.

My computer can't even handle playing back the video without a bit of stutter.  It's like a message will pop up any second telling me "this isn't something I can even fathom playing back for you in a timely fashion."  I can imagine what a $600 graphics card could do with this technology.

The amount of thought and depth that went into this short demo is staggering.  What I really find interesting is the concept art and reference sheets.  Google Goggles and Machine Vision seems like a great tool for building these otherworldly characters.  Take a picture of someone and it will classify, recognize color and text, identify brands and bar codes, and search for related images.  We could tweak the Cloud Vision or Microsoft Cognitive Services Computer Vision API technology to generate these reference sheets automatically, and provide additional insight and ideas to the realtime art director. 

What if a tiny device sitting in the middle of your living room could model the room with 360 depth video, add ceilings, floors, render light sources, color match, visually classify the contents and determine what would look just a little bit out of place yet matches the color, lighting scheme, and design aesthetics of the space.

The tough problem is no longer modeling a room in realtime - I can do this with a Kinect and Skanect in about 5 minutes.

Well, there's still a few kinks to work out...

The tough problem is making this technology portable, non-intrusive, and insensitive to light sources like the Sun. The ability to use laser technology to capture accurate depth and distance across hundreds of yards of indoor or outdoor terrain.  A way of bringing the experience to the individual, rather than bringing the individual to the experience.

What if you could take an experience like watching a hockey game, render it realtime in VR, add binaural audio and throw in a few of your closest friends from around the world.

There's still some room to grow with the APIs though. I'm pretty sure there's plenty more Joy, Sorrow, Anger, Surprise and Headwear at this hockey game.




https://blogs.unity3d.com/2016/07/07/adam-production-design-for-the-real-time-short-film/
https://unity3d.com/pages/adam

Assets here
https://blogs.unity3d.com/2016/11/01/adam-demo-executable-and-assets-released/

Tuesday, January 3, 2017

Deep Learning, 8-bit Dimensionality and Toys of the World

Google announced they are open sourcing the Embedding Projector: a tool for visualizing high dimensional data, a hosted flavor of Deep Learning tool TensorFlow.  Last year they announced TensorFlow would be open sourced.  Here's an overview of what TensorFlow is and how it may have been disappointing last year.



Partying like it was 1999, I was working with Microsoft Site Server, analyzing web logs in multiple dimensions for the largest telecom company in Canada.  Standard dimensions for a web log would include host, referrer, uri, date, time, code, client ip, browser, and the mess that was cookies and session ids, One tool that sticks out for me was Microsoft Site Analyzer.  It was the coolest tool I had seen (out of Microsoft, at least) for visually understanding path analysis and links within a web site.

A huuuuge list of links to Neural network resources, published in 1999.

The interactivity of the tool and bouncing, spiderweb physics really made the it fun to use.  Though it didn't scale so well once we got up to 50,000 pages of content and 10GB of web logs a day...  and I think I was the only one actually using it.  Most of the e-marketers were just concerned with how many visits there were to their respective campaign pages, and less concerned with what page links to where or how people got to a page in the site.



Microsoft Data Analyzer was another tool for visualizing multiple dimensions that didn't really take off as well as the Proclarity's and Cognos BI tools of the world.


A game-changing demonstration of interactivity, multi-dimensional data visualization, and time-series animation was with the Gap Minder bubble chart demo and its origins from Ted Talks by Hans Rosling.  Hans conveyed an exciting story of the world's population & life expectancy data like a horse race, showing the staggering growth of India and China across the centuries and the effects of war and famine on the world's countries.

Dollar Street, the latest tool from GapMinder, uses 30,000 photos, from 240 families in 46 countries (excluding Canada) and the stories of families, their annual income, and the dimensionality of their lives, including what toys they have and how they brush their teeth.

Toys on different incomes

It is a great example of the power of data combined with multiple dimensions of visual imagery, personal stories, and heart-breaking reality.  The story, context and imagery behind the data points is much more important than just the numbers themselves or plots on a graph.

What does it mean to think in 4-dimensions?

For a more lighthearted example of a different way of visualizing 3-D data, here's Wolfenstein 1-D, Wolfenstein in a single pixel line.  Not as good as Snake... or this Deep Reinforcement Model which beats Snake.

In the 90's, John Carmack of id, Wolfenstein and now Oculus Rift fame put together a side-scroller called Dangerous Dave, to clone Nintendo's Super Mario Bros for the PC.  At the time it was amazing.

Now we are in 2017, and we have Nintendo releasing Super Mario Run, a mobile side scroller that does the side scrolling for you.  And it's amazing?  Nostalgia sells, just ask Pokemon Go players.

People tend to avoid eye contact when speaking, as their thoughts and verbal responses are more easily processed when not distracted by the visualization process.  This may explain the overwhelming feeling you may get when viewing something the brain cannot easily process, such as a multi-dimensional Virtual Reality video that shouldn't exist in your current reality space.  

It may also explain why it's still fun to ignore having to move a character in multiple dimensions and just tap a screen like a slot machine while hurling uncontrollably right and jumping on Koopa Troopas and Goombas.

And why the 8 bit workshop Atari 2600 repl is Amazing.
Deep Learning with Atari

Monday, December 26, 2016

The Virtual Sword of Damocles

One "holy shit" arsword moment that stands out for me when seeing a technology hack in action over the last few years was the wii-mote being used as a multi-touch mouse for an augmented reality whiteboard.  Johnny Chung Lee posted a video demonstrating the technology and it really grabbed my attention.  Microsoft ended up hiring him in 2008 for their XBox Kinect and Applied Sciences team.  He was there until 2011 when he joined Google and is now working on Project Tango and 3D vision.

My company has hosted hackathon competitions for the last few years.  Back in early 2007, I immediately saw this type of mixed-space technology as being a game-changer.  Not necessarily using a wii-mote to control a PC or draw objects in space on a wall, instead, the concept of multiple touch points inside a virtual or physical space, and controlling that space with something more natural than a mouse or keyboard.

I told myself that this is what I want to learn more about and this is a space that will change the world.

My doppelganger thought the same thing in 1991 while a graduate student in the University of Maryland's Human-Computer Interaction Lab.



Most people use 1-10 fingers and a mouse to interact with a computer and can type anywhere from 1-75 words per minute.  The wii-mote added a "z-axis", depth, a gyroscope, motion detection, and additional buttons to what has mainly been a flat interface experience.  There were even rumors of a microphone built into the wiimote for voice calls and speech recognition.

Nintendo really dropped the ball on that one.  They could have changed the home market quite simply by offering sip phone numbers on a wii and the ability for the wii to wake-up on a call. Mario Google Hangouts?  Zelda Snapchat?  Pokemon Phone? 


Speaking is generally much faster than typing and conveys many different nonverbal signals.  The MicroMachines guy could sing Michael Jackson's Bad in 20 seconds.  However, it still feels a bit odd and clumsy to me saying "Xbox, play Lego Batman" or speaking to a device while by yourself or in a crowded room.

While doing analytics for a cable company, I remember a story one of the engineers told me about how to get better service with a voice recognition IVR calling system.  If the system detected a swear word it would immediately go to a CSR in customer retention.  I'm not sure if that was true or not, however it conveys the idea that swearing is a strong signal of positive or negative feelings.  Google even detects this, and filters the 7 dirty words.  I get tired of listening to reality TV shows, prime-time cable television is now nothing more than a bunch of beep censors.

In 2008 I was offered the Microsoft Kinect among some other Christmas present options from my company.  I chose the Kinect though though I didn't even own an XBox, and set to work hacking it on Windows 7.  I built some cool apps with C#, Python, OpenCV and OpenNI, speech recognition and physics libraries.  The body-tracking features, skeleton detection, depth camera, voice recognition and immersive surround microphones of the Microsoft Kinect make it stand out as one of the most advanced consumer tech products of its time.

The only quirk at the time was you had to hold your hands up in the air like you were under arrest for Kinect to recognize you.



How many thoughts per second can you capture as output to a device? 
 
I was one of 3 people who purchased the now defunct OCZ Neural Impulse Actuator and attempted to get it to translate my thoughts to the screen. The developers were kind enough to send me a couple additional headbands, which still didn't fix the grounding issue I was having.

The OCZ had sensors which detected facial movements, eye tracking and even blinking.  You blink about 15 times a minute, and the act of blinking can be transmitted faster than a keystroke.  The latency of a blink is 300-400 milliseconds.  A blink is faster than a mouse click.  It was a bit tiring playing Unreal Tournament with the OCZ and using your eyes as the fire button.  

One of the reasons I think the OCZ NIA failed as a product, other than because of OCZ's financial shenanigans and grounding issues was the fact that many people feel very uncomfortable about something that can "read their thoughts".  The creepyness factor is high with these kind of devices.

In 2009 I bought a home projector and screen, which saw a lot of movie time and also some fun hacks with flight sims, the wii-mote, my new Xbox and the Kinect.

In 2010, I won a first-gen iPad and hacked it to work with Synergy, using my laptop to control my desktop PC, iPad and Android phone.  Being able to seamlessly control multiple devices, and move between them with the same interfaces is a really amazing productivity tool.

Fastforward to 2016.  VR is everywhere.  Virtual Reality, Augmented Reality, Simulated Reality, and various flavours of the tech have really come into their own since then.  Over the last few years, Microsoft Kinect, Unity, Surface, and now Hololens and Cortana Analytics have changed the way people build and interact with apps.

Reality that hackable culture is now a commodity was when I walked into Bestbuy just before Christmas this year.  In the toy section they had the Makey Makey I purchased a few years ago as a Kickstarter. There was a Brain-Sensing Headband for sale for mindful meditation.  A lady saw me looking at smart watches and asked if they had one for her boyfriend that he use to could call people with.  The Samsung Gear 360 was on sale, to capture the entire virtual reality experience of looking around and moving through a space.

Christmas Day, 2016.  I bought a cheap $10 Google Cardboard clone as a stocking stuffer and everyone I showed it to was blown away.  It was funny because my 4 year-old keeps trying to touch the objects in the play space with his hands like with the Kinect, however cardboard is really just paper with a button magnet, a couple cheap lenses and some very complex software.

In 2017, I hope to see the technology evolve to the point where it will be possible to have a seamless transition from PC to Mobile to Virtual Reality / Augmented Reality device.



Why can't you point at something on your screen then drag it out to the real world, or point to something in the real world and immediately have a complete copy, including sound and at-scale dimensions, on a screen.  Magic trick or not, it is happening.  Synergy of the real world.

My kids are addicted to the game Skylanders.  As a Portal Master, they can place their plastic figures with encrypted NFC tags on a USB Portal connected to the Xbox, and watch them appear on the screen.  If a Skylander villain is trapped, the tinny-sounding speaker on the Portal kicks in to make it sound as though the character is actually being sucked out of your television into a piece of plastic in your living room.

Games like Skylanders, Lego Dimensions, and Disney Infinity are good examples of merchandising that turns a single-dimensional play experience into a tangible reality.

This year the grandparents will be getting plastic clones of their grandchildren, and everyone will become a Skylander.
http://3dvrcentral.com/2016/12/19/sky-kids-coming-to-your-home-1000-winners-augmented-reality-3d-fun-turning-into-toys-pt-4/

This blog will track my links, ideas, and innovation within AR, VR, Unity, Machine Learning and Natural User Interfaces, starting from the genius and simplicity of Cardboard and The Sword of Damocles.

Some topics to get started on.

Microsoft Hololens.  HTC Vive.  Steam VRLeap Motion. OSVR and Hacker Dev Kit HDK2.  PlayCanvas.  Sketchfab.  Visor.io.

Google Daydream.