Gunkholing Bottomfeeders: 2017

Saturday, June 17, 2017

CI/CD and Second Order Test Concerns

Cisco has some reasonably mature media products (phone and video) built using the microservices approach with Continuous Integration/Continuous Delivery and plenty of automated testing. As our products matured the nature of the challenges we faced changed: we were faced with second order test effects. The first order effect of the tests is to test our production source code, catching bugs and increasing the production code's quality. The second order of effect is the increasing overhead of designing, building, operating, modifying, cleaning up and eliminating automated tests. As the total number of tests increases both performance and reliability of the tests will become critical to your ability to turn the CI/CD crank on each new change. To make life interesting, we have a world of great techniques we use to improve our production code and apply almost none of it to our test code.

Cisco's agile process use a fairly rigidly defined "definition of done" with a long list of requirements. It's somewhat a pain, but it did indeed yield code that had appropriate unit, sanity, regression, integration, feature, system, load, performance and soak tests. Code was always fairly modular due to a hard cyclomatic complexity requirement and we used all the latest bug scanning tools and so forth. Coverage was kept high, and we got large benefits from the careful and frequent testing.

This allowed us to deliver changes and features much quicker at first. We each built our handful of microservices and their little universes of tests, then added tests for the microservices we depended on. Every time new features are added, multiple new automated tests of various sorts are needed. As time passes and you grow features in an agile manner you end up with dependencies on more and more microservices, and you only have to get burned a couple times to realize you need to add tests the verify the features of other microservices that you rely on do indeed work. This leads to fuzzier lines of responsibilities, reinvented test approaches without best practices and hard to maintain tests. Communication across teams helps but is time consuming.

Every time a customer issue is fixed a regression test is added. Tests accumulate, and when a large organization is applying thousands of developers to building new interdependent microservices, the tests multiply at an amazing rate.

Like anything, writing good tests takes time to learn and master. Since the production code is the actual shipping item, much less time is spent revisiting tests, cleaning them up, making them modular and less complex. Get the code looking good, get the test working (it does not have to look good) and check it all in. This also means you're slower to master the test coding process - it's lower priority than the features, since features get your team those critical velocity points.

Given the requirements, maximizing velocity requires skimping on testing and mostly leaving them in the moderately functional state, not the desirable well tested and cleaned up state that increases quality and maintainability. Production code coverage is checked, test code coverage itself is never looked at. Production code is measured for cyclomatic complexity and rejected if it isn't fairly simple, but that is not done with test code. No automated bug checkers for test code!

Over time you get some sweet microservices providing awesomely scalable, performant and reliable features in a manner that simply can't be done in an old school behemoth solution. The pattern works extremely well, but it is also accumulates a huge amount of technical debt: the test code turns into a world of hurt. This is the most painful second order test concern of CI/CD systems that I've seen. Focus on production code over test gets increasingly expensive over time and especially as you scale the number of contributors up.

Just as we are mastering the architecture and the approach and delivering new features and bug fixes at a rapid pace, as our "velocity" starts peaking (boy, did Cisco go on about velocity) the tests accumulate huge amounts of poorly designed monolithic non-modular error and breakage prone code.

Our CI/CD systems refuse to integrate if we fail the tests. The first wave of pain was when scale started increasing massively and performance (as expected) dropped a bit. All was comfortably within expectations, but a few tests would break due to poor design and timing dependencies. Occasional code submissions failed to go through because a test that had nothing to do with your code failed; having never seen that test code, you have no idea why. Rather than check carefully, you immediately rerun the test. If it passes this time given that it has nothing to do with your code, the temptation is pretty much overwhelming to ignore it: try best 2 of 3, if it passes, it's in! While this is an insidious practice, the nature of timing dependencies in tests is that they are intermittent. If it fails too frequently then the team responsible for it will notice and fix it; if it always fails the team responsible will be found and told to fix it so that code can be promoted. This is the sort of situation that gets you to switch off tests so you can promote a change. If you find yourself switching off tests then you're probably not spending enough time maintaining your test code.

Now there are thousands of tests, and on top of the random test failures, the tests themselves start taking longer and longer, pretty quickly to an unacceptable degree. Time now has to be spent going back and sorting out tests to run very frequently vs. occasionally vs. rarely to get appropriate performance out of the different phases of the test suites without losing the coverage and quality benefits. Cisco's product managers weren't about to assign user stories to us for things they didn't care about and didn't feel responsible for, so the problem would fester until enough engineers on enough different teams were complaining about it that it finally percolated a few levels up and some VP had to step in and re-purpose efforts, assembling a team across the groups with the offending test suites to spend a week or two cleaning up. After the slow downward creep in velocity caused by the problem, velocities drop even further as teams change focus and lose members temporarily, and executives are unhappy.

Pretty soon the occasional build failure is a reliable build failure, sometimes with 2 or 3 random cases failing. Once again, no scrum team is in a position to address all of the issues, we haven't noticed our own intermittently failing tests blocking us (or if we do, we fix that one), we just get stuck by everyone else's blocking us. Note that this is an evil networking effect: the bigger you are, the worse the problem any given level of unreliable tests will cause you, and it goes up faster than linearly, I'm pretty sure. At companies as large as Cisco this becomes a large concern.

Once again it waits for a VP to crack the whip, and teams get raided,and velocity again drops, and executives are again annoyed, and Cisco kicks off another round of layoffs. Not that the problems caused the layoffs, mind you, they were just a regular feature of Cisco life, but I digress.

The main simple rule of thumb I learned at Cisco doing CI/CD is that done right in a large and mature microservices cloud, you spend quite a bit more time coding and maintaining all the different test cases for all the different types of testing than you do coding the actual production code to be tested.

Wednesday, May 24, 2017

PhantomJS, CasperJS and Arrays of Functions

Scraping

I've been doing some scraping - writing apps that fetch HTML content using HTTP GET and the occasional POST.

I've found two reasonably nice solutions for making scrapers easily:

Scrapy - a Python framework, optional Splash server available if full browser implementation (especially Javascript) is needed.
CasperJS - a Javascript framework built on PhantomJS, a headless browser.

One recent accomplishment has to do with downloading files from a web site, but I'm under non-disclosure and can't talk about that. Dang.

That work was done in CasperJS, which has an interesting approach to defining and executing scrapers and spiders.

CasperJS

CasperJS handles PhantomJS "under the covers" and provides nice wrappers around important features like injecting Javascript code into the browser and waiting on DOM elements, not to mention inputting keystrokes and mouse clicks.

Functions Inside Functions

Using CasperJS, you create a Casper object then start it with an initial URL, which is requested husing HTTP(S) via the PhantomJS headless browser.

Instead of directly coding the spider or scraper, you define a series of CasperJS steps using casper.then() (or inside of a casper object use this.then()). Each definition is a function:

casper.then(function doSomething() {

this.wait(250);
});

These functions are added to an array of function definitions and are not immediately run. When you get done defining them, you do a casper.run() {} and the functions will be invoked in order (maybe, see the bypass() function).

Functions frequently add new functions to the list, so you can be executing step 3 out of 4, and when the step complete you are now executing step 4 out of 7.

You can add logic that skips forward or backward through the array of functions, allowing loops and optional steps.

Most everything is asynchronous, which can bite you. If you code this.wait(500) and this.wait(500), they both run asynchronously after the last active bit of this step completes, and finish at the same time. They do not add additonal delay to each other at all if they are in the same .then().

The approach of adding functions everywhere for everything can lead to an accumulation of anonymous functions. This is actually a bad idea, since the debug/log mechanisms available will report the function names being processed - if they exist. It's best to add a unique function name to each and every function:

this.then(function check4errors() {
    var errorsFound = false;
    if (verbose) {
        this.echo('Check for errors');
    }

Be careful, though. There are also tight requirements around the casper.waitFor()/this.waitFor() and casper.waitUntil()/this.waitUntil() methods provided by CasperJS. The successful case has to be named function then() and the timeout case has to be named onTimeout() or things simply do not work. Here's an example of correct coding within a CasperJS object, so "this" is used rather than casper:

this.waitFor(function check() {
    return this.evaluate(function rptcheck() {
        return ((document.querySelectorAll('#reports\\:genRpt').length > 0) &&

                (document.querySelectorAll('#_idJsp1\\:data\\:0\\:genRpt').onclick.toString().length > 0));
    });
}, function then() {
    if (verbose) {
        this.echo('Found report generation button.');
        this.capture('ButtonFound.png');
    }
    this.wait(100);
}, function onTimeout() {
    this.echo('Timed out waiting for report generation button.');
    this.capture('NoButtonFound.png');
}, 20000);

Pluses and Minuses

CasperJS/PhantomJS are different than most NodeJS apps, so integrating them (other than via command line execution) is complicated. Writing NodeJS wrappers for CasperJS scrapers is straightforward. This is how I solved the challenges that I can;t talk about due to non-disclosure.

Inability to easily mix NodeJS and CasperJS code is a minus, but not horrendous. The ease of injecting JS into the browser is a plus, and the consistent JS language for both our code and the injected code has benefits too. Linting tools work well with the CasperJS code, enforcing correct coding in the CasperJS and browser injected code at the same time.

Scrapy Splash

Scrapy is nice, and the Scrap/Splash combination looks like the best bet for truly large scale approaches. The tools allow you to run an extremely stateful back end process on a stateless server with the ScrapySplash adaptation layer handling the tricky state management bits for you. I got this working, but only to a limited degree. It looks like the best solution for a large scale professional approach, but it does have a higher barrier to entry with the separate back-end server and the adaptation layer on top of everything else.

I ended up not using this approach, so for now my opinions on Scrapy and ScrapySplash are not well informed. If I ever get a chance to use it for real I'll revisit this article.

Friday, May 19, 2017

Really Tiny Embedded Code

I did a couple of generations of control systems based on 8051 detivatives. These had a bit of ROM - 8K, usually - but only 128 bytes of RAM. The stack used that 128 bytes, and so did your variables. Pretty darn tight fit for interrupt driven code, which needs to use some stack.

I did the If VI Were IX guitar robots on the 8051. It handled MIDI input over the serial port in interrupts, and it had a timer driver interrupt that we used to both update our servo loop and also to derive the pulse chain used to control position on the pluckers - the shaft with a plastic pick or nib that plucks the string when rotated properly.

We put 2 on each shaft - as I said, a pick and a nub, and added support for a MIDI mode switch to select between the two. Based on the requested velocity in MIDI note on commands received from the serial port, we would set the PWM parameters higher (mostly on) for high velocities and lower (mostly off) for low velocities. To make the PWM average out and not be too jerky and noisy, we need to update it as fast as we possibly can. Bear in mind that our 8051 used a 4 MHz clock, and many instruction took more than 4 cycles, so we got less than 1 million instructions per second. Not much power for handling real time updates and asynchronous serial input while playing a guitar in real time.

(Old man ranting at lazy kids mode)
Micro-controller chips today usually have hardware PWM circuits, so we can just load a duty cycle number and provide a really fast MHz+ digital clock and we get a great PWM. Luxury! The 8051 I was using had no PWM hardware, so we implemented it in software using interrupts. Messier, less smooth, lots more code and a few variables in a system that had little room for either. We couldn't even get 1M instructions/sec.

Micro-controllers today also have more RAM - either built in, or external, they just don't make them with 128 bytes much any more. Luxury! (You're supposed to hear the Monty Python bit about having to walk to school uphill both ways, not like the lazy kids today; doesn't come across well in text). Clocks on modern micro-controllers are often in the gigahertz, a thousand or more times faster than the 8051, and also 32 bits wide, so each instruction handles and processes 4 times as much data as the old 8051s could.

So we had all of our local variables - assorted modes (damper tied to notes, or off to allow hammer-on and hammer-off, or on for a damped sound, etc), state (plucking has several states and we need requested and expected positions i order to include the error component in the feedback loop), limit details, channel details, note range details, and more. We also had to have enough left over in the 128 bytes to allow for the registers to be stored during an interrupt (MIDI IO) with enough room for an additional stack frame for an overlapping timer interrupt (servo and position updating).

We managed to squeeze it all in and it works fine. It helps that registers are only 8 bits and there aren't many of them, and the program counter (pushed onto the stack as the return address) is small too - not all that much needs to be pushed on the stack. The upside of little room is that you simply can't have code bloat and variables must be as simple as possible. The result is small enough that you can realistically squeeze all of the bugs out.

The If VI Were IX installation has never crashed due to software fault, and has outlived every single moving part in the system - MIDI sources had to be replaced with a solid state source, pluckers and strings replaced, yet the 8051 micro-controllers are still fine a decade later.

If I was doing this over again from scratch today, I'd probably base it on a Raspberry Pi system with gigabytes of memory and a flash hard drive with tens of gigabytes more. Luxury!

In My Day We Had to Grind Square Roots Out A Bit At A Time - If We Were Smart

I'm old for my industry, getting into my late fifties. I also started very young, with my first paying tech gig in 1976. I have programmed computers by throwing switches and liked using punched paper tapes because that was a step up.

My first paying gig was a square root. I met Chuck through Paul, who lived down the block from him.
Check was, like me, a bit of a math prodigy. He was working on a Z8000 based motion control system, and having problems with the square root calculations, needed to figure out distances and how fast to go on each axis. You know, square each individual component or axis, add them, and take the square root. The result is the total distance of the motion as a vector on those axes. Given the speed desired you can now work out how fast each individual axis should go to achieve the correct speed.

They were using the method of Newton, and it was painfully slow. Too slow. To do motion control in real time, you have to "get ahead" a bit. You start calculating when to take steps on each axis, but not doing it yet. Instead you build a queue and store up a fair number of steps before starting the first one. This allows you to have slower portions (setting up for the next motion) as long as they are fairly short and the longer bit is faster, using the queue to provide the data we are too slow to calculate. How slow you are in the worst case directly determines how big that queue needs to be. Memory wasn't what it is now, if we were lucky we had 64K bytes for everything. Nowadays that's not even big enough for a stack. That meant the queues were quite a bit smaller still.

The method of Newton: given an integer, determine it's square root by starting with an estimate (int/2, for example) and repeatedly:
Divide the integer by the latest estimate
Average the estimate and the result: New estimate = (latest estimate + division result)/2
Repeat with new estimate

This will converge on the correct result. Slowly. For example, square root of 2 has these estimates:
1
1.5
1.333
1.41666
1.411764...

5th estimate is off by about 1/3000 or so of the actual square root of 2.

Each round requires a divide, which is the slowest instruction on the Z8000. Really slow for the 64 bit divided by 32 bit numbers we were dealing with. The biggest queue they could make would empty after 2 or 3 fairly short motions, used up by the divides in the square root calculations. The solution didn't work.

Chuck mentioned the difficulties to me and I got curious. Six years prior in grade school they had taught us how to take square roots manually and I sat down and worked that out, then figured out how to do it in binary. It turned out to be much simpler in binary.

To generate each digit (bit) you select the largest digit that when multiplied by the square root result so far is less than the input value. There's a doubling step in there too.

Doubling is just a bit shift, the fastest thing the Z8000 does. In binary the "largest digit" is always 1, which makes largest * so far the same as so far, so the whole multiply needed in base 10 drops out in binary, completely eliminated. This all just reduces to a simple compare: if so far is too big then this bit is not set & sofar = sofar * 2 (again just a shift, add new 0 bit in least significant bit), else it is set and so far = so far * 2 (another bit shift) + 1.

So we do 2 shifts & a compare for each 0 bit in the result and 2 shifts plus a compare and an increment for each 1 bit in the result. This is already faster than a single divide. Since the short motions are the most challenging, having less time to build up the queue with the simple linear timing generation algorithm, I optimized the algorithm for small distances. If the top 32 bits are 0 then we only need to do a 32 into 16 bit square root, taking half the time. For the 32/16 bit case, if the top 16 bits are 0 it turns into a square root of a 16 bit number, twice as fast again. Optimized to the byte, the shortest motions end up needing at most 8 cycles through our extremely fast 2 shift plus maybe 1 increment loop. This was screamingly fast, and immediately made the real time system work. The queues were more than long enough even for short motions. We were able to reduce the size of the queues, freeing up memory for the part program that the machine would cut and for code to be added when we added features.

They paid me $1,000 for solving their problem. I was working a near minimum wage job at the Neptune Theater for $1.45 an hour or something like that. This was more money than I made in 3 months.

This experience inclined me to go into the field I'm still in, software engineering and related technical specialties.

That gives me 41 years as a software engineer as of 2017, there can't be all that many around with more. The industry in the seventies was tiny by modern standards, and most of the engineers at that time were electrical engineers or other technical sorts who had switched over to meet the demand for the new skills, so they were already a decade into their careers for the most part. Those folk are in their seventies now and the few who did not leave the industry over the decades have largely retired now.

If I can make it to my retirement in the industry in a bit over another decade, I'll probably be one of the most experienced software engineers on the planet. I wonder how many folk who started before the mid seventies are still in the industry? Where's the leader-board when you need it?

Non-diclosure

I worked recently on a project where I got to build a new set of services from scratch and deploy them to the cloud. Since the services are intended to be used with smart phone apps, we ended up hosting on Google's cloud, GCE, since they have quite a few useful tools available for supporting smartphones like Firebase and nicely integrated credentials and authorization management.

Unfortunately, since this is a commercial product that is likely to face competition, or at least inspire competition once it is released, I'm under non-disclosure. I'm not allowed to say much about what the services I designed, built and deployed actually do.

Once the product is actually released I'll be able to blog about it, but for now any blogs on the topic (like this one) can't talk about much and therefore end up being pretty short.

Guitar Robot Revisited

One of the cooler projects I've ever done was for Trimpin's "If VI Were IX" robot guitar installation at the EMP, which is now call the Museum of Pop or MoPop.

Photo is by Thomas Upton who was nice enough to license it under the Creative Commons license. A higher resolution version from Mr. Upton is available on flickr.

The giant whirlwind looking collection of instruments in MoPop shown in the picture above is actually a guitar robot. You can see various guitar-ish looking items with rows of devices lined up on either side of the guitar fret board. These are the guitar robot elements. A little above the center, slightly to the left, the purple and blue-ish guitars are both Trimpin's custom build guitar robot elements.

Each element has a single string, a "plucker," and "frettles" which are the devices lined up along the fretboard. These are solenoids and when you apply voltage to their coils they pull the solenoid closed, which pushes a pad down on the guitar string, one device per fret.

Each string has a controller - well, two for now, one to control frettles and one to control the plucker. A MIDI input is routed to all of the controllers, and groups of 6 pluckers and their associated frettles are assigned to a single MIDI channel as a logical guitar. The mapping from MIDI notes to guitar frets is odd since we skip octaves between strings so that each fret on each string is a unique note. This means that the MIDI played on the robot guitar has to be processed to put notes into artificial octaves (incorrect musically) so that each note can only be played on one specific string, making the control easier to implement.

Trimpin made a second generation version for his own use that was simplified and improved. The plucker controller now also controls the frettles fot the string it plucks. The quality of the sound degrades as youi move up the frets - by 7 it's tinnier, by the 12th fret it sounds lousy - so in the second version Trimpin only provides 5 frets per string, and adds more strings to handle multiple notes simultaneously.

We also added a damper - a soft pad that rests on the strings, damping them so that a plucked note ends very quickly. When a note goes on, we lift the damper for that string, and when we get the note off command we drop the damper back onto the string. If a note-on is followed by a different note-on for the same string, we switch frettles and keep the damper up.

We modified our response to MIDI note-on commands so that low velocity notes (below 12% of full velocity) do not actually use the plucker, they just keep the damper up and deploy the correct frettle. This allows for hammer on and hammer off techniques - playing notes by hitting frets with the left hand, rather than using the right hand to pluck.

We also added a mode where you can switch off the dampers, so they stay damped the whole time. This allows for a different sounding result - notes are brief and a little muffled,

Using MIDI commands the new generation control has a wider variety of options and sounds available. Trimpin has used the second generation guitar robot control in live performances from time to time, but I dob't think there's a permanent installation using them yet.

We're hoping that the EMP will want to replace the current first generation controls with a 2nd generation control (or by then probably a 3rd generation since we'll most likely add a few more tweaks). We've looked into the devices needed - our original microcontrollers are no longer available new, and using surplus used devices is dicey at best, so we'll be changing microcontrollers and tool sets if we get to do this work.

I hope we get to do it, I enjoy working on robots and robotic guitars are one of the more interesting robots I've ever worked on.

Sunday, February 19, 2017

Mini Data

Heard of "big data?" Billions of records, operations so complex and huge that fleets of computers work on them?
Yeah, I'm not doing that. I found an interesting source of useful data to mine, it's got a bit over 100K records. Not billions, no multiple sources to cross tabulate, just a linear "fetch A, use it as a key for B" process. Not quite trivial, but pretty direct.

I'm afraid I couldn't even call it medium sized data, it's small enough that I'm just stuffing the results into a mySQL database I brought up for that purpose on my laptop. The horror! Not a pool of DB machines, not even a separate DB, just another process grinding away on the laptop. The overhead is so small that I don't particularly notice.

I'll christen this mini-data, not quite as small scale and non-uniform as micro-data, but not big at all.
 
Micro-data is the bottom up stuff, the Internet-of-Everything background noise when every minor device is chattering away in some odd dialect out at the edge of the network mesh, in our kitchens and thermostats and cars and factories.

Unlike micro-data, mini-data is still fairly regular and uniform. Mini-data is from a single or small number of sources, so it's much easier to work with.

Much easier to work with is a relative term, of course. I got the import running, then a bit past 1,000 records it spewed chunks. Unicode characters I hadn't prepared to handle properly. Doh! Fixed that (brutally ugly bit of a fix, but hey! it's a one time process!).

Somewhere over 2,000: crunch. Oddly formatted data led to a missing array element and another run ended by an exception. Did I mention it's a one off, so it's under-engineered? Like missing error checking and handling, so all errors are usually catastrophic the first time. I guess this is what you call "testing the quality in" and by the time I get my data gathered and organized from top to bottom I will have squeezed all of the bugs that apply to my current data set out. The approach is quite fragile and not re-usable, but that appears to be the appropriate approach, for now.

Error handling will inevitably improve as I get from a fatal failure every thousand to every 10,000 then 100,000 and finally 0 for the whole data set. At that point I've achieved my goal and won't run the process again. Until the inevitable source data update, which for this source often includes major formatting changes, so the code is possibly not reusable when I next need to repeat this operation. 

Ask me what I think of the trade-off in a month or two, and again after the source data updates.

Meanwhile, I'm enjoying my work in mini-data. Hopefully it will be as useful as we think it can be!

Saturday, February 18, 2017

Creaky or Cranky Code

Sometimes you build a code Taj Mahal, or at least try to. A thing of rare beauty, of soaring architecture and fine attention to detail, the combined efforts of many over time.

This is not about that.

Sometimes you're a bit off the reservation - you're exploring new technologies by doing simple tasks, and that is not going to look pretty. Building the huts of sticks comes first. It shows you can at least do something with the available materials, but the first few cuts are often a bit unstable and poorly designed.

We have an embarrassment of riches available for free - well, as long as you have a computer and can pay for internet access, anyway. The barrier is much lower than it once was, and keeps getting lower.

I've been playing with AngularJS and needed to get some REST data as a client. AngularJS does that well, so I made a simple app and added a service to supply the REST data via promises. I copied large portions of the solution from assorted "How Do I...?" posts on the internet with good answers. In no time it was working, and I added some controls and UI embellishments easily. That's also something AngularJS does well.

AngularJS does have it's limitations. For local file access and more complex tasks without obvious solutions to crib it can be a slog.

Luckily up my other sleeve I've got Python, to cover the cases not easily covered via an MVC interface in the browser and access to backends. AngularJS isn't good at everything! Python mostly is good at (or at least capable of) most everything, it seems like.

Python is great for cases like this recent one. I needed to grab some data from a REST API with a data rate cap, with assorted further processing needed before shipping the data sliced various ways to an SQL database. Using Python allows me to solve all the issues directly in one script.

Python will handle pulling data from a REST API easily enough, and it's one of the more pleasant languages when it comes to getting things written to disk. It understands XML easily enough, and interfaces to SQL are simple enough. A script or two to create a database and tables and an import or two in the main script and we're off.

I was able to mine logged streaming data for input keys used to request records from a different but related REST API. Handling the REST client details and getting all of the SQL tables updated properly for each data point turns out to be easy to do in Python - I got it running in an afternoon.

Then the cranky part came in. Rate limiting made test runs slower, and my local mySQL implementation seemed to like to confuse the mySQL Workbench tool periodically, making tests even more tedious as I frequently exited and restarted utilities.

Just when I think I have it working, past 1,000 records into a 50K record run, things looking good, bang! An exception. The first of many. Characters in JSON results that can't be translated to the local code page (exception!), remote requests failing in various interesting ways, most triggering exceptions as it hadn't occurred to you that might happen so you certainly didn't prepare for it.

Your good idea that was oh so close to completion and looked like a nice reasonable tight bit of code - well, something happened when we weren't looking. It's a creaky thing, prone to falling over at the first hint of trouble. There are cures for that, but they take time and attention and may have their own issues.

Your code accumulates "Try:/except:"s all over, parameters are checked, text describing important features in the comments is added, logging of results and branches, days and weeks go by and it mostly does the same thing just more correctly. Code bloats up until you have to lose yet more time restructuring and then chasing down and correcting the inevitable bugs that introduces.

Even though the code does not look the same or as nice, with some careful refactoring it won't be too bad. Get some good unit test coverage and automated tests going, now hook up the CI/CD. Uh, where to hook up the CI/CD, well that's another topic.

Simple ideas don't stay that way if they get worked on, I suppose. The process of using AngularJS to get the first level of details recorded, then Python scripts to do the heavier API, data analysis and SQL output gets more complex over time, but I'm also gaining increasing value.

A bit more re-architecting and getting past a few one-time startup issues (i.e. initial data load with throttling) and I'll have a much more useful set of information, broken out nicely in tables, where it can be searched and sorted and used to drive other processes.

The next layer of intellectual property is to master how to manipulate that data to generate code, in this case for smartphone apps, that carries useful or important information.

I remember being told once that only an idiot, somebody who did not know what they were doing, used run-time code generation. He was wrong then, and still is. Plenty of smart people have joined the idiots doing run-time code generation, and it continues to be an appropriate way to solve various sorts of interesting problems.

This is simpler, this is back end analysis and code generation, not real time. The code thus generated in Java will be compiled into an Android phone app and apply the data to the user's benefit. This is data we gleaned from our complicated dance across the internet and an assortment of REST APIs, tricks and simple approaches to aggregate and increase the usefulness of the results.

I enjoy the complex mental work, figuring out where something that can be leveraged into something useful is available without too much effort (and for free is always nicest). I can see how the IP will fit together before I've written a line of it. It will change a bit as I continue - next time I'll lead with Python, probably, it's just a bit more suited to ad-hoc on the fly API ingestion via code and reverse engineering - but the phases are all there, just the flavor of the mouthwash used in this step might change, or a different brush there...

Now I'm down in the guts of the architecture, across the first chasm and running the creaky/cranky engine that will cross the next one, getting me useful data properly organized in mySQL.

The final step is to flesh out a specific task that could use this data, write the code and design the data object (a map of something, basically) then write code to generate the specific implementation needed based on the data and the use case we're trying to solve. This would be easier to describe if it were public so I could explain exactly what is going on. The details are trade secrets for now. Maybe later, we'll see.

I plan on doing this final step multiple times, working out specific implementation forms then generating code for them based on insights from the data. The data can source an awful lot of different useful features if you can figure out how to extract the needed details and turn it into code. As I said, I enjoy this sort of work and look forward to the challenge.

Wednesday, February 15, 2017

What Was That?

My first 3 blog posts here on Gunkholing Bottomfeeders were kinda not really blog posts.

The first post is from 2012. I was teaching myself Android programming and I had quite a few videos of performances at the Vera project, where I volunteered frequently.

I decided to build an app that would play my videos of bands at the Vera Project on your phone.

I figured out how to make that work with an awful UI - buttons all over the place, each for a different video, and you paged through the buttons. I had lots of videos, already in the hundreds back in 2012, so it was interesting. At least to me. I also added some buttons to get to the Vera Project Donation page, and info and links to tickets for upcoming shows.

UI By Engineer (me), Ouch!

I got it working, coded and hosted the back end on Google's App Engine, which had the most usable free tier at the time. I got an Android Developer account for $25, the only part of this that was not free. Well, not free in terms of money, Android dev tools, the SDK and Eclipse are all free. The time I put in recording videos and posting them and writing the app wasn't free, but it was fun and I learned all kinds of useful things.

Useful things like how to program and publish an Android app; Vera Video has been available since 2012 (closing in on five years as I write this) and has been installed a bit over 5,700 times total.

To make the list of upcoming shows work I had to provide a back end with a master list and have the application check in once a day to get the latest list. I also added a bit that allowed me to add more Vera Project videos to the phone apps on the fly. I "versioned" the lists, and the phone app would report it's version when asking for the latest list of shows. If the version was out of date, code on the backend would add a list of all the new videos and their details to the response with the 5 upcoming shows, and the ugly UI would have even more pages of buttons that all show different Vera Project live music videos. I didn't maintain these for very long, manual updates to App Engine and Data Store are tedious and boring,

When I went to publish the application I found that I had to post a privacy statement on the internet to link to from the Android Play Store. Finally I have reached the point of the story, that took a bit. The first post from 2012 is the privacy statement I linked to from the Google App Store.

VeraVideo is still on the app store, so you can check it out if you feel like it. You can also find it on the Google Play Store by searching for VeraVideo. I'm not sure if it even works anymore, but it's tiny and it shouldn't do any harm. Let me know in the comments if you tried it and whether it works.

The second post is less interesting technically and a bit bloody. I store photos of shows in Flickr and I wanted to blog about a particular photo that was mildly bloody.

This was from a live performance and shows don't get bloody much anyway, even at punk or heavy metal shows. I noticed that Flickr allowed me to "post the photo to a blog" and I wanted to try that. I didn't want some funky one-off posting messing up my music blog, so I pointed it at the inactive (except for the privacy statement two years earlier) blog and tried it. The second blog post is the result. I ended up blogging on my music blog the way I always do, didn't find the flickr feature that useful. Good photo, anyway.

The third post I introduce my vision and what I expect to blog about. More of an explanation and introduction to the blog than a blog post, to some degree.

That brings us to this, the fourth blog post and the first one that is really and completely a blog post. Hopefully this is a trend and from here on out there will be plenty of actual blog posts coming up, In any event, thanks for hang in there and reading the whole blog post!

Tuesday, February 14, 2017

Vision

In most of my corporate jobs my employer makes a point of having a formal "vision" statement. Why not?

My vision for this blog is to write about whatever takes my fancy on subjects that are at least tangentially related to tech.

Like all old farts, I've got stories. I've been working in tech since the seventies so I've loaded programs via switches (punched card readers? Luxury!) and worked on systems where assembly language was the highest level (and only) language available.

I don't think I'll be all that consistent in the kinds of posts I make. Sometimes I'll tell old stories, sometimes I'll talk about what I'm working on, sometimes I'll talk about what I wish I was working on. Sometimes I'll talk about interesting tech notes or applications I've come across.

At least that's the plan.

Enough vision, hopefully I'll get enough inspiration and time to fulfill my "sometimes" predictions.