Friday, May 19, 2017

In My Day We Had to Grind Square Roots Out A Bit At A Time - If We Were Smart

I'm old for my industry, getting into my late fifties. I also started very young, with my first paying tech gig in 1976. I have programmed computers by throwing switches and liked using punched paper tapes because that was a step up.

My first paying gig was a square root. I met Chuck through Paul, who lived down the block from him.
Check was, like me, a bit of a math prodigy. He was working on a Z8000 based motion control system, and having problems with the square root calculations, needed to figure out distances and how fast to go on each axis. You know, square each individual component or axis, add them, and take the square root. The result is the total distance of the motion as a vector on those axes. Given the speed desired you can now work out how fast each individual axis should go to achieve the correct speed.

They were using the method of Newton, and it was painfully slow. Too slow. To do motion control in real time, you have to "get ahead" a bit. You start calculating when to take steps on each axis, but not doing it yet. Instead you build a queue and store up a fair number of steps before starting the first one. This allows you to have slower portions (setting up for the next motion) as long as they are fairly short and the longer bit is faster, using the queue to provide the data we are too slow to calculate. How slow you are in the worst case directly determines how big that queue needs to be. Memory wasn't what it is now, if we were lucky we had 64K bytes for everything. Nowadays that's not even big enough for a stack. That meant the queues were quite a bit smaller still.

The method of Newton: given an integer, determine it's square root by starting with an estimate (int/2, for example) and repeatedly:
    Divide the integer by the latest estimate
    Average the estimate and the result: New estimate = (latest estimate + division result)/2
Repeat with new estimate

This will converge on the correct result. Slowly. For example, square root of 2 has these estimates:
1
1.5
1.333
1.41666
1.411764...

5th estimate is off by about 1/3000 or so of the actual square root of 2.

Each round requires a divide, which is the slowest instruction on the Z8000. Really slow for the 64 bit divided by 32 bit numbers we were dealing with. The biggest queue they could make would empty after 2 or 3 fairly short motions, used up by the divides in the square root calculations. The solution didn't work.

Chuck mentioned the difficulties to me and I got curious. Six years prior in grade school they had taught us how to take square roots manually and I sat down and worked that out, then figured out how to do it in binary. It turned out to be much simpler in binary.

To generate each digit (bit) you select the largest digit that when multiplied by the square root result so far is less than the input value. There's a doubling step in there too.

Doubling is just a bit shift, the fastest thing the Z8000 does. In binary the "largest digit" is always 1, which makes largest * so far the same as so far, so the whole multiply needed in base 10 drops out in binary, completely eliminated. This all just reduces to a simple compare: if so far is too big then this bit is not set & sofar = sofar * 2 (again just a shift, add new 0 bit in least significant bit), else it is set and so far = so far * 2 (another bit shift) + 1.

So we do 2 shifts & a compare for each 0 bit in the result and 2 shifts plus a compare and an increment for each 1 bit in the result. This is already faster than a single divide. Since the short motions are the most challenging, having less time to build up the queue with the simple linear timing generation algorithm, I optimized the algorithm for small distances. If the top 32 bits are 0 then we only need to do a 32 into 16 bit square root, taking half the time. For the 32/16 bit case, if the top 16 bits are 0 it turns into a square root of a 16 bit number, twice as fast again. Optimized to the byte, the shortest motions end up needing at most 8 cycles through our extremely fast 2 shift plus maybe 1 increment loop. This was screamingly fast, and immediately made the real time system work. The queues were more than long enough even for short motions. We were able to reduce the size of the queues, freeing up memory for the part program that the machine would cut and for code to be added when we added features.

They paid me $1,000 for solving their problem. I was working a near minimum wage job at the Neptune Theater for $1.45 an hour or something like that. This was more money than I made in 3 months.

This experience inclined me to go into the field I'm still in, software engineering and related technical specialties.

That gives me 41 years as a software engineer as of 2017, there can't be all that many around with more. The industry in the seventies was tiny by modern standards, and most of the engineers at that time were electrical engineers or other technical sorts who had switched over to meet the demand for the new skills, so they were already a decade into their careers for the most part. Those folk are in their seventies now and the few who did not leave the industry over the decades have largely retired now.

If I can make it to my retirement in the industry in a bit over another decade, I'll probably be one of the most experienced software engineers on the planet. I wonder how many folk who started before the mid seventies are still in the industry? Where's the leader-board when you need it?

No comments:

Post a Comment