ralphmelton: (Default)
[personal profile] ralphmelton
Here's my opportunity to reveal trade secrets:

On Thursday or Friday of last week, Steve had discovered a nasty Heisenbug. The release version of our code was sometimes giving different results, with no changes to input data, execution parameters, or executable. (The debug version was not giving any discrepancies.)

Any computer jock will understand that this gave us the heebie-jeebies. This sort of bug is the worst type of bug to track down and debug.

Steve, though, is a big QA stud. Over the weekend, he managed to trim down a massive input set to find a single document that would fairly regularly have variant output. We owe a lot to him; without his narrowing down the problem like that, I would probably have just thrown up my hands in defeat.

Steve had also found a discrepancy in the output from the tracer that corresponded to the discrepancy in the output. (Steve had gotten very familiar with this bug, so familiar that he had named the bug "Werner".)

That was where things stood when I came into work today. I looked at the discrepant tracer output. It was just an indicator of problems elsewhere, not a direct evidence of the problem. But it gave me a basis for putting trace statements elsewhere, and those trace statements led us nearer to the heart of the discrepancy.

With that new evidence, we were better able to understand the discrepancy. We saw that one of the results was clearly wrong--it was identifying an email address of the form 'ralph@livejournal.com' as 'ralph@livejournal.co'. (The other result was just somewhat wrong.) This twigged a memory--I had seen that misbehavior in the course of debugging. We fired up the debugger and confirmed that we were able to replicate that misbehavior.

We were still uncertain about why it would be nondeterministic, but we decided to pursue this bug. We asked Mike to look at it with us. This turned out to be just the right thing to do, because the bug turned out to be in a portion of the code within his domain of expertise. It turned out to be an off-by-one error in one of the most deeply hackish portions of the code. In fact, this turned out to explain a screwy bug we had observed in an entirely different product.
It also turned out that a workaround for another bug would also be a workaround for this bug, so we decided to take no action on this bug before the imminent release.

But we were still nervous about the nondeterminism. (I quipped, "As computer jocks, we are used to things being wrong, but we hate being surprised." Steve's girlfriend Raven said, "Thank you! That explains a lot about Steve.")
We devised some traces that demonstrated that the nondeterminism was within a single module. Steve decided to pursue that with a carpetbombing of trace statements.
Steve would narrow down the discrepancy a bit further, we would take a look, and we'd suggest further traces.

On the second time I went in to look at the narrowed discrepancy, I spotted the problem. The code was foolishly assuming null-terminated strings, when the actual strings were not null-terminated. And it so happened that different types of garbage after the last real letter were giving different results. Aha! This is just the sort of thing that causes nondeterministic behavior.

Steve worked on a quick hack that tested whether this was the problem, and I worked on rewriting the code to eliminate that problem. It even turned out that [livejournal.com profile] jpbl had written a unit test for that code that caught some of the errors in my rewrite. Jennifer rocks.

When I left, Steve was testing my rewritten code. All in all, it took us about a man-week of labor to identify these bugs and fix the nondeterminism. For a subtle heisenbug like this, I think that's excellent speed.

BR32-M Unmatched subject error.

Date: 2002-04-08 11:33 pm (UTC)
From: (Anonymous)
*** Warning *** Ungrammatical sentence. *** Error *** Unable to parse sentence. Error Information: Unable to correctly parse sentence -- It even turned out that had written a unit test for that code that caught some of the errors in my rewrite. -- due to missing subject. GramaticoLexical Ensemble aNalyzer report: /begin report/ suggest Jennifer as subject of sentence. /end G.L.E.N. report/

Re: BR32-M Unmatched subject error.

Date: 2002-04-09 06:35 am (UTC)
From: [identity profile] ralphmelton.livejournal.com
I had misused the <lj user> tag. It's better now.

BR23-5 Formatting Error in posting software

Date: 2002-04-08 11:36 pm (UTC)
From: (Anonymous)
*** Warning *** Useless button warning. Clicking "Don't auto-format" appears to do squat. When it showed my previous comment, it proceeded to format anyway. Glen

Date: 2002-04-09 05:19 am (UTC)
cellio: (avatar)
From: [personal profile] cellio
Aiiieeee! You'll never be free of the dark secrets of the lexor. Bwahahahaha... :-)

Seriously, congrats. Good job!

Date: 2002-04-09 06:11 am (UTC)
From: [identity profile] jpbl.livejournal.com
Thanks for the compliment. Which test was this exactly? And are you/Steve planning on extending it to catch the errors that it didn't? :) I'd be happy to lend a hand if you need it. Sounds like fun!

Date: 2002-04-09 06:42 am (UTC)
From: [identity profile] ralphmelton.livejournal.com
But revealing which test it was might be confidential company information!

Yeah, whatever. It was testurlscanner. And it actually had tests to make sure that it did the wrong thing with non-terminated strings. The problem was that we hadn't realized that it was demanding the wrong thing.

Date: 2002-04-09 01:06 pm (UTC)
From: [identity profile] indigodove.livejournal.com
I quipped, "As computer jocks, we are used to things being wrong, but we hate being surprised." Steve's girlfriend Raven said, "Thank you! That explains a lot about Steve.

Yeah, your explaining that to me was helpful. :)

Date: 2002-04-10 09:49 am (UTC)
From: (Anonymous)
It applies to me too. I guess it's genetic.

Laurabelle

Date: 2002-04-09 01:26 pm (UTC)
From: [identity profile] mg4h.livejournal.com
In some ways, it's comforting to know that software monkeys work the same way in different companies. Gootmu's told me of stories much like this one, and how they went about tracking down the problem. Occasionally I was able to help but usually it's after the fact.

Date: 2002-04-09 02:24 pm (UTC)
cellio: (Default)
From: [personal profile] cellio
Werner is a pretty uncommon name. Should the one we both know be concerned that you're naming bugs after him?

Date: 2002-04-09 02:27 pm (UTC)
From: [identity profile] ralphmelton.livejournal.com
I think Steve named the bug after Werner Heisenberg.

Date: 2002-04-09 02:30 pm (UTC)
cellio: (Default)
From: [personal profile] cellio
Duh! Ok, that makes sense. :-)

Profile

ralphmelton: (Default)
ralphmelton

April 2018

S M T W T F S
1234567
891011121314
151617181920 21
22232425262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 25th, 2026 11:15 am
Powered by Dreamwidth Studios