Archive for June, 2006

RDP - Win32k Network I/O Blocking Problem

Wednesday, June 28th, 2006

After nagging for days, I’ve finally gotten Ken to agree to let me post his analysis of a really interesting kernel locking issue. He’s one of the best there is with a debugger, and is working without symbols the whole way through here, debugging third-party components from two different companies.

Worth a read, if you’re into this sort of thing.

RDP - Win32k Network I/O Blocking Problem

MBNA is stupidly negligent with customer data

Wednesday, June 28th, 2006

I’m not in a great mood today, so it’s probably not the day to post this. Or maybe it’s exactly the day to post this.

I used to like MBNA, as a credit card company - they have a nice setup for generating one-time credit card numbers (that *almost* runs in Safari), rates are low, and they integrate with my bank’s bill pay system.

Then I get an e-mail to my work account that says this in bold letters: Your credit line is $XYZ!, except the XYZ was filled in with the RIGHT NUMBER! I really thought it was spam. In fact, my Postini filter flagged it as spam, despite the fact that MBNA’s e-mails usually get through. But no: it also has the last four digits of my account number in the e-mail. ARGH! They, of all people, ought to know that the last four digits are the worst numbers to give out.

This is a new low. That got e-mailed to my employer (which, admittedly, happens to be a company I founded, but that’s just luck). Someone in our IT department might easily have been spot-checking mails for AUP, for example. And this is useful information: I actually believe that this e-mail came from MBNA because it knows my balance and account number.

Now Anthony in Engineering can phish me much more effectively! Or worse, he could use this information himself - if you’ve ever filed out a request for a free credit report, you know one of the ways they verify your identity is with questions like “You have an account at MBNA ending in ABCD. What is your credit line?” I’m baffled that a financial institution would be this clueless.

So, I’m leaving MBNA, as soon as I get time to do a little looking. Recommendations would be appreciated. Even if this mail is a fake, it has some real numbers in it, so my account has been compromised somehow. But boy, it sure looks legit, the opinion of my spam filter notwithstanding.

Locking and performance

Tuesday, June 27th, 2006

Doron Holan has a post on his blog about choosing appropriate locking mechanisms for kernel drivers. He points out an amazingly good paper on locking on WHDC.

One thing he didn’t go into (although he may have meant to; his post ended in a comma) is the performance question. Of primary importance, of course, is program correctness, which means choosing the correct lock for the job at hand. But with that given, let’s talk about perf a bit.[1]

Consider what would happen if you simply used a spin lock at the beginning and end of every singe function in your driver. Leaving out the IRQL issues you would have (lots of DDIs can only be called at <=DISPATCH_LEVEL), you would have left yourself with a totally single-threaded driver. You would basically cause any other processor that had need of executing code in your driver to be parked while the current processor does its thing. A much better idea is to protect data structures, rather than code paths, and to hold your locks for as short a period of time as correctness will permit.

Beyond simply optimizing your locks for concurrency, there is the issue of how expensive the lock itself is. The name “fast mutex” should give you a clue - different locks have different perf implications. The slowest locks are the dispatcher objects - semaphores, kernel mutexes, and (although they’re not strictly locks) kernel events. These are certainly appropriate for some uses, but they’re not the fastest locks in the world; they all contend for the single system-wide dispatcher lock, which is one of the hottest locks in the system.

Spin locks are an interesting performance case. On the one hand, they’re extremely fast in the un-contended case, and virtually optimal in the contended case - you won’t wait many more CPU ticks than the bare minimum to acquire a lock once it is freed. On the flip side, they can cause memory bus congestion if more than one CPU is spinning on the same lock. Furthermore, acquisition of a spin lock (uniproc or multiproc) implies going off-core to hit the interrupt controller. If there are many CPUs spinning on a lock, the eventual winner of the lock is essentially unpredictable - there is a small but nonzero probability of lock starvation, purely by chance of the ordering of xchg operations on the bus.

In-stack queued spin locks are a solution to a couple of the problems associated with standard spin locks. Because each waiter supplies its own wait location to spin on, each CPU can spin on a cache line rather than locking the memory bus with each xchg. And, because they are queued, each lock holder releases the next waiter in line in the process of dropping its lock. This enforces fairness in lock wait time - first come, first served.

If you don’t need to raise to DISPATCH_LEVEL, though, you are probably better off in terms of overall system performance by choosing something else still. Some lower-IRQL lock options include fast mutexes, executive resources, and guarded mutexes. Fast mutexes imply hitting the interrupt controller to raise to APC_LEVEL, unless you put yourself in a critical or guarded region first (which is a very fast operation). Guarded mutexes merely disable all APCs on the thread by entering a guarded region; this saves them from hitting the PIC. Both are extremely fast in uncontended cases, assuming you don’t hit the PIC.

Executive resources are kind of a middle ground; they’re not quite as fast to acquire as fast mutexes or guarded mutexes, but they allow for recursion and reader/writer semantics, which can make a world of (positive) difference in driver performance situations where multi-reader/single-writer semantics make sense.

The final truth, though, is that performance tuning is hard work. Beyond some very basic architecture tuning, you’re usually better off optimizing under the direction of a profiler.

[1]A couple of good quotes come to mind: Premature optimization is the root of all evil — Tony Hoare, but also Andy [grove] giveth, and Bill [gates] taketh away

Coding on a desert island

Sunday, June 25th, 2006

If you had to go write a bunch of code on a desert island, what music would you take?

Personally, if I had to pick one composer, it’d have to be J.S. Bach. His music is perfect, in exactly the way I want my code to be perfect. There is something deeply mathematically satisfying about Bach’s music, in addition to just being aesthetically beautiful. I spend a lot of time plunking around on Bach pieces on my mandolin - currently working on the sixth Cello Suite (transposed up an octave and a half for mando).

I’ll admit to having a mandolin addiction, but regardless, my absolute favorite way to hear Bach is on a mandolin. Chris Thile and Mike Marshall both have recordings of Bach out there, including a fantastic rendition of Variation #1 from the Goldberg Variations (originally written for piano). I was lucky enough to sit through an extended 1.5 hour demo with Chris where he played through several movements of the Sonatas and Partitas for Solo Violin on mando, including an absolutely stunning rendition of the E Major prelude. I can kind of get through that prelude, but hearing it done really well is an experience all its own.

Mike Marshall released that particular prelude on his Gator Strut album, and in spite of the fact that he put a little more of his own personality into the piece than I would typically like, it’s still a great recording. Bela Fleck also plays the EMaj prelude on banjo, which makes for an interesting sound.

Although there are others in the running - Mozart for raw aesthetic beauty and flamboyance, Beethoven for emotional intensity and his fantastic sense of melody - Bach would be my desert island coding companion.

PCH mystery solved

Saturday, June 24th, 2006

Compiler bug!

Ken Johnson was hanging around on our internal chat channel and heard me complaining, so he went and read my last post. First he complained about the #pragma hdrstop usage, so I changed things slightly (the zip file is now slightly newer than the stuff posted). That was a red herring, though.

So, being one of the best reverse engineers out there, and having lots of free time due to his ostensible vacation this week, he decided to dig in a little deeper. Then, 20 minutes later, this shows up on the channel (edited for clarity):

ken: yeah, fixed it
ken: it's a bug :p
ken: start it under a debugger and
       `bp msvcr80!_wsopen_s "ed esp+10 0x40;kv;g"' and it will compile
marsh: Is that changing the open flags to share?
ken: yes, it is
ken:  it opens the pch for writing twice, and the first time it
        specified the wrong share flag
ken: it uses _wfopen_s which always uses exclusive mode
       when it internally calls _wsopen_s

So, it looks like a regression after all.

Friday night header reorganization

Friday, June 23rd, 2006

One of my pet peeves is stupid header organization. Like most large-scale software projects, we have hundreds (thousands?) of headers in our source tree. Those headers take a long time to compile. There are lots of ways to optimize compilation time (check out the absolutely necessary The C++ Programming Language for a good discussion of the issues). One way is using precompiled headers.

To try to improve our compile time, I decided I’d play with a more efficient PCH scheme, wherein the system headers and standard library headers are precompiled into a globally shared PCH file and the cross-project shared headers within our own codebase are compiled into a second level of PCH. The rationale is that the latter set of headers, which includes headers for several internal libraries, etc., changes somewhat more often than the system headers. Using a two-level PCH scheme, you get out of having to recompile the system headers if you change an internal header.

There is an obtuse but intriguing tutorial on the issue in the VC docs, but the problem is that it doesn’t work. Another person reported the issue on USENET a few months ago, with no answers. I’m now suspicious that I’ve found a compiler bug.

Anyhow, my USENET post from this evening has more details if you’re curious. Meanwhile, the experimentation continues.

UPDATE: This really looks like a compiler bug. It works fine on cl 13.x.y but breaks on cl 14.x.y. Here are the files I’m using (or download a ZIP: pch.zip:

--------
Makefile
--------
all:
    cl -nologo -Ic:\\winddk\\5384\\inc\\api \\
        -Ic:\\winddk\\5384\\inc\\crt \\
        -Ycsystem-precomp.h -Fpsystem-precomp.pch \\
        -c system-precomp.cpp
    cl -nologo -Ic:\\winddk\\5384\\inc\\api \\
        -Ic:\\winddk\\5384\\inc\\crt \\
        -Yusystem-precomp.h -Fpsystem-precomp.pch \\
        -c -Yc precomp.cpp
    cl -nologo -Ic:\\winddk\\5384\\inc\\api \\
        -Ic:\\winddk\\5384\\inc\\crt \\
        -Yuprecomp.h -Fpprecomp.pch \\
        -c junk.cpp

clean:
    del *.pch *.obj

----------------
system-precomp.h
----------------
#define UNICODE
#include 

------------------
system-precomp.cpp
------------------
#include "system-precomp.h"

---------
precomp.h
---------
#include "junk.h"

-----------
precomp.cpp
-----------
#pragma once
#include "system-precomp.h"
#include "junk.h"
#pragma hdrstop("precomp.pch")

------
junk.h
------
#pragma once

class A
{
        int x;
};

--------
junk.cpp
--------
#include "precomp.h"

int main()
{
        MessageBox(0, L"Hello", L"Hello", 0);
}

With cl 13, from the 3790.1830 DDK, it works. With cl 14, from the beta 2 WDK, it fails:

C:\\dev\\sandbox\\pch>cl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version
13.10.4035 for 80x86
Copyright (C) Microsoft Corporation 1984-2002. All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

C:\\dev\\sandbox\\pch>make
        cl -nologo -Ic:\\winddk\\5384\\inc\\api -Ic:\\winddk\\5384\\inc\\crt
-Ycsystem-precomp.h -Fpsystem-precomp.pch -c system-precomp.cpp

system-precomp.cpp
        cl -nologo -Ic:\\winddk\\5384\\inc\\api -Ic:\\winddk\\5384\\inc\\crt
-Yusystem-precomp.h -Fpsystem-precomp.pch -c -Yc precomp.cpp
precomp.cpp
        cl -nologo -Ic:\\winddk\\5384\\inc\\api -Ic:\\winddk\\5384\\inc\\crt
 -Yuprecomp.h -Fpprecomp.pch -c junk.cpp
junk.cpp

C:\\dev\\sandbox\\pch>

---

C:\\dev\\sandbox\\pch>make clean
        del *.pch *.obj

C:\\dev\\sandbox\\pch>cl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version
14.00.50727.93 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

C:\\dev\\sandbox\\pch>make
        cl -nologo -Ic:\\winddk\\5384\\inc\\api -Ic:\\winddk\\5384\\inc\\crt
-Ycsystem-precomp.h -Fpsystem-precomp.pch -c system-precomp.cpp

system-precomp.cpp
        cl -nologo -Ic:\\winddk\\5384\\inc\\api -Ic:\\winddk\5384\\inc\\crt
-Yusystem-precomp.h -Fpsystem-precomp.pch -c -Yc precomp.cpp
precomp.cpp
precomp.cpp(4) : fatal error C1083: Cannot open precompiled header file:
'precomp.pch': The process cannot access the file because it is being
used by another process.

NMAKE :  U1077: 'c:\\WinDDK\\5384\\bin\\x86\\x86\\cl.EXE' :
return code '0x2'
Stop.

C:\\dev\\sandbox\\pch>

Why drivers have to be secure

Friday, June 23rd, 2006

Here’s a very practical reason to run your drives through SDV and PREfast: people are using wi-fi drivers as attack vectors.

What does it cost to test drivers using PREfast, SDV, and the kind of input fuzzing described in the article? What does it cost to have a user’s system breached via your driver?

Continuous integration…

Thursday, June 22nd, 2006

Wow, that last post regarding CVS touched off quite a discussion. I think that singlehandedly doubled the total number of comments I’ve gotten on this blog to date.

We had some interesting internal discussions as a result of this stuff, too. And, in the meantime, I’ve found dozens more little patches that CVS had apparently been missing for an unknown period of time. As the title of this post suggests, I’ve been continuously integrating for days now, and I’m way past tired of it.

Wayne points out in the comments that it has to be pilot error. Fine, maybe it is, but if the plane is un-flyable to begin with, is it the pilot’s fault for messing up? Hmm, philosophical…

It looks like SVK, Subversion, or Perforce will be the successor to our CVS installation. I’m tempted to give SVK a try, but I only want to do this one time - we have literally thousands of files in our tree(s), so this is going to be painful any way we go.

Anyway, thanks for all of the feedback. It has been great.

The hell of CVS

Tuesday, June 20th, 2006

I’m a big CVS fan. Well, I was. Then, yesterday, I merged a very large set of changes associated with upgrading some parts of our build system from HEAD to our mainline stable branch.

The first problem I had was the common one: I touched a lot of files across a long time, so the traditional tag-to-tag merge wouldn’t have worked. So I closed my eyes and held on tight and just let CVS re-merge everything. I ran cvs update to see what fun awaited me, and got a healthy but not overwhelming number of conflicts. A good many of them were the standard re-merge conflicts for which CVS is so (in)famous. OK, fine, resolved, committed.

Then I went through and built it. Bang, build error. I fixed a few obvious merge bugs, and then I got to a file that was missing a change from HEAD. So: delete file, update to stable, merge from head. NO CHANGES.

I updated to HEAD to be sure I wasn’t losing my mind. The build fix was there. Updated to stable again. Gone. Merge from head: nothing. I finally manually cvs diff’d the head revision (by number) with the stable revision (by number), and voila, out comes a diff! I grumble some more, output the diff to a patch, run the patch utility, and commit the change. Build works again.

I’ve had it. It’s one thing to commit a Type II error by double-merging and conflicting - at least I know for sure what’s going on. But to fail to detect a change and falsely report the same - that is unacceptable and scares the hell out of me. I’ve been using CVS professionally for over a decade; at this point, I give myself credit for understanding the basics. If I’m doing something wrong, it’s non-obvious.

So, I’m switching. This is ridiculous. I’m looking at (in probability order) SVK, Perforce, Subversion, and maybe darcs, arch, and (longshot) git. With a codebase our size, we need intelligent merging in a bad way, and perf is a concern. Anyone have any thoughts on any of these? Is SVK mature enough for serious commercial use? Or should I just bite the bullet and go to Perforce?

Some good debugger articles

Monday, June 19th, 2006

I just ran across Oleg Starodumov’s DebugInfo.com debugging site. Some fantastic articles. Worth a read.

Via Matt Pietrek