Saturday, July 12, 2008

Linux Kernel Development Stats from Greg Kroah Hartman

Linux kernel hacker Greg Kroah Hartman's June 5, 2008 talk at Google titled "The Linux Kernel" was chock-full of details about kernel development. I noted down some of the the things he said. Please note that the talk was delivered on June 5th, 2008 and all stats mentioned by GKH are relative to that date.
  • Code changes per day in 2007-2008 so far
    • 4,300 lines added
    • 1,800 lines removed
    • 1,500 lines modified
    These changes don't include moving files around.It works out to about 3.69 changes per hour, 24x7 . It's not just the drivers that are changing at this rate, it's the the entire kernel.
  • The kernel itself is about 9.2 million lines, and has been increasing in size by about 10% every year since 2.6.0 (when GKH started tracking it). The drivers make up about about 55% of the code, while architecture specific code is second in terms of LOC. The core of the kernel is about 5%.
  • Supports more processors and devices than any other OS in history.
  • One of the consequences of the scorching pace of the kernel development is that, the kernel is a better place for patches , than for it to be maintained separately. GKH pointed out that it is difficult to keep pace with Linux kernel development and gave the example of Xen; The Xen folks apparently never played nice with the kernel developers and are now trying to get their patches into the kernel. From his tone it sounded as thought it was a tough climb uphill for the Xen guys, and the KVM is already in the kernel.
  • The Kernel development hierarchy (it has one !) has developers feeding patches to the maintainers, who in turn feed them to subsystem maintainers (pci, security, usb, etc).Sub-system maintainers maintain their own trees. Steve Rothwell (IBM, Austalia) pulls changes into the "Next" tree every night and does daily builds, while Andrew Morton pulls changes into his tree once a week or so. When Linus says the merge tree window is open, all the sub-system maintainers hit Linus. There is also the stable release tree, which is not maintained indefinitely.
  • He likened the patch submission process to a lossy network routing algorithm - it can handle people dropping out . There is no other way to develop at such high speed.If you own a file or subsystem, you have to accept the fact that other people are going to be changing it. Maintainers can always revert changes if they don't like it.
  • There is no good way to test the kernel except to run it. The hundreds of permutations of devices and interactions makes it impossible to test it comprehensively. The only way out is for the developers to test the rc releases.
  • There are no stable and unstable releases anymore. For the last four years,they have been replaced by releases every 2 and 3/4 months.
  • There have been 2399 independent contributors to the Linux kernel in the last year. 50% of the contributors submitted only one patch; half of the half contributed two patches. the top of the curve is getting flatter. Top 30% do only 30% of the work. The number of individual contributors is going up.
  • Top developers by quantity (for the last one and half years)
    1. Adrian Bunk 754
    2. Al Viro 698
    3. Thomas Gleixner 656
    4. David S.Miller 655
    5. Bart Zolnierkiewicz 637
    6. Paul Mundt 610
    7. Ralf Baechle 604
    8. Ingo Molnar 596
    9. Patrick McHardy 554
    10. Tejun Heo 530
  • Top contributors by sign-offs (shows who the gatekeepers are). Note that Linus doesn't sign off on patches from subsystem maintainers.
    1. Andrew morton 9086
    2. Linus Torvalds 8960
    3. David S.Miller 4926
    4. Jeff Garzik 2960
    5. Ingo Molnar 2489
    6. Greg Kroah-Hartman 2098
    7. Thomas Gleixner 1098
    8. Mauro Carvalho Chehab 1822
    9. Paul Mackerras 1675
    10. John Linville 1461
  • Who's funding Linux kernel development ?
    1. Amateurs 18.5%
    2. Red Hat 11.6%
    3. IBM 7.5%
    4. Novell 6.6%
    5. Unknown individuals 5.5%
    6. Intel 4.1%
    7. Oracle 2.2%
    8. Consultants 2.2%
    9. Academia 1.5%
    10. Renesas Technology 1.5%
    Google is at number 13 with 1.4% contribution. Without Andrew Morton's contributions Google's would be at the fortieth spot.
  • Canonical had about 6 changes in the past 5 years; they are in the 300th
    position. GKH was very emphatic that 'Canonical does not give back to the community'.
  • 75% of Linux kernel work is paid for.
  • Linus jokes that the kernel is not intelligent design, it is evolution. GKH also added that "We react to stimuli that's happening in the world" and "We don't over plan things".
  • As GKH put it, 'We broke all software development rules and we are continuing to break it'.

14 comments:

Anonymous said...

I like those simple stats :-)


The only thing I dont like is that it seems professional coders are overrepresented. Of course, they are probably better than hobby coders, but it also shows that Linux as a whole is less an amateur project and much more a professional project

Anonymous said...

Amateurs are funding Linux development at a higher fraction than even RedHat? Is this possible? Are they doing this indirectly by supporting free systems like Gentoo or is there a way to fund its development more directly?

Anonymous said...

Nice stats :)- There are some errors in the blog post:

# 1,800 lines removed
# 1,500 lines removed

two times removed?

And there's "Linxu" instead of "Linux".

Anonymous said...

"GKH was very emphatic that 'Canonical does not give back to the community'."

The Linux kernel isn't the only community Canonical connects to.

Heck, I've never had a patch accepted into the kernel, but that doesn't mean I've never filed a bug report/suggested an init script revision/pointed out a documentation deficiency.

GKH needs to get over himself.

Rams said...

robin:
Thanks a ton. Fixed both. thanks again.

anon2:
yes, amateurs are the largest percentage. Greg specifically mentioned that they had asked these guys whom they worked for. It turns out that lots of folks really do it out of interest.

gus3:
Can you explain what's wrong with what Greg said ?

Anonymous said...

Another typo, there's "GHK" instead of "GKH".

Scot McSweeney-Roberts said...

Canonical had about 6 changes in the past 5 years

I'm surprised it's that many. Ubuntu has some standard patches, but I don't think there's that much that's Ubuntu specific. And don't forget that Ubuntu is mostly a "humanized" Debian, so Ubuntu would be taking advantage of Debian kernel work and you would think Ubuntu's changes would go upstream via Debian.

In my mind that makes "Canonical does not give back to the community" an unfair statement - they give back to the communities that they're doing active development work. Unfortunately for the kernel folk, the kernel isn't one of those areas.

Rams said...

anon3:
Thanks a lot. Fixed.

Anonymous said...

I just checked from Linus's tree (current HEAD commit 4d3702b6):

$ git log --author=@canonical.com --pretty=oneline | wc -l
8

Yes, just authors with canonical.com as email domain. It's unreliable, I know, don't flame me. The history of this repository goes back to 2005-04-16 when Linus made the first commit to the git repository (2.6.12-rc2).

Anonymous said...

"
The Linux kernel isn't the only community Canonical connects to."

Ok. Show me the stats of the patches for the "community" the Canonical "connects" to. GNOME doesn't get much, Linux kernel - no, KDE - nothing major. What project exactly?

"
I'm surprised it's that many. Ubuntu has some standard patches, "

What standard patches? Their kernel is full of crappy non-standard patches that they never bother to send upstream or to Debian (because they would never accept those crappy patches anyway)


http://kernelslacker.livejournal.com/127218.html

Anonymous said...

Maybe we want to count commits coming from @ubuntu.com and @canonical.com together? Anyway, here it is:

$ git shortlog --author=@(canonical|ubuntu)\.com -E -ns

65 Ben Collins
11 Fabio Massimo Di Nitto
6 Daniel T Chen
4 Fabio M. Di Nitto
4 Tim Gardner
2 Kees Cook
2 Stefan Bader
1 Amit Kucheria
1 Colin Ian King
1 Scott James Remnant

(The complete history of Linux kernel since 2.6.12-rc2.)

Anonymous said...

Perhaps Canonical did not contribute back to the community via new code, but increasing the profile of Linux in the general community at large must count for something!

I don't believe that Canonical operates in the Kernal domain. It would be like saying the SVN guys did not contribute back the community.

Anonymous said...

The talk was great!
And yours "shorts keys" too, thanks!

Anonymous said...

nice post