Friday, July 29, 2005

Miguel de Icaza on on J2EE domination

Miguel explains in an interview why J2EE has such a stranglehold in the application server market:

There is a segmentation of the application server space, which is the high-end market, the mid-market, and the low market. The low market is anything that costs less than $200,000 to develop and deploy. It includes every technology that you’ve heard of. Then you’ve got the high market, which is any project where the cost of deployment is over two million dollars, and in that market J2EE is firmly entrenched. There is no other technology considered today in that application space. In part that’s because people need to have multiple vendors providing the same solutions, so they like the fact that there’s BEA and there’s IBM and there’s Sun. There’s different J2EE providers. There’s also different hardware providers. So that market is very hard for Microsoft to penetrate.

"That leaves the mid-market, which today is about 50% Java, with the other 50% made up of ASP.NET, and a couple of other proprietary frameworks."

Sunday, July 24, 2005

Adam Bosworth on a new data model for the web

Following the excellent presentation by Michael Tiemann at the MySQL user's conference organized by O'Reilly media, itconversations has posted another great presentation - this time by Adam Bosworth at the same conference. The talk titled "Database requirements in the age of scalable services" and Adam Bosworth explains his vision to make data access on the web simple and standards based, and why he thinks RSS 2.0/Atom are the beginning of the unfolding of that vision.

The presentation starts with a joke about 'Microsoft Project' - Bosworth says he is amused to hear about people using Microsoft Project and how it is Microsoft's secret weapon to stop everyone else from competing. Some of the points I noted down while listening to the podcast.

  • Everything looked irrelevant to AB after he got excited about the web.
  • How did the web happen ? Tim Berners Lee hit a perfect storm of productivity and using HTML, HTTP any 'P' (Perl, Python, PHP) programmer could generate content. Simple was huge and everybody could play with HTML and HTTP. HTML is sloppy - everything is rendered
    without a complaint. It takes a licking and keeps on ticking. You don't have to be the high priest of syntax. But that's not the case with XHTML - and that's not a good user experience. Web pages could read/edited on all operating systems
  • Databases are not good at partitioning - partitioning of data is very important to scalability. But the web does a good job of this, by distributing data. The web is also good at caching - At google they
    observed 120,000 hits per second on a certain blog and the only reason the infrastructure didn't melt down was due to the amount of caching done on the web by proxies, front ends and google's own front end. Statelessness - the coarse grained interaction is also another reason for the scalability of the web. Clients talk to servers in terms of chunks of data: go to data when you are ready, not
  • Google combines lot of simple minded techniques with brute force to deliver. Google has lots of Ph.D's and everyone is a General Patton driving tanks. Take the spell check that google does when you type in an incorrect (or sometimes correct) search string - it's based on the very simple technique of tracking failed searches and what users type in after a failed search. This kind of brute force enables google togo through petabytes of data in seconds.
  • Once you start to search, this whole business of putting things in folders begins to diminish in importance. It's very hard with folders to remember where you put what. Folders are not efficient, searchesare.
  • The Vision is to take the database and do the same thing for the web (as was done for content). Can we take all the info on the web andmake it easily findable ? Now you get content, not information.
  • Need something that scales massively and linearly. Originally thought that we would do it using XML. We created a tower of Babel (with XML) - websites need to support only one grammar with HTML. A working group took four years to come up with a spec for a XML Query standard. It's
    better to spend six months and learn the rest from customers. The query standard was not simple like the web - the schemas were very complicated. AB also found the WS specs to be very complicated. Why did this happen ? The companies that came up with these standards were big and were trying to protect themselves. They were people at companies like IBM and MS. Frankly, they were trying to make it deliberately hard
  • AB apparently is a technical advisor to MySQL and had some advice for them as well: Basically MySQL is trying to be Oracle by adding support for procedures, triggers, and views. All of this is about centralizing processing logic in the database. To be blunt centralizing processing logic in the database is a bad idea - doesn't scale. Centralizing logic doesn't give you scale. Advice to MySQL - don't do something because you want to be Oracle. Because Oracle isn't big enough andcan't deliver on billions of queries.
  • We need an Open model for data. What's not open today is how you talk to a database - the actual wire format. There is nothing like HTML/HTTP for data. This is a very 20th century way of thinking. Open up wire formats to serve any kind of information - this will bring enormous changes to computing centered around data. Need open standards for different types of items with one single grammar. It will have to be sloppy. Open up and democratize the way data isserved.
  • Big believer in stupidity - virtues of dumbness.
  • We are actually starting to realize this vision - RSS2.0/Atom are going to be for data what HTML was for content. They are going to be the Lingua Franca of consuming data. Surprisingly simple and sloppy. These guys got the web and that's why it is catching on like wild fire. Atom was formed by a consortium of bloggers and the two formats areisomorphic.
  • Data queries have to be such that they don't need data spread across machines - if a query uses data from four machines, it isn't going to work very well. Queries need to run at an item level - it's not
    technically as complex as sql
  • AB made it emphatically clear that he was not talking about the semantic web and called RDF an empirical failure. RSS 1.0 had an RDF grammar, RSS 2.0 doesn't have an RDF based grammar. Ordinary programmers do not understand how to model something as arcs, nodes,and graphs
Update: Looks like the O'Reilly network and have also covered Bosworth's remarkable speech.

Saturday, July 23, 2005

Michael Tiemann on Open Source

The peerless features a talk by Michael Tiemann of RedHat, formerly of Cygnus and the guy who first wrote the GNU C++ compiler. Some interesting points from the talk:
  • In 1995, when MySql started (this was a talk at the 10th anniversary MySql conference), Tiemann had already turned over the reins of the company to a president at Cygnus; the company was then making $6 million in revenue and employed more than sixty people, disproving the fact that you cannot do business with open source. Larry McVoy was kicked out of Sun in the same year for advocating open source as a strategy.
  • McVoy brought to Tiemann's notice a small company in North Carolina called Redhat - Tiemann advice to acquire a stake in that company was ignored and five years later, Redhat bought Cygnus.
  • It took him two years to figure out how to do business with open source.
  • Tiemann addressed the question of why OSS had succeeded and drew an analogy with the observation made by Alexis de Tocqueville in 1830s, that the sum of all individual undertakings in America exceeded the efforts of the government.
  • Disruptive technology always comes from unexpected quarters.
  • Cost cutting is not always the most interesting question - there is always more upside to inreasing revenue than to cutting costs. There are limits to cost cutting, while there are none to increasing revenues.
  • The real value of architecture is when something comes along and it can be accomodated quickly. Explained the value of strategy with an example from a Wall St. firm - a strategy that was 10-20% different resulted in a ten-fold increase in growth.
  • According to research done at CMU, the Apache project has ten to fifteen people who do eighty five percent of the work. But the total number of people who contribute is closer to 400 ( 388 ?). It's this extra non-core contributions that add lot of polish and quality to the software. Tiemann asked a rhetorical question about going to a venture capitalist to ask for money to recruit all 400 developers. (I thought this was a particularly effective illustration of the role of a large pool of contributors - more about this aspect in a future post, if time permits)
  • Quoted Eric Von Hippel, faculty at MIT and the author of Democratizing Innovation, on the role of user participation in design. EVH in his book explains in his book how Jack Welch managed to make GE a leader in plastics by bringing in the concept of user design toolkits. Previously GE's business model made it possible to get into selected market segments, in fact the largest customers. Jack Welch said that GE doesn't need to be the exclusive designer of plastics - Why can't customers design their own plastics ? This idea of moving of moving the locus of design from the supplier to the customer resulted in 85% of the designs coming from the customers ultimately. MT then asked "What rational person would let go of this opportunity ?"
  • Drew some analogies between disruptive open source technologies and themes from the book "Guns, Germs, and Steel" by Jared Steel. (The analogies about the conquest of the native american population by the Spanish seemed out of place to me)
  • He recalled reading Stallman's code for the first time - first time he said, he saw good code. The architecture and implementation unfolded in one view.
  • He remarked that even the newly elected board of the OSI did not have a clue about the reasons for the success of open source. That's the territory that needs to be protected and grown. He urged the audience and sophisticated users to think about it.
  • Over 70% of all projects on are covered by the GPL or BSD style license.
  • Quoted designer Bruce Mau about the fact that design is not very visible unless there is a failure and used Micrsoft's out-of-control spending on security to conclude that the shared shared model was a design failure. Microsoft's spending on security in Jul 2002 was $100 million and in Mar'2005 $2 billion. At this rate they should end up spending the rest of their bank balance on security. MT contrasted this with the improvements in open source software. The Fuzz report noted a 20% failure rate in Unix utilities in 1985. In 1990, by which time the GNU tools had appeared, the failure rate was down to 25% of that of Unix tools. In the same period the Unix utilties/tools had not improved substantially. Later the failure rate of GNU tools went down to 2% and by 1996 the GNU tools were a 100% clean.
  • Software patents are the lead banner for massive stasis - it really means that innovation will stop for the next twenty years.
  • Referred to Bruce Mau's work which rejects the notion of a client-designer relationship. The notion of an open source license may be obsolete, when it tries to break the above law.
  • How to enage the plurality of potential contributions in OSS.
A recommended addition to your IPod/whatever. A very thought provoking lecture from a person who has written a compiler, founded a company, been a CTO - a man of many hats.

Update: Looks like Michael Tiemann gave a similar talk (pdf) at the Redhat summit.

Tuesday, July 05, 2005

Dave Winer on working groups

Dave Winer in a podcast explaining the evolution of RSS makes a interesting remark about working groups:

What happens when you put together a working group is that, you get people who like to be on working groups, not people who like to write software. And they have very different interests. They like to fly places, they like to have meetings, they like to discuss things, and discuss them again and again and again. They really like the discussions. They don't live to make the software. They want to make the discussions.