Monday, December 17, 2007

Folksonomies and the Rise of Tagging

Tagging—individuals using keywords of their own choosing to classify objects online,
including photos, bookmarks, products, and blog posts—is common with Web 2.0-
style sites. These folksonomies (a neologism derived from “folks” and “taxonomy”) provide
a number of benefits:
Hierarchies by definition are top-down and typically defined in a centralized
fashion, which is an impractical model poorly suited to the collaborative,
decentralized, highly networked world of Web 2.0.
Rich media, such as audio and video, benefit from this explicit metadata because
other forms of extracting meaning and searching is still a challenge.
Tags facilitate second-order effects by virtue of the aggregate, collective
intelligence that can be mined from the data set (enough to overcome differences
in individual tag choices for identical items).
Issues & Debates
Walled gardens 2.0. In Web 1.0, America Online (AOL) exemplified the
walled garden: users were welcome to create content and community as long
as it occurred within the walls of AOL. Arguably, there are now Web 2.0
walled gardens, such as MySpace and LinkedIn, which have vibrant, but—in
many ways—closed communities.
Privacy and liability for individuals. People are revealing increasingly more
details about themselves online, including likes, dislikes, opinions, personal
history, relationships, purchasing history, work history, dating history, and
so on. Therefore, it is increasingly being mined by other people, including
employers performing background checks, singles investigating upcoming
dates, government agencies mining social networking sites,36 and tax assessors
using homeowners’ online comments about remodeling upgrades to increase
property taxes.37
Privacy and liability for providers. Not being sufficiently sensitive to privacy
issues can result in legal and public relations disasters. Facebook suffered
some high-profile PR fallout when it underestimated the privacy implications
of new features deployed in September 2006. Within days, it was
forced to retract statements it had made and change service behavior.38 Or,
consider the $1 million fine issued to Xanga by the U.S. Federal Trade Commission
for violating the Children’s Online Privacy Protection Act (COPPA).
Xanga allowed children who identified themselves as under 13 years old
to sign-up for accounts, even though the stated policy forbid providing
accounts to that age group.39
Quality, not just quantity, matters. All users are not created equal nor are
their contributions. The most successful Web 2.0 companies have instituted
mechanisms to encourage and reward their most valuable members.

WEB 2.0 The Eight Core Patterns

Harnessing Collective Intelligence
Create an architecture of participation that uses network effects and algorithms to
produce software that gets better the more people use it.
Data Is the Next “Intel Inside”
Use unique, hard-to-recreate data sources to become the “Intel Inside” for this era
in which data has become as important as function.
Innovation in Assembly
Build platforms to foster innovation in assembly, where remixing of data and services
creates new opportunities and markets.
Rich User Experiences
Go beyond traditional web-page metaphors to deliver rich user experiences combining
the best of desktop and online software.
Software Above the Level of a Single Device
Create software that spans Internet-connected devices and builds on the growing
pervasiveness of online experience.
Perpetual Beta
Move away from old models of software development and adoption in favor of
online, continuously updated, software as a service (SaaS) models.
The Eight Core
Patterns
11 Web 2.0 Principles and Best Practices
Leveraging the Long Tail
Capture niche markets profitably through the low-cost economics and broad reach
enabled by the Internet.
Lightweight Models and Cost-Effective Scalability
Use lightweight business- and software-development models to build products and
businesses quickly and cost-effectively.
Although each pattern is unique, they are by no means independent. In fact, they are
quite interdependent.
A set of common Web 2.0 attributes supports these patterns:
Massively connected. Network effects move us from the one-to-many
publishing and communication models of the past into a true web of manyto-
many connections. In this era, the edges become as important as the core,
and old modes of communication, publishing, distribution, and aggregation
become disrupted.
Decentralized. Connectedness also disrupts traditional control and power
structures, leading to much greater decentralization. Bottom-up now competes
with top-down in everything from global information flow to marketing
to new product design. Adoption occurs via pull not push. Systems often
grow from the edges in, not from the core out.
User focused. The user is at the center of Web 2.0. Network effects give
users unprecedented power for participation, conversation, collaboration,
and, ultimately, impact. Consumers have become publishers with greater
control, experiences are tailored on the fly for each user, rich interfaces optimize
user interactions, users actively shape product direction, and consumers
reward companies that treat them well with loyalty and valuable word-ofmouth
marketing.
Open. In Web 2.0, openness begins with the foundation of the Internet’s
open technology standards and rapidly grows into an open ecosystem of
loosely coupled applications built on open data, open APIs, and reusable
components. And open means more than technology—it means greater
transparency in corporate communications, shared intellectual property, and
greater visibility into how products are developed.
Lightweight. A “less is more, keep it simple” philosophy permeates Web 2.0:
software is designed and built by small teams using agile methods; technology
solutions build on simple data formats and protocols; software becomes
simple to deploy with light footprint services built on open source software;
business focuses on keeping investment and costs low; and marketing uses
simple consumer-to-consumer viral techniques.
Emergent. Rather than relying on fully predefined application structures,
Web 2.0 structures and behaviors are allowed to emerge over time. A flexible,
adaptive strategy permits appropriate solutions to evolve in response to realworld
usage; success comes from cooperation, not control.

Ingredients of Web 2.0 Success

Web 2.0 is a set of social, economic, and technology trends that collectively
form the basis for the next generation of the Internet—a more mature,
distinct medium characterized by user participation, openness, and
network effects.
Web 2.0 did not materialize overnight. It represents the evolution and maturation of
the Internet during the past decade. The Internet, like other new mediums, began by
mimicking those that came before, but only time and experience revealed its unique
strengths. In many ways, Web 2.0 is a rediscovery or fulfillment of what the Web was
intended to be.26
The impact of Web 2.0 is now accelerating as the network grows and becomes more
ingrained into the daily lives of individuals and organizations. Yet Web 2.0 as a label
has become part of the vernacular because it concisely conveys just enough meaning—
in particular, that we are entering a distinct new era—to become the basis for
this worldwide dialog. The definition of Web 2.0 is a starting point because, in the
end, it is the underlying patterns that are much more important than a definition.
Understanding these patterns is the key to success in Web 2.0.

How Consumers Are Leading the Way to Enterprise 2.0

In earlier eras, computing innovation was originally driven by investments in the military
and enterprise sectors, and later moved into the consumer space. However, we are
now seeing consumers leading the way by virtue of their high-performance computers,
broadband connections, comfort with the medium, and ready access to powerful
online applications. This will reach IT from at least two distinct directions:
Consumers’ experience with Web 2.0-class software is setting the bar of what
software can and should be. Consumers are bringing that knowledge, as well as
those expectations, into their roles as corporate employees.
Enterprise software vendors are learning how to effectively incorporate Web 2.0
principles into their product and service offerings.
Web 2.0’s inevitable arrival within the enterprise is likely to follow the pattern set by
earlier disruptions, such as personal computers or instant messaging, and infiltrate
organizations in a decentralized, bottom-up fashion, only to become pervasive and
essential.
Impact: Web 2.0 is leading to Enterprise 2.0—CIOs and IT executives will only succeed
if they are ahead of the curve through an understanding of the workplace benefits and
challenges of Web 2.0. The differences in needs and culture “behind the firewall” mean
adapting external models to the appropriate internal ones. Enterprises can learn from
consumer Web 2.0 lessons, such as massive scaling, capturing network effects, and creating
rich user experiences.
Although each of these trends has impact and meaning unto itself, the truly significant
consequence comes from the fact that they are all occurring simultaneously. The
most successful Web 2.0 products and companies are capitalizing on:
New business models facilitated by changes in infrastructure costs, the reach
of the Long Tail, viral network-driven marketing, and new advertising-based
revenue opportunities.
New social models in which user-generated content can be as valuable as
traditional media, where social networks form and grow with tremendous
speed, where truly global audiences can be reached more easily, and rich
media from photos to videos is a part of everyday life online.
New technology models in which software becomes a service; the Internet
becomes the development platform, where online services and data are mixed
and matched; syndication of content becomes glue across the network; and
high-speed, ubiquitous access is the norm.
These models are bound together in an era where network effects rapidly drive viral
growth, where data rather than function is the core value of applications, and customers
now think of applications as services they use, not software they install.

Executive Summary WEB 2. 0

Web 2.0 is a set of economic, social, and technology trends that collectively
form the basis for the next generation of the Internet—a more mature,
distinctive medium characterized by user participation, openness, and
network effects.
Web 2.0 is here today, yet its vast disruptive impact is just beginning. More than just
the latest technology buzzword, it’s a transformative force that’s propelling companies
across all industries toward a new way of doing business. Those who act on the Web
2.0 opportunity stand to gain an early-mover advantage in their markets.
O’Reilly Media has identified eight core patterns that are keys to understanding and
navigating the Web 2.0 era. This report details the problems each pattern solves or
opportunities it creates, and provides a thorough analysis of market trends, proven
best practices, case studies of industry leaders, and tools for hands-on self-assessment.
To compete and thrive in today’s Web 2.0 world, technology decision-makers—
including
executives, product strategists, entrepreneurs, and thought leaders—need
to act now, before the market settles into a new equilibrium. This report shows you
how.
What’s causing this change? Consider the following raw demographic and technological
drivers:
One billion people around the globe now have access to the Internet
Mobile devices outnumber desktop computers by a factor of two
Nearly 50 percent of all U.S. Internet access is now via always-on broadband
connections
Combine drivers with the fundamental laws of social networks and lessons from the
Web’s first decade, and:
In the first quarter of 2006, MySpace.com signed up 280,000 new users each
day and had the second most Internet traffic
By the second quarter of 2006, 50 million blogs were created—new ones
were added at a rate of two per second
In 2005, eBay conducted 8 billion API-based web services transactions
These trends manifest themselves under a variety of guises, names, and technologies:
social computing, user-generated content, software as a service, podcasting, blogs,
and the read–write web. Taken together, they are Web 2.0, the next-generation, userdriven,
intelligent web. This report is a guide to understanding the principles of Web
2.0 today, providing you with the information and tools you need to implement Web
2.0 concepts in your own products and organization.

Web 2.0 Design Patterns

In his book, A Pattern Language, Christopher Alexander prescribes a format for the concise description of the solution to architectural problems. He writes: "Each pattern describes a problem that occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice."

The Long Tail
Small sites make up the bulk of the internet's content; narrow niches make up the bulk of internet's the possible applications. Therefore: Leverage customer-self service and algorithmic data management to reach out to the entire web, to the edges and not just the center, to the long tail and not just the head.
Data is the Next Intel Inside
Applications are increasingly data-driven. Therefore: For competitive advantage, seek to own a unique, hard-to-recreate source of data.
Users Add Value
The key to competitive advantage in internet applications is the extent to which users add their own data to that which you provide. Therefore: Don't restrict your "architecture of participation" to software development. Involve your users both implicitly and explicitly in adding value to your application.
Network Effects by Default
Only a small percentage of users will go to the trouble of adding value to your application. Therefore: Set inclusive defaults for aggregating user data as a side-effect of their use of the application.
Some Rights Reserved. Intellectual property protection limits re-use and prevents experimentation. Therefore: When benefits come from collective adoption, not private restriction, make sure that barriers to adoption are low. Follow existing standards, and use licenses with as few restrictions as possible. Design for "hackability" and "remixability."
The Perpetual Beta
When devices and programs are connected to the internet, applications are no longer software artifacts, they are ongoing services. Therefore: Don't package up new features into monolithic releases, but instead add them on a regular basis as part of the normal user experience. Engage your users as real-time testers, and instrument the service so that you know how people use the new features.
Cooperate, Don't Control
Web 2.0 applications are built of a network of cooperating data services. Therefore: Offer web services interfaces and content syndication, and re-use the data services of others. Support lightweight programming models that allow for loosely-coupled systems.
Software Above the Level of a Single Device
The PC is no longer the only access device for internet applications, and applications that are limited to a single device are less valuable than those that are connected. Therefore: Design your application from the get-go to integrate services across handheld devices, PCs, and internet servers.

Web 2.0 Design Patterns

In his book, A Pattern Language, Christopher Alexander prescribes a format for the concise description of the solution to architectural problems. He writes: "Each pattern describes a problem that occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice."

The Long Tail
Small sites make up the bulk of the internet's content; narrow niches make up the bulk of internet's the possible applications. Therefore: Leverage customer-self service and algorithmic data management to reach out to the entire web, to the edges and not just the center, to the long tail and not just the head.
Data is the Next Intel Inside
Applications are increasingly data-driven. Therefore: For competitive advantage, seek to own a unique, hard-to-recreate source of data.
Users Add Value
The key to competitive advantage in internet applications is the extent to which users add their own data to that which you provide. Therefore: Don't restrict your "architecture of participation" to software development. Involve your users both implicitly and explicitly in adding value to your application.
Network Effects by Default
Only a small percentage of users will go to the trouble of adding value to your application. Therefore: Set inclusive defaults for aggregating user data as a side-effect of their use of the application.
Some Rights Reserved. Intellectual property protection limits re-use and prevents experimentation. Therefore: When benefits come from collective adoption, not private restriction, make sure that barriers to adoption are low. Follow existing standards, and use licenses with as few restrictions as possible. Design for "hackability" and "remixability."
The Perpetual Beta
When devices and programs are connected to the internet, applications are no longer software artifacts, they are ongoing services. Therefore: Don't package up new features into monolithic releases, but instead add them on a regular basis as part of the normal user experience. Engage your users as real-time testers, and instrument the service so that you know how people use the new features.
Cooperate, Don't Control
Web 2.0 applications are built of a network of cooperating data services. Therefore: Offer web services interfaces and content syndication, and re-use the data services of others. Support lightweight programming models that allow for loosely-coupled systems.
Software Above the Level of a Single Device
The PC is no longer the only access device for internet applications, and applications that are limited to a single device are less valuable than those that are connected. Therefore: Design your application from the get-go to integrate services across handheld devices, PCs, and internet servers.

Core Competencies of Web 2.0 Companies

In exploring the seven principles above, we've highlighted some of the principal features of Web 2.0. Each of the examples we've explored demonstrates one or more of those key principles, but may miss others. Let's close, therefore, by summarizing what we believe to be the core competencies of Web 2.0 companies:

Services, not packaged software, with cost-effective scalability
Control over unique, hard-to-recreate data sources that get richer as more people use them
Trusting users as co-developers
Harnessing collective intelligence
Leveraging the long tail through customer self-service
Software above the level of a single device
Lightweight user interfaces, development models, AND business models
The next time a company claims that it's "Web 2.0," test their features against the list above. The more points they score, the more they are worthy of the name. Remember, though, that excellence in one area may be more telling than some small steps in all seven.

Rich User Experiences

As early as Pei Wei's Viola browser in 1992, the web was being used to deliver "applets" and other kinds of active content within the web browser. Java's introduction in 1995 was framed around the delivery of such applets. JavaScript and then DHTML were introduced as lightweight ways to provide client side programmability and richer user experiences. Several years ago, Macromedia coined the term "Rich Internet Applications" (which has also been picked up by open source Flash competitor Laszlo Systems) to highlight the capabilities of Flash to deliver not just multimedia content but also GUI-style application experiences.

However, the potential of the web to deliver full scale applications didn't hit the mainstream till Google introduced Gmail, quickly followed by Google Maps, web based applications with rich user interfaces and PC-equivalent interactivity. The collection of technologies used by Google was christened AJAX, in a seminal essay by Jesse James Garrett of web design firm Adaptive Path. He wrote:

"Ajax isn't a technology. It's really several technologies, each flourishing in its own right, coming together in powerful new ways. Ajax incorporates:

standards-based presentation using XHTML and CSS;
dynamic display and interaction using the Document Object Model;
data interchange and manipulation using XML and XSLT;
asynchronous data retrieval using XMLHttpRequest;
and JavaScript binding everything together."
Web 2.0 Design Patterns
In his book, A Pattern Language, Christopher Alexander prescribes a format for the concise description of the solution to architectural problems. He writes: "Each pattern describes a problem that occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice."

The Long Tail
Small sites make up the bulk of the internet's content; narrow niches make up the bulk of internet's the possible applications. Therefore: Leverage customer-self service and algorithmic data management to reach out to the entire web, to the edges and not just the center, to the long tail and not just the head.
Data is the Next Intel Inside
Applications are increasingly data-driven. Therefore: For competitive advantage, seek to own a unique, hard-to-recreate source of data.
Users Add Value
The key to competitive advantage in internet applications is the extent to which users add their own data to that which you provide. Therefore: Don't restrict your "architecture of participation" to software development. Involve your users both implicitly and explicitly in adding value to your application.
Network Effects by Default
Only a small percentage of users will go to the trouble of adding value to your application. Therefore: Set inclusive defaults for aggregating user data as a side-effect of their use of the application.
Some Rights Reserved. Intellectual property protection limits re-use and prevents experimentation. Therefore: When benefits come from collective adoption, not private restriction, make sure that barriers to adoption are low. Follow existing standards, and use licenses with as few restrictions as possible. Design for "hackability" and "remixability."
The Perpetual Beta
When devices and programs are connected to the internet, applications are no longer software artifacts, they are ongoing services. Therefore: Don't package up new features into monolithic releases, but instead add them on a regular basis as part of the normal user experience. Engage your users as real-time testers, and instrument the service so that you know how people use the new features.
Cooperate, Don't Control
Web 2.0 applications are built of a network of cooperating data services. Therefore: Offer web services interfaces and content syndication, and re-use the data services of others. Support lightweight programming models that allow for loosely-coupled systems.
Software Above the Level of a Single Device
The PC is no longer the only access device for internet applications, and applications that are limited to a single device are less valuable than those that are connected. Therefore: Design your application from the get-go to integrate services across handheld devices, PCs, and internet servers.

AJAX is also a key component of Web 2.0 applications such as Flickr, now part of Yahoo!, 37signals' applications basecamp and backpack, as well as other Google applications such as Gmail and Orkut. We're entering an unprecedented period of user interface innovation, as web developers are finally able to build web applications as rich as local PC-based applications.

Interestingly, many of the capabilities now being explored have been around for many years. In the late '90s, both Microsoft and Netscape had a vision of the kind of capabilities that are now finally being realized, but their battle over the standards to be used made cross-browser applications difficult. It was only when Microsoft definitively won the browser wars, and there was a single de-facto browser standard to write to, that this kind of application became possible. And while Firefox has reintroduced competition to the browser market, at least so far we haven't seen the destructive competition over web standards that held back progress in the '90s.

We expect to see many new web applications over the next few years, both truly novel applications, and rich web reimplementations of PC applications. Every platform change to date has also created opportunities for a leadership change in the dominant applications of the previous platform.

Gmail has already provided some interesting innovations in email, combining the strengths of the web (accessible from anywhere, deep database competencies, searchability) with user interfaces that approach PC interfaces in usability. Meanwhile, other mail clients on the PC platform are nibbling away at the problem from the other end, adding IM and presence capabilities. How far are we from an integrated communications client combining the best of email, IM, and the cell phone, using VoIP to add voice capabilities to the rich capabilities of web applications? The race is on.

It's easy to see how Web 2.0 will also remake the address book. A Web 2.0-style address book would treat the local address book on the PC or phone merely as a cache of the contacts you've explicitly asked the system to remember. Meanwhile, a web-based synchronization agent, Gmail-style, would remember every message sent or received, every email address and every phone number used, and build social networking heuristics to decide which ones to offer up as alternatives when an answer wasn't found in the local cache. Lacking an answer there, the system would query the broader social network.

A Web 2.0 word processor would support wiki-style collaborative editing, not just standalone documents. But it would also support the rich formatting we've come to expect in PC-based word processors. Writely is a good example of such an application, although it hasn't yet gained wide traction.

Nor will the Web 2.0 revolution be limited to PC applications. Salesforce.com demonstrates how the web can be used to deliver software as a service, in enterprise scale applications such as CRM.

The competitive opportunity for new entrants is to fully embrace the potential of Web 2.0. Companies that succeed will create applications that learn from their users, using an architecture of participation to build a commanding advantage not just in the software interface, but in the richness of the shared data.

Software Above the Level of a Single Device

One other feature of Web 2.0 that deserves mention is the fact that it's no longer limited to the PC platform. In his parting advice to Microsoft, long time Microsoft developer Dave Stutz pointed out that "Useful software written above the level of the single device will command high margins for a long time to come."

Of course, any web application can be seen as software above the level of a single device. After all, even the simplest web application involves at least two computers: the one hosting the web server and the one hosting the browser. And as we've discussed, the development of the web as platform extends this idea to synthetic applications composed of services provided by multiple computers.

But as with many areas of Web 2.0, where the "2.0-ness" is not something new, but rather a fuller realization of the true potential of the web platform, this phrase gives us a key insight into how to design applications and services for the new platform.

To date, iTunes is the best exemplar of this principle. This application seamlessly reaches from the handheld device to a massive web back-end, with the PC acting as a local cache and control station. There have been many previous attempts to bring web content to portable devices, but the iPod/iTunes combination is one of the first such applications designed from the ground up to span multiple devices. TiVo is another good example.

iTunes and TiVo also demonstrate many of the other core principles of Web 2.0. They are not web applications per se, but they leverage the power of the web platform, making it a seamless, almost invisible part of their infrastructure. Data management is most clearly the heart of their offering. They are services, not packaged applications (although in the case of iTunes, it can be used as a packaged application, managing only the user's local data.) What's more, both TiVo and iTunes show some budding use of collective intelligence, although in each case, their experiments are at war with the IP lobby's. There's only a limited architecture of participation in iTunes, though the recent addition of podcasting changes that equation substantially.

This is one of the areas of Web 2.0 where we expect to see some of the greatest change, as more and more devices are connected to the new platform. What applications become possible when our phones and our cars are not consuming data but reporting it? Real time traffic monitoring, flash mobs, and citizen journalism are only a few of the early warning signs of the capabilities of the new platform.

Lightweight Programming Models

Once the idea of web services became au courant, large companies jumped into the fray with a complex web services stack designed to create highly reliable programming environments for distributed applications.

But much as the web succeeded precisely because it overthrew much of hypertext theory, substituting a simple pragmatism for ideal design, RSS has become perhaps the single most widely deployed web service because of its simplicity, while the complex corporate web services stacks have yet to achieve wide deployment.

Similarly, Amazon.com's web services are provided in two forms: one adhering to the formalisms of the SOAP (Simple Object Access Protocol) web services stack, the other simply providing XML data over HTTP, in a lightweight approach sometimes referred to as REST (Representational State Transfer). While high value B2B connections (like those between Amazon and retail partners like ToysRUs) use the SOAP stack, Amazon reports that 95% of the usage is of the lightweight REST service.

This same quest for simplicity can be seen in other "organic" web services. Google's recent release of Google Maps is a case in point. Google Maps' simple AJAX (Javascript and XML) interface was quickly decrypted by hackers, who then proceeded to remix the data into new services.

Mapping-related web services had been available for some time from GIS vendors such as ESRI as well as from MapQuest and Microsoft MapPoint. But Google Maps set the world on fire because of its simplicity. While experimenting with any of the formal vendor-supported web services required a formal contract between the parties, the way Google Maps was implemented left the data for the taking, and hackers soon found ways to creatively re-use that data.

There are several significant lessons here:

Support lightweight programming models that allow for loosely coupled systems. The complexity of the corporate-sponsored web services stack is designed to enable tight coupling. While this is necessary in many cases, many of the most interesting applications can indeed remain loosely coupled, and even fragile. The Web 2.0 mindset is very different from the traditional IT mindset!
Think syndication, not coordination. Simple web services, like RSS and REST-based web services, are about syndicating data outwards, not controlling what happens when it gets to the other end of the connection. This idea is fundamental to the internet itself, a reflection of what is known as the end-to-end principle.
Design for "hackability" and remixability. Systems like the original web, RSS, and AJAX all have this in common: the barriers to re-use are extremely low. Much of the useful software is actually open source, but even when it isn't, there is little in the way of intellectual property protection. The web browser's "View Source" option made it possible for any user to copy any other user's web page; RSS was designed to empower the user to view the content he or she wants, when it's wanted, not at the behest of the information provider; the most successful web services are those that have been easiest to take in new directions unimagined by their creators. The phrase "some rights reserved," which was popularized by the Creative Commons to contrast with the more typical "all rights reserved," is a useful guidepost.

Data is the Next Intel Inside

Every significant internet application to date has been backed by a specialized database: Google's web crawl, Yahoo!'s directory (and web crawl), Amazon's database of products, eBay's database of products and sellers, MapQuest's map databases, Napster's distributed song database. As Hal Varian remarked in a personal conversation last year, "SQL is the new HTML." Database management is a core competency of Web 2.0 companies, so much so that we have sometimes referred to these applications as "infoware" rather than merely software.

This fact leads to a key question: Who owns the data?

In the internet era, one can already see a number of cases where control over the database has led to market control and outsized financial returns. The monopoly on domain name registry initially granted by government fiat to Network Solutions (later purchased by Verisign) was one of the first great moneymakers of the internet. While we've argued that business advantage via controlling software APIs is much more difficult in the age of the internet, control of key data sources is not, especially if those data sources are expensive to create or amenable to increasing returns via network effects.

Look at the copyright notices at the base of every map served by MapQuest, maps.yahoo.com, maps.msn.com, or maps.google.com, and you'll see the line "Maps copyright NavTeq, TeleAtlas," or with the new satellite imagery services, "Images copyright Digital Globe." These companies made substantial investments in their databases (NavTeq alone reportedly invested $750 million to build their database of street addresses and directions. Digital Globe spent $500 million to launch their own satellite to improve on government-supplied imagery.) NavTeq has gone so far as to imitate Intel's familiar Intel Inside logo: Cars with navigation systems bear the imprint, "NavTeq Onboard." Data is indeed the Intel Inside of these applications, a sole source component in systems whose software infrastructure is largely open source or otherwise commodified.

The now hotly contested web mapping arena demonstrates how a failure to understand the importance of owning an application's core data will eventually undercut its competitive position. MapQuest pioneered the web mapping category in 1995, yet when Yahoo!, and then Microsoft, and most recently Google, decided to enter the market, they were easily able to offer a competing application simply by licensing the same data.

Contrast, however, the position of Amazon.com. Like competitors such as Barnesandnoble.com, its original database came from ISBN registry provider R.R. Bowker. But unlike MapQuest, Amazon relentlessly enhanced the data, adding publisher-supplied data such as cover images, table of contents, index, and sample material. Even more importantly, they harnessed their users to annotate the data, such that after ten years, Amazon, not Bowker, is the primary source for bibliographic data on books, a reference source for scholars and librarians as well as consumers. Amazon also introduced their own proprietary identifier, the ASIN, which corresponds to the ISBN where one is present, and creates an equivalent namespace for products without one. Effectively, Amazon "embraced and extended" their data suppliers.

Imagine if MapQuest had done the same thing, harnessing their users to annotate maps and directions, adding layers of value. It would have been much more difficult for competitors to enter the market just by licensing the base data.

The recent introduction of Google Maps provides a living laboratory for the competition between application vendors and their data suppliers. Google's lightweight programming model has led to the creation of numerous value-added services in the form of mashups that link Google Maps with other internet-accessible data sources. Paul Rademacher's housingmaps.com, which combines Google Maps with Craigslist apartment rental and home purchase data to create an interactive housing search tool, is the pre-eminent example of such a mashup.

At present, these mashups are mostly innovative experiments, done by hackers. But entrepreneurial activity follows close behind. And already, one can see that for at least one class of developer, Google has taken the role of data source away from Navteq and inserted themselves as a favored intermediary. We expect to see battles between data suppliers and application vendors in the next few years, as both realize just how important certain classes of data will become as building blocks for Web 2.0 applications.

The race is on to own certain classes of core data: location, identity, calendaring of public events, product identifiers and namespaces. In many cases, where there is significant cost to create the data, there may be an opportunity for an Intel Inside style play, with a single source for the data. In others, the winner will be the company that first reaches critical mass via user aggregation, and turns that aggregated data into a system service.

For example, in the area of identity, PayPal, Amazon's 1-click, and the millions of users of communications systems, may all be legitimate contenders to build a network-wide identity database. (In this regard, Google's recent attempt to use cell phone numbers as an identifier for Gmail accounts may be a step towards embracing and extending the phone system.) Meanwhile, startups like Sxip are exploring the potential of federated identity, in quest of a kind of "distributed 1-click" that will provide a seamless Web 2.0 identity subsystem. In the area of calendaring, EVDB is an attempt to build the world's largest shared calendar via a wiki-style architecture of participation. While the jury's still out on the success of any particular startup or approach, it's clear that standards and solutions in these areas, effectively turning certain classes of data into reliable subsystems of the "internet operating system", will enable the next generation of applications.

A further point must be noted with regard to data, and that is user concerns about privacy and their rights to their own data. In many of the early web applications, copyright is only loosely enforced. For example, Amazon lays claim to any reviews submitted to the site, but in the absence of enforcement, people may repost the same review elsewhere. However, as companies begin to realize that control over data may be their chief source of competitive advantage, we may see heightened attempts at control.

Much as the rise of proprietary software led to the Free Software movement, we expect the rise of proprietary databases to result in a Free Data movement within the next decade. One can see early signs of this countervailing trend in open data projects such as Wikipedia, the Creative Commons, and in software projects like Greasemonkey, which allow users to take control of how data is displayed on their computer.

DoubleClick vs. Overture and AdSense

Like Google, DoubleClick is a true child of the internet era. It harnesses software as a service, has a core competency in data management, and, as noted above, was a pioneer in web services long before web services even had a name. However, DoubleClick was ultimately limited by its business model. It bought into the '90s notion that the web was about publishing, not participation; that advertisers, not consumers, ought to call the shots; that size mattered, and that the internet was increasingly being dominated by the top websites as measured by MediaMetrix and other web ad scoring companies.

As a result, DoubleClick proudly cites on its website "over 2000 successful implementations" of its software. Yahoo! Search Marketing (formerly Overture) and Google AdSense, by contrast, already serve hundreds of thousands of advertisers apiece.

Overture and Google's success came from an understanding of what Chris Anderson refers to as "the long tail," the collective power of the small sites that make up the bulk of the web's content. DoubleClick's offerings require a formal sales contract, limiting their market to the few thousand largest websites. Overture and Google figured out how to enable ad placement on virtually any web page. What's more, they eschewed publisher/ad-agency friendly advertising formats such as banner ads and popups in favor of minimally intrusive, context-sensitive, consumer-friendly text advertising.

The Web 2.0 lesson: leverage customer-self service and algorithmic data management to reach out to the entire web, to the edges and not just the center, to the long tail and not just the head.

The Web As Platform

Like many important concepts, Web 2.0 doesn't have a hard boundary, but rather, a gravitational core. You can visualize Web 2.0 as a set of principles and practices that tie together a veritable solar system of sites that demonstrate some or all of those principles, at a varying distance from that core.

For example, at the first Web 2.0 conference, in October 2004, John Battelle and I listed a preliminary set of principles in our opening talk. The first of those principles was "The web as platform." Yet that was also a rallying cry of Web 1.0 darling Netscape, which went down in flames after a heated battle with Microsoft. What's more, two of our initial Web 1.0 exemplars, DoubleClick and Akamai, were both pioneers in treating the web as a platform. People don't often think of it as "web services", but in fact, ad serving was the first widely deployed web service, and the first widely deployed "mashup" (to use another term that has gained currency of late). Every banner ad is served as a seamless cooperation between two websites, delivering an integrated page to a reader on yet another computer. Akamai also treats the network as the platform, and at a deeper level of the stack, building a transparent caching and content delivery network that eases bandwidth congestion.

Nonetheless, these pioneers provided useful contrasts because later entrants have taken their solution to the same problem even further, understanding something deeper about the nature of the new platform. Both DoubleClick and Akamai were Web 2.0 pioneers, yet we can also see how it's possible to realize more of the possibilities by embracing additional Web 2.0 design patterns.

Let's drill down for a moment into each of these three cases, teasing out some of the essential elements of difference.

Netscape vs. Google
If Netscape was the standard bearer for Web 1.0, Google is most certainly the standard bearer for Web 2.0, if only because their respective IPOs were defining events for each era. So let's start with a comparison of these two companies and their positioning.

Netscape framed "the web as platform" in terms of the old software paradigm: their flagship product was the web browser, a desktop application, and their strategy was to use their dominance in the browser market to establish a market for high-priced server products. Control over standards for displaying content and applications in the browser would, in theory, give Netscape the kind of market power enjoyed by Microsoft in the PC market. Much like the "horseless carriage" framed the automobile as an extension of the familiar, Netscape promoted a "webtop" to replace the desktop, and planned to populate that webtop with information updates and applets pushed to the webtop by information providers who would purchase Netscape servers.

In the end, both web browsers and web servers turned out to be commodities, and value moved "up the stack" to services delivered over the web platform.

Google, by contrast, began its life as a native web application, never sold or packaged, but delivered as a service, with customers paying, directly or indirectly, for the use of that service. None of the trappings of the old software industry are present. No scheduled software releases, just continuous improvement. No licensing or sale, just usage. No porting to different platforms so that customers can run the software on their own equipment, just a massively scalable collection of commodity PCs running open source operating systems plus homegrown applications and utilities that no one outside the company ever gets to see.

At bottom, Google requires a competency that Netscape never needed: database management. Google isn't just a collection of software tools, it's a specialized database. Without the data, the tools are useless; without the software, the data is unmanageable. Software licensing and control over APIs--the lever of power in the previous era--is irrelevant because the software never need be distributed but only performed, and also because without the ability to collect and manage the data, the software is of little use. In fact, the value of the software is proportional to the scale and dynamism of the data it helps to manage.

Google's service is not a server--though it is delivered by a massive collection of internet servers--nor a browser--though it is experienced by the user within the browser. Nor does its flagship search service even host the content that it enables users to find. Much like a phone call, which happens not just on the phones at either end of the call, but on the network in between, Google happens in the space between browser and search engine and destination content server, as an enabler or middleman between the user and his or her online experience.

While both Netscape and Google could be described as software companies, it's clear that Netscape belonged to the same software world as Lotus, Microsoft, Oracle, SAP, and other companies that got their start in the 1980's software revolution, while Google's fellows are other internet applications like eBay, Amazon, Napster, and yes, DoubleClick and Akamai.

What Is Web 2.0

The bursting of the dot-com bubble in the fall of 2001 marked a turning point for the web. Many people concluded that the web was overhyped, when in fact bubbles and consequent shakeouts appear to be a common feature of all technological revolutions. Shakeouts typically mark the point at which an ascendant technology is ready to take its place at center stage. The pretenders are given the bum's rush, the real success stories show their strength, and there begins to be an understanding of what separates one from the other.

The concept of "Web 2.0" began with a conference brainstorming session between O'Reilly and MediaLive International. Dale Dougherty, web pioneer and O'Reilly VP, noted that far from having "crashed", the web was more important than ever, with exciting new applications and sites popping up with surprising regularity. What's more, the companies that had survived the collapse seemed to have some things in common. Could it be that the dot-com collapse marked some kind of turning point for the web, such that a call to action such as "Web 2.0" might make sense? We agreed that it did, and so the Web 2.0 Conference was born.

In the year and a half since, the term "Web 2.0" has clearly taken hold, with more than 9.5 million citations in Google. But there's still a huge amount of disagreement about just what Web 2.0 means, with some people decrying it as a meaningless marketing buzzword, and others accepting it as the new conventional wisdom.

This article is an attempt to clarify just what we mean by Web 2.0.

In our initial brainstorming, we formulated our sense of Web 2.0 by example:

Web 1.0 Web 2.0
DoubleClick --> Google AdSense
Ofoto --> Flickr
Akamai --> BitTorrent
mp3.com --> Napster
Britannica Online --> Wikipedia
personal websites --> blogging
evite --> upcoming.org and EVDB
domain name speculation --> search engine optimization
page views --> cost per click
screen scraping --> web services
publishing --> participation
content management systems --> wikis
directories (taxonomy) --> tagging ("folksonomy")
stickiness --> syndication

The list went on and on. But what was it that made us identify one application or approach as "Web 1.0" and another as "Web 2.0"? (The question is particularly urgent because the Web 2.0 meme has become so widespread that companies are now pasting it on as a marketing buzzword, with no real understanding of just what it means. The question is particularly difficult because many of those buzzword-addicted startups are definitely not Web 2.0, while some of the applications we identified as Web 2.0, like Napster and BitTorrent, are not even properly web applications!) We began trying to tease out the principles that are demonstrated in one way or another by the success stories of web 1.0 and by the most interesting of the new applications.