[an error occurred while processing this directive]

learning.now: at the crossroads of Internet culture & education with host Andy Carvin

[an error occurred while processing this directive]

The Semantic Web and the Online Educational Experience

Yesterday, the inventor of the World Wide Web announced the formation of a new research center to study how the Web really works. (Yes, it’s such an amorphous and complicated space that even he doesn’t understand it.) And the work that will take place at this center may eventually change how we all use the Web in the classroom.

Sir Tim Berners-Lee, the physicist who developed the web 17 years ago as a way of sharing knowledge with his colleagues, announced that the Massachusetts Institute of Technology is partnering with the University of Southampton to form the Web Science Research Initiative (WSRI). According to the announcement, WSRI will develop “a research agenda for understanding the scientific, technical and social challenges underlying the growth of the Web.”

Says Sir Tim:

As the Web celebrates its first decade of widespread use, we still know surprisingly little about how it evolved, and we have only scratched the surface of what could be realized with deeper scientific investigation into its design, operation and impact on society. The Web Science Research Initiative will allow researchers to take the web seriously as an object of scientific inquiry, with the goal of helping to foster the web’s growth and fulfill its great potential as a powerful tool for humanity.

The decentralizd nature of the Web makes it very difficult for any of us, including researchers, to understand how it all fits together. There’s no single repository of all Web content, nor is there one search engine that makes it possible for us to find everything we might wish to find. Even the biggest search engines like Google only manage to capture a fraction of everything that’s actually out there.

Berners-Lee has spent the last several years trying to improve our ability to search the Internet and find the information we need through an ongoing initiative called the Semantic Web. Essentially, the Semantic Web seeks to add more meaning to the knowledge we put online, so that knowledge can be better understood by machines: search engines, online social networks, software, etc. If machines can understand it better, than we can put it all to better use.

For example, let’s say I post a photo of one of my cats online. I know it’s my cat, and you may know it’s my cat if I include some text explaining that fact. As far as computers are concerned, it’s just a photo. There’s no easy way for the computer or websites to understand that the photo is that of a cat - and specifically, my cat.

This may not seem very important in the grand scheme of things, but it affects the way Web 2.0 tools like online social networks work. It’s hard for technology to understand the relationships that exist between people, places, events and objects. So when you conduct a search, that level of knowledge isn’t really taken into consideration, because most online content lacks the metadata to explain those complex relationships.

The Semantic Web seeks to change all of that. Let’s take my cats as an example again. I might post a picture of one of my cats, label it as a cat and tag it as a cat, but that’s the end of the story. With the Semantic Web, I would somehow include a hyperlink to a page that contains technical information that would explain to a computer or a piece of software what a cat actually is. Every attribute I might include about the picture - that the cat is orange, he’s male, a tabby, that he’s owned by Andy, who’s married to Susanne, etc - would all be linked to URLs defining those attributes. Of course, none of us would want to spend our time hunting down these universally accepted definitions - or coming up with these definitions, for that matter, which is why creating the Semantic Web is so complicated. All of this will have to be done almost very easily for people to buy into it, but once it occurs, we’ll never look at online knowledge the same way again. (If you want to dive into this more, here are some notes I wrote after seeing Sir Tim speak at MIT a couple of years ago.)

In fact, the Semantic Web is already beginning to prove its abilities through the Creative Commons initiative, which I’ve blogged about previously. Creative Commons is a way for online content producers to define a copyright license for their work. For example, my personal blog has a license called a noncommercial, attribute, share-alike license, which means you can use my content as long as it’s for noncommercial use, you attribute me as the source, and you pass along these same copyright privileges to anyone who uses your version of my content. Prior to the Semantic Web, a copyright license would simply be a bit of legal jargon slapped onto someone’s website. But Creative Commons takes it one step further by using the Semantic Web. When you create a CC license, you embed some HTML code into your website that contains a bunch of URLs. Each one of those URLs defines the attributes of your license. So there’s a URL pointing to the definition of commercial, another for attribution, and another for share-alike. That way, everyone who uses a Creative Commons license is pointing their website to a universally accepted definition of what those things mean to a computer. And as those definitions evolve, no one has to update their websites - only the URLs they’re pointing to are altered. Meanwhile, search engines can now let you search for photos, videos, audio and text that have been assigned specific Creative Commons licenses, because the search engines understand the meaning of those licenses and can recognize them on individual websites.

All of this may seem really technical and boring, but it’s going to change the Web in profound ways. How might this apply to education? One example that clearly comes to mind is embedding metadata to content that connects it to specific education standards. For example, let’s say you find a website about the history of India and Pakistan that contains stories of families forced to flee their homes and cross the border when the countries split apart. These stories might fit nicely in a lesson plan you’re doing for your geography class on human migration and refugees, as part of the National Geography Standards. The Semantic Web would allow you to tag this page as being connected to the specific standard on understanding human migration, embedding a URL into that website that links into to a machine-friendly definition of that standard. From that point onward, any other teacher searching the Internet would be able to make the connection between that specific website and that specific standard. And the same principle could be used to link online content with lesson plans and the teachers who use them. Suddenly, what started as a search for an educational website leads you to a social network of educators using that site in their classroom to meet a specific standard.

Of course, you may think we won’t want to spend our days tagging websites to connect them to standards. But wait a second - millions of people are tagging websites every single day. The very idea of Web 2.0 is built upon people volunteering their time to create content and add richness to it, making connections between ideas. Even if only a small fraction of us spend our time charting out these connections, that still adds up to huge sums of people. (This is often called the One Percent Rule, with only one percent of members in a given online community actively creating new knowledge, but in large communities one percent is often enough.) Right now it may be as simple as tagging Flickr photos or websites with del.icio.us, but as the Semantic Web develops, we’ll be able to layer even richer amounts of data on top of online content. We won’t all have to do it, but enough of us probably will - and the Web will be a better educational experience because of it. -andy

Filed under : Cool Tools, Research


Excellent post on an important topic!

One thing I would add is that Semantic Web technologies also have incredible potential in a school’s private intranet and administrative applications as well. For example, right now, if you add a new online literacy program & assessment in your school, your student information system or data warehouse doesn’t understand the meaning of the data being output by the literacy assessment.

Semantic Web “tags” are sophisticated enough to allow the literacy assessment to describe the meaning of the data it generates to other applications, e.g., “This is an assessment; this is a formative assessment; this is a literacy assessment; this is an assessment of phonemic awareness; a ‘10’ on this scale is equivalent to ‘6’ on the state scale, etc., etc.”

Technically, we know how to do this stuff, but it will take a decade or more to get people together to actually write the taxonomies and standards to describe all this stuff in the detail necessary. It isn’t exactly fun, easy, or glamorous work.

Fascinating indeed.

I wonder how long it will take before it’s ready for prime time? Along with people willing to tag, I see other hurdles. The quality of the tags would be important. Poor tags and dealing with intentional mistagging by people in seach of the all mighty dollar would be problematic.

The ability to conduct good searches would still be critical. Much of this would have to be driven by search engine evolution, specifically the development of plain english (and other language) searches. Then of course, there would have to be a way to deal with the information overload that might be generated by a search.

On a note that may be related. I recently began exploring the Web 2.0 versions of social networking. In one beta system, you can add a plugin that allows all of your chats to be searched by the Google desk bar. The site is a complete avatar community supported by real commerce and products created by the users.

Since tagging is part of the process in the site and it would be easy to generated tags from chat, I can’t help but wondering about the implications and how this ties into the semantic web.

It’s going to be interesting to watch things play out.


We’re developing a TLD for use by residents, businesses, and institutions in New York City. (See Campaign for .nyc TLD.) It’s our view that New York City has been globalized by the .com web, i.e., the web has enhanced the ability to communicate globally far more than it has local communication. As a consequence, there is no organized, community-friendly space on the Internet for New York City. The .nyc TLD will promote civic awareness, community pride, and self-improvement.

We look forward to engaging local schools and students in creating New York City’s space on the net and expect the semantic web, with its ability to link relationships between people, places, events, and objects to be a foundation for .nyc’s development.

Hi Andy,

Your post is impressively clear: I translated parts of it into Italian here. I have “tagged” the books I read and shared the “tags” in photocopy with others who did the same since I was a student. Others didn’t. And the proportion of sharers and hoarders, psychologically, is probably a constant. Yet in my limited experience of semantic web tagging, its advantages are so great for the tagger him/herself (thinking of del.icio.us for instance) that once you understand how tags work, you’d need to be a gravely pathological miser to refuse to partake in them for the sake of knowledge hoarding…



As Andy explins, the Semantic Web uses URIs (not tags) for terms. It puts out hard data which can be reused by others, because the URIs have relatively well defined meaning, and are shared within particular communities.

Most of the data on the semantic web is and, I think, will be for a long time, the semantic web form of big databases. These can be put on the web easily without any human tagging necessary.

Also, ‘semantic wikis’ and other sites which capture relationships betwen things directly from users are an interesting source. They cature for exaple the point on the map where a photo was taken, or the people (or cats) in a photo, or why not the readings of chemical levels in streams taken by kids on a science trip to the woods.

Those wishing to try the Semantic Web can make themselves a FOAF file, using FOAF-a-matic, and put it on the web, linking to your friends and/or colleagues.

Thank you for the correction, Sir: I linked to it in a generic begining of an erratum in the Italian blog post I wrote about Andy Carvin’s text. But I called on a co-author who has the necessary understanding to complete it, lest I create more confusion by attempting to do so myself.
And thank you for the FOAF and FOAF-o-matic references: when Foaf-o-matic-like, but more encompassing, generators become available, mightn’t they attract more than the 1% of people mentioned by Andy?

I’d just like to second Claude’s thanks for the clarifications and links - who better than TBL himself to join the conversation. :-)

Meanwhile, Stephen Downes makes an important point on his blog in response to my piece:

Andy Carvin takes the occasion of the launching of the Web Science Research Initiative (WSRI) to discuss how the Semantic Web will change the way the web works. It’s not a bad discussion, but I think it misses a very important difference between Web 2.0, properly so-called, and the Semantic Web, properly so-called. And that is this: the latter depends largely on formal specifications involving a lot of overhead, such as schemas, ontologies, web services, and the like. But Web 2.0 was developed using very simple and often informal protocols, such as RSS, FOAF and REST. There is room, of course, for the two approaches to co-exist and even communicate. Still, the semantic web is an enterprise-heavy approach, while Web 2.0 is the populist approach, and there is an ongoing tension between the two. Ironically, learning object metadata (LOM) takes the worst from each approach: it lacks the simplicity of Web 2.0, but it lacks the semantics formalism of the Semantic Web.
Here’s the response I wrote to him:
I struggled over how I should address this, if at all. I originally wrote a paragraph talking about the populist nature of web 2.0 and whether the SW would be too strict a structure for allowing for similar bottom-up approaches, but I concluded it was so convoluted it my freak out my readership, which leans to the newbie side. So your point is both well-taken and appreciated. We need more bloggers in the edtech world who really get this stuff and can translate it appropriately. :-)

Anyway, that’s what happens when you tackle a complex subject - people a heck of a lot smarter than you come out of the woodwork. :-) Thanks again to TBL and Stephen for offering their expertise….

There’s another thing about “tagging” content that I believe should also be mentioned, relating it to learning: not only teachers will be able to more easily find what they are looking for, but probably it will make the work of the learner in the web easier. Instead of just typing keywords in Google, from Andy’s description, it seems that by looking up a subject will take the person to a complete networked experiece, maybe as similar as the one you get when you browse Amazon?

Hi Andy,

Your post is a most succinct and informative summary of one aspect of the semantic web.

There’s another aspect that, long term, may be even more significant.

It’s the proposal by Sir Tim and the W3C that the world wide web may one day become a huge, uniform database. Roughly speaking, the idea is to express all databases in the form of subject-predicate-object triples in a “Resource Description Framework”.

As you may know, there has been lots of debate about whether this will really happen.

My two cents worth about this is that it is more likely happen if we broaden our view of “Semantics”, to include not only tagging with metadata. Call that tagging Semantics1. Then, Semantics2 is concerned with spelling out what deductions we should be able to make from any collection of facts and rules. And Semantics3 concerns the meaning of English and other languages in real the world of business and science [1].

This may sound like blue sky stuff, so I have placed online a live system to show that it can be grounded in reality [2]. The system tries to unify the three kinds of Semantics. It works as a kind of Wiki for executable English content. As befits a Wiki, shared use is free.

Anyone can use a browser to read, edit and run some examples that are provided. Folks are cordially invited to write and run their own examples too.

[1] www.semantic-conference.com/program/sessions/S2.html

[2] Internet Business Logic,
online at www.reengineeringllc.com

What will keep the Semantic Web from being cluttered by sales offerings, especially for pornography, off-beat medicines, loans, and stocks?

Good question. I think Stephen Downes’ answer may suggest that this may be less likely because the Semantic Web will be based on standards and definitions determined at a very high level. This might make it possible to steer clear of spam. Then again, spammers have managed to find a loophole in every other safeguard known to mankind, so only time will tell how vulnerable the SW will be.

[an error occurred while processing this directive]