The Found Corporation
Featured content from Semantic Focus, Semantic Web Blog and Community
Displaying 10 most recent entries.
Investor Opportunities and Pitfalls for the Semantic Web 22 May 2008
ReadWriteWeb just posted an interesting article about investor opportunities and pitfalls in the Semantic Web space. The questions were asked to a panel of industry insiders at the SemTech 2008 conference. Panelists include Amanda Reed (Palomar Ventures), Eghosa Omoigui (Intel), and Stephen Hall (Vulcan Capital). This information can be very useful if you're looking to start a business within the Semantic Web industry.
Got something to say? Leave a comment!
QDOS Allows Users to Search the FOAF Social Graph 19 May 2008
QDOS, measurer of digital presence, has built an interface that lets you search for a FOAF profile. You can search for an individual by their email address or by the URL to their blog or homepage. Their goal is to index and make visible the entire FOAF social graph. If I'm not mistaken they're also helping to extend the social graph by republishing data provided by its users through its primary service (very cool indeed).
Got something to say? Leave a comment!
Planeta Web Semántica: Spanish Semantic Web News Aggregator 17 May 2008
I got an email from Dolors Reig about his Semantic Web planet-type site, Planeta Web Semántica, an aggregator of Semantic Web news in Spanish. The site indexes feeds in both Spanish and English to make up for the shortage of Spanish-language Semantic Web activity in the blogosphere. I doubt this will be so in the near future as Semantic Web concepts continue to gain traction with people around the world. The site sports a clean layout and I like that you're given the ability to comment on each news item. This is an excellent resource for those whose primary language is Spanish.
Got something to say? Leave a comment!
Building Semantics is Different from Building the Web 17 May 2008
When constructing the Semantic Web, we are actually building two varied aspects simultaneously. One aspect is the Web that includes things such as the communication protocols, the Web data presentation formats, and so on. In particular, we have invented new technologies such as RDF, OWL, SPARQL, and other W3C recommended Semantic Web standards. The other aspect is the semantics that represent the meanings of Web data. Building semantics is, however, different from building the Web.
Building the Web is a professional activity. Ordinary users do not have the knowledge nor do they have the interest to design efficient network transmission protocols or data presentation formats. Hence to the end, these Web-construction issues can only be solved by few well-trained professionals. As long as the eventual results (i.e. the constructed Web) works well, ordinary users do not care what has been implemented technically.
Building semantics is, however, a different story. "Semantics" is a subjective term by contrast to "the Web" which is an objective term. For instance, to the same name Tony Blair George W. Bush will label and assign it the semantics such as ally and friend while Osama bin Laden will label and assign it the semantics such as enemy. So is Tony Blair a friend or an enemy? It very much depends on who answers or who searches the answers. Because of this reason, building semantics cannot be restricted to the hands of few professionals. By contrast, it must engage the participation of all Web users.
In a recent blog post, Nova Spivack emphasized that only the companies that have adopted Semantic Web technologies such as RDF and OWL in their infrastructure might be titled the "Semantic Web companies." Though this argument makes sense, it is not the precise declaration in my point of view.
As we just discussed, adopting technologies such as RDF and OWL helps build a web that can be enhanced by explicit semantic specifications. These technologies themselves do not mean semantics. No single company can substitute billions of Web users and to specify semantics for them since assigning semantics is a subjective issue. Only Web users can specify semantics by themselves and for themselves. So what Nova's argument suggested is actually the companies dedicated to building a web in contrast to building semantics. The companies dedicated to building semantics are the ones that focus on providing users facilities for declaring their own semantics.
Of course, however, Twine seems to match both categories by using Semantic Web technologies and encouraging user-specified semantics. Hence we can determine that Radar Networks is a Semantic Web company. By contrast, Digg is not a Semantic Web company yet even when it has tried to store data in RDF because it hardly encourages user-specified semantics.
Got something to say? Leave a comment!
ISWC 2008 Deadlines Approaching 5 May 2008
Deadlines are fast approaching for those submitting papers, Doctoral Consortium applications and tutorial proposals for ISWC 2008! More information can be found here.
Upcoming deadlines:
- Research papers: 9/16 May
- Semantic Web in Use papers: 16 May
- Tutorial proposals: 16 May
- Doctoral Consortium applications: 16 May
- Posters & Demo proposals: 25 July
- Workshops papers (13 workshops): Varies
- Semantic Web & Billion Triples challenge: 1 October
Got something to say? Leave a comment!
Win a Full Conference Pass for LinkedData Planet 2008 4 May 2008
The Semantic Web Company in Vienna, Austria is giving away a full conference pass worth $1,095 for the LinkedData Planet Conference! LinkedData Planet 2008 will be taking place on June 17-18, 2008 in New York with confirmed keynote speakers Sir Tim Berners-Lee, Kingsley Idehen and Ian Davis.
Want to enter the competition? Write a brief description of your vision of the impact that linking Open Data will have on business, politics and culture, as well as the pros and cons involved. More details can be found here.
They're looking for ideas in the following categories:
- Mashups
- Ontologies and schemas
- Policies for the practice of linking Open Data
- Search applications
- Scenarios for lifestyles
The prize is certainly worth the effort! The full conference pass gives you access to:
- All Conference Sessions for June 17-18
- Conference Breakfasts, Lunches and Networking Events
- Conference Bag and take-home specialty items
- Access to online conference proceedings
Got something to say? Leave a comment!
My First Experiences with Twine 12 Mar 2008
Today finally I logged in to Twine the first time. I was reading yesterday about some shortcomings of the system, so I was keen on trying out the system by myself to get my own impression.
It's true that the system isn't as easy to understand as del.icio.us or other bookmarking tools. It takes a while until you get used to all those additional ways you can navigate through the system. Remember: "Twine looks at content and parses it automatically for the names of people, places, organizations and other subject tags. Users are then able to navigate between related content, view recommended content and connect with recommended people with related interests."
The "shortcoming" mentioned by Marshall Kirkpatrick that "... it's hard to keep track of all the levels and types of information available" I can't agree with: This has only to do with a general problem, which arises whenever semantic technologies should enhance the user experience. Either you stay with "simple" user-interfaces like Google or del.icio.us or you spend 5 minutes or so to learn a new piece of software which will help you to save time in the future and which helps you to find related information automatically.
On the other hand I was very surprised, that the automatic recommendations Twine makes on how to annotate or describe a new resource is really unsatisfying. Users will only spend time to tag their bookmarks if the machine comes up with some intelligent suggestions. And it's true, as Marshall says, "most of the web is made up of ugly, non-standard pages."
So hopefully Twine will add that feature before it will open up to the public (isn't there a plan to integrate OpenCalais or something similar?), otherwise there will be no "first mainstream semantic web application" but only another prototype of a yet another semweb-app.
Got something to say? Leave a comment!
Semantic Web Search Engine Roundup 27 Feb 2008
Unlike traditional search engines, which crawl the Web gathering Web pages, Semantic Web search engines index RDF data stored on the Web and provide an interface to search through the crawled data. Below is a list of Semantic Web search engines that are currently under development.
- Semantic Web Search Engine (SWSE)
- SWSE is a search engine for the RDF Web on the Web, and provides the equivalent services a search engine currently provides for the HTML Web. The system explores and indexes the Semantic Web and provides an easy-to-use interface through which users can find the information they are looking for. Because of the inherent semantics of RDF and other Semantic Web languages, the search and information retrieval capabilities of SWSE are potentially much more powerful than those of current search engines. SWSE indexes RDF data from many sources, including OWL, RDF and RSS files. RSS2 is converted to RDF and they will be adding GRDDL sources soon. Developed by DERI Ireland.
- Sindice
- Sindice is a lookup index for Semantic Web documents built on data intensive cluster computing techniques. Sindice indexes the Semantic Web and can tell you which sources mention a resource URI, IFP, or keyword, but it does not answer triple queries. Sindice currently indexes over 20 million RDF documents. Developed by DERI Ireland.
- Watson
- Allows you to search through ontologies and semantic documents using keywords. At the moment, you can enter a set of keywords (e.g. "cat dog old_lady"), and obtain a list of URIs of semantic documents in which the keywords appear as identifiers or in literals of classes, properties, and individuals. You can also use wildcards in the keywords (e.g., "ca? dog*"). Developed by KMi, UK.
- Yahoo! Microsearch
- Microsearch is Yahoo!'s stab at Semantic Web search and provides a richer search experience by combining traditional search results with metadata extracted from Web pages. Indexes RDF, RDFa and Microformats crawled from the Web. Microsearch will soon be adding support for GRDDL.
- Falcons
- Falcons is a keyword-based search engine for the Semantic Web, equipped with browsing capability. Falcons provides keyword-based search for URIs identifying objects and concepts (classes and properties) on the Semantic Web. Falcons also provides a summarization for each entity (object, class, property) for rapid understanding. Falcons currently indexes 7 million RDF documents and allows you to search through 34,566,728 objects. Developed by IWS China.
- Swoogle
- Searches through over 10,000 ontologies. 2.3 million RDF documents indexed, currently including those written in RDF/XML, N-Triples, N3(RDF) and some documents that embed RDF/XML fragments. Currently, it allows you to search through ontologies, instance data, and terms (i.e., URIs that have been defined as classes and properties). Not only that, it provides metadata for Semantic Web documents and supports browsing the Semantic Web. Swoogle also archives different versions of Semantic Web documents. Developed by the Ebiquity Group of UMBC.
- Semantic Web Search
- Powered by RDF Gateway, Intellidimension's proprietary platform for Semantic Web applications and agents. Developed by Intellidimension Inc.
- Zitgist Search
- The Zitgist Query Service simplifies the Semantic Data Web Query construction process with an end-user friendly interface. The user need not conceive of all relevant characteristics - appropriate options are presented based on the current shape of the query. Search results are displayed through an interface that enables further discovery of additional related data, information, and knowledge. Users describe characteristics of their search target, instead of relying entirely on content keywords.
Got something to say? Leave a comment!
The Calais Initiative Looks Back on Its First Month 26 Feb 2008
The Calais Initiative is almost one month old, and they've already received a large and welcoming response from the development community (1,113 early adopters)! When they weren't busy doing interviews or answering hundreds of emails and forum posts, they were coming up with ways to help spread the technology. They will soon be releasing a Wordpress plugin, followed by plugins for Drupal, Plone and other content management systems. They also express that Calais is not only good for named entity extraction, but can extract other facts from documents. An example they give is "what technologies are associated with what company in a document?" Good luck, Calais team!
Got something to say? Leave a comment!
True Knowledge: The Natural Language Question Answering Wikipedia for Facts 26 Feb 2008
True Knowledge is a natural language search engine and question answering site, but to leave it at that would not do the site justice. What makes it stand out from similar sounding services like Powerset and Freebase? True Knowledge tackles natural language search and question answering (much like Powerset and Hakia), and it also maintains a knowledge base of facts about the world (similar to DBpedia and Freebase). However, what makes True Knowledge stand out is that they've combined these features and encourage their userbase to contribute facts and add new knowledge.
A brief overview of True Knowledge
True Knowledge has combined their technologies to create something that doesn't easily fall into any one category. In fact, you can categorize it as all of the following:
- Question-Answering site
- You can ask questions about any subject and get a direct response. Unlike human-powered Q&A sites, you don't need to wait for someone to respond. The computer answers your question using knowledge stored in a form it can comprehend, and isn't just regurgitating text that it doesn't understand. For this reason it can answer questions it hasn't seen before and can combine knowledge through a process of inference and cross-referencing stored information to produce a reasoned answer.
- Natural language search engine
- True Knowledge also returns search results like a standard search engine, however not without first passing it through their natural language technology. Your query may be a standard question; even if it isn't, they may be able to work out what you are looking for and give you the answer directly. Because of the way facts are assessed you can enjoy a high degree of confidence that any information they retrieve will be accurate (unlike information on any single Web page). You aren't limited to properly constructed questions, you can also use the typical two and three word "keywordese" queries that many search engine users are accustomed to. Where what is typed is just the name of an entity, their technology can produce a small information screen giving core information about the entity (as well as search engine results).
- Wikipedia for facts
- The knowledge in their system comes from two main sources: information they import themselves from various sources (such as the CIA Factbook) and facts added by their userbase. A big part of their technology is enabling users to add knowledge without having to have any technical understanding of the underlying computer processes. Unlike Wikipedia, where the knowledge in each entry is buried in natural language, True Knowledge stores each piece of knowledge as a discrete fact that can be reasoned on. Once a fact has been established with enough evidence it can't be easily changed. Furthermore, facts that contradict this knowledge are also automatically prevented, which helps the system deal with vandalism.
- "Universal database"
- With a typical database-driven application the developers sit down and create a schema. They then write code which manipulates and processes the data in that schema and when the application is finished this code is run by users. The knowledge that such a system can process is extremely narrow and remains so because nothing that happens after launch expands the scope of the application. Users may add data to the tables but the schema remains fixed. True Knowledge is like a database application except that everything in it is amenable to expansion by users. The scope of the knowledge that it can store expands every time a user adds a new class, relation or attribute; and knowledge about every conceivable entity can be put into the system and be used to answer questions.
In short, they've created a platform for representing the world's knowledge in a form that is clear and accessible to humans, as well as being comprehensible to computer.
Information about their architecture
At the heart of the True Knowledge system is the Knowledge Base - a huge database of facts on any topic represented in a form that can be processed by computer. Facts are also inferred by the Knowledge Generator, either using Knowledge Base facts, other generated facts or external feeds of knowledge.
Users can ask questions through a browser interface and those questions are translated via Natural Language Translation into queries expressed in the True Knowledge query language. Their technology has the ability to disambiguate ambiguous questions, including removing interpretations of questions that are unlikely. Questions can also be abbreviated to two or three ("keywordese") words and still be understood - similar to typical keyword search terms.
Their question answering system uses the Knowledge Base and generated facts to answer queries. The API provides an alternative interface to the question answering system from remote computers.
System Assessment further processes existing facts in order to maintain semantic consistency of knowledge. For example, facts can be marked as untrue if they are contradicted by other facts. The browser interface provides a means for users to assess the validity of facts (User Assessment), enabling them to endorse or contradict particular facts. A user's reputation and track record is used to automatically weight this information. In combination with System Assessment this prevents the back-and-forth battles that are common on Wikis.
The Knowledge Base grows through Knowledge Addition, either from users via the browser interface, or imported in volume from external sources.
A key design decision is that all components are extendable by users. In addition to users adding facts, they can also extend the questions that can be translated into whole new areas and even provide new inference rules (and even executable code for steps that involve calculation) for the Knowledge Generator.
True Knowledge API
No service such as this would be complete without an API! They say their API can execute any query you supply it with, however they are in the process of releasing a series of API services. These simple services encapsulate areas of knowledge which are well served by their current Knowledge Base. All these services can be accessed via the same query interface using a single account. Click on the names of the services below to test each one!
- IP Geolocation
- Converts an IP address to a probable geographical location of an internet user (e.g. the user of a website). This geographic knowledge can then be used in subsequent queries to retrieve further relevant facts about the location from the Knowledge Base: including the user's likely language, preferred currency, local time etc.
- Local Time
- Identifies a place either from an IP address obtained automatically or from a supplied string denoting the place and obtains a local time either now or at some past or future time. Possible applications included an online or phone conferencing system wanting to inform the participants about the date/time of the meeting in their local time zone.
- Name-to-Gender
- Takes a personal name (first name or full name) and returns the gender inferred by the system for that name. The system applies certain heuristics to a string representing a person's name in an attempt to judge the gender of the person. If the gender can be determined with reasonable probability, then it will be returned. This service would be useful to, for example, a social networking site wishing to use gender-specific language about a user whose name, but not gender, was known.
- Email-to-Name
- Takes an email address and returns the forename inferred from its local-part (if a name can safely be inferred). Businesses with access to users' email addresses but not names could use this to address emails more personally. This service can be combined with the Name-to-Gender service to infer a person's gender from his/her email address.
- Trading Day
- Takes a point in time and a geographical location and returns 'no' if it is a weekend day or a public holiday in the location and 'yes' otherwise.
- Location-to-Language
- Returns a language which can be read by a significant number of people at a location. True Knowledge has complete coverage at the national level and partial coverage for smaller areas. This can be used in combination with the IP Geolocation service to decide which language(s) are appropriate when displaying websites to international users, for example.
- Telephone Number-to-Location
- Returns the geographical location of the specified landline telephone number.
Don't worry, the road doesn't end there. True Knowledge says they are currently working on even more services to add to this list.
Adding knowledge to True Knowledge
Time for some hands-on stuff!
What do True Knowledge and Jurassic Park have in common? Nothing as far as I'm aware of. However, I am going to show you step-by-step how I taught True Knowledge something it didn't know. To be more specific, I'm going to show you how to add new knowledge from start to finish and then how to expand on it. Because True Knowledge seems to update itself in real-time, I was able to see the fruits of my labor right away. Not having to wait for an index to rebuilt made the task of adding knowledge feel more worthwhile.
After playing with a few test queries I tried to find something it didn't know anything about. I asked "who is the author of jurassic park?", which returned the response "I don't know" and a more detailed explanation:
It sounds like "jurassic park" may be a thing that is published that I don't currently know about. If you want, you can add the thing that is published called "jurassic park" to the Knowledge Base.
Incidently the search results that appear along the side the answer are pretty relevant. The first result contains the answer to my question. By chance, the title is exactly my answer.
Clicking the link took me to a screen that asked me to enter the most common name for "a thing that is published." I entered "Jurassic Park." They do ask that you don't enter information about fictional things (e.g., unicorns). I had to think for a moment if Jurassic Park is considered a fictional thing in this context. I came to the conclusion that Jurassic Park is not fictional in the sense that it is both a literary work and the title of several movies so I clicked Submit.
After a quick look at the confirmation page I was ready to proceed. I should note that there are several confirmation pages along the way. If you're comfortable enough with the process you can disable each confirmation page individually by checking the box that says "Don't show me this confirmation page again."
Next I was presented with a possible Wikipedia match and a helpful extract from the page. I was satisfied that the Wikipedia entry presented to me was indeed talking about the very same Jurassic Park so I clicked continue.
The next screen asked me if I knew anything that Jurassic Park is that is more specific than a "thing that is published." It was trying to figure out the name of the class of things Jurassic Park belonged to. I clicked yes, entered "movie" and clicked submit.
True Knowledge is already aware of what a movie is and asks me specifically if what I meant was "movie (connected cinematic narrative)." Satisfied that I had my match I clicked submit and continued on.
This is where I thought things got interesting. The next screen asked me to be more specific about what kind of movie Jurassic Park is and gave me the following options to choose from:
- Made for TV movie
- Made for video movie
- Big screen movie
Since we all know Jurassic Park was a major motion picture I chose "big screen movie" and clicked select. Alternatively if I didn't want to choose any of those refinements (e.g., if they didn't apply) I could simply click Yes and proceed with Jurassic Park labeled as a "movie."
The next screen asked me to enter a phrase that could be used instead of Jurassic Park in all circumstances. Basically they were asking for a short but descriptive phrase that makes it absolutely clear what Jurassic Park is. They give a few examples such as "France, the Republic of France" and "Star Wars, the 1977 adventure action sci-fi movie Star Wars." Going off the Star Wars example I entered "Jurassic Park, the 1993 movie about dinosaurs" and clicked submit.
I was then asked to confirm that the phrase I entered was an unambiguous way of saying Jurassic Park, which would be recognized by anyone wanting to say something about that big screen movie. After confirming a few points about the ambiguity of my phrase I clicked Yes.
I was then asked to enter a few alternate names. I entered "JP" (the US promotional title) and "Jurassic Park 1" (a common way of referring to the original movie after the sequels were released).
Next I had to enter a unique, human readable ID. The page informed me that [jurassic park] was available and auto-populated that value for me. I certainly couldn't think of a better ID so clicked submit.
After submitting the ID I was presented with a list of facts that the system had gathered from the information I entered. Reading through the list of facts you can see how each step along the way input the information into True Knowledge. I am listed as the source for each fact because I have not specified any other sources. Luckily I am able to do that at the bottom of the page. As I want this information to be trustworthy, I included a trustworthy source: The IMDB entry for Jurassic Park.
I entered the URL for the entry on IMDB and clicked add new source. This took me to a mini-process of adding a document stored in a remote system (i.e., a Web page). I clicked OK to start the process.
The next screen asked me to verify that the contents below were what I was expecting. Everything checked out so I clicked confirm.
Now that I have a new source available to me (the IMDB page) I changed the source where appropriate. Once I had the sources set I clicked add these facts to finish up the process of adding new knowledge.
All done! Clicking on OK will take you to a page with your new entry.
The page has a few links for adding more information that would be relevant to the entry.
I wasn't done yet since I still couldn't answer the question "who is the author of jurassic park?" Of course now I have a whole new problem, I told the system that Jurassic Park was a movie, not a literary work. We'll see how the system handles this. On the add knowledge page I selected "add a new fact."
On the add a fact page I was given three textboxes to enter a (subject,object,predicate) tuple about anything. Since I want to enter the author information for Jurassic Park I entered "Michael Crichton" -> "is the author of" -> "Jurassic Park" and clicked submit.
The next screen actually informs me that the system is already aware of Michael Crichton, the American author born in 1942. Since we're both talking about the same person I clicked submit.
On the fact confirmation page that followed I was given the option to go ahead and add the fact as-is or to change the left or right part of the fact (the subject or object). Although the proper course of action would have probably been to create a new entry in True Knowledge for the literary work Jurassic Park, I wanted to see if the property "author" could be applied to an instance of class "movie." I also wanted to determine whether or not something can belong to multiple classes ("book" and "movie"). I chose to add Michael Crichton as the author of Jurassic Park (the movie), and clicked Yes.
When it came time to list sources I told it that I was not the source, and I listed the Wikipedia entry for Jurassic Park and went through the two-step process of adding a Web page.
Now True Knowledge knows about Jurassic Park (the 1993 movie about dinosaurs) and Michael Crichton, the author of Jurassic Park (the literary work). It should be noted that True Knowledge is under the impression that Michael Crichton is actually the author of the movie Jurassic Park.
I tried my original question and this time I got a direct answer, including how it came to that conclusion. So you can apply an author to a movie. It feels weird to me that you can do that, because I don't feel you can be the "author" of a movie (rather, the movie's script and screenplay).
Back on the add a fact page I tell True Knowledge that "Jurassic Park" -> "is a" -> "book."
This time around I'm given three options of what a book might be. I chose the last option, "book (a written work intended to be published as a set of pages bound together on one side)" because I felt it was the best definition of what a book is.
After confirming the fact and adding my source (Wikipedia again) I am informed that "Jurassic Park is a book" contradicts previously inserted facts. In this case, it is apparent that a movie cannot also be a book.
In the end the fact did not get added because it contradicts an existing fact in the system. Today was just my first day, so I'm sure I'll get better at this.
My first impression of True Knowledge
I found my first experience with True Knowledge very satisfying! The user interface is simple and it's hard to get lost trying to do something new. They are still in beta, and as such they still have some polish to apply before the general public is let in, but the product is solid and I can't wait until more users are let in the gates.
I'm interested to see how it will prevail over similar services. Components of True Knowledge compete with many semantic services (Freebase, Hakia, Powerset, DBpedia, etc) and even non-services like Cyc. I am of the opinion that True Knowledge has the winning combination of each approach.
Got something to say? Leave a comment!