Find citations on Bloglines or Technorati. View blog reactions
By category: Cool, Design, Engineering, Info Mgmt, Sandbox, Search, Social Web, Tools, Web Services, Yahoo!.
There’s a million APIs out there, and I couldn’t be happier. It’s easy now to translate street addresses to lat/long coordinates. It’s easy to grab local results, and overlay them on a map. It’s easy to use Yahoo or Google to get all types of search results (local, images, etc), and sites like Amazon to get prices and products.
But I think one of the coolest and most underrated APIs is the Term Extractor API from Yahoo!:
In other words, you point it at a piece of content — a news article, blog post, movie review or whatever — and it returns a list of terms, or keywords (or “tags” for those of you keeping score at home).
What do you do next with a list of keywords from a piece of content? Well, lots of things. Jeremy Keith wrote yesterday about a few ideas (that seem up for grabs, if you’re in a hacking mood!).
What if you treated each returned term as a tag? You could then pass those tags to any number of tag-based services, like Flickr, Del.icio.us, or Technorati.
So, instead of the simple “here’s my Technorati profile” or “here are my Flickr pics” on a blog, you could have links that were specific to each individual blog post. If I sent the text of this post to the term extractor, it would return a list of terms like “api”, “yahoo”, etc. By passing those terms as tags to a service like Technorati or Del.icio.us, readers could be pointed to other blog posts and articles that are (probably) related.
Like he suggests, it gets interesting when you let the output from this web service be the input for another service. I was lucky enough a few months ago to lend a small bit of help to the team that brought you the Yahoo! Events Browser mashup. One challenge of that product was to get images associated with each event. If you’ve ever worked with unstructured data — event listings are super unstructured — then you know that they don’t provide many high-quality hooks for understanding their content. The team tried doing image searches on venue or artist name, but the results weren’t very relevant or interesting, even when the parsed venue or artist was accurate. So, being the put-lots-of-pieces-together types there are, they decided to use the Term Extractor to discover more accurate, meaningful, and specific query terms to then find images for. Here’s how they summed it up:
To display appropriate images for events, local event output was sent into the Term Extraction API, then the term vector was given to the Image Search API. The results are often incredibly accurate.
I’ve only seen a handful of implementations of the Term Extractor API so far. If you’ve got a cool one to point me to, or a cool idea for a future implementation, please leave ‘em in the comments below.

Mike P. January 17th, 2006 - 5:59 pm
I did some work with the Yahoo! term extractor to use it for tags, and it can be a bit “noisy”, to the point that some data checking had to be done in order to ensure that the quality of tagging was high.
Tagyu seemed to give me better results, however ymmv.
My stuff/work/thoughts here.
Simon Wheatley January 18th, 2006 - 1:31 am
I’m looking into an events listing for a UK charity, Yahoo! doesn’t currently stretch this far (I think). All this stuff seems US centric at the moment, is there anything UK based yet?
Matt Biddulph January 18th, 2006 - 5:18 am
I used the term extractor on wikipedia articles as a way of enriching the linking in some data I was processing: writeup
natek January 18th, 2006 - 5:19 am
Hey Simon,
It’s true that many Y! products get launched in the States first. Partially because that’s where the bulk of the developers are, and secondly because content acquistion — getting the data from a source — is often on a regional scope, and generally unstructured which means much post-processing has to be done.
For events, I recommend you check out Upcoming.org (a recent Yahoo acquistion). It’s an open, so-called “Web 2.0″ type site, and let’s you list events from anywhere about anything. One you add a few buddies to the service and join a few groups, it starts getting pretty awesome. Here’s their “metro” page for London:
http://upcoming.org/metro/uk/london/london/
Adrian Holovaty January 18th, 2006 - 9:16 am
We’ve gotten some cool uses of the term-extractor API for Post Remix (the Washington Post’s mashup site). For instance, Ripped from the Headlines and Amazon Light. I think NewsCloud may be using it, but I’m not sure.
Frank Wiles January 20th, 2006 - 1:22 pm
Yeah NewsCloud does use the Yahoo term-extractor API to look for keywords it hasn’t seen before. After it knows about a keyword however, it just looks for the keywords in the content itself, because Yahoo doesn’t give you a frequency count for each keyword.
Raghu February 13th, 2006 - 5:14 pm
I use the term extractor API on a social predicting website and it works quite nicely.
I pass the prediction text onto the T/E API and use the results from that to call the Yahoo Image and News APIs. It works nicely most of the time but is not without its quirks.
Take a look http://www.twocrowds.com
Abhijit Nadgouda @ iface » Term Extractor For Tags February 14th, 2006 - 7:25 am
[...] Today I came across a post about Yahoo! Term Extractor API by Nate Koechley. This can result into something that will not only benefit the readers but also the bloggers. In addition to ensuring that no terms are missed, it can fully automate discovery of related posts/articles on tag-based services like Technorati. And coming from Yahoo! it is very much usable in PHP, and so compatible with Wordpress!? [...]
Ritwik Banerjee May 16th, 2006 - 4:28 am
Nice post … especially close to me because I was working on a similar thing (just two of us) when the Yahoo! extractor came into being ……
Yahoo API Term Extractor | Useful Web Stuff September 30th, 2006 - 8:58 am
[...] yahoo api term extractor article Term extract documentation from Yahoo Share:These icons link to social bookmarking sites where readers can share and discover new web pages. Filed under Web by admin. Permalink • Print • Email [...]
Francesco Sclano November 16th, 2006 - 10:02 am
Hi everybody!
TermExtractor, my master thesis, is online at the
address http://lcl2.di.uniroma1.it.
TermExtractor is a software package for Terminology
Extraction. The software helps a web community to
extract and validate relevant domain terms in their
interest domain, by submitting an archive of
domain-related documents in any format.
TermExtractor extracts terminology consensually
referred in a specific application domain. The
software takes as input a corpus of domain documents,
parses the documents, and extracts a list of
“syntactically plausible” terms (e.g. compounds,
adjective-nouns, etc.).
Documents parsing assigns a greater importance
to terms with text layouts (title, bold, italic,
underlined, etc.). Two entropy-based measures, called
Domain Relevance and Domain Consensus, are then used.
Domain Consensus is used to select only the terms
which are consensually referred throughout the corpus
documents. Domain Relevance to select only the terms
which are relevant to the domain of interest, Domain
Relevance is computed with reference to a set of
contrastive terminologies from different domains.
Finally, extracted terms are further filtered using
Lexical Cohesion, that measures the degree of
association of all the words in a terminological
string. Accept files formats are: txt, pdf, ps, dvi,
tex, doc, rtf, ppt, xls, xml, html/htm, chm, wpd and
also zip archives.
I’d like if you partecipate in the TermExtractor
evaluation task. The result of your evaluation will be
put in a paper (I enclose a draft). Please contact me
if you want to partecipate (this is very important for
me!).
MANY THANKS!!!
–
Francesco Sclano
home page: http://lcl2.di.uniroma1.it/~sclano
msn: francesco_sclano@yahoo.it
skype: francesco978
Term Extractor For Tags on iface thoughts December 9th, 2006 - 12:40 pm
[...] Today I came across a post about Yahoo! Term Extractor API by Nate Koechley. This can result into something that will not only benefit the readers but also the bloggers. In addition to ensuring that no terms are missed, it can fully automate discovery of related posts/articles on tag-based services like Technorati. And coming from Yahoo! it is very much usable in PHP, and so compatible with Wordpress!? [...]
World News October 5th, 2007 - 3:03 am
We used Yahoo term extractor for our World News website. It works like a charm.
Marcoullier.com » The semanticization of people March 19th, 2008 - 9:15 pm
[...] data gets far more interesting when attached to people. Why settle for the results of Yahoo Term Extractor, when we can attach highly structured data from sources like Flixster, iLike and [...]
Automatically Generate Keywords | Vunky November 11th, 2008 - 7:22 am
[...] reading Nate Koechley article on Yahoo’s term extractor API i was inspired to connect it to actsastaggableonsteroids. [...]
Vincent November 12th, 2008 - 6:57 am
Hello nate,
Thanks for writing this article. I came across it through stumbleupon. It inspired me to combine Yahoo’s term extractor with ruby on rails tagging plugin. Not so underrated anymore ;)