Columbia Newsblaster is a system to
automatically track the day's news. There are no human editors
involved -- everything you see on the main page is generated
automatically, drawing on the sources listed on the left side of the
screen.
Every night, the system crawls a series of Web sites, downloads
articles, groups them together into "clusters" about the same topic,
and summarizes each cluster. The end result is a Web page that gives
you a sense of what the major stories of the day are, so you don't
have to visit the pages of dozens of publications.
Newsblaster is an academic project from the Natural Language Processing
group at Columbia
University's Department of
Computer Science. It is designed to demonstrate the Group's
technologies for multidocument summarization, clustering, and text
categorization, among others. It is funded under DARPA TIDES and KDD
and has been operational online since September 2001.
Current and future enhancements include international perspectives,
multilingual capability, and tracking events across days.
Back to Newsblaster
Newsblaster FAQ
- Does Newsblaster threaten to replace reporters?
Absolutely not! Newsblaster collects, clusters, categorizes, and
summarizes news, but it does not write news. It will always need
human journalists for its raw content.
- Can I obtain more technical information about Newsblaster?
Please check out this page for a list of papers concerning Newsblaster and its components.
- What additonal publications exist dicussing Newsblaster?
For a list of articles from the press which discuss Newsblaster, please
check out this page.
- How is Newsblaster different from Google News?
Google News does not do multidocument summarization; it simply uses
the articles' leading sentences. In addition, Newsblaster produces
multiple summaries for an event, each reflecting the media from a
particular country. Future expansions such as tracking events across
days are also in the works.
- Why can't I search Newsblaster?
You will soon be able to search Newsblaster's summaries, as well as
the blasted articles themselves.
- How come Newsblaster sometimes doesn't update every day?
We sometimes deliberately cancel a day's run when we don't want the
Web page to change. This happens when we are preparing to present the
system at a demo or site visit. Network problems and code bugs can
also come up.
- Can I license the code for Newsblaster or make it run on my own data?
We are currently discussing plans by which we may be able to either
license Newsblaster code or run it ourselves on other people's data. It
is not yet clear when we will be able to do this. If you are interested,
please contact
blaster@cs.columbia.edu.
- Is the Newsblaster code free or open?
Sorry, there are no plans to make Newsblaster open source.
- How does Newsblaster make its summaries?
Newsblaster uses two different summarizers. One carefully selects
sentences from among the articles and rearranges them to produce a
coherent summary. The other looks for common information conveyed
across all the articles and then reformulates new sentences expressing
that information. After a summary is generated, it is then revised
for greater fluency.
- What platforms and languages are used to run Newsblaster?
Newsblaster currently exists as a collection of programs, scripts and
tools which run on both the SUN/Solaris and Linux operating systems.
We have written components in Java, Perl, C, and shell scripting
languages. A typical run, including crawling the Web, downloading
documents, clustering, categorizing, and summarizing, currently takes
4-12 hours, depending on which summarizers are used.
- Can Newsblaster be updated more often than once a day?
When the overnight run finishes soon enough, Newsblaster runs an "incremental" run in the afternoon. That's when you will see stories marked as "NEW."
- Who's writing and maintaining Newsblaster? How long did
development of Newsblaster take?
Work on Newsblaster started in the Fall of 2001, and is still ongoing.
Many of the components that are used within Newsblaster had been
developed under previous projects dating back to 1996. Development of
Newsblaster is more active than ever at this time; you can see a list
of the members of the Newsblaster team.
- Why are some news sites used and not others? Can you add my site?
We tried to choose common and popular news sites from the Web, and we have
occasionally added new sites as requested by users. If you have a site
you would like to recommend to us, please contact
blaster@cs.columbia.edu.
- Can I talk to anyone involved with Newsblaster directly?
Feel free to contact blaster@cs.columbia.edu, which is monitored by team members, with any suggestions or comments.
- What other similar projects exist?
Aside from Google News, the only similar project we are aware of is
NewsInEssence, developed at the University of Michigan. It's available at
www.newsinessence.com.
- I noticed errors that Newsblaster made!
It's very difficult to write software that accurately deals with
natural language. Summaries will sometimes contain bad English and
other mistakes. Clusters occasionally end up with unrelated articles.
Up to top
Back to Newsblaster
The Newsblaster Team
As of May 2003, the Newsblaster team consists of:
Principal Investigators
Kathleen McKeown
Judith Klavans
Vasileios Hatzivassiloglou
Professors
Luis Gravano
Postdocs
John Chen
Students
Regina Barzilay (graduated)
Wisam Dakka
David Evans
Ani Nenkova
Carl Sable (graduated)
Barry Schiffman
Programmers
David Elson
Sergey Sigelman
Michael Tanenblatt
Up to top
Back to Newsblaster
|