Text Summarizer
I've been working for some time on a automatic text-summarization tool. It uses Stanford's Named Entity Recognizer for entity extraction, and I'm working on my own methods of improving their results. The summarizer makes multiple passes over the data: combining words into contractions, identifying who pronouns refer to ("she" refers to [some name], etc.), quote attribution, identifying possession relationships, etc.
The goal is to create an abstraction-based auto-summarization tool rather than extractive, but I made a quick extractive version for demonstration purposes using what I have so far. It identifies the most important entity or entities by number of references in the text, and removes all sentences that don't include them. An example is included below where the same news article is summarized using the top one, two, and three entities. You can see that a single-entity summarizer is usually enough, because adding more entities only add a few more sentences each to the summary. A link to the repository is included below.
The goal is to create an abstraction-based auto-summarization tool rather than extractive, but I made a quick extractive version for demonstration purposes using what I have so far. It identifies the most important entity or entities by number of references in the text, and removes all sentences that don't include them. An example is included below where the same news article is summarized using the top one, two, and three entities. You can see that a single-entity summarizer is usually enough, because adding more entities only add a few more sentences each to the summary. A link to the repository is included below.