Philip & I had an idea during HackPrinceton to try and increase context given when reading news articles, and this is a post trying to explain our approaches. We won the HackPrinceton Bloomberg Prize. We've also open-sourced our whole code base!
We set out to develop a reading experience that incorporates context of news articles. Given an article from a popular newspaper, what are possible stories and pictures that complement the article's content. We suggest a feed of content, including sentiment data on locations of events covered by the article and tweets from previous readers, to provide a more informed, and efficient reading experience.
Online KMeans was used to cluster articles correlated by entities, or keywords. Synergizing Twitter and Bloomberg sentiment schemas was tough, but gave us a powerful way of expressing social and geographical sentiments on the article's content. By ranking images from Imgur and Reddit by upvote and entities (names, organizations, etc.) provided by New York Times' API, we crowd-source news photography (works better for world news).
Pulling all these cogs together to build our first complete Hackathon project was a great ride. We've ended up with a proof-of-concept and an API for abstracting relationships between articles which we'll leave open for anyone to use, and are looking forward to showing it off.