The Distant Reader

The Distant Reader is a tool enabling the scientist to read a large volume of text -- to use & understand a corpora.

Given a variety of inputs, The Distant Reader creates a corpus of materials, applies various natural language processing & text mining techniques to the result, and presents a scientist with both the raw results as well as interfaces for interacting with the results. Through this process the reader ought to be able to consume the given literature quickly & easily.

For example, a scientist could submit a list of URLs representing scholarly journal articles. The Distant Reader would then harvest the articles, convert them into plain text, count & tabulate the words, extract named entities, provide tools for searching the corpus in any number of ways (semantically, grammatically, or through relevance ranking), enable to scientists to download the analysis, or use a number of Web-based tools to interpret the results. The interfaces would then track trends over space as well as time. It would also enable the scientist to answer questions against the text such as: what is being discussed, what happens in the text, how is it described, who is mentioned in the text, what places are mentioned, etc.

The Distant Reader supports any type of science, as long as the input is textual, but I suppose the whole thing is computational linguistics.

