Project description:
a python generator that produces speculative indexes to non-existent texts through spatial representation of textual data with NLP
Combining text processing, generation and NLP techniques in Python to uncover structural and semantic features of a text, the project explores the genre of the index as an infrastructure for containing information and a mechanism that maps cross references (much like a hyperlink), creating associative relations between things. Rather than the generated output producing correct or logical groupings of items, which it rarely did, I was more interested in the spatial layout of the text on the page being determined by another layer of meaning-making, in this case, a nearest neighbor algorithm, which was used to determine items (subentries) that are closest (or most similar) to any others in your data, classifying closer relation (which is often nonsensical) by proximity.
My corpus was comprised of the indexes of 3 books ( Michel Foucault - History of Sexuality Volume 1, Jose Muñoz - Cruising Utopia, and Sara Ahmed - Queer Phenomenology) as well as some phrases from my notebook.
Method:
1. pre-processing corpus. format .txt files of index corpus to make their structures uniform, then reassemble
parse text into subentries (can be viewed as textual nodes which are joined by referential links)
2. processing corpus. create a vector for each subentry in the corpus
use t-SNE (t-distributed stochastic neighbor embedding) to reduce the dimensionality of each vector
save the x-values of t-SNE visualization of each vectors as an array
map the vectors to a more workable range and append each of the re-mapped values to their corresponding subentry in a vector dictionary (associative array)
3. generating. a tracery grammar generates indexes by combining subentries together under a heading
the position of each subentry on the page is determined by the x-position of the vectors in 2-d space as determined by t-SNE