Module II

Text as Data

What happens when text becomes data? This module explores computational text analysis and distant reading — and the particular challenges of working with Arabic and other regional languages in digital environments. What is gained and what is lost when we treat language as a dataset?

2 class sessions

Key Questions

  • What happens when text becomes data?
  • What does 'distant reading' reveal that close reading cannot?
  • What are the politics of digitizing Arabic and other regional languages?

Class 3 — Feb 11

What's text got to do with it?: Text as Data I

ATTEND NYCDH week event (Feb 8-11). This session explores how text becomes data — from distant reading and topic modeling to sentiment analysis. What new questions become possible when you can read thousands of texts at once? What gets lost?

Breakdown in Class

KITAB — Knowledge, Information Technology, and the Arabic Book

In-Class Tools

Voyant Sentiment Viz (Twitter) Tapor Google Books N-Gram Fairouz & Um Kulthoum lyrics analysis

From the Session

  • Voyant Guide
  • NgramReader+ Lite for Classical Arabic Corpus
  • jsLDA topic modeling (David Mimno)
  • Sentiment Analysis in Arabic
  • The Open Islamicate Texts Initiative
  • LINKED JAZZ
  • AtlasTI
  • TAPOR
  • Digital Ottoman Studies

Class 4 — Feb 18

No Class — Monday Schedule

No class — NYU follows the Monday schedule on this date.

Class 5 — Feb 25

What's text got to do with it?: Text as Data II

Report Back from NYCDH week/Workshop (Writeup Due April 1). Continuing the exploration of text as data — this session digs into datasets, OCR, and the politics of mass digitization. What does it mean when Google digitizes millions of books? Who decides what gets scanned? Discussion of "DATASET" as concept.

Breakdown in Class

Afghanistan Digital Library

In-Class Tools

Voyant with Said's Orientalism Netlytic analysis of SNL YouTube clip

From the Session

  • Omar: DroneWars & GPT-3
  • Computer Vision Explorer
  • Examples of Textual Analysis projects