Conversation

Jupyter notebook source on GitHub: conversation.ipynb

get conversation tree from question

Each #d status with a reply_count > 0 is the start of a diagnostic conversation tree.

reply_count

  • Not available in the standard API
  • Available from TweetScraper
  • dataset/replycount.py

Standard (free of charge) Twitter API doesn't allow to get all responses to a specific status. Method to route around this limitation:

  1. Use TweetScraper
  2. Search all replies to the user who posted the question status after a certain date and time
  3. We need to filter those answers with "in_reply_to_status_id" but this field is not present in the json object obtained with TweetScraper...
  4. Get the full Twitter object with the standard API
  5. store those objects in database to save API throttling and speed up the process for further lookup
  6. filter all collected answers with status["in_reply_to_status_id"] == status_id
  7. if true add to the corpus database
  8. repeat the process recursively for each answer with not null reply_count

Original tweet is 1st doc(s)toctoc tweet posted on 2012-06-06: https://twitter.com/DrKoibo/status/210290960695959553 Request is "to:DrKoibo since:2012-06-06"

# using pipenv
pipenv run scrapy crawl TweetScraper -a query="to:DrKoibo since:2012-06-06"

returns 8111 status (as of 2018-03-29)

Database structure

  • PostgreSQL