Difference between revisions of "Twitter Analysis DB Details"

From OpenCircuits
Jump to navigation Jump to search
Line 3: Line 3:
 
= General =
 
= General =
  
Look at the GUI  [[Link TBD]] and just try it out.
+
Look at the GUI  [[Link TBD]] and just try it out. This page of documentation is both operational, that is using the app, and theoretical, as the ultimate guide to the app is to read the code.  Module, class, and method names are fairly stable, but subject to change.
 
 
  
 
= Performance =
 
= Performance =

Revision as of 14:25, 18 May 2020

The main page for the project is Twitter Analysis DB - OpenCircuits

General

Look at the GUI Link TBD and just try it out. This page of documentation is both operational, that is using the app, and theoretical, as the ultimate guide to the app is to read the code. Module, class, and method names are fairly stable, but subject to change.

Performance

  • I currently run the db on a ram disk, in any case put on your fastest drive.
  • I have not tuned the db for performance. Got suggestions ( that are more than just guesses ). I am sure that more indexing might help, will try in time.

Debugging

Building a Database

I am working on providing DB building facilities from the GUI. Since this is sensitive to the input sources it only works with the type of input sources I have used. Not everything is in the GUI as of this writing, this will probably change.

Parameters

First to enable the GUI features you need to adjust the parameter file so

  • self.show_db_def = True

Then you also need to point to the input files: ( these are in the github repo )

  • self.tweet_input_file_name = r"./input/all_tweets_may_16_for_2020.txt" # where tweets are
  • self.word_input_file_name = r"./input/english-word-frequency/unigram_freq.csv" # word frequency data from kaggal

Then some processing options:

  • self.who_tweets = "djt" # id for who tweets, not much used yet
  • self.use_spacy = True # processing words to lemmas