Difference between revisions of "Twitter Analysis DB Details"
Russ hensel (talk | contribs) |
Russ hensel (talk | contribs) |
||
Line 32: | Line 32: | ||
* self.who_tweets = "djt" # id for who tweets, not much used yet | * self.who_tweets = "djt" # id for who tweets, not much used yet | ||
* self.use_spacy = True # processing words to lemmas | * self.use_spacy = True # processing words to lemmas | ||
+ | |||
+ | Name a database, I would start with a file name that does not exist, as defining and loading the db will erase any current content. Or if you are only defining some of the tables, or adding to a table start with an existing db. | ||
+ | |||
+ | * self.self.database_name = "tweet_big_words.db" # database with lots of words loaded. | ||
+ | == Run the GUI == | ||
+ | |||
+ | |||
+ | * <Show Load Parameters> will show you the values of some of the parameters used to load the db. If you do not like what you get edit the parameter file. ( and there is a button for that too ) | ||
+ | * <Define Tweets Concord> will clear the tweets and concord tables, and initialize the columns in them. | ||
+ | * <Load Tweets File> will load the tweet input file and populate both tweets and concord tables. | ||
+ | * <Define Words> will clear the words table, and initialize the columns in it. | ||
+ | * <Load Words> will load the words input file and populate the words table. | ||
+ | * as loaded the words table does not populate the words.words_rank column, there is another utility to do that that still needs to be added to the GUI | ||
+ | |||
+ | == Some Code and Theory == | ||
+ | |||
+ | |||
Revision as of 14:41, 18 May 2020
The main page for the project is Twitter Analysis DB - OpenCircuits
Contents
General
Look at the GUI Link TBD and just try it out. This page of documentation is both operational, that is using the app, and theoretical, as the ultimate guide to the app is to read the code. Module, class, and method names are fairly stable, but subject to change.
Performance
- I currently run the db on a ram disk, in any case put on your fastest drive.
- I have not tuned the db for performance. Got suggestions ( that are more than just guesses ). I am sure that more indexing might help, will try in time.
Debugging
Building a Database
I am working on providing DB building facilities from the GUI. Since this is sensitive to the input sources it only works with the type of input sources I have used. Not everything is in the GUI as of this writing, this will probably change.
Parameters
First to enable the GUI features you need to adjust the parameter file so
- self.show_db_def = True
Then you also need to point to the input files: ( these are in the github repo )
- self.tweet_input_file_name = r"./input/all_tweets_may_16_for_2020.txt" # where tweets are
- self.word_input_file_name = r"./input/english-word-frequency/unigram_freq.csv" # word frequency data from kaggal
Then some processing options:
- self.who_tweets = "djt" # id for who tweets, not much used yet
- self.use_spacy = True # processing words to lemmas
Name a database, I would start with a file name that does not exist, as defining and loading the db will erase any current content. Or if you are only defining some of the tables, or adding to a table start with an existing db.
- self.self.database_name = "tweet_big_words.db" # database with lots of words loaded.
Run the GUI
- <Show Load Parameters> will show you the values of some of the parameters used to load the db. If you do not like what you get edit the parameter file. ( and there is a button for that too )
- <Define Tweets Concord> will clear the tweets and concord tables, and initialize the columns in them.
- <Load Tweets File> will load the tweet input file and populate both tweets and concord tables.
- <Define Words> will clear the words table, and initialize the columns in it.
- <Load Words> will load the words input file and populate the words table.
- as loaded the words table does not populate the words.words_rank column, there is another utility to do that that still needs to be added to the GUI