Add visualization support

This commit is contained in:
Jacob Manning 2018-07-01 12:04:10 -04:00
parent 74ed3146a6
commit a3ac722159
2 changed files with 67 additions and 7 deletions

View file

@ -1,10 +1,11 @@
# hangouts-parser
This repository parses conversation data from Google Hangouts and gives
diagnostics on the number of messages in conversations. I'm working on adding
support for visualizations based on the parsed data for more interesting
graphical views of the data.
diagnostics on the number of messages in conversations. Two scripts are
currently supported: `parser.py` and `visualize.py`. The parsing script parses
raw JSON data from Google Takeout and creates pickled summary files for each
conversation. The visualization script creates a histogram of messages over
time using the pickled conversation summaries.
## Usage
1. Clone this repository
@ -14,13 +15,13 @@ graphical views of the data.
+ Download the data in zip format and move the `Hangouts.json` file into the `raw` folder in this repository
3. Install dependencies via `pip`
+ `pip install -r requirements.txt`
+ No dependencies are required for the `parser.py` script, but `visualize.py` will require the dependencies
+ No dependencies are required for the `parser.py` script, but `visualize.py` requires the dependencies
4. Run the parser
+ **Note:** if you did not place your hangouts data as `raw/Hangouts.json` you can specify the path to the `.json` file as an argument to the `parser.py` script via the `-f` flag
```bash
python parser.py
```
5. **Coming soon** Run the visualization
5. Run the visualization
+ The `<conversation_id>` can be found in the output of the `parser.py`
script
```bash
@ -34,4 +35,4 @@ This code is freely available under the GNU Public License (GPL).
> All of the data processing in these scripts happens locally on your computer. The data you provide to the script is **NOT** uploaded to an external server. Feel free to examine the code if you are concerned.
### Acknowledgements
> This repository was inspired by [MasterScrat/Chatistics](https://github.com/MasterScrat/Chatistics). Chatistics can parse Facebook Messenger and Telegram data, but not Hangouts group messages. I originally intended to contribute to that repository and hadd Hangouts group message support, but my design drifted far from the existing design in that repository so I created a new project. Shoutout to MasterScrat for great work and thanks for the inspiration!
> This repository was inspired by [MasterScrat/Chatistics](https://github.com/MasterScrat/Chatistics). Chatistics can parse Facebook Messenger and Telegram data, but not Hangouts group messages. I originally intended to contribute to that repository and add Hangouts group message support, but my design drifted far from the existing design in that repository so I created a new project. Shoutout to MasterScrat for great work and thanks for the inspiration!

59
visualize.py Normal file
View file

@ -0,0 +1,59 @@
#!/usr/bin/env python
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import ggplot
import pickle
import argparse
import datetime
import pandas as pd
import utils
utils.set_log_level(1)
from utils import LOG_ERROR, LOG_DEBUG, LOG_INFO, LOG_WARN
def main(file_path):
# Validate raw data path
if not os.path.exists(file_path):
LOG_ERROR('Could not find file: {}'.format(file_path))
return
# Validate raw data file type
if not file_path.endswith('.pkl'):
LOG_ERROR('File path must be a pickle file')
return
with open(file_path, 'rb') as f:
LOG_INFO('Parsing pickle file: {}'.format(file_path))
conversation = pickle.load(f)
LOG_INFO('Found conversation: {}'.format(conversation['conversation_name']))
df = pd.DataFrame(conversation['messages'])
df.columns = ['Timestamp', 'Type', 'Participant']
# df['Datetime'] = pd.to_datetime(df['Timestamp'])
df['Datetime'] = df['Timestamp'].apply(lambda x:
datetime.datetime.fromtimestamp(float(x)).toordinal())
histogram = ggplot.ggplot(df, ggplot.aes(x='Datetime', fill='Participant')) \
+ ggplot.geom_histogram(alpha=0.6, binwidth=2) \
+ ggplot.scale_x_date(labels='%b %Y') \
+ ggplot.ggtitle(conversation['conversation_name']) \
+ ggplot.ylab('Number of messages') \
+ ggplot.xlab('Date')
print(histogram)
if __name__ == "__main__":
LOG_INFO('Started script')
parser = argparse.ArgumentParser()
parser.add_argument('-f', '--file_path', required=True,
type=str, help='Path to parsed data file')
args = parser.parse_args()
main(args.file_path)