mirror of
https://github.com/jacobmanning/hangouts-parser.git
synced 2025-01-22 09:22:04 -05:00
Add visualization support
This commit is contained in:
parent
74ed3146a6
commit
a3ac722159
2 changed files with 67 additions and 7 deletions
15
README.md
15
README.md
|
@ -1,10 +1,11 @@
|
|||
# hangouts-parser
|
||||
|
||||
This repository parses conversation data from Google Hangouts and gives
|
||||
diagnostics on the number of messages in conversations. I'm working on adding
|
||||
support for visualizations based on the parsed data for more interesting
|
||||
graphical views of the data.
|
||||
|
||||
diagnostics on the number of messages in conversations. Two scripts are
|
||||
currently supported: `parser.py` and `visualize.py`. The parsing script parses
|
||||
raw JSON data from Google Takeout and creates pickled summary files for each
|
||||
conversation. The visualization script creates a histogram of messages over
|
||||
time using the pickled conversation summaries.
|
||||
|
||||
## Usage
|
||||
1. Clone this repository
|
||||
|
@ -14,13 +15,13 @@ graphical views of the data.
|
|||
+ Download the data in zip format and move the `Hangouts.json` file into the `raw` folder in this repository
|
||||
3. Install dependencies via `pip`
|
||||
+ `pip install -r requirements.txt`
|
||||
+ No dependencies are required for the `parser.py` script, but `visualize.py` will require the dependencies
|
||||
+ No dependencies are required for the `parser.py` script, but `visualize.py` requires the dependencies
|
||||
4. Run the parser
|
||||
+ **Note:** if you did not place your hangouts data as `raw/Hangouts.json` you can specify the path to the `.json` file as an argument to the `parser.py` script via the `-f` flag
|
||||
```bash
|
||||
python parser.py
|
||||
```
|
||||
5. **Coming soon** Run the visualization
|
||||
5. Run the visualization
|
||||
+ The `<conversation_id>` can be found in the output of the `parser.py`
|
||||
script
|
||||
```bash
|
||||
|
@ -34,4 +35,4 @@ This code is freely available under the GNU Public License (GPL).
|
|||
> All of the data processing in these scripts happens locally on your computer. The data you provide to the script is **NOT** uploaded to an external server. Feel free to examine the code if you are concerned.
|
||||
|
||||
### Acknowledgements
|
||||
> This repository was inspired by [MasterScrat/Chatistics](https://github.com/MasterScrat/Chatistics). Chatistics can parse Facebook Messenger and Telegram data, but not Hangouts group messages. I originally intended to contribute to that repository and hadd Hangouts group message support, but my design drifted far from the existing design in that repository so I created a new project. Shoutout to MasterScrat for great work and thanks for the inspiration!
|
||||
> This repository was inspired by [MasterScrat/Chatistics](https://github.com/MasterScrat/Chatistics). Chatistics can parse Facebook Messenger and Telegram data, but not Hangouts group messages. I originally intended to contribute to that repository and add Hangouts group message support, but my design drifted far from the existing design in that repository so I created a new project. Shoutout to MasterScrat for great work and thanks for the inspiration!
|
||||
|
|
59
visualize.py
Normal file
59
visualize.py
Normal file
|
@ -0,0 +1,59 @@
|
|||
#!/usr/bin/env python
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import ggplot
|
||||
import pickle
|
||||
import argparse
|
||||
import datetime
|
||||
|
||||
import pandas as pd
|
||||
|
||||
import utils
|
||||
utils.set_log_level(1)
|
||||
|
||||
from utils import LOG_ERROR, LOG_DEBUG, LOG_INFO, LOG_WARN
|
||||
|
||||
def main(file_path):
|
||||
# Validate raw data path
|
||||
if not os.path.exists(file_path):
|
||||
LOG_ERROR('Could not find file: {}'.format(file_path))
|
||||
return
|
||||
|
||||
# Validate raw data file type
|
||||
if not file_path.endswith('.pkl'):
|
||||
LOG_ERROR('File path must be a pickle file')
|
||||
return
|
||||
|
||||
with open(file_path, 'rb') as f:
|
||||
LOG_INFO('Parsing pickle file: {}'.format(file_path))
|
||||
conversation = pickle.load(f)
|
||||
|
||||
LOG_INFO('Found conversation: {}'.format(conversation['conversation_name']))
|
||||
|
||||
df = pd.DataFrame(conversation['messages'])
|
||||
df.columns = ['Timestamp', 'Type', 'Participant']
|
||||
# df['Datetime'] = pd.to_datetime(df['Timestamp'])
|
||||
df['Datetime'] = df['Timestamp'].apply(lambda x:
|
||||
datetime.datetime.fromtimestamp(float(x)).toordinal())
|
||||
|
||||
histogram = ggplot.ggplot(df, ggplot.aes(x='Datetime', fill='Participant')) \
|
||||
+ ggplot.geom_histogram(alpha=0.6, binwidth=2) \
|
||||
+ ggplot.scale_x_date(labels='%b %Y') \
|
||||
+ ggplot.ggtitle(conversation['conversation_name']) \
|
||||
+ ggplot.ylab('Number of messages') \
|
||||
+ ggplot.xlab('Date')
|
||||
|
||||
print(histogram)
|
||||
|
||||
if __name__ == "__main__":
|
||||
LOG_INFO('Started script')
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('-f', '--file_path', required=True,
|
||||
type=str, help='Path to parsed data file')
|
||||
args = parser.parse_args()
|
||||
|
||||
main(args.file_path)
|
Loading…
Reference in a new issue