What data on myself I collect and why? | beepb00p
-->
settings:<br>show timestamps
What data on myself I collect and why?<br>[see within blog graph]
How I am using 50+ sources of my personal data
This is the list of personal data sources I use or planning to use with rough guides on how to get your hands on that data if you want it as well.<br>It's still incomplete and I'm going to update it regularly.
My goal is collecting almost all of my digital trace, automating data collection to the maximum extent possible and making it work in the background, so one can set up pipelines once and hopefully never think about it again.
This is kind of a follow-up on my previous post on the sad state of personal data, and part of my personal way of getting around this sad state.
If you're terrified by the long list, you can jump straight into "Data consumers" section to find out how I use it.<br>In addition, check out my infrastructure map, which might explain it better!
Table of Contents
1. Why do you collect X? How do you use your data?
backup
lifelogging
quantified self
2. What do I collect/want to collect?
Amazon
Arbtt (desktop time tracker)
Bitbucket (repositories)
Bluemaestro (environment sensor)
Blood
Browser history (Firefox/Chrome)
Emfit QS (sleep tracker)
Endomondo
Facebook Messenger
Feedbin
Feedly
Fitbit
Foursquare/Swarm
Github (repositories)
Github (events)
Gmail
Goodreads
Google takeout
STRTHackernews
HSBC bank
Hypothesis
Instapaper
Jawbone
Kindle
Kobo reader
Last.fm
Monzo bank
Nomie
Nutrition
Photos
PDF annotations
Pinboard
Plaintext notes
Remember the Milk
Rescuetime
Shell history
Sleep
Sms/calls
Spotify
Stackexchange
Taplog
Telegram
VK.com
Weight
TODOWhatsapp
23andme
3. Data consumers
Instant search
orger
promnesia
dashboard
timeline
HPI python package
4. --
¶1 Why do you collect X? How do you use your data?
All things considered, I think it's a fair question!<br>Why bother with all this infrastructure and hoard the data if you never use it?
In the next section, I will elaborate on each specific data source, but to start with I'll list the rationales that all of them share:
¶backup
It may feel unnecessary, but shit happens. What if your device dies, account gets suspended for some reason or the company goes bust?
¶lifelogging
Most data in digital form got timestamps, so automatically, without manual effort, constitutes data for your timeline.
I want to remember more, be able to review my past and bring back and reflect on memories. Practicing lifelogging helps with that.
It feels very wrong that things can be forgotten and lost forever.<br>It's understandable from the neuroscience point of view, i.e. the brain has limited capacity and it would be too distracting to remember everything all the time.<br>That said, I want to have a choice whether to forget or remember events, and I'd like to be able to potentially access forgotten ones.
¶quantified self
Most collected digital data is somewhat quantitative and can be used to analyze your body or mind.
¶2 What do I collect/want to collect?
As I mentioned, most of the collected data serve as a means of backup/lifelogging/quantified self, so I won't mention them again in the 'Why' sections.
All my data collection pipelines are automatic unless mentioned otherwise .
Some scripts are still private so if you want to know more, let me know so I can prioritize sharing them.
¶Amazon
How: jbms/finance-dl
Why:
was planning to correlate them with monzo/HSBC transactions, but haven't got to it yet
¶Arbtt (desktop time tracker)
How: arbtt-capture
Why:
haven't used it yet, but it could be a rich source of lifelogging context
¶Bitbucket (repositories)
How: samkuehn/bitbucket-backup
Why:
proved especially useful considering Atlassian is going to wipe mercurial repositories
I've got lots of private mercurial repositories with university homework and other early projects, and it's sad to think of people who will lose theirs during this wipe.
¶Bluemaestro (environment sensor)
How: sensor syncs with phone app via Bluetooth, /data/data/com.bluemaestro.tempo_utility/databases/ is regularly copied to grab the data.
Why:
temperature during sleep data for the dashboard
lifelogging: capturing weather conditions information
E.g. I can potentially see temperature/humidity readings along with my photos from hiking or skiing.
¶Blood
How: via thriva, data imported manually into an org-mode table (not doing too frequently so wasn't worth automated scraping)
Also tracked glucose and ketones (with freestyle libre) for a few days out of curiosity, also didn't bother automating it.
Why:
contributes to the dashboard, could be a good way of establishing your baselines
¶Browser history (Firefox/Chrome)
How: custom scripts, copying the underlying sqlite databases directly, running on my computers and phone.
Why:
better browsing history
¶Emfit QS (sleep tracker)
Emfit QS is...