What data on myself I collect and why? (2020)

What data on myself I collect and why? | beepb00p

-->

settings: show timestamps

What data on myself I collect and why? [see within blog graph]

How I am using 50+ sources of my personal data

This is the list of personal data sources I use or planning to use with rough guides on how to get your hands on that data if you want it as well. It's still incomplete and I'm going to update it regularly.

My goal is collecting almost all of my digital trace, automating data collection to the maximum extent possible and making it work in the background, so one can set up pipelines once and hopefully never think about it again.

This is kind of a follow-up on my previous post on the sad state of personal data, and part of my personal way of getting around this sad state.

If you're terrified by the long list, you can jump straight into "Data consumers" section to find out how I use it. In addition, check out my infrastructure map, which might explain it better!

Table of Contents

1. Why do you collect X? How do you use your data?

backup

lifelogging

quantified self

2. What do I collect/want to collect?

Amazon

Arbtt (desktop time tracker)

Bitbucket (repositories)

Bluemaestro (environment sensor)

Blood

Browser history (Firefox/Chrome)

Emfit QS (sleep tracker)

Endomondo

Facebook

Facebook Messenger

Feedbin

Feedly

Fitbit

Foursquare/Swarm

Github (repositories)

Github (events)

Gmail

Goodreads

Google takeout

STRTHackernews

HSBC bank

Hypothesis

Instapaper

Jawbone

Kindle

Kobo reader

Last.fm

Monzo bank

Nomie

Nutrition

Photos

PDF annotations

Pinboard

Plaintext notes

Pocket

Remember the Milk

Rescuetime

Shell history

Sleep

Sms/calls

Spotify

Stackexchange

Taplog

Twitter

VK.com

Weight

TODOWhatsapp

23andme

3. Data consumers

Instant search

orger

promnesia

dashboard

timeline

HPI python package

4. --

¶1 Why do you collect X? How do you use your data?

All things considered, I think it's a fair question! Why bother with all this infrastructure and hoard the data if you never use it?

In the next section, I will elaborate on each specific data source, but to start with I'll list the rationales that all of them share:

¶backup

It may feel unnecessary, but shit happens. What if your device dies, account gets suspended for some reason or the company goes bust?

¶lifelogging

Most data in digital form got timestamps, so automatically, without manual effort, constitutes data for your timeline.

I want to remember more, be able to review my past and bring back and reflect on memories. Practicing lifelogging helps with that.

It feels very wrong that things can be forgotten and lost forever. It's understandable from the neuroscience point of view, i.e. the brain has limited capacity and it would be too distracting to remember everything all the time. That said, I want to have a choice whether to forget or remember events, and I'd like to be able to potentially access forgotten ones.

¶quantified self

Most collected digital data is somewhat quantitative and can be used to analyze your body or mind.

¶2 What do I collect/want to collect?

As I mentioned, most of the collected data serve as a means of backup/lifelogging/quantified self, so I won't mention them again in the 'Why' sections.

All my data collection pipelines are automatic unless mentioned otherwise .

Some scripts are still private so if you want to know more, let me know so I can prioritize sharing them.

¶Amazon

How: jbms/finance-dl

Why:

was planning to correlate them with monzo/HSBC transactions, but haven't got to it yet

¶Arbtt (desktop time tracker)

How: arbtt-capture

Why:

haven't used it yet, but it could be a rich source of lifelogging context

¶Bitbucket (repositories)

How: samkuehn/bitbucket-backup

Why:

proved especially useful considering Atlassian is going to wipe mercurial repositories

I've got lots of private mercurial repositories with university homework and other early projects, and it's sad to think of people who will lose theirs during this wipe.

¶Bluemaestro (environment sensor)

How: sensor syncs with phone app via Bluetooth, /data/data/com.bluemaestro.tempo_utility/databases/ is regularly copied to grab the data.

Why:

temperature during sleep data for the dashboard

lifelogging: capturing weather conditions information

E.g. I can potentially see temperature/humidity readings along with my photos from hiking or skiing.

¶Blood

How: via thriva, data imported manually into an org-mode table (not doing too frequently so wasn't worth automated scraping)

Also tracked glucose and ketones (with freestyle libre) for a few days out of curiosity, also didn't bother automating it.

Why:

contributes to the dashboard, could be a good way of establishing your baselines

¶Browser history (Firefox/Chrome)

How: custom scripts, copying the underlying sqlite databases directly, running on my computers and phone.

Why:

better browsing history

¶Emfit QS (sleep tracker)

Emfit QS is...

What data on myself I collect and why? (2020)

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI