class: center, middle, inverse, title-slide # Using R and a Raspberry Pi to automate social media data collection ### Frie Preu ### RLadies Tunis, 2021-02-20 --- # About me - political scientist turned data scientist turned IT consultant / software developer... something else? - useR since 2013/2015 - CorreAid volunteer since 2015, full-time since 2020 --- # About CorrelAid - German(/European) Data4Good network with over 1500 volunteers - data4good projects with external partners - education: e.g. meetups, tidytuesday, workshops, annual conference, internal projects,.. - dialogue with society - excellent opportunity to try out things --- # About this project 2017: new website withπ β‘οΈ collect social media time series: facebook, twitter, mailchimp subscribers ![](index_files/figure-html/unnamed-chunk-1-1.png)<!-- --> --- # Requirements for automated data collection - π€ somewhere to run our code on - π automatically execute code at regular intervals - πΎ store data for later, easy access - π¬ notify us if something is wrong --- # π€: A Raspberry Pi .pull-left[ <a title="Gareth Halfacree from Bradford, UK / CC BY-SA (https://creativecommons.org/licenses/by-sa/2.0)" href="https://commons.wikimedia.org/wiki/File:Raspberry_Pi_3_B%2B_(39906369025).png"><img width="512" alt="Raspberry Pi 3 B+ (39906369025)" src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/Raspberry_Pi_3_B%2B_%2839906369025%29.png/512px-Raspberry_Pi_3_B%2B_%2839906369025%29.png"></a> ] .pull-right[ - tiny and affordable computer, originally used for teaching - large open-source community, many different projects - πΈ: ~10-110 Euro (with accessoires) - Specs: 512MB - 8GB RAM, own OS (Raspian) ] --- # π: Cron jobs > Cron is one of the most useful utility that you can find in any Unix-like operating system. It is used to schedule commands at a specific time. These scheduled commands or tasks are known as "Cron Jobs". ([Source](https://ostechnix.com/a-beginners-guide-to-cron-jobs)) ![https://ostechnix.com/a-beginners-guide-to-cron-jobs](https://ostechnix.com/wp-content/uploads/2018/05/cron-job-format-1.png) -- ```r 50 23 * * * /usr/lib/R/bin/Rscript '/home/frie/correlaid-utils/correlaid-analytics/run.R' ``` .footnote[Note: Slide adapted from Alex Kapps presentation, see [here](https://docs.correlaid.org/correlcollection/open-online-data-meetup#how-to-store-thousands-of-shared-bike-locations-every-4-minutes-into-a-database). Image source: https://ostechnix.com/wp-content/uploads/2018/05/cron-job-format-1.png.] --- # Project timeline & versions .pull-left[ [mid 2017 - Oct. 2017](https://github.com/friep/correlaid-utils/tree/9f2506f90773e34f409be46f164bbbc16e8c7b9d) <br> <br> [early 2018 - mid 2018 (?)](https://github.com/friep/correlaid-utils/tree/1ed5a5b4416beab950bcc1313ae6bc2f8fab1b22) <br> <br> mid 2018 - late 2020 [late 2020](https://github.com/friep/correlaid-utils) ] .pull-right[ Raspberry Pi + R + mlab, cf. [talk at OODM](https://youtu.be/tFRNBHqg_ZQ?t=2290) AWS Lambda, Serverless & Python, cf. [talk at OODM](https://youtu.be/tFRNBHqg_ZQ?t=2413) β Raspberry Pi + R + GitHub + GitHub actions ] --- class: center, middle, inverse # R and Raspberry Pi - 2017 version --- # 2017 version: diagram <img src="img/r_v1.png" width="1003" /> --- # 2017 version: summary - π€ Raspberry Pi - π Cron - πΎ mlab - π¬ β -- ### Problems - one big, messy R script - authentication details in text files checked into (private) GitHub (β οΈ) - code quality ... --- class: center, middle, inverse # 2018: Python + AWS Lambda + Serverless --- # Dezember 2017 Frie π§π» [https://www.codecentric.de](https://www.codecentric.de) --- # 2018 version: diagram <img src="img/correlaid-analytics_v2.png" width="983" /> --- # 2018 version: What is AWS Lambda? > AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. [...] > The purpose of Lambda, as compared to AWS EC2, is to simplify building smaller, on-demand applications [...] (([Wikipedia](https://en.wikipedia.org/wiki/AWS_Lambda)) -- - *event-driven*: it only runs responding to an **event** - the event can be a cronjob π -- - *serverless*: underlying servers are **automatically** started + stopped by AWS (-> RIP fripi) -- - *smaller, on-demand applications*: those are called **functions** -- - payment per execution -> free / very cheap! --- # 2018: AWS Lambda + Python ```bash correlaid-analytics βββ daily.py βββ deploy-analytics.sh βββ every_monday.py βββ package-lock.json βββ requirements.txt βββ serverless.yml βββ setup.sh ``` --- # 2018 version: serverless The [serverless](https://serverless.com) framework allows to define Lambda *functions* in a yml file (`serverless.yml`) and makes deployment to AWS very easy. ```yml functions: daily_correlaid_analytics: handler: daily.get_correlaid_data events: - schedule: rate: cron(56 22 * * ? *) ``` Deployment with: ```bash serverless deploy -v ``` --- # 2018 summary - π€ AWS Lamdba (runs on AWS) - π Cron - πΎ hosted MySQL - π¬ AWS Lambda alerts --- class: center, middle, inverse # R and Raspberry Pi - 2020 version --- <img src="img/r_v2.png" width="1004" /> --- # 2020 version: Cron job ```r ## cronR job ## id: daily_analytics ## tags: ## desc: Get daily CorrelAid Analytics 50 23 * * * cd '/home/frie/correlaid-utils/correlaid-analytics' && /usr/lib/R/bin/Rscript '/home/frie/correlaid-utils/correlaid-analytics/run.R' > '/home/frie/correlaid-utils/correlaid-analytics/run.log' 2>&1 ``` set up with the very helpful {[cronR](https://github.com/bnosac/cronR)} π¦ -- ### run.R ```r library(here) print("==============================") print(Sys.time()) source(here::here("correlaid-analytics/01_get_daily_analytics.R")) source(here::here("correlaid-analytics/02_git.R")) ``` --- # 2020 version: files ```r correlaid-analytics/ βββ 01_get_daily_analytics.R βββ 02_git.R βββ cron.R βββ data β βββ all_daily.csv βββ run.log βββ run.R ``` [01_get_daily_analytics.R](https://github.com/friep/correlaid-utils/blob/main/correlaid-analytics/01_get_daily_analytics.R) --- # 2020 version: smcounts π¦ ```r library(smcounts) smcounts::collect_data ``` ``` ## function (slack = TRUE, facebook = TRUE, twitter = TRUE, mailchimp = TRUE) ## { ## df <- tibble::tibble(date = c(), platform = c(), n = c()) ## if (slack) { ## slack_df <- ca_slack() ## df <- rbind(df, slack_df) ## } ## if (facebook) { ## facebook_df <- ca_facebook() ## df <- rbind(df, facebook_df) ## } ## if (twitter) { ## twitter_df <- ca_twitter() ## df <- rbind(df, twitter_df) ## } ## if (mailchimp) { ## mailchimp_df <- ca_newsletter() ## df <- rbind(df, mailchimp_df) ## } ## return(df) ## } ## <bytecode: 0x7ff213720b60> ## <environment: namespace:smcounts> ``` --- # smcounts π¦ - abstracts data collection functionality --> can be reused in other contexts - define dependencies via DESCRIPTION file - easy installation from [GitHub](https://github.com/friep/smcounts) (https://github.com/friep/smcounts) - uses environment variables (standard way to store API keys etc.) --- # 2020 version πΎ: Git ### 02_git.R ```r # gert (https://docs.ropensci.org/gert/index.html) library(gert) gert::git_pull() print(gert::git_status()) gert::git_add("correlaid-analytics/data/all_daily.csv") gert::git_commit(message = "π€ CRON - update daily data", author = git_signature("raspi3", "raspi3@pr130.dev")) gert::git_push() ``` ```r ca_counts <- readr::read_csv("https://raw.githubusercontent.com/friep/correlaid-utils/main/correlaid-analytics/data/all_daily.csv") ``` ``` ## ## ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ## cols( ## date = col_date(format = ""), ## platform = col_character(), ## n = col_double() ## ) ``` --- # 2020 version: π¬ GitHub Action - CI/CD tool (continuous integration, continuous deployment) to define *workflows* in yml files - typical use case: run checks on R Packages (e.g. [dplyr](https://github.com/tidyverse/ggplot2/actions?query=workflow%3AR-CMD-check)), build websites - different kinds of triggers: push, pull request, cron job (π) -- ### correlaid-utils workflow - runs every morning to check whether a commit has been made to `all_daily.csv` in last 24 hours - if yes: β - if no: β -> workflow fails and GitHub sends email - [yml file](https://github.com/friep/correlaid-utils/blob/main/.github/workflows/notify_on_failure.yml) - [workflow runs](https://github.com/friep/correlaid-utils/actions) -> works for the case when the raspberry pi is offline. does not catch [other cases](https://github.com/friep/correlaid-utils/issues/11)! --- # 2021 version: Bonus π - knit small website with interactive graphs - https://friep.github.io/correlaid-utils - does not work on raspberry pi 3 because of old version (3.5) -> instead use GitHub Actions for this --- # 2020 version: summary - π€ Raspberry Pi - π Cron - πΎ GitHub - π¬ GitHub Actions -- ### 2020 version vs. 2017 version - β better decoupling through smcounts package - β more stability - π€ git as storage option & github action - π better error handling - π tests!! --- class: center, inverse, middle # Alternatives & Summary --- # Alternatives #### Server π€ - Virtual machines on AWS, Azure, Google Cloud - specialized services from AWS, Azure, ... - GitHub Actions or other CI/CD services (?!) -- #### Storage πΎ - a proper database - local - in the cloud (e.g. [AWS RDS free tier](https://aws.amazon.com/rds/free/?nc1=h_ls), [elephantsql](https://www.elephantsql.com)) - file storage for csv file (e.g. free AWS S3) -- #### Notifications π¬ - make built-in cron emailing functionality work on Raspberry Pi - monitoring services on AWS etc. (e.g. AWS SNS) --- # Summary - Things you can learn: git, cron jobs, ssh, scp, basics of networking, command line, bash scripting, to write code that works not only on your machine... - Buy a Raspberry Pi, if... - ... you want to get more experience with virtual machines / "the cloud" etc. but you feel like you need something in between - ... you have a use case (and 2-3 other use cases once you "graduate" to the cloud)! -- - Don't buy one if... - ... you'll have to work with cloud services soon anyway - ... you don't have the time / nerves to work without RStudio / non-interactively - ... you have project ideas that require complex architectures / more computing resources / new packages --- # Thanks for coming! ### Links - [Slides](https://talks.pr130.dev/2020-11-24_rladies_bucharest_raspberrypi/index.html) - [correlaid-utils Repository](https://github.com/friep/correlaid-utils) with a (hopefully) helpful README - [smcounts](https://github.com/friep/smcounts) R Package - [talk at CorrelAid Open Online Data Meetup](https://youtu.be/tFRNBHqg_ZQ?t=1966) ### Follow me / Reach out - <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path></svg> [frie.p@correlaid.org](frie.p@correlaid.org) - <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg>[ameisen_strasse](https://twitter.com/ameisen_strasse) - <svg viewBox="0 0 496 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"></path></svg>[https://pr130.dev](https://pr130.dev) - <svg viewBox="0 0 496 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"></path></svg> [correlaid.org](https://correlaid.org)