class: center, middle, inverse, title-slide # Automating data collection for CorrelAid ## A (personal) evolution from Raspberry Pi to AWS ### Frie Preu ### CorrelAid OODM 4 - 2020-09-23 --- class: center, middle <iframe src="https://giphy.com/embed/7TZvWKVkm0xXi" width="480" height="260" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/back-to-the-future-delorean-regreso-al-futuro-7TZvWKVkm0xXi">via GIPHY</a></p> -- **Let's go back to 2017!** --- # 2017 Frie .pull-left[ <img src="img/frie-may2017.jpg" width="3456" /> ] .pull-right[ - skills - ~3 years worth of R skills - ~1.5 years experience with Linux - ~ _some_ Python skills - **no** experience with Cloud (VM who? what is a server?) - Raspberry Pi to practice Linux - about to start writing MA thesis, injured in summer 🤦 ] --- # 2017 CorrelAid infrastructure - no money ❌ - no Manitu servers / databases ❌ - no Azure ❌ - Website on PHP (?) ✅ - Mailchimp Newsletter ✅ --- # Enter *fripi* <a title="Gareth Halfacree from Bradford, UK / CC BY-SA (https://creativecommons.org/licenses/by-sa/2.0)" href="https://commons.wikimedia.org/wiki/File:Raspberry_Pi_3_B%2B_(39906369025).png"><img width="512" alt="Raspberry Pi 3 B+ (39906369025)" src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/Raspberry_Pi_3_B%2B_%2839906369025%29.png/512px-Raspberry_Pi_3_B%2B_%2839906369025%29.png"></a> - 💸: ~60 Euro (with accessoires) - Specs: 1GB RAM, Raspian --- class: inverse, center, middle # Automatically collecting CorrelAid data **Goal**: Track + display growth of our network + relevant social media channels --- ![](index_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- ## Version 1: Raspberry Pi + R (2017) <img src="img/correlaid-analytics.png" width="983" /> --- ## Version 1: Raspberry Pi + R (2017) ``` correlaid-analytics/ ├── code │ ├── 01_get_daily_analytics.R │ ├── 02_upload_weekly.R │ ├── get_network_stats.R │ ├── requirements.R │ └── utils.R ├── data │ ├── all_daily.json │ └── all_weekly.json ├── run_daily.sh └── run_weekly.sh ``` check out this version of the code [here](https://github.com/friep/correlaid-utils/tree/9f2506f90773e34f409be46f164bbbc16e8c7b9d). --- # Dezember 2017 Frie 🧑💻 [https://www.codecentric.de](https://www.codecentric.de) --- ## Version 2: AWS Lambda + Python (2018) ```bash correlaid-analytics ├── daily.py ├── deploy-analytics.sh ├── every_monday.py ├── package-lock.json ├── requirements.txt ├── serverless.yml └── setup.sh ``` This is the current version at [https://github.com/friep/correlaid-utils](https://github.com/friep/correlaid-utils). --- ## Version 2: AWS Lambda + Python (2018) > AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. [...] > The purpose of Lambda, as compared to AWS EC2, is to simplify building smaller, on-demand applications [...] (([Wikipedia](https://en.wikipedia.org/wiki/AWS_Lambda)) -- - *event-driven*: it only runs responding to an **event** - the event can be a cronjob 👀 -- - *serverless*: underlying servers are **automatically** started + stopped by AWS (-> RIP fripi) -- - *smaller, on-demand applications*: those are called **functions** -- - payment per execution -> free / very cheap! --- ## Version 2: AWS Lambda + Python (2018) Code written in two Python files. A lot of functions but one main executing function per file: - `get_correlaid_data` in `daily.py` - `upload_to_ftp` in `every_monday.py` --- ## Version 2: AWS Lambda + Python (2018) The [serverless](https://serverless.com) framework allows to define Lambda *functions* in a yml file (`serverless.yml`) and makes deployment to AWS very easy. ```yml functions: daily_correlaid_analytics: handler: daily.get_correlaid_data events: - schedule: rate: cron(56 22 * * ? *) weekly_upload: handler: every_monday.upload_to_ftp events: - schedule: rate: cron(05 00 ? * TUE *) ``` --- ## Version 2: AWS Lambda + Python (2018) deployment `deploy-analytics.sh` ```bash #source setup-aws-profile.sh source setup.sh serverless deploy -v ``` -- This command will: 1. look at serverless.yml and package up the defined functions together with the dependencies (e.g. Python packages) 2. connect to AWS and create the resources needed (mostly AWS Lambda functions) --- ## Version 2: AWS Lambda + Python <img src="img/correlaid-analytics_v2.png" width="983" /> --- ## Takeaways * Raspberry Pi's as mini-servers are cool! * Cloud is cool as well! * You can learn a lot in a year (given the right circumstances / opportunities) cod at [https://github.com/friep/correlaid-utils](https://github.com/friep/correlaid-utils)