+ - 0:00:00
Notes for current slide
Notes for next slide

Automating data collection for CorrelAid

A (personal) evolution from Raspberry Pi to AWS

Frie Preu

CorrelAid OODM 4 - 2020-09-23

1 / 17

via GIPHY

2 / 17

via GIPHY

Let's go back to 2017!
2 / 17

2017 Frie

  • skills
    • ~3 years worth of R skills
    • ~1.5 years experience with Linux
    • ~ some Python skills
    • no experience with Cloud (VM who? what is a server?)
  • Raspberry Pi to practice Linux
  • about to start writing MA thesis, injured in summer 🤦
3 / 17

2017 CorrelAid infrastructure

  • no money ❌
  • no Manitu servers / databases ❌
  • no Azure ❌
  • Website on PHP (?) ✅
  • Mailchimp Newsletter ✅
4 / 17

Enter fripi

Raspberry Pi 3 B+ (39906369025)

  • 💸: ~60 Euro (with accessoires)
  • Specs: 1GB RAM, Raspian
5 / 17

Automatically collecting CorrelAid data

Goal: Track + display growth of our network + relevant social media channels

6 / 17

7 / 17

Version 1: Raspberry Pi + R (2017)

8 / 17

Version 1: Raspberry Pi + R (2017)

correlaid-analytics/
├── code
│ ├── 01_get_daily_analytics.R
│ ├── 02_upload_weekly.R
│ ├── get_network_stats.R
│ ├── requirements.R
│ └── utils.R
├── data
│ ├── all_daily.json
│ └── all_weekly.json
├── run_daily.sh
└── run_weekly.sh

check out this version of the code here.

9 / 17

Dezember 2017 Frie

🧑💻

https://www.codecentric.de

10 / 17

Version 2: AWS Lambda + Python (2018)

correlaid-analytics
├── daily.py
├── deploy-analytics.sh
├── every_monday.py
├── package-lock.json
├── requirements.txt
├── serverless.yml
└── setup.sh

This is the current version at https://github.com/friep/correlaid-utils.

11 / 17

Version 2: AWS Lambda + Python (2018)

AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. [...] The purpose of Lambda, as compared to AWS EC2, is to simplify building smaller, on-demand applications [...] ((Wikipedia)

12 / 17

Version 2: AWS Lambda + Python (2018)

AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. [...] The purpose of Lambda, as compared to AWS EC2, is to simplify building smaller, on-demand applications [...] ((Wikipedia)

  • event-driven: it only runs responding to an event - the event can be a cronjob 👀
12 / 17

Version 2: AWS Lambda + Python (2018)

AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. [...] The purpose of Lambda, as compared to AWS EC2, is to simplify building smaller, on-demand applications [...] ((Wikipedia)

  • event-driven: it only runs responding to an event - the event can be a cronjob 👀

  • serverless: underlying servers are automatically started + stopped by AWS (-> RIP fripi)

12 / 17

Version 2: AWS Lambda + Python (2018)

AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. [...] The purpose of Lambda, as compared to AWS EC2, is to simplify building smaller, on-demand applications [...] ((Wikipedia)

  • event-driven: it only runs responding to an event - the event can be a cronjob 👀

  • serverless: underlying servers are automatically started + stopped by AWS (-> RIP fripi)

  • smaller, on-demand applications: those are called functions
12 / 17

Version 2: AWS Lambda + Python (2018)

AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. [...] The purpose of Lambda, as compared to AWS EC2, is to simplify building smaller, on-demand applications [...] ((Wikipedia)

  • event-driven: it only runs responding to an event - the event can be a cronjob 👀

  • serverless: underlying servers are automatically started + stopped by AWS (-> RIP fripi)

  • smaller, on-demand applications: those are called functions
  • payment per execution -> free / very cheap!
12 / 17

Version 2: AWS Lambda + Python (2018)

Code written in two Python files.

A lot of functions but one main executing function per file:

  • get_correlaid_data in daily.py
  • upload_to_ftp in every_monday.py
13 / 17

Version 2: AWS Lambda + Python (2018)

The serverless framework allows to define Lambda functions in a yml file (serverless.yml) and makes deployment to AWS very easy.

functions:
daily_correlaid_analytics:
handler: daily.get_correlaid_data
events:
- schedule:
rate: cron(56 22 * * ? *)
weekly_upload:
handler: every_monday.upload_to_ftp
events:
- schedule:
rate: cron(05 00 ? * TUE *)
14 / 17

Version 2: AWS Lambda + Python (2018)

deployment deploy-analytics.sh

#source setup-aws-profile.sh
source setup.sh
serverless deploy -v
15 / 17

Version 2: AWS Lambda + Python (2018)

deployment deploy-analytics.sh

#source setup-aws-profile.sh
source setup.sh
serverless deploy -v

This command will:

  1. look at serverless.yml and package up the defined functions together with the dependencies (e.g. Python packages)
  2. connect to AWS and create the resources needed (mostly AWS Lambda functions)
15 / 17

Version 2: AWS Lambda + Python

16 / 17

Takeaways

  • Raspberry Pi's as mini-servers are cool!
  • Cloud is cool as well!
  • You can learn a lot in a year (given the right circumstances / opportunities)

cod at https://github.com/friep/correlaid-utils

17 / 17

via GIPHY

2 / 17
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow