How to create a simple ETL Job locally with PySpark, PostgreSQL and Docker

Source: itnext.io Introduction In this article, I’m going to demonstrate how Apache Spark can be utilised for writing powerful ETL jobs in Python. If you’re already familiar with Python and working with data from day to day, then PySpark is going to help you to create more scalable processing and analysis of (big) data. The data that I’ll use is scraped from Ebay-Kleinanzeigen, which is the German branch of Ebay where people can advertise their properties. In our case, we