Spark for Data engineers

Data Analysts, data Scientist, Business Intelligence analysts and many other roles require data on demand. Fighting with data silos, many scatter databases, Excel files, CSV files, JSON files, APIs and potentially different flavours of cloud storage may be tedious, nerve-wracking and time-consuming.

Automated process that would follow set of steps, procedures and processes take subsets of data, columns from database, binary files and merged them together to serve business needs and potentials is and still will be a favorite job for many organizations and teams.

Apache Spark™ is designed to to build faster and more reliable data pipelines, cover low level and structured API and brings tools and packages for Streaming data, Machine Learning, data engineering and building pipelines and extending the Spark ecosystem.

Spark is an absolute winner for this tasks and a great choice for adoption.

Data Engineering should have the extent and capability to do:

- System architecture
- Programming
- Database design and configuration
- Interface and sensor configuration

And in addition to that, it is as important as familiarity with the technical tools is, the concepts of data architecture and pipeline design are even more important. The tools are worthless without a solid conceptual understanding of:

- Data models
- Relational and non-relational database design
- Information flow
- Query execution and optimisation
- Comparative analysis of data stores
- Logical operations

Apache Spark have all the technology built-in to cover these topics and has the capacity for achieving a concrete goal for assembling together functional systems to do the goal.

Workshop Title: "Spark for Data Engineers"

Target Audience: Data engineer, BI Engineer, Cloud data engineer

Broader Audiance: Analysts, BI Analysts, Big Data analysts, DevOps data engineer, Machine Learning engineer, Statisticians, Data Scientist, Database Administrator, Data Orchestrator, Data Architect

Prerequisite knowledge for attendees:
Data engineering tasks:
- analyzing and organizing raw data (with T-SQL or Python or R or Scala)
- buidling data transformations and pipelines (with T-SQL or Python or R or Scala)

Technical prerequisite for attendees:
- working laptop with ability to install Apache Spark and other tools
- Access to internet
- Credentials and credit (free credit) for accessing Azure portal

Agenda for the day (9AM – 5PM; Start and end time can vary and will be finalised with organizator)

1. Module 1 (9.00AM – 10.00 AM): Getting to know Apache Spark, Installation and setting up the environment
2. Coffee Break 15'
3. Module 2 (10.00 – 11.15): Creating Datasets, organising raw data and working with structured APIs
4. Coffee Break 15'
5. Module 3 (11:30 – 13.00): Designing and building pipelines, moving data and building data models with Spark
6. Lunch: 13.00 – 14.00
7. Module 4: Data and process orchestration, deployment and Spark Applications (14.00 - 15.00)
8. Coffee break 15'
9. Module 5: Data Streaming with (15.15 - 16.15)
10. Module 6: Ecosystem, tooling and community (16.15 - 17.00)

All modules have hands-on material that will be given to attendees at the beginning of the training.

Feedback link:

Starts: 09:00 8th Mar 2022
Ends: 17:00 8th Mar 2022


Short Description
- Spark is becoming a go-tool for many data engineers. Understading the architecture, data transformation, relational data, pipelines and orchestration will help you deliver better on-prem and cloud solution.


Tomaž Kaštrun


AzureOn PremisesData IntegratorManaging Big DataDeploymentManagingSparkIntermediate

The SQL Bits Story

SQLBits was formed in 2007 by a group of volunteers who were passionate about the SQL Server product suite and wanted to provide much-needed community-driven education to the data community.

As one of the largest data platform conferences in the world, we offer more opportunities to a wider audience.

15 Years

We’ve grown and expanded a lot since 2007.

2500 Participants

SQLBits is the best place to meet fellow data professionals.

82 Countries

We welcome data professionals from all over the globe.

1140 recorded sessions

All the live sessions are recorded and offered for free, year round.

Experience the SQLBits Conference


Want to be part of the SQLBits community?

Attend the London conference in-person or virtually on 

March 8-12, 2022 at ExCel London, UK.