Starting a career in data science has been a long-term goal for me. It surely did not happen overnight, and that is exactly why I wanted it to be big.
I am Bartosz Smok, a data engineer. This year I am celebrating my second workiversary with Kainos after taking part in the 3rd edition of the Big Data Academy – there is no better time to look back on my journey and tell you all about it.
Let’s start with a bit of background but first, the big question:
It all started almost three years ago, and it went something like this:
Bartosz: I want to start my career in data, which tech companies are the best in Tricity (Poland)?
Dev friend: I know some. Have you heard of Kainos?
So there it was. I did my research, and I immediately knew that it would be great to work there someday.
At first, I did not succeed
Shortly after, I heard about the Big Data Academy. It felt like my way into the data world. I prepared my resume, and I submitted my application. A couple of days later, I found out I was not selected – but frankly, that was okay, I was not quite there yet, I was not ready for my dream role. But you know what they say – great things take time.
Meanwhile, I started a new role as a Junior Java Developer for a small company in Gdańsk. I did not lose my focus, I knew exactly where I wanted to be, and that is why I would spend all my time after the 9-5 job studying – data, data, data.
And so I tried again and again
During this time, I have learned software development fundamentals and extended my knowledge about data. There were not many meetups and events I did not attend (yes, back in the day, these were actually a thing).
And then one day – I got my second chance. After months, Kainos opened the applications for the 3rd Big Data Academy – I was so ready. General interest in data only grew during this time, so it was clear that the competition was greater than ever. Nevertheless, I did it – I got accepted, and I was only days away from starting my amazing (data) journey.
The Big Data Academy
The first two weeks of the academy consisted strictly of hands-on workshops. We learned cloud, Hadoop, HDFS, Sqoop, Hive, Impala, and many more. It was all about fundamentals, tools and best practices.
At the end of the two weeks, it was time for us to fortify everything we have learned so far. We developed a short proof-of-concept project: a data streaming pipeline. Its privacy and efficiency were our top priorities. Technology-wise we used AWS, Scala, Bash for scripting and SQL for data management. For data processing, Hadoop and Spark and finally, Kafka for streaming the data.
As with every team project, there are ups and downs. The worst part? Issues when anonymising sensible data. The best part? Eventually anonymising sensible data.
A couple of weeks later, at the end of the academy, we had some time self-studying to prepare for a cloud certification exam. And there it was: four months and many many cereal bowls later, I was ready for my first project. But was I really?
After the Big Data Academy
I joined a small data team in Birmingham shortly after the academy. We intended to design and build a reporting service for the UK government. After ingesting data from multiple sources, transforming and loading it into a data warehouse, we aimed to create insightful reports to be used by the end-user.
I was so excited – it was not only my first project, it was so so much more – new technologies, new city and new people (an adventure at its best).
But remember what I said – every project has its ups and downs. The downside of this one was that I was the only Data Engineer in the team, and I could use some mentoring and support. It took teamwork and ambition: I worked closely with the Data Analysts and Architects to fulfil all the requirements and complete tasks to the best of my ability. After a couple of months, a Junior Data Engineer joined me. Thanks to that, we could split the work and team up on more challenging tasks if needed. Truth be told, if not for my teammates, who were there to support me, I would not have succeeded.
Now let’s talk a bit about my work on this project
One of my most significant achievements relates to the extraction of data from a twisted JSON object. In a standard setting, this would not be an issue, but this was no standard setting.
Due to architectural restrictions, SQL was all I had at hand. The whole process of unwrapping seven layers of nested hierarchy deserves its own blog post. I am not going into details, but one thing is clear: it required a lot of research. I still remember the moment I presented my work to the whole team – the solution that was so much better than I thought it would be. I managed to both speed up the process and to clean up the codebase. So rewarding!
On another note, I frequently focused on improving the performance of the ETL data pipelines. One of my first contributions was to create a function to clean up the missing values, reduce the code complexity and make it all more readable. And let me tell you: we used this function A LOT. Over time, we managed to improve the execution time of the pipeline by 60%!
My main takeaway? Think big. Don’t be afraid, there are always opportunities to grow, and you have great people around you to help.
Learning never stops
Since then, I was on several other projects, learning new technologies and programming languages. AWS and Python are some that come to mind. Overall, making sure I am always ready for the next challenge is crucial to me. Who knows what comes next?
If you want to learn more about the academy or if you would like to ask why Kainos is among the best companies, find me on LinkedIn – I am always happy to help 🙂