How to write cron expressions and why data engineers need to know them.

Cron is shorthand for the command-line utility and is a job scheduler on systems that use unix. Cron is used to specify the run time for recurring jobs and can account for minutes, hours, days, months, days of the week.

For the uninitiated, cron jobs can look like a foreign…

A guide on how to convert bytes to everything from bits to petabytes using Python and SQL.

Understanding resource allocation, including memory usage is a crucial part of every data engineer’s work. Meta analysis of memory units allows data engineers and business intelligence leadership to make decisions about everything from cloud infrastructure pricing to pipeline configuration. Since databases like BigQuery often return memory usage information as bytes…

Both SQL and Python offer powerful functions to help data engineers clean data and eliminate dreaded ‘dupes’ in datasets.

One of the most important processes a data engineer can master is deduplicating values in order to provide clean data for data consumers. Since raw data can vary in format and cleanliness it is vital that data engineers take steps to automate the cleaning of data in ETL pipelines to…

Concisely summarizing your role can help avoid ambiguity at both the dinner table and the negotiation table.

I Work in Data…

Sometimes I get the ‘what do you do for work?’ question at the most inconvenient moments. Yesterday it was while I was reclined in a dentist’s chair.

For data-driven roles, a discipline that has found a way to answer complex business questions, there is still one inquiry data scientists, data…

Despite holding an M.S. in data science, I have found the data engineering path more organizationally impactful, professionally fulfilling and personally lucrative than data science.

In spring of 2021 I graduated with a degree in data science; after a two-month job search I landed my current role as a data engineer in the media industry. Initially I intended to pursue data analyst and data science roles. …

Still in preview, the feature is promising but has limitations when it comes to data engineering production use cases.

Big(Query) News for Data Engineering Teams

Last week, Google quietly revealed a preview of a JSON data type that is compatible with BigQuery. For data engineering teams working with messy and unstructured columns, this is a big deal. In exploring BigQuery’s compatibility (or lack thereof) with JSON I stumbled upon StackOverFlow posts from 5–6 years ago

