Expert Interview Series: Sam McFarland of astronomer.io on Data Engineering
Sam McFarland is Head of Customer for Astronomer, a data engineering platform that collects, processes and unifies your enterprise data, so you can get straight to analytics, data science and—more importantly—insights.
We recently asked Sam for his insight data engineering. Here’s what he shared:
Tell us about the mission behind astronomer.io. How are you hoping to impact how brands manage data?
Data science, big data and enterprise SaaS are rising. This makes it increasingly difficult to get the data you need to implement new ideas. But struggling companies aren’t alone; only 4 percent of businesses know how to make the most of their data (Bain and Co.). Which means those who get ahead now will have a clear competitive advantage.
For companies to fully leverage the data to which they have access, it means many disjointed data sets must be cleaned, enriched, processed and transformed, so that it can be analyzed. All this prep, called data engineering, takes a lot of time. That’s where Astronomer comes in. We’re a data engineering platform that collects, processes and unifies enterprise data, so organizations can get straight to analytics, data science and—more importantly—insights.
What are some of the main challenges you are hoping to fix with your platform? What pain points are you hoping to solve?
The co-founders of Astronomer first worked together on an analytics tool called USERcycle. What they realized is that almost nobody had access to the data they needed. If they did, it needed to be cleaned, transformed or otherwise processed in order to be connected into one actionable body of information. The real pain point of analytics is getting the right data to the right place. By automating that, organizations are free to experiment with new technologies and implement the kind of data initiatives that will drive a competitive advantage. So we pivoted to Astronomer. By focusing on our expertise (data engineering), organizations can focus on theirs.
What parts of data management are the most time consuming for business? What can they do to make these processes more efficient?
By 2019, organizations that provide agile, curated internal and external datasets will realize twice the business benefits of those that don’t (Gartner, 2016). That involves:
- Cleaning data (getting rid of inaccurate data)
- Transforming data (getting into the right structure so that various data sets can “talk” to each other)
- Enriching data (refining data which could mean a variety of things, like correcting typos, often corrected by algorithms)
- Accessing hard-to-reach data sets that involve cutting through red tape or don’t have a readily-available API
By leveraging cutting-edge technology, we’ve created a platform that can automate all of these processes.
What are the most common mistakes or oversights you observe your clients making with regards to data management?
It’s less that they make mistakes and more that they don’t know where to begin. Technology is advancing rapidly and companies who implement predictive analytics, real-time analytics, machine learning and even artificial intelligence lead the pack. The longer others wait to begin these advanced analytics, the further they fall behind. The volume of data generated every day is astounding, and, according to the IDC, it is expected to increase over 1,000 percent from 2014 to 2020. Companies who don’t yet have a data engineering or data science team in-house simply don’t have access to the specialized skills required to completely understand the data they have, let alone leverage all the data available (and applicable) to them.
What best practices do you suggest to your clients for wrangling their data?
The most important thing to do is “begin with the end in mind” (Stephen Covey).
- Determine the exact business questions you need to answer. These may apply to the entire organization, but more likely, it’s a team within an organization.
- Then determine which datasets will be needed to answer those questions and how you can access them, including which analytics tools might be best for your team.
- Choose how you want to analyze the data: Run queries in a data warehouse? Connect a custom dashboard to an analytical data mart? Incorporate new data sets into current predictive models? The skills of your team will drive this choice.
Then assemble the experts and tools needed to execute next steps to reach your goal.
- Engineer the data infrastructure to get the all the data you need into the location you need it, in the right format. (This is what Astronomer automates, so we recommend using our data engineering platform for this step.)
- Identify who will dig into the madness and extract insights. Ideally, this would be a team involving a developer, data scientist and subject matter expert.
Finally, execute with great care and continued curiosity. By this, we mean continue to monitor your data and regularly upgrade your technology to keep it cutting edge. Additionally, push yourself to keep experimenting with new technology. Already running predictive analytics? Do it in real time. Finding success in machine learning? Begin leveraging artificial intelligence. Or at a minimum, regularly consider how new data sets could improve your business.
What tools are essential for better data analysis? What types of tools are less effective?
This really depends on the company and role. In marketing, for example, about 3,800 tools exist to help them do their job more effectively: run email campaigns and digital ads, complete transactions, manage their customer relationships—the list goes on. Of course, for an enterprise insurance organization, they may have an actuary whose already constructed intricate predictive models. The fact of the matter is, companies need to be doing analytics and they need to choose the tools that work best for them. What we do is make it easy to not just use a tool, but also to test out a new one or switch to a better one. That way, they don’t ever have to feel stuck where they are technologically. They can constantly explore a new direction.
What brands do you work with that have been especially innovative with their data management and analysis? What types of insight have they gotten? What can we learn from them?
AdmitHub launched the first-ever university chatbot that supplies on-demand answers to applicant and student questions to enhance recruiting engagement and spark enrollment growth. The bot can answer myriad questions, from “When will I get my scholarship package?” to “When is my financial aid application due?” to “Can my dog live in the dorm with me?” Getting answers at all—let alone in real-time—means countless university hours for staff… and a lot of waiting around for students. AdmitHub’s challenge was to connect disparate data sets, including complex legacy systems, at multiple universities, so students get accurate and automatic access to the information they need via AdmitHub’s app.
The CEO of AdmitHub says, “With the combined agility of their platform and the diligence of their team, Astronomer exceeded our wildest expectations. Our partnership helped our customers achieve some of the best applicant numbers in years.”
With Astronomer, all university data flows securely through AdmitHub’s app in real-time, which means they can focus entirely on easing the transition to college for students and streamlining the admissions process for university staff. Other companies who face a similar challenge to create a personal customer service can consider implementing use of a chatbot to both delight customers and streamline internal operations.
Cincinnati / Northern Kentucky International Airport (CVG)
CVG transformed their customer experience by automating the real-time collection of multiple data streams into one central location. Many airports struggle to maintain massive existing infrastructure while balancing the need to invest in the changing demands of today’s consumer, especially with so many variables factoring into airport service. CVG’s challenge was to connect key information, like passenger volumes, flight data, weather, staffing data and more into one centralized location where they could do the predictive analytics it takes to serve passengers, airlines and vendors.
VP of Customer Experience at CVG says, “Astronomer’s single visual reporting source provides an immediate assessment on timely airport operations that impact customers. This allows us to adjust to conditions and influence outcomes to deliver improved performance.”
With Astronomer, these disparate data sources flow automatically into a single actionable body of information, which means CVG can focus on enriching their data store and proactively making business decisions that will improve the travel experience for everyone. Other airports can consider unifying similar types of data sets in order to significantly improve the traveling experience.
EVERYTHING BUT THE HOUSE (EBTH)
EBTH harnesses the consumer intelligence necessary to not only become the household name in online estate sales, but also create a new generation of avid estate sale shoppers. Everything is sold in an estate sale format, from a late 17th century Bodhisattva statue to a vintage Art Deco style automatic 1920s wall telephone. They help families get the value they deserve out of the items they’ve collected over time by sorting, cataloging, photographing and describing every single item in order to sell to the highest bidder. Their main challenge is to market to a disparate audience and to turn impressions into sales through accurate marketing analytics.
CTO of EBTH says, “An auction platform is different than commerce and makes web analytics much more difficult, so the deeper analysis we can do with Astronomer allows us to gain a much better understanding of our business and customers.”
With Astronomer, every single user journey across EBTH’s website is mapped, revealing the types of customers who explore and purchase (or don’t purchase) certain items. That intelligence is analyzed in real-time and used to optimize future marketing strategies. Other e-commerce platforms can consider similar solutions to improving not just marketing but their entire customer experience.
What trends or innovations are you following in the world of data management and data science today? Why do they interest you?
We believe that artificial intelligence will one day be a part of every day life. As data ecosystems are created within an organization, those ecosystems can be connected, creating an entirely interconnected world. By creating a platform (and eventually, bots) to automate the data engineering previously done by humans, our vision is to be an integral part of tomorrow’s reality.
In this way, we see the need for machines and humans to live in harmony. As humans leverage the power of sophisticated machines, we become capable of much more connectivity, communication and knowledge than ever before. We call this our machines + humans philosophy and it interests us because it’s going to shape our future.