Starting out in Big Data: How to learn your way into one of tech’s hottest industries
This feature first appeared in the Spring 2019 issue of Certification Magazine. Click here to get your own print or digital copy.
What’s the best way to break into the Big Data field for aspiring IT professionals, or for tech types who are established and possibly settled in for the long haul? Is there a reliable means of injecting Big Data knowledge to jump-start a stalled tech career, or of crossing over to the Big Data field?
What if you have no direct knowledge of Big Data, or maybe know just enough to be dangerous? Is this something you can overpower and overcome with the proper guidance or toolset? What skills or tools would you need, and which items would be best to focus on, when you have zero knowledge versus a little knowledge?
What kind of skill set lends itself to one’s taking a flying leap into the Big Data realm? Can you make a smaller jump by having a little bit of knowledge? What about certifications? Which ones are likely to be most helpful, and would taking them limit your ability to learn outside the box?
Are there good outside-the-box options? Perhaps flexible online training from Udemy, or a university-sponsored massive open online course (MOOC), or a battery of college classes would be a better solution for you — maybe for everyone. Let’s explore all of these questions and try to determine what’s needed to really dig into Big Data.
What is Big Data?
If you are not at least a little bit familiar with Big Data, well, chances are you have no information technology background at all — or maybe you’ve been living under a rock. That said, a brief overview of Big Data is still in order. Big Data is a term that describes the large volume of data – both structured and unstructured – that inundates a business or organization on a day-to-day basis.
The actual amount of data can vary. I have consulted and worked for and with companies who thought a few hundred gigs of data was “big,” and I have worked at companies that required, by default, petabytes of information regarding a given topic. A petabyte, if we could organize it into a single line of ones and zeros using standard font size, would stretch to the Moon and back.
Researchers estimate that the world currently holds around 5 exabytes of unique data, with just a single exabyte having the capacity to store every word ever spoken by every human who has ever lived. You could even include the guttural grunts and clicks of our earliest ancestors. (Somehow, even without the benefit of Big Data storage technology, my wife remembers every word I have spoken through our entire relationship.)
Data volume, however, is not the most valuable or important aspect of Big Data. It’s what organizations do with the data that matters. Big Data also refers to our ability to analyze massive data sets and draw out insights that lead to better decision-making and business strategy. Setting aside the means of its collection and storage, it’s what we can do with Big Data that is truly revolutionary.
A lot of organizations mine Big Data for facts and insights. First imagine knowing the entire history of your sales orders — now imagine you have some insight on “when” someone will order again, based on “who” they are, or “where” they come from. This is a very rudimentary way of describing the enormous insights you can get from Big Data, but it should give you a clue as to the many and varied ways in which companies use that data.
Complications arise when considering storage costs, when determining what data is relevant, and when assuring that vast caches of data remain secure. If you store every known thing about a person, what becomes of the privacy we are all entitled to? Who is paying for that storage? Are there insights you can glean from storing every detail about every person that many (or most) people would object to? Moral and ethical concerns abound as Big Data gets bigger and bigger.
Take a class or consult with a colleague
As a technologist, you have so many ways to learn. The current technology environment makes it possible for almost anyone to learn Big Data or to advance their career in Big Data. You could even change careers while staying at your current company.
The first thing to do is decide what you want and write down your goal. A written goal is both a reminder of what you set out to accomplish and a benchmark that tells you when you’ve succeeded. Let’s imagine that you have at least a limited understanding of database technology. You know how to review or manipulate data, and you understand what a table is, or a primary key, or blob data.
The best way forward for a person in these circumstances would be to get a leg up by taking a MOOC. There are already numerous options for MOOCs — many of which can be taken free of charge — with more coming online every day. (My favorite such provider is Coursera.) Anyone can take a class this way, including some that are designed and managed by Ivy League universities.
With most MOOC providers, you simply sign up, pay a small fee (or not), and take the class either at your leisure (this tends to be more true of paid courses) or following a prescribed schedule. A quick review of the online learning sites I frequent turned up more than 200 courses that address all aspects of Big Data. That is a massive offering, and the market continues to rapidly expand.
For those who have zero knowledge of Big Data, a fundamentals of data class is the best place to start. This is where, starting at ground zero, you learn that data is “contained” somewhere, organized and accessed by various means and methods, and so forth. For example, data inputs to tables are often called “structured,” whereas data not put into tables is always called “unstructured.” Knowing the difference could save your job … or at least save your company a lot of money.
For those who are looking to change departments or job functions at the company where they already work, I actually recommend a mentor. Find someone who has a managerial position in the department where you want to be and hit them up for an understanding of what they think it will take to transfer over.
If they offer lots of details and are eager to share, then there’s a good chance that department will support your move. If they are hesitant to discuss your questions, or suggest that it’s difficult (or impossible) to break into their field later in life, then that department is not likely to support your efforts. If that happens, then maybe it’s best to look for a different way to enter the Big Data field.
You could also go the traditional route and get a college degree at the bachelor’s or master’s level. You could “test drive” this option by taking a community college class on database design. You are never too old to learn something new. Don’t be discouraged that you are not picking up, say, machine learning on your first pass or, for that matter, even your first year. Learn at whatever pace feels comfortable.
Get a certification
Big Data professionals who would like to get to the next level of their profession should consider picking up a certification or three. Certifications are one of the easiest and most visible means of demonstrating to those in your industry that you know what you are doing. It’s a great way to prove your worth to an employer — and maybe to yourself.
What follows is my list of the five most impactful certifications you can get to advance your career in Big Data. These credentials can help to ensure that your skills and knowledge are in high demand:
SAS Certified Big Data Professional — This is a general-purpose Big Data certification. You will be expected to demonstrate your ability to use the tools and technology designed to handle big data. The SAS Certified Big Data Professional credential is a solid stepping stone to higher ground.
Amazon Web Services (AWS) Certified Big Data – Specialty — AWS is a cloud-based data center environment. I get chills thinking back to my server-building days, which I remember with fondness, when I think of what current technology can do. With AWS tools, I could spin up 40 servers at a time and just as many databases.
According to the AWS certification website, the AWS Certified Big Data – Specialty exam validates technical skills and experience in designing and implementing AWS services to derive value from data. This exam is for those who need to perform complex Big Data analyses. This credential is a must-have and, having taken the exam, I believe it to be well within the grasp of any data wonk.
Certified Analytics Professional (CAP) — This certification is not widely known, but CAP offers independent verification that you have superior knowledge of the entire analytics process. It takes in everything from business problem framing and analytics problem framing to methodology selection and model building, deployment, and lifecycle management.
CAP offers a strong assurance to prospective employers that you have mastery of the analytics process. If you want to dive deep into analytics and build up a solid understanding of how to organize data and turn it into actionable outputs, then CAP is a great place to turn.
Cloudera Certified Associate (CCA) Spark and Hadoop Developer — This recommendation is specific to a certain product and will help you learn to manage unstructured data. This test is unique in the Big Data certification realm in that it tests real world scenarios.
Each CCA exam question requires you to grapple with a particular scenario. In some cases, a tool such as Impala or Hive may be used. In other cases, coding is required. In order to speed up development time of Spark questions, a template may be provided that contains a skeleton of the solution, asking the candidate to fill in the missing lines with functional code. (This template is likely to either be written in Scala or Python, but not necessarily both.)
You are not required to use the template and may solve the scenario using a language you prefer. Be aware, however, that coding every problem from scratch may take more time than is allocated for the exam.
IBM Certified Data Engineer – Big Data — This one is the cream of the Big Data certification crop; it’s the top of the food chain. Within the IBM ecosystem, a Big Data Engineer “works directly with a Data Architect and hands-on developers to convert the architect’s Big Data vision and blueprint into a Big Data reality. A Data Engineer possesses a deep level of technical knowledge and experience across a wide array of products and technologies.”
If you want to break through here, then you’ll need to hone your big Data expertise in a number of areas, including mastering the “five Vs” of Big Data: variety (how data is organized), velocity (the speed at which data is generated), volume (the scale of the data), veracity (the reliability of the data), and value (what the data can tell us). Hitting the mark with this cert will help you stand out above the rest.
No matter what learning option you pursue, or which certification you choose, you can rest assured that the software and platforms will change, but the data will not. A deep, thorough understanding of data, how to pull it, and where to pull it from, will serve you well in your career. Don’t be timid when it comes to Big Data — jump in with both feet. As always, I wish you the best of luck and happy certifying.