Internship Spotlight: Juggling data at Petabyte scale everyday

Tell me about your internship! What are your responsibilities? What projects are you working on over the summer?

I am working at  Amazon Website Services (AWS) this summer. The company develops industry-leading software services for enterprises. One of the services from the AWS portfolio that I work on is Redshift. AWS Redshift creates clusters for users and runs big data operations on them. It is a petabyte-scale data warehouse that helps other enterprises get insights from tonnes of data they generate. To keep customer data safe Redshift generates a lot of backups aka snapshots. My role is to improve the performance and cut costs for maintaining these backups.

What is the most important thing you have learned thus far?

I learned big data frameworks like Hadoop and Spark. I have always wanted to learn about them for a long time since I was an undergraduate student when I knew they existed. Not only do I get to understand them, but I also get to use them to process hundreds of petabytes of data almost every day. There are very few companies that handle data at this scale. For comparison 1 petabyte = 1000 x Terabytes = 1 Million Gigabytes (GB).  We can’t just use a single computer to process this data. So I use big data frameworks to process this huge data on clusters having 100s of computers.

A nontechnical skill that I learned is that communication is key. Communicate what you are doing and try to be as transparent as possible, so everyone can understand what you are doing. Don’t try to fight those battles on your own. Attempting to do it on your own is good, but you also need to be able to communicate what you are struggling with and what you will need help with. Accept where you lack, then learn and show improvement. Opening the door for communication with your mentors and managers also helps in setting the right expectations on deliverables for your work which is very key in the tech sector. In short, seek feedback early and often.

How did your school work prepare you for this role?

I learned the foundations of computer science during my bachelor’s, and my master’s program allowed me to enhance it further and brought me up to speed with the current technologies. I took a course on cloud computing and the theoretical foundations for most of the applications that I use in my work today were laid during the course. It was offered as a special topics course and not many people were taking that course. But I was passionate about building systems at scale. And that paid off today and the course kind of gave me an edge when I got started working on the project during the internship.

Beyond that, I worked as a graduate assistant last semester, and during that time, I learned a lot about Python and working with Deep Learning Frameworks like Pytorch. I'm not using that right now, but that got me into the habit of quickly writing the code to build a poc, then plot, get some metrics and reiterating based on feedback. This kind of got me into the ‘fail fast’ mentality to quickly build innovative products. Even though I don’t directly use those concepts, it helped hone my technical skills which in turn helped me throughout my internship.

How did you become interested in computer science and your specific areas of interest?

Since I was a kid I have always been interested in technology. I would play with radio communication systems and electric motors, and I’ve always been interested in tech. That was the reason why I took computer science as my major during my bachelor's; I was always excited when new technology came out, like when Android was first introduced. Most of the time the OS on my Android phone is not the one that comes out of the box from the manufacturer. I have the habit of replacing it with a custom one that’s tailored to my needs. That’s my passion for tech. Computer Science was a natural choice for me.

In the industry, things are getting more interdisciplinary and there are a lot of applications for Artificial Intelligence. But just learning AI or Machine Learning isn't enough. You have to understand that the biggest problem for the industry right now is how you scale this AI/ML. With ChatGPT, we are asking thousands of questions but behind ChatGPT, we run these language models on GPUs. But how can we handle millions of ChatGPT requests? Building these systems is a computer science problem. So as Computer Science students, we shouldn’t just be restricted to AI/ML domain but also have sufficient depth in building vanilla large-scale software systems. When I was a research assistant, I learned AI/ML and I did coursework in cloud and software systems. I think both of them are needed and impactful to me in the industry today. My area of interest is a combination of architecting large-scale systems and also AI/ML.

What are your future career goals? How has this position impacted your goals?

I am an engineer at heart. So my career goal is to return to the industry after the master’s program. I want to engineer solutions for the problems faced by the industry today with tech. For instance, we have limitations on battery technology which limits innovation in electric vehicles. Similarly, in the tech industry right now, there are a lot of people who want to do AI/ML but there are limitations on the scalability of the current systems to the current and future demand. There are only a few companies that can build software systems for the future and the place where I work (AWS) is one of them.

The thing that gets my heart racing is engineering solutions to these tough problems that we have in the industry. This is what I feel a true engineer does. The word ‘Engineer’ is so ubiquitously used that we sometimes forget our duty as engineers. 


What advice would you give to someone pursuing a similar internship in the future?

Apply early and apply a lot; that's the main advice I’d give. At internships, yes your technical skills matter, but timing matters even more. Honestly, I don’t think I did the best job when applying to internships, because I paid more attention to my coursework, and pushed off my internship applications multiple times. I just got lucky with the timing. Make sure to apply to a lot of companies. In order to apply early, you have to have your resume ready, so get that ready first. As you get your resume ready, be sure to hone your programming skills using websites like leetcode. If you want to do a summer internship in 2025, the right time to start applying is this August or September 2024. So you’ll have to be ready with a resume and coding preparation even before that. I predominantly applied in February this year for this summer, that is almost eight months late, so the lesson is to start much earlier than that. 

Share