Caitlin Lewis Smallwood ’88 is vice president of science and algorithms at the world’s leading “internet television network.” From her office in Los Gatos, Calif., she oversees the numbers that help Netflix learn what you like to watch, predict what you might want to watch next, and decide what to purchase and produce in the future. She can also see cranes, dump trucks and construction workers building Netflix’s brand-new corporate headquarters on the south end of Silicon Valley. She’s a leader in a growing field within an influential company during a historic moment: data science is helping Netflix take off.
“It’s incredibly inspiring to me to be involved in this company when we’re at a point where we’ve launched mostly globally,” Smallwood says. “It’s an amazing opportunity to really help cultures learn about one another in an innocuous, non-threatening fashion.”
There is a dizzying number of conference rooms at Netflix headquarters, and they’re not even finished building it yet. Each one is a little different and named after a famous movie (usually one that is available to stream via the service). Many feature a glass wall, frosted with the image of a notable scene or actor.
And then there are the Emmys.
The lobby of Smallwood’s building is full of movie- and TV-based touches: art books about film on reclaimed wood tables, a constant stream of Netflix products on a giant screen (in this case, “The Crown”), and two softly lit columns displaying Netflix’s Emmy awards. Emmys aren’t common in Silicon Valley, but they’re testaments to the company’s smarts and strategy.
Then there’s the data.
When a user logs into the Netflix service and begins browsing, they see rows of categories. While you browse, Netflix logs what you watch, how you found it, how long you watched it, and the device you watched it on, among other things. As Netflix learns more about your viewing habits, it gets better at predicting what you like. This works no matter where you are in the world.
“One thing we’ve learned that holds true — so far, anyhow — is that when you try to get an understanding of people’s tastes, the kinds of ‘clusters of taste’ that people have are pretty similar around the world,” says Smallwood. “The size of the audience for these different kinds of tastes can differ quite a bit from region to region, but the actual kernels of what those tastes are, are not dramatically different.”
On Netflix, tastes are displayed as rows of categorized content. Usually, the rows are the typical fare from the old Blockbuster shelves: drama, comedy, action, sci-fi, romance and so on. But as you move further into Netflix’s database of over 50,000 row titles, things get really specific — “Strong Female Lead,” “Raunchy Cult Late-Night Comedies,” “Quirky Romances,” “Supernatural Horror Movies” and so on. There are a number of websites dedicated to chronicling the most obscure categories delivered to Netflix subscribers all over the world — like “Gritty British Prison Movies.”
“Although our internal job is harder,” she says, “the output to our customers is actually a little bit better because you can discover nuanced pockets of taste because of other regions that then help you serve members in a different region even more effectively. That part’s exciting, too.”
It may seem automatic to the average user, but Netflix relies on the hard work of its employees in Silicon Valley and Hollywood to prepare content and deliver it effectively.
Smallwood is in charge of more than 50 engineers, data scientists and mathematicians who are working to distill the viewing habits of over 86 million Netflix subscribers and make the product better. Algorithms orchestrate the viewer experience, and they’ve responded: to the tune of 125 million hours watched per day. That’s like watching “Star Trek IV: The Voyage Home” more than a million times daily, or that one episode of “Murder, She Wrote” nearly 2.7 million times. It’s a colossal amount of data, and it’s up to Smallwood to make sense of it.
Technically, an algorithm is just a set of rules that a computer can follow to solve a problem. For early data scientists like Smallwood in the ’90s, those problems were confined largely to logistics and transportation: packing shipping containers or coordinating airline networks. The problems that she’s solving today at Netflix, she says, “weren’t really in the world” back then.
“The biggest thing that hit me over the head is just the volume of data — it’s on a completely different scale than anything I had experienced before,” she says, “because there are these large companies where the product itself is interacting with people. Lots of people just generate so much data — that volume was really the biggest thing.”
So as the Internet expanded out of universities and into offices, living rooms, laptops and pant pockets, the amount of data generated by people exploded. Suddenly, extra data was attached to everything: digital photos were now linked to the location where they were taken; your phone keeps track of where you parked your car. Netflix, for its part, is mostly just interested in the shows and movies people watch and how they watch them — but there are lots of data points attached to that, too.
“In data science, there are a couple things that happen when you have a large volume of data,” she says. “One is that you really can touch so many lives in a way that you hope is a positive thing. Even if I just make some task for you quicker, for me that’s very satisfying.”
For a Netflix subscriber, that task is, primarily, “how do I find something to watch?” The answer is different for a couple staying in for a movie night than for a parent trying to calm a chaotic toddler, but the service learns from everyone who uses it. The data that is generated is processed by Smallwood’s team and their arsenal of exotic statistical and machine learning techniques. At a search giant like Google, tech people talk about The One Algorithm in hushed tones; at Netflix, they test 500 different algorithms a year. There’s no one “silver bullet,” says Smallwood.
These algorithms underpin the whole operation, especially the recommender system. The business goal, she explains, is to increase and retain the people who pay monthly to use Netflix, but it’s also to grow the number of hours subscribers spend with the service. So the easier it is for us to find the shows we want and discover movies we don’t even know we want, the more successful her team is. So they ask questions. Which episodes did you start and not finish? What else was in the row you chose your last movie from? How much binge-watching have you been doing?
“The number of things we could track, measure, study, analyze and everything else is crazy — it’s just impractical to do it all,” she says. “So it’s very important to get crystal clear on the core thing we have to learn with this experiment. Let’s focus on that and add one or two other things, not 20 other things. Part of it is a discipline.”
Settling on the right question, Smallwood says, is in some ways more important than the conclusion that is reached by the end of the experiment. Then it’s time to look at the numbers. Netflix first chooses algorithms that show potential based on offline data — not from users currently using the service. Only the most promising processes are tested live.
“Since you have data from that many people, you really can identify patterns and clusters and see this massive variety in human behavior,” she says. “Then you tailor things to that behavior that otherwise might not be known to you. Even as humans, we can’t necessarily articulate why we’re behaving the way that we do. What’s great about the data is that it’s pure — it’s what actually happened.
Even as humans, we can’t necessarily articulate why we’re behaving the way that we do. What’s great about the data is that it’s pure — it’s what actually happened.
“You find surprising things that nobody else would have found otherwise. It’s almost like being a detective.”
Smallwood takes special care to note that Netflix is extremely cautious with the data it analyzes: there is physical separation of information in some cases, and widespread anonymization so data points can’t be attached to specific users. She can’t delve into the system and find out who precisely is watching every episode of “Voltron: Legendary Defender” at 3 a.m.
She also can’t reveal all the exact methods and results they find in their tests, but with original series like “Stranger Things,” “House of Cards” and (Smallwood’s current favorite) “The Crown,” they’re doing something right. Especially in those cases, data is not 100 percent of the decision: human expertise is critical to making sure Netflix’s catalog is fresh, deep and successful. “Sometimes it’s automating things, but other times it’s just providing an additional data point,” she says.
“We want to really help make it easier for you to find the things you’re going to watch. Luckily, we have so many other members watching stuff, we can see what kinds of patterns emerge and where your tastes tend to line up with other people’s tastes. That helps us really to identify things to suggest to you.”
Algorithm by algorithm, Smallwood and her team are building bridges between human behavior and machine learning to provide the best possible experience.
Early in her career, Smallwood worked on preventing skill degradation in the Air Force — applying data to airmen’s job training. But it was her later work on modeling U.S. Postal Service data networks that opened the door to more. These networks were much, much smaller in the 1990s, she says, but that didn’t make the problem simple.
“That project actually was the thing that made me fall in love with data science in a deeper way because it was this complicated networking kind of problem,” she says. “You had your voice lines and your data lines and they all have to be configured and designed with a pattern that spanned the U.S. and covered all the demand. It was fascinating to study.”
When Smallwood talks about the complex data sets and networking challenges she engages with, it’s easy to tell how excited she gets. This kind of math wasn’t equations scrawled on a chalkboard — it was nodes and linkages and networks. And there was an inescapable human component — the data was saying something about people. She had found a reason to delve even further into math.
“That was the first time I really thought that there was an area I could actually specialize in,” Smallwood says. “It resonated with me and my interests, both technically and in terms of the applications I could imagine.”
So she got a master’s in operations research at Stanford. That was where Beth Lewis — as she had been known — became Caitlin. Her grad-school roommates unilaterally decided that she looked more like a Caitlin than a Beth, and after about a year, she found herself introducing herself that way.
“By the end of that year, nobody called me Beth anymore,” she says with a smile. But she was still the same woman: focusing on data and finding the truths within. However, that focus was hard-fought.
Until she got to high school, Smallwood had attended a different school every year of her life. Her family bounced around places like Colorado and New Mexico before eventually landing in Virginia. The movement may have contributed to her certain Renaissance woman quality.
“I’m one of those people who suffers from being interested in too many things,” she says. “I always have been.”
For her, William & Mary was the right place to start exploring her options. She landed in Spotswood Hall and eventually pledged Chi Omega. She was a resident assistant in Yates during her sophomore year, and fondly remembers the Green Leafe, the Cheese Shop, and jumping around on the trampoline of then-President Paul Verkuil ’61. She also laughs when she remembers leaping over the Governor’s Palace wall with friends late one night.
“Williamsburg is such a beautiful city,” she remembers. “One of my girlfriends and I used to just go running together [on DoG Street] probably five days a week. I miss it.”
Coursework was rewarding, as well. “I remember some of the religion and philosophy classes were actually held in the Wren Building. That was just phenomenal,” she says. “It felt really, truly like you had gone back in time, because there’s the professor at the old-fashioned podium, there’s the pews… it’s just such a funky unique classroom setting. The quality of the instruction at William & Mary was just awesome.”
But during the school week, Beth Lewis — the future Caitlin Smallwood — could not seem to stop changing her major.
Even as she enjoyed her courses, something still didn’t feel right. First she tried an accounting major, then biology, then philosophy, and then it was a crisis.
She called her mom on the phone, who said: “Well, you’ve always loved math, and — just in the background, without paying attention to it — you’ve actually got all the qualities and taken all the courses as if you were majoring in math.” Mom was right.
“She made the point that sometimes we think that something isn’t our passion because it comes easily to us,” says Smallwood. “I thought about that comment and I realized: I actually do enjoy math. Then I really started getting into it and focusing on it.” But what had taken her so long?
“Math has so many flavors,” she says. “Sometimes you might take a course that’s in a particular genre that you don’t really love that much, and it might lead you to draw an incorrect conclusion about the whole field. What I really learned that I love is patterns.”
Soon enough, Smallwood was on her way to graduating as a mathematics major — and a philosophy minor, of course. On some level, she knew then how important it was to take cutting-edge methods and lots and lots of numbers, and then link them with human experience. It was all bound to make sense from the very beginning — it’s all about patterns.
“When I was a little kid,” she says, “I used to cut paper and kept cutting it into smaller and smaller pieces to see if I could get to infinity.”
Infinity feels like the horizon in Silicon Valley — technology grows and moves that quickly. But despite her deep background in the burgeoning big-data field, Caitlin Smallwood still finds time to unplug. She spends time inline skating in the California sun, and found herself recently at a Queens of the Stone Age concert. Earlier in 2016, she brought her high-school senior twins to William & Mary for a visit, showing them an important data point for the story of their mom.
“I have to give a lot of credit to William & Mary for starting me off on my path,” she says. “I feel so fortunate to have landed in a career at a point in time — in history — where data science is really evolving so much.”