Netflix has amassed roughly 60 petabytes of data in the Amazon's Web Service cloud, which serves as its key platform for its information infrastructure.
"We want more data for more rich analysis so we can improve the experience for our users," says Kurt Brown, Netflix's director of data platform.
Amazon's Simple Storage Service (S3) serves as Netflix's "central source of truth," Brown says. Once uploaded to S3, the data systems Netflix relies on for its internal analysis, such as Teradata, Redshift, Druid, and others, is sourced from S3.
Other key tools Netflix uses include Metacat, which allows its users to analyze metadata with several systems, such as Hive, Teradata, or Redshift. Netflix's Big Data Portal serves as a single interface for access to its Redshift, Teradata, and other data warehouse platforms. Netflix also uses Apache Spark, for large scale data processing, and Jupyter Notebook, for document sharing.