Theory, Hand-ons and 200 Practice Exam QnA – All Hands-Ons in 1-Click Copy-Paste Style, All Material in Downloadable PDF
Enrollments: 3,234 students
Rating: 4.6/5 (34 ratings)
Course Language: English
Course Description:
Designing data processing systems
Selecting the appropriate storage technologies. Considerations include:
β Mapping storage systems to business requirements
β Data modeling
β Trade-offs involving latency, throughput, transactions
β Distributed systems
β Schema design
Designing data pipelines. Considerations include:
β Data publishing and visualization (e.g., BigQuery)
β Batch and streaming data (e.g., Dataflow, Dataproc, Apache Beam, Apache Spark and Hadoop ecosystem, Pub/Sub, Apache Kafka)
β Online (interactive) vs. batch predictions
β Job automation and orchestration (e.g., Cloud Composer)
Designing a data processing solution. Considerations include:
β Choice of infrastructure
β System availability and fault tolerance
β Use of distributed systems
β Capacity planning
β Hybrid cloud and edge computing
β Architecture options (e.g., message brokers, message queues, middleware, service-oriented architecture, serverless functions)
β At least once, in-order, and exactly once, etc., event processing
Migrating data warehousing and data processing. Considerations include:
β Awareness of current state and how to migrate a design to a future state
β Migrating from on-premises to cloud (Data Transfer Service, Transfer Appliance, Cloud Networking)
β Validating a migration
Building and operationalizing data processing systems
Building and operationalizing storage systems. Considerations include:
β Effective use of managed services (Cloud Bigtable, Cloud Spanner, Cloud SQL, BigQuery, Cloud Storage, Datastore, Memorystore)
β Storage costs and performance
β Life cycle management of data
Building and operationalizing pipelines. Considerations include:
β Data cleansing
β Batch and streaming
β Transformation
β Data acquisition and import
β Integrating with new data sources
Building and operationalizing processing infrastructure. Considerations include:
β Provisioning resources
β Monitoring pipelines
β Adjusting pipelines
β Testing and quality control
Operationalizing machine learning models
Leveraging pre-built ML models as a service. Considerations include:
β ML APIs (e.g., Vision API, Speech API)
β Customizing ML APIs (e.g., AutoML Vision, Auto ML text)
β Conversational experiences (e.g., Dialogflow)
Deploying an ML pipeline. Considerations include:
β Ingesting appropriate data
β Retraining of machine learning models (AI Platform Prediction and Training, BigQuery ML, Kubeflow, Spark ML)
β Continuous evaluation
Choosing the appropriate training and serving infrastructure. Considerations include:
β Distributed vs. single machine
β Use of edge compute
β Hardware accelerators (e.g., GPU, TPU)
Measuring, monitoring, and troubleshooting machine learning models. Considerations include:
β Machine learning terminology (e.g., features, labels, models, regression, classification, recommendation, supervised and unsupervised learning, evaluation metrics)
β Impact of dependencies of machine learning models
β Common sources of error (e.g., assumptions about data)
Ensuring solution quality
Designing for security and compliance. Considerations include:
β Identity and access management (e.g., Cloud IAM)
β Data security (encryption, key management)
β Ensuring privacy (e.g., Data Loss Prevention API)
β Legal compliance (e.g., Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), FedRAMP, General Data Protection Regulation (GDPR))
Ensuring scalability and efficiency. Considerations include:
β Building and running test suites
β Pipeline monitoring (e.g., Cloud Monitoring)
β Assessing, troubleshooting, and improving data representations and data processing infrastructure
β Resizing and autoscaling resources
Ensuring reliability and fidelity. Considerations include:
β Performing data preparation and quality control (e.g., Dataprep)
β Verification and monitoring
β Planning, executing, and stress testing data recovery (fault tolerance, rerunning failed jobs, performing retrospective re-analysis)
β Choosing between ACID, idempotent, eventually consistent requirements
Ensuring flexibility and portability. Considerations include:
β Mapping to current and future business requirements
β Designing for data and application portability (e.g., multicloud, data residency requirements)
β Data staging, cataloging, and discovery