Conversation
|
Claude finished @BenWu's task in 1m 11s —— View job Code Review
Overall this is a clean and straightforward PoC. The approach (fire-and-forget publish to GCP Pub/Sub within the existing event emitter) is reasonable, and the gRPC retry handling is a good callout. A few issues to address, some before shipping to prod: Issues1.
{
name: 'GCP_EVENT_PROJECT',
value: 'pocket-snowplow-shim-dev', // never changes for prod!
},Unlike 2. Unguarded
const gcp_credentials = config.gcp.serviceAccountKey
? JSON.parse(config.gcp.serviceAccountKey) // throws if malformed JSON
: undefined;If 3. Config defaults silently fall through to dev values in prod
projectId: process.env.GCP_EVENT_PROJECT || 'pocket-snowplow-shim-dev',
topicName: process.env.GCP_EVENT_TOPIC || 'snowplow-events-dev',If either env var is accidentally unset in prod, events will silently publish to the dev GCP project/topic with no error. Since these are set explicitly via the infra config, the fallback values aren't needed in practice — but if you want to keep them, consider naming them to make the dev intent clear (or at minimum document that they're dev-only defaults). Minor4. 5. 6. PubSub/topic initialized at module level — This works fine, but it does make unit testing harder (you can't easily mock the client per test). Not a blocker for a PoC, but worth noting if you add tests later. Moving initialization into the class (or injecting the What looks good
|
Plan Result (corpus-scheduler-lambda-cdk-production)
Change Result (Click me) # aws_lambda_function.corpus-scheduler-sqs-lambda_F2ECDF9F will be updated in-place
~ resource "aws_lambda_function" "corpus-scheduler-sqs-lambda_F2ECDF9F" {
id = "CorpusSchedulerLambda-Prod-SQS-Function"
~ qualified_arn = "arn:aws:lambda:us-east-1:996905175585:function:CorpusSchedulerLambda-Prod-SQS-Function:290" -> (known after apply)
~ qualified_invoke_arn = "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:996905175585:function:CorpusSchedulerLambda-Prod-SQS-Function:290/invocations" -> (known after apply)
tags = {
"app_code" = "content"
"component_code" = "content-corpusschedulerlambda"
"env_code" = "prod"
"environment" = "Prod"
"service" = "CorpusSchedulerLambda"
}
~ version = "290" -> (known after apply)
# (20 unchanged attributes hidden)
~ environment {
~ variables = {
~ "GIT_SHA" = (sensitive value)
# (7 unchanged elements hidden)
}
}
# (4 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
|
Plan Result (prospect-translation-lambda-cdk-production)
Change Result (Click me) # aws_lambda_function.translation-lambda_translation-sqs-lambda_B9BDF6BA will be updated in-place
~ resource "aws_lambda_function" "translation-lambda_translation-sqs-lambda_B9BDF6BA" {
id = "ProspectAPI-Prod-Sqs-Translation-Function"
tags = {
"app_code" = "content"
"component_code" = "content-prospectapi"
"env_code" = "prod"
"environment" = "Prod"
"service" = "ProspectAPI-Sqs-Translation"
}
# (22 unchanged attributes hidden)
~ environment {
~ variables = {
~ "GIT_SHA" = (sensitive value)
# (5 unchanged elements hidden)
}
}
# (4 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
|
Plan Result (section-manager-lambda-cdk-production)
Change Result (Click me) # aws_lambda_function.section-manager-sqs-lambda_D7365DAE will be updated in-place
~ resource "aws_lambda_function" "section-manager-sqs-lambda_D7365DAE" {
id = "SectionManagerLambda-Prod-SQS-Function"
~ qualified_arn = "arn:aws:lambda:us-east-1:996905175585:function:SectionManagerLambda-Prod-SQS-Function:86" -> (known after apply)
~ qualified_invoke_arn = "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:996905175585:function:SectionManagerLambda-Prod-SQS-Function:86/invocations" -> (known after apply)
tags = {
"app_code" = "content"
"component_code" = "content-sectionmanagerlambda"
"env_code" = "prod"
"environment" = "Prod"
"service" = "SectionManagerLambda"
}
~ version = "86" -> (known after apply)
# (20 unchanged attributes hidden)
~ environment {
~ variables = {
- "GIT_SHA" = "829a2457a67a8e774bc9e284cd4b7a490c879f84" -> null
# (5 unchanged elements hidden)
}
}
# (4 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
|
Plan Result (curated-corpus-api-cdk-production)
|
Plan Result (prospect-api-cdk-production)
Change Result (Click me) # data.aws_iam_policy_document.application_ecs_service_ecs-iam_data-ecs-task-role-policy_090CC3AD will be read during apply
# (depends on a resource or a module with changes pending)
<= data "aws_iam_policy_document" "application_ecs_service_ecs-iam_data-ecs-task-role-policy_090CC3AD" {
+ id = (known after apply)
+ json = (known after apply)
+ minified_json = (known after apply)
+ version = "2012-10-17"
+ statement {
+ actions = [
+ "dynamodb:BatchGet*",
+ "dynamodb:DescribeTable",
+ "dynamodb:Get*",
+ "dynamodb:Query",
+ "dynamodb:Scan",
+ "dynamodb:UpdateItem",
]
+ effect = "Allow"
+ resources = [
+ "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects",
+ "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects/*",
]
}
+ statement {
+ actions = [
+ "s3:*",
]
+ effect = "Allow"
+ resources = [
+ "arn:aws:s3:::pocket-prospectapi-prod-images",
+ "arn:aws:s3:::pocket-prospectapi-prod-images/*",
]
}
+ statement {
+ actions = [
+ "logs:CreateLogGroup",
+ "logs:CreateLogStream",
+ "logs:DescribeLogGroups",
+ "logs:DescribeLogStreams",
+ "logs:PutLogEvents",
]
+ effect = "Allow"
+ resources = [
+ "*",
]
}
}
# aws_dynamodb_table.dynamodb_prospects_dynamodb_table_9854E41E will be updated in-place
~ resource "aws_dynamodb_table" "dynamodb_prospects_dynamodb_table_9854E41E" {
id = "PROAPI-Prod-Prospects"
name = "PROAPI-Prod-Prospects"
tags = {
"app_code" = "content"
"component_code" = "content-prospectapi"
"env_code" = "prod"
"environment" = "Prod"
"service" = "ProspectAPI"
}
# (9 unchanged attributes hidden)
- global_secondary_index {
- hash_key = "scheduledSurfaceGuid" -> null
- name = "scheduledSurfaceGuid-prospectType" -> null
- non_key_attributes = [] -> null
- projection_type = "ALL" -> null
- range_key = "prospectType" -> null
- read_capacity = 0 -> null
- write_capacity = 0 -> null
}
+ global_secondary_index {
+ hash_key = "scheduledSurfaceGuid"
+ name = "scheduledSurfaceGuid-prospectType"
+ non_key_attributes = []
+ projection_type = "ALL"
+ range_key = "prospectType"
+ read_capacity = 5
+ write_capacity = 5
}
# (5 unchanged blocks hidden)
}
# aws_iam_policy.application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6 will be updated in-place
~ resource "aws_iam_policy" "application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6" {
id = "arn:aws:iam::996905175585:policy/ProspectAPI-Prod-TaskRolePolicy"
name = "ProspectAPI-Prod-TaskRolePolicy"
~ policy = jsonencode(
{
- Statement = [
- {
- Action = [
- "dynamodb:UpdateItem",
- "dynamodb:Scan",
- "dynamodb:Query",
- "dynamodb:Get*",
- "dynamodb:DescribeTable",
- "dynamodb:BatchGet*",
]
- Effect = "Allow"
- Resource = [
- "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects/*",
- "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects",
]
},
- {
- Action = "s3:*"
- Effect = "Allow"
- Resource = [
- "arn:aws:s3:::pocket-prospectapi-prod-images/*",
- "arn:aws:s3:::pocket-prospectapi-prod-images",
]
},
- {
- Action = [
- "logs:PutLogEvents",
- "logs:DescribeLogStreams",
- "logs:DescribeLogGroups",
- "logs:CreateLogStream",
- "logs:CreateLogGroup",
]
- Effect = "Allow"
- Resource = "*"
},
]
- Version = "2012-10-17"
}
) -> (known after apply)
tags = {
"app_code" = "content"
"component_code" = "content-prospectapi"
"env_code" = "prod"
"environment" = "Prod"
"service" = "ProspectAPI"
}
# (5 unchanged attributes hidden)
}
Plan: 0 to add, 2 to change, 0 to destroy. |
mmiermans
left a comment
There was a problem hiding this comment.
I'm curious to understand the different approaches we could take. Let meet live to discuss.
| "@aws-sdk/client-s3": "3.529.1", | ||
| "@aws-sdk/lib-storage": "3.529.1", | ||
| "@devoxa/prisma-relay-cursor-connection": "3.1.0", | ||
| "@google-cloud/pubsub": "^5.3.0", |
There was a problem hiding this comment.
What are the pros/cons of using pub/sub directly versus using @mozilla/glean with a mozlog configuration? I created an experimental PR using Glean in this repo a year ago.
|
No longer needed. I'll add details in https://mozilla-hub.atlassian.net/browse/DENG-10803 |
Goal
This adds an additional call in the event emitter in the curated_corpus_api to send events to a pubsub topic in gcp for further transformations and loading into bigquery (that's set up in https://github.com/mozilla-services/data-sandbox-terraform/pull/121/changes). The code so far is a simple proof of concept to validate this approach. This does work locally when triggering an event via graphql
I'd love feedback/perspectives on:
Implementation Decisions
Deployment steps
References
JIRA ticket: