chore: 안전신문고 대용량 seed 스크립트 추가 by devupkim · Pull Request #39 · gdsc-ssu/ansim-server

devupkim · 2026-03-19T20:23:05Z

작업 내용

안전신문고 공공데이터 1,021,298건을 DB에 삽입하기 위한 전용 seed 스크립트를 추가합니다.
기존 seed-markers.ts는 숭실대 주변 테스트 데이터 10건 전용으로 분리하고, 대용량 실제 데이터는 새 스크립트로 처리합니다.

변경 사항

src/scripts/seed-safety-mungo.ts 신규 추가 (1M건 2-pass 삽입 스크립트)
package.json에 seed:safety-mungo 스크립트 추가

테스트

로컬에서 테스트 완료 (1,021,298건 정상 삽입 확인)
원격 DB(joojae.synology.me:6432)에서도 1,021,298건 삽입 완료
기존 기능 정상 동작 확인

리뷰어에게

핵심 기술 포인트:

fs.createReadStream + chunk.split('\n') 스트리밍 파싱: readFileSync 문자열 한계(~536MB) 우회, 1GB+ 파일 대응
synchronize: false 필수: synchronize: true 시 TypeORM이 enum rename 과정에서 _old 타입을 DROP하지 못해 오류 발생
HAZARD_TYPE_MAP: 원본 JSON의 한국어 카테고리 → DB enum(HazardType) 변환 필수 (enum 제약 위반 방지)
orIgnore(): 중복 실행 시에도 충돌 무시 → 멱등성 보장

스크립트 실행 방법:

pnpm seed:safety-mungo

기존 숭실대 주변 10개 하드코딩 목업 데이터를 실제 안전신문고 공공데이터로 교체. 변경 사항: - 2-pass 스트리밍 방식으로 대용량 JSON 처리 (Pass1: SafetyMungoReport, Pass2: Marker) - fs.createReadStream + chunk.split('\n') + replace(/\r$/) 방식으로 CR/LF 처리 - BATCH_SIZE=100 배치 삽입 + orIgnore()로 멱등성 보장 - src/scripts/data/ 는 .git/info/exclude 로 로컬 전용 제외 (팀 공유 안 됨) - 총 2,146,430건 SafetyMungoReport + 2,146,430건 Marker 삽입 완료 closes #33

- `src/scripts/seed-safety-mungo.ts` 신규 추가 - safety_mungo_reports.json + marker.json (각 1,021,298건) 2-pass 삽입 - fs.createReadStream + chunk.split('\n') 스트리밍 파싱 (1GB+ 파일 대응) - synchronize: false — TypeORM enum _old 타입 잔재 오류 방지 - HAZARD_TYPE_MAP: 한국어 카테고리 → HazardType enum 변환 (DB enum 제약 대응) - orIgnore()로 중복 실행 시 멱등성 보장 - `package.json`에 `seed:safety-mungo` 스크립트 추가 closes #33

gemini-code-assist · 2026-03-19T20:23:35Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

이 PR은 안전신문고 공공데이터의 대규모 세트를 데이터베이스에 효율적으로 시드하기 위한 전용 스크립트를 도입합니다. 기존의 소규모 테스트 데이터 시드 스크립트와 분리하여, 100만 건 이상의 데이터를 스트리밍 파싱 및 배치 삽입 방식으로 처리하여 성능과 안정성을 확보했습니다. 다만, PR 설명에서는 기존 seed-markers.ts를 테스트 데이터 전용으로 분리한다고 언급되었으나, 실제 변경 사항에서는 seed-markers.ts 파일의 내용이 새로운 대용량 시드 로직으로 완전히 대체되었고, 동일한 로직이 seed-safety-mungo.ts로 새로 추가되어 기존 테스트 시드 스크립트의 역할이 불분명해진 점이 있습니다.

Highlights

대용량 데이터 시드 스크립트 추가: 안전신문고 공공데이터 100만 건 이상을 효율적으로 데이터베이스에 삽입하기 위한 전용 시드 스크립트 seed-safety-mungo.ts가 추가되었습니다.
스트리밍 기반 JSON 파싱: 1GB 이상의 대용량 JSON 파일을 Node.js의 문자열 한계를 우회하여 fs.createReadStream을 이용한 스트리밍 방식으로 파싱하도록 구현되었습니다.
배치 삽입 및 멱등성 보장: TypeORM의 createQueryBuilder().insert().values().orIgnore().execute()를 활용하여 데이터를 배치로 삽입하고, 중복 실행 시 충돌을 무시하여 스크립트의 멱등성을 확보했습니다.
HazardType 매핑 로직 포함: 원본 JSON 데이터의 한국어 카테고리를 DB의 HazardType enum 값으로 변환하는 매핑 로직(HAZARD_TYPE_MAP, toHazardType 함수)이 포함되어 데이터 일관성을 유지합니다.
TypeORM synchronize: false 설정: 시드 스크립트 실행 시 TypeORM이 스키마를 자동으로 동기화하지 않도록 synchronize: false로 설정하여 마이그레이션과의 역할을 명확히 분리했습니다.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

안전신문고 대용량 데이터를 시딩하기 위한 스크립트를 추가하는 좋은 변경입니다. 스트림과 배치 처리를 사용하여 대용량 파일을 효율적으로 처리하는 접근 방식이 인상적입니다.

다만, 리뷰 결과 몇 가지 개선점을 발견했습니다.

src/scripts/seed-markers.ts 파일이 의도치 않게 변경된 것으로 보입니다. PR 설명과 달리, 테스트 데이터 시딩 기능이 사라지고 대용량 데이터 처리 로직으로 대체되었습니다. 이 부분은 수정이 필요합니다.
새로 추가된 src/scripts/seed-safety-mungo.ts 파일 내부에 중복되는 코드가 있어, 이를 함수로 추출하여 가독성과 유지보수성을 높일 수 있습니다.

자세한 내용은 각 파일에 남긴 리뷰 코멘트를 참고해주세요.

gemini-code-assist · 2026-03-19T20:25:25Z

 import 'dotenv/config';
+import * as fs from 'fs';
+import * as path from 'path';
 import { DataSource } from 'typeorm';
 import { databaseConfig } from '../database.config';
-import { User } from '../user/entities/user.entity';
-import { HazardLevel, HazardType } from '../common/enums/hazard.enum';
-import { Report } from '../report/entities/report.entity';
+import { HazardLevel } from '../report/entities/report.entity';
 import { Marker, MarkerSource } from '../marker/entities/marker.entity';
+import { SafetyMungoReport } from '../safety-mungo-report/entities/safety-mungo-report.entity';
+
+const BATCH_SIZE = 100;

 const dataSource = new DataSource({
  ...databaseConfig,
  entities: ['src/**/*.entity.ts'],
 });

-interface SeedMarker {
-  hazardType: HazardType;
-  hazardLevel: HazardLevel;
-  description: string;
+interface RawReport {
+  externalReportId: string;
+  externalId: string | null;
+  spotName: string | null;
+  category: string | null;
+  description: string | null;
+  origin: string | null;
+  occurenceDate: string | null;
+  syncedAt: string | null;
+}
+
+interface RawMarker {
+  externalReportId: string;
  latitude: number;
  longitude: number;
+  hazardType: string;
+  hazardLevel: string | null;
 }

-// 숭실대 중심(37.4964, 126.9572) 주변 좌표
-const SEED_DATA: SeedMarker[] = [
-  {
-    hazardType: HazardType.ROAD_DAMAGE,
-    hazardLevel: HazardLevel.HIGH,
-    description: '도로 포트홀이 크게 발생하여 차량 통행에 위험',
-    latitude: 37.4971,
-    longitude: 126.9568,
-  },
-  {
-    hazardType: HazardType.OTHER,
-    hazardLevel: HazardLevel.MEDIUM,
-    description: '가로등 불빛이 꺼져 야간 보행이 위험',
-    latitude: 37.4955,
-    longitude: 126.9583,
-  },
-  {
-    hazardType: HazardType.ROAD_DAMAGE,
-    hazardLevel: HazardLevel.LOW,
-    description: '보도블록이 들떠 있어 보행 시 걸림 위험',
-    latitude: 37.4978,
-    longitude: 126.9555,
-  },
-  {
-    hazardType: HazardType.CONSTRUCTION,
-    hazardLevel: HazardLevel.HIGH,
-    description: '건물 철거 공사 중 안전 펜스 미설치',
-    latitude: 37.4948,
-    longitude: 126.959,
-  },
-  {
-    hazardType: HazardType.FLOOD,
-    hazardLevel: HazardLevel.HIGH,
-    description: '배수로 막혀 폭우 시 침수 반복 발생',
-    latitude: 37.496,
-    longitude: 126.9545,
-  },
-  {
-    hazardType: HazardType.TRAFFIC,
-    hazardLevel: HazardLevel.MEDIUM,
-    description: '교차로 신호등이 간헐적으로 작동하지 않음',
-    latitude: 37.4983,
-    longitude: 126.9578,
-  },
-  {
-    hazardType: HazardType.TRAFFIC,
-    hazardLevel: HazardLevel.MEDIUM,
-    description: '사고로 인한 가드레일 파손 방치',
-    latitude: 37.4942,
-    longitude: 126.9562,
-  },
-  {
-    hazardType: HazardType.LANDSLIDE,
-    hazardLevel: HazardLevel.HIGH,
-    description: '절개지 암석이 불안정하여 낙석 위험',
-    latitude: 37.4968,
-    longitude: 126.9598,
-  },
-  {
-    hazardType: HazardType.ROAD_DAMAGE,
-    hazardLevel: HazardLevel.LOW,
-    description: '낙엽이 쌓여 비올 때 차량 미끄러짐 주의',
-    latitude: 37.499,
-    longitude: 126.956,
-  },
-  {
-    hazardType: HazardType.TRAFFIC,
-    hazardLevel: HazardLevel.LOW,
-    description: '소방차 진입로에 불법 주정차 빈번',
-    latitude: 37.4952,
-    longitude: 126.9575,
-  },
-];
+async function* streamJsonLines<T>(filePath: string): AsyncGenerator<T> {
+  const stream = fs.createReadStream(filePath, { encoding: 'utf8' });
+  let partial = '';
+  for await (const chunk of stream) {
+    const lines = (partial + (chunk as string)).split('\n');
+    partial = lines.pop()!;
+    for (const line of lines) {
+      const trimmed = line.replace(/\r$/, '').trim();
+      if (!trimmed || trimmed === '[' || trimmed === ']') continue;
+      const cleaned = trimmed.endsWith(',') ? trimmed.slice(0, -1) : trimmed;
+      yield JSON.parse(cleaned) as T;
+    }
+  }
+  if (partial.trim() && partial.trim() !== ']') {
+    const trimmed = partial.replace(/\r$/, '').trim();
+    const cleaned = trimmed.endsWith(',') ? trimmed.slice(0, -1) : trimmed;
+    yield JSON.parse(cleaned) as T;
+  }
+}

 async function seed() {
  await dataSource.initialize();
  console.log('DB 연결 완료');

-  await dataSource.transaction(async (manager) => {
-    // 시드 전용 유저 (upsert)
-    const userRepo = manager.getRepository(User);
-    let user = await userRepo.findOneBy({ googleId: 'seed-user-google-id' });
-    if (!user) {
-      user = await userRepo.save(
-        userRepo.create({
-          googleId: 'seed-user-google-id',
-          email: 'seed@example.com',
-          name: 'Seed User',
-        }),
-      );
-    }
+  try {
+    const smrRepo = dataSource.getRepository(SafetyMungoReport);
+    const markerRepo = dataSource.getRepository(Marker);

-    const reportRepo = manager.getRepository(Report);
-    const markerRepo = manager.getRepository(Marker);
+    // Pass 1: safety_mungo_reports.json → SafetyMungoReport 배치 삽입
+    console.log('\n[1/2] SafetyMungoReport 삽입 중...');
+    let smrCount = 0;
+    let smrBatch: SafetyMungoReport[] = [];

-    for (const item of SEED_DATA) {
-      const report = await reportRepo.save(
-        reportRepo.create({
-          userId: user.id,
-          hazardType: item.hazardType,
-          hazardLevel: item.hazardLevel,
-          description: item.description,
+    for await (const r of streamJsonLines<RawReport>(
+      path.resolve(__dirname, 'data/safety_mungo_reports.json'),
+    )) {
+      smrBatch.push(
+        smrRepo.create({
+          id: r.externalReportId,
+          externalId: r.externalId ?? null,
+          spotName: r.spotName ?? null,
+          category: r.category ?? null,
+          description: r.description ?? null,
+          origin: r.origin ?? null,
+          occurrenceDate: r.occurenceDate ?? null,
+          syncedAt: r.syncedAt ? new Date(r.syncedAt) : new Date(),
        }),
      );

-      await markerRepo.save(
+      if (smrBatch.length >= BATCH_SIZE) {
+        await smrRepo
+          .createQueryBuilder()
+          .insert()
+          .into(SafetyMungoReport)
+          .values(smrBatch)
+          .orIgnore()
+          .execute();
+        smrCount += smrBatch.length;
+        smrBatch = [];
+        if (smrCount % 1000 === 0) {
+          console.log(`  진행: ${smrCount.toLocaleString()}건`);
+        }
+      }
+    }
+    if (smrBatch.length > 0) {
+      await smrRepo
+        .createQueryBuilder()
+        .insert()
+        .into(SafetyMungoReport)
+        .values(smrBatch)
+        .orIgnore()
+        .execute();
+      smrCount += smrBatch.length;
+    }
+    console.log(`  완료: ${smrCount.toLocaleString()}건 처리`);
+
+    // Pass 2: marker.json → Marker 배치 삽입
+    console.log('\n[2/2] Marker 삽입 중...');
+    let markerCount = 0;
+    let markerBatch: Marker[] = [];
+
+    for await (const m of streamJsonLines<RawMarker>(
+      path.resolve(__dirname, 'data/marker.json'),
+    )) {
+      markerBatch.push(
        markerRepo.create({
-          reportId: report.id,
-          source: MarkerSource.REPORT,
-          latitude: item.latitude,
-          longitude: item.longitude,
+          safetyMungoReportId: m.externalReportId,
+          source: MarkerSource.SAFETY_MUNGO,
+          latitude: m.latitude,
+          longitude: m.longitude,
          location: {
            type: 'Point',
-            coordinates: [item.longitude, item.latitude],
+            coordinates: [m.longitude, m.latitude],
          },
-          hazardType: item.hazardType,
-          hazardLevel: item.hazardLevel,
+          hazardType: m.hazardType,
+          hazardLevel: m.hazardLevel as HazardLevel | null,
        }),
      );

-      console.log(`  생성: [${item.hazardLevel}] ${item.hazardType}`);
+      if (markerBatch.length >= BATCH_SIZE) {
+        await markerRepo
+          .createQueryBuilder()
+          .insert()
+          .into(Marker)
+          .values(markerBatch)
+          .orIgnore()
+          .execute();
+        markerCount += markerBatch.length;
+        markerBatch = [];
+        if (markerCount % 1000 === 0) {
+          console.log(`  진행: ${markerCount.toLocaleString()}건`);
+        }
+      }
    }
+    if (markerBatch.length > 0) {
+      await markerRepo
+        .createQueryBuilder()
+        .insert()
+        .into(Marker)
+        .values(markerBatch)
+        .orIgnore()
+        .execute();
+      markerCount += markerBatch.length;
+    }
+    console.log(`  완료: ${markerCount.toLocaleString()}건 처리`);

-    console.log(`\n시드 완료: ${SEED_DATA.length}건의 Marker 생성`);
-  });
-
-  await dataSource.destroy();
+    console.log('\n시드 완료');
+  } catch (err) {
+    console.error('시드 실패:', err);
+    process.exit(1);
+  } finally {
+    await dataSource.destroy();
+  }
 }

 void seed();


PR 설명에 따르면 이 seed-markers.ts 파일은 숭실대 주변의 테스트 데이터 10건을 시딩하는 용도로 유지되어야 합니다. 하지만 현재 변경 사항은 파일을 대용량 데이터 시딩 스크립트로 완전히 대체하고 있으며, 이는 새로 추가된 seed-safety-mungo.ts와 거의 동일한 내용입니다.

이는 의도하지 않은 변경으로 보입니다. 이 파일의 변경 사항을 되돌려 기존의 테스트 데이터 시딩 기능을 유지해야 합니다.

참고로, 현재 변경된 코드에는 버그도 존재합니다. marker.json의 hazardType (문자열)을 HazardType enum으로 변환하지 않고 그대로 할당하고 있어 (130번째 줄), 데이터베이스에 삽입 시 오류가 발생할 것입니다. 이는 이 변경이 잘못되었음을 보여주는 또 다른 증거입니다.

rebase 충돌 해결 과정에서 seed-markers.ts가 의도치 않게 대용량 스트리밍 버전으로 교체되었고,
숭실대 주변 테스트 데이터 10건으로 복원하면서, PR #35 이후 변경된 HazardType enum에 맞게 한국어 문자열 값도 함께 수정했습니다.

gemini-code-assist · 2026-03-19T20:25:25Z

+      if (smrBatch.length >= BATCH_SIZE) {
+        await smrRepo
+          .createQueryBuilder()
+          .insert()
+          .into(SafetyMungoReport)
+          .values(smrBatch)
+          .orIgnore()
+          .execute();
+        smrCount += smrBatch.length;
+        smrBatch = [];
+        if (smrCount % 1000 === 0) {
+          console.log(`  진행: ${smrCount.toLocaleString()}건`);
+        }
+      }
+    }
+    // 배치 크기 미만으로 남은 마지막 레코드 처리
+    if (smrBatch.length > 0) {
+      await smrRepo
+        .createQueryBuilder()
+        .insert()
+        .into(SafetyMungoReport)
+        .values(smrBatch)
+        .orIgnore()
+        .execute();
+      smrCount += smrBatch.length;
+    }
+    console.log(`  완료: ${smrCount.toLocaleString()}건 처리`);


SafetyMungoReport를 배치 삽입하는 로직(129-155줄)과 Marker를 배치 삽입하는 로직(184-209줄)이 거의 동일하게 반복되고 있습니다. 코드 중복을 줄이고 유지보수성을 높이기 위해 이 로직을 재사용 가능한 헬퍼 함수로 추출하는 것을 고려해 보세요.

예를 들어, 다음과 같은 제네릭 함수를 만들 수 있습니다.

async function executeBatchInsert<T extends ObjectLiteral>( repository: Repository<T>, batch: T[], ): Promise<void> { if (batch.length === 0) { return; } await repository .createQueryBuilder() .insert() .into(repository.target) .values(batch as QueryDeepPartialEntity<T>[]) .orIgnore() .execute(); }

이 함수를 사용하면 seed 함수 내의 두 for await 루프의 배치 처리 부분을 다음과 같이 간소화할 수 있습니다.

// 사용 예시 if (smrBatch.length >= BATCH_SIZE) { await executeBatchInsert(smrRepo, smrBatch); smrCount += smrBatch.length; smrBatch = []; // ... } // 루프 종료 후 await executeBatchInsert(smrRepo, smrBatch); smrCount += smrBatch.length;

이렇게 리팩토링하면 코드의 가독성이 향상되고 향후 유사한 배치 작업에 재사용할 수 있습니다.

async function executeBatchInsert<T extends ObjectLiteral>( repository: Repository<T>, batch: T[], ): Promise<void> { if (batch.length === 0) { return; } await repository .createQueryBuilder() .insert() .into(repository.target) .values(batch as QueryDeepPartialEntity<T>[]) .orIgnore() .execute(); }

이 스크립트는 1회성 수동 실행용이라 재사용 가능성이 없어, 추상화를 도입하기보다는 현재의 명시적인 코드를 유지하는 방향으로 결정함

report.entity에서 re-export된 HazardLevel을 사용하면 ESLint @typescript-eslint/no-unsafe-assignment 오류 발생. common/enums/hazard.enum에서 직접 import하도록 변경.

hazardType 컬럼이 PR #35 이후 enum 타입으로 변경됨. RawMarker.hazardType(string)을 HazardType enum으로 캐스팅하여 TypeScript 타입 오류 해결.

seed-markers.ts가 의도치 않게 대용량 스트리밍 스크립트로 교체되어 원래 목적(숭실대 주변 테스트 마커 10건)으로 복원. hazardType 필드는 PR #35에서 enum으로 변경되었으므로 한국어 문자열 → HazardType enum 값으로 대체. HazardLevel/HazardType import 경로도 common/enums/hazard.enum으로 정정.

devupkim added 2 commits March 20, 2026 01:04

devupkim requested review from joojae02 and mangoooooo1 as code owners March 19, 2026 20:23

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

devupkim added 3 commits March 20, 2026 05:25

fix: seed-markers.ts HazardLevel import 경로 수정

2dcfbf4

report.entity에서 re-export된 HazardLevel을 사용하면 ESLint @typescript-eslint/no-unsafe-assignment 오류 발생. common/enums/hazard.enum에서 직접 import하도록 변경.

fix: seed-markers.ts HazardType import 추가 및 타입 캐스팅

7fb0044

hazardType 컬럼이 PR #35 이후 enum 타입으로 변경됨. RawMarker.hazardType(string)을 HazardType enum으로 캐스팅하여 TypeScript 타입 오류 해결.

joojae02 approved these changes Mar 19, 2026

View reviewed changes

devupkim merged commit 7d630bc into main Mar 20, 2026
1 check passed

devupkim deleted the chore/33-add-safety-mungo-seed branch March 20, 2026 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: 안전신문고 대용량 seed 스크립트 추가#39

chore: 안전신문고 대용량 seed 스크립트 추가#39
devupkim merged 5 commits intomainfrom
chore/33-add-safety-mungo-seed

devupkim commented Mar 19, 2026

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

devupkim Mar 19, 2026

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

devupkim Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

devupkim commented Mar 19, 2026

관련 이슈

작업 내용

변경 사항

테스트

리뷰어에게

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

devupkim Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

devupkim Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants