Express backend api for Instagram scraping via Apify, post classification, and insight generation.
View frontend here
This repo is configured for Docker-based deployment.
- Node.js 18+
- Environment variable
OPENROUTER_API_KEY - Environment variable
APIFY_TOKEN
Optional environment variables:
PORT(default:3000)OPENROUTER_MODEL(default:google/gemma-4-26b-a4b-it)DEFAULT_ACCOUNTS(comma-separated handles, default:plaeto.schools)DEFAULT_MAX_POSTS(default:2)APIFY_INSTAGRAM_ACTOR(default:apify/instagram-post-scraper)
npm install
npm startThe process binds to PORT, which your platform should set automatically.
Use the included Dockerfile as the runtime source.
docker build -t eidos-backend .
docker run -p 3000:3000 --env-file .env eidos-backendImportant:
- Runtime command must be
npm start(ornode index.js). - Do not use
node test_analyze.jsas the service start command; it is only a one-off client test script.
Service info and route list.
Basic liveness response.
Returns supported intent and format categories.
Classifies a single caption with optional image context.
Request body:
{
"caption": "A sample Instagram caption",
"imageUrl": "https://example.com/image.jpg",
"categories": {
"intent": ["Promotional", "Educational"],
"format": ["Trend", "Tutorial"]
}
}Notes:
categoriesis optional. If not provided, the default categories are used.
Response body:
{
"classification": {
"intent": "Promotional",
"format": "Trend"
},
"rawResponse": "{\n \"intent\": \"Promotional\",\n \"format\": \"Trend\"\n}"
}The service uses the Apify apify/instagram-post-scraper actor to fetch Instagram posts.
For each account, the following request is sent to the Apify actor:
{
"dataDetailLevel": "basicData",
"resultsLimit": 5,
"skipPinnedPosts": false,
"username": ["plaeto.schools"]
}dataDetailLevel: Set tobasicDatafor standard post detailsresultsLimit: Number of posts to retrieve (passed frommaxPostsparameter)skipPinnedPosts: Whether to skip pinned postsusername: Array of Instagram handles to scrape
The actor returns an array of post objects with the following structure:
[
{
"inputUrl": "https://www.instagram.com/p/DLNsnpUTdVS/",
"id": "3660778310592222546",
"type": "Image",
"shortCode": "DLNsnpUTdVS",
"caption": "Your phone isn't rotting your brain...",
"hashtags": [],
"mentions": [],
"url": "https://www.instagram.com/p/DLNsnpUTdVS/",
"commentsCount": 230,
"firstComment": "Amen.",
"latestComments": [...],
"dimensionsHeight": 1350,
"dimensionsWidth": 1080,
"displayUrl": "https://scontent-dfw5-3.cdninstagram.com/v/t51.2885-15/...",
"images": [],
"alt": "Photo by National Geographic...",
"likesCount": 73473,
"timestamp": "2025-06-22T19:00:10.000Z",
"childPosts": [],
"ownerFullName": "National Geographic",
"ownerUsername": "natgeo",
"ownerId": "787132",
"isCommentsDisabled": false
}
]Key fields extracted and normalized:
url/inputUrl→link: Post URLdisplayUrl/images[0]→img: Cover imagetype/productType→type: Normalized topostorreellikesCount→likes: Like countcommentsCount→comments: Comment countcaption→caption: Post caption texttimestamp→date: ISO 8601 date
Runs end-to-end scrape + classify + analytics.
Request body:
{
"accounts": ["plaeto.schools", "another.brand"],
"maxPosts": 3,
"includeAiOverview": true,
"generateExcel": true,
"categories": {
"intent": ["Promotional", "Educational"],
"format": ["Trend", "Tutorial"]
}
}Notes:
accountsis optional; falls back toDEFAULT_ACCOUNTS.maxPostsmust be between 1 and 25.- If
maxPostsis higher than the number of available posts for an account, the service returns all available posts without failing. categoriesis optional; falls back to default categories if not provided.- One analysis run is allowed at a time.
POST /api/analyze supports Server-Sent Events (SSE) progress streaming.
Enable streaming in either way:
- Add
"stream": truein request JSON body. - Or send header
Accept: text/event-stream.
When streaming is enabled, the response is SSE (not a single JSON response). The API sends progress events during execution, then a final event with the full analysis output.
Each progress update is sent as:
event: progress
data: { ... }
Progress payload examples:
- While extracting posts via Apify:
{
"stage": "extracting_posts",
"message": "Extracting posts...",
"account": "plaeto.schools"
}- While analyzing individual posts:
{
"stage": "analyzing_post",
"message": "plaeto.schools | post 1 | https://www.instagram.com/p/ABC123/",
"account": "plaeto.schools",
"postNumber": 1,
"link": "https://www.instagram.com/p/ABC123/"
}- While generating analytics from collected posts:
{
"stage": "analyzing_data",
"message": "analysing data"
}At completion, the API streams:
event: final
data: { ...full analyze payload... }
event: done
data: { "message": "analysis complete" }
The final event contains the same structure as the non-streaming JSON response (fields like runId, createdAt, accounts, maxPosts, rawData, analysis, aiOverview, excelPath, errors).
curl -N -X POST http://localhost:8080/api/analyze \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"accounts": ["plaeto.schools"],
"maxPosts": 2,
"includeAiOverview": false,
"generateExcel": false,
"stream": true
}'Response body:
{
"runId": "1713600000000",
"createdAt": "2026-04-20T12:00:00.000Z",
"accounts": [
"plaeto.schools",
"another.brand"
],
"maxPosts": 3,
"rawData": {
"plaeto.schools": [
{
"link": "https://www.instagram.com/p/...",
"img": "https://...",
"type": "post",
"likes": 1500,
"comments": 45,
"caption": "Example caption...",
"date": "2026-04-18T10:00:00.000Z",
"intent": "Educational",
"format": "Tutorial"
}
],
"another.brand": []
},
"analysis": {
"global_insights": {
"intent_insights": {
"Educational": {
"global_relative_performance_average": {
"likes": "10.50%",
"comments": "5.00%"
},
"global_relative_performance_median": {
"likes": "8.00%",
"comments": "2.50%"
},
"account_relative_win_rate": {
"likes": "50.00%",
"comments": "25.00%"
}
}
},
"format_insights": {
"Tutorial": {
"global_relative_performance_average": {
"likes": "15.00%",
"comments": "N/A"
},
"global_relative_performance_median": {
"likes": "12.00%",
"comments": "N/A"
},
"account_relative_win_rate": {
"likes": "100.00%",
"comments": "0.00%"
}
}
}
},
"additional_insights": {
"topPerformer": {
"account": "plaeto.schools",
"frequency": "2 days"
},
"reelsPerformanceOverPosts": "15.20%",
"timeOfDayEngagement": {
"10:00 to 12:00": {
"avgLikes": 1500,
"avgComments": 45
}
}
},
"account_analysis": {
"plaeto.schools": {
"averageLikesComments": {
"avgLikes": 1500,
"avgComments": 45
},
"totalPosts": 3,
"intentDistribution": {
"Educational": {
"no_of_posts": 1,
"category_total_likes": 1500,
"category_total_comments": 45,
"category_avg_likes": 1500,
"category_avg_comments": 45,
"relative_performance": {
"likes": "0.00%",
"comments": "0.00%"
}
}
},
"formatDistribution": {
"Tutorial": {
"no_of_posts": 1,
"category_total_likes": 1500,
"category_total_comments": 45,
"category_avg_likes": 1500,
"category_avg_comments": 45,
"relative_performance": {
"likes": "0.00%",
"comments": "0.00%"
}
}
},
"averageTimeBetweenPostsReadable": "2 days"
}
}
},
"aiOverview": null,
"excelPath": ".../outputs/global_insights_1713600000000.xlsx",
"errors": []
}Returns the latest completed analysis payload.
Downloads the latest generated Excel file (if generateExcel was true).
- Build method: Dockerfile
- Runtime command (inside container):
npm start - Container port:
8080
- Keep
.envprivate. - Do not commit API keys such as
OPENROUTER_API_KEYorAPIFY_TOKEN.