Skip to content
This repository was archived by the owner on Mar 1, 2021. It is now read-only.

insta_posts with user_id=0 in elasticsearch #224

@Urhengulas

Description

@Urhengulas

Description

At the moment we have 583652 insta_posts in our elasticsearch indexed, which have an user_id of 0.

You can see that in aggregations.user.buckets of the return of the request in kibana of this:

GET /insta_posts/_search
{"aggregations":{"user":{"terms":{"field":"user_id"}}}}

Some statistics regarding our postgres:

instascraper=> SELECT COUNT(*) FROM posts WHERE user_id is NULL;
 count  
--------
 724473
(1 row)

instascraper=> SELECT max(id) FROM posts WHERE user_id is NULL;
   max    
----------
 49919470
(1 row)

instascraper=> SELECT min(id) FROM posts WHERE user_id is NULL;
 min  
------
 9641
(1 row)

We guess that in some messages in kafka/postgres.public.posts have no user_id and thereby the indexer is using the zero value for that field which is 0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions