Description
At the moment we have 583652 insta_posts in our elasticsearch indexed, which have an user_id of 0.
You can see that in aggregations.user.buckets of the return of the request in kibana of this:
GET /insta_posts/_search
{"aggregations":{"user":{"terms":{"field":"user_id"}}}}
Some statistics regarding our postgres:
instascraper=> SELECT COUNT(*) FROM posts WHERE user_id is NULL;
count
--------
724473
(1 row)
instascraper=> SELECT max(id) FROM posts WHERE user_id is NULL;
max
----------
49919470
(1 row)
instascraper=> SELECT min(id) FROM posts WHERE user_id is NULL;
min
------
9641
(1 row)
We guess that in some messages in kafka/postgres.public.posts have no user_id and thereby the indexer is using the zero value for that field which is 0.