This is the indexer application for the CMR. It is responsible for indexing modified data into Elasticsearch.
curl -i -XPOST -H "Content-Type: application/json" http://localhost:3004 -d '{"concept-id": "C1234-PROV1", "revision-id": "1"}'
curl -i -XDELETE -H "Content-Type: application/json" http://localhost:3004/C1234-PROV1/2
This will un-index all concepts within the given provider.
curl -i -XDELETE http://localhost:3004/provider/PROV1?token=XXXX
These tasks require an admin user token with the INGEST_MANAGEMENT_ACL with read or update permission.
WARNING - this endpoint drops all data from the index.
Every CMR application has a reset function to reset it back to it's initial state. This will reset the indexes back to their initial state and also clear the cache.
curl -i -XPOST http://localhost:3004/reset?token=XXXX
curl -i -XPOST http://localhost:3004/caches/clear-cache?token=XXXX
Endpoints are provided for querying the contents of the various caches used by the application. The following curl will return the list of caches:
curl -i http://localhost:3004/caches
The following curl will return the keys for a specific cache:
curl -i http://localhost:3004/caches/cache-name
This curl will return the value for a specific key in the named cache:
curl -i http://localhost:3004/caches/cache-name/cache-key
This will report the current health of the application. It checks all resources and services used by the application and reports their healthes in the response body in JSON format. For resources, the report includes an "ok?" status and a "problem" field if the resource is not OK. For services, the report includes an overall "ok?" status for the service and health reports for each of its dependencies. It returns HTTP status code 200 when the application is healthy, which means all its interfacing resources and services are healthy; or HTTP status code 503 when one of the resources or services is not healthy.
curl -i -XGET "http://localhost:3004/health"
Example healthy response body:
{
"elastic_search" : {
"ok?" : true
},
"echo" : {
"ok?" : true
},
"metadata-db" : {
"ok?" : true,
"dependencies" : {
"oracle" : {
"ok?" : true
},
"echo" : {
"ok?" : true
}
}
},
"message-queue": {
"ok?": true
}
}
Example un-healthy response body:
{
"elastic_search" : {
"ok?" : true
},
"echo" : {
"ok?" : true
},
"metadata-db" : {
"ok?" : false,
"problem" : {
"oracle" : {
"ok?" : false,
"problem" : "db-spec cmr.common.memory_db.connection.MemoryStore@aead584 is missing a required parameter"
},
"echo" : {
"ok?" : true
}
}
},
"message-queue": {
"ok?": true
}
}
By default, a comparison is run between the existing elasticsearch indexes and what is configured in index-set, and only apply the update when there is a difference between the two. User can override the default by passing in query parameter "force=true" and always update the elasticsearch indexes with the current configuration.
curl -XPOST http://localhost:3004/update-indexes?token=XXXX
curl -XPOST -H "Content-Type: application/json" http://localhost:3004/reindex-provider-collections?token=XXXX -d '["PROV1","PROV2"]'
curl -XPOST http://localhost:3004/reindex-tags?token=XXXX'
curl -i -H "Accept: application/json" -H "Content-type: application/json" -XPOST "http://localhost:3004/index-sets" -d "{\"index-set\":{\"name\":\"cmr-base-index-set\",\"create-reason\":\"include message about reasons for creating this index set\",\"granule\":{\"index-names\":[\"G2-PROV1\",\"G4-Prov3\",\"g5_prov5\"],\"mapping\":{\"granule\":{\"_all\":{\"enabled\":false},\"properties\":{\"collection-concept-id\":{\"store\":\"yes\",\"index_options\":\"docs\",\"norms\":\"false\",\"type\":\"string\",\"index\":\"not_analyzed\"},\"concept-id\":{\"store\":\"yes\",\"index_options\":\"docs\",\"norms\":\"false\",\"type\":\"string\",\"index\":\"not_analyzed\"}},\"dynamic\":\"strict\",\"_source\":{\"enabled\":false},\"_id\":{\"path\":\"concept-id\"}}},\"settings\":{\"index\":{\"number_of_replicas\":0,\"refresh_interval\":\"10s\",\"number_of_shards\":1}}},\"collection\":{\"index-names\":[\"C4-collections\",\"c6_Collections\"],\"mapping\":{\"collection\":{\"_all\":{\"enabled\":false},\"properties\":{\"entry-title\":{\"store\":\"yes\",\"index_options\":\"docs\",\"omit_norms\":\"true\",\"type\":\"string\",\"index\":\"not_analyzed\"},\"concept-id\":{\"store\":\"yes\",\"index_options\":\"docs\",\"omit_norms\":\"true\",\"type\":\"string\",\"index\":\"not_analyzed\"}},\"dynamic\":\"strict\",\"_source\":{\"enabled\":false},\"_id\":{\"path\":\"concept-id\"}}},\"settings\":{\"index\":{\"number_of_replicas\":0,\"refresh_interval\":\"20s\",\"number_of_shards\":1}}},\"id\":3}}"
curl -XGET "http://localhost:3004/index-sets/3"
curl -XGET "http://localhost:3004/index-sets"
curl -XDELETE "http://localhost:3004/index-sets/3"
There are multiple granule indexes for performance. Larger collections are split out into their own indexes. Smaller collections are grouped in a small_collections index. Once a collection gets to a certain size, we can manually 'rebalance' that collection by moving the collection's granule docs into a separate granule index. This process is specific to the following steps:
- Mark the collection as rebalancing
- Finalize the rebalance process
- Update the rebalance status as COMPLETE
IMPORTANT: This process is required to do ONE collection at a time. Do not attempt to rebalance multiple collections at the same time.
Collection is added to the list of collections being rebalanced.
Required params:
- target = string
-
options:
separate-indexandsmall-collections- if
target=separate-indexa new granule index is created in addition to updating the index-set.
curl -XPOST http://localhost:3004/index-sets/3/rebalancing-collections/C5-PROV1/start?target=separate-index
- if
-
Finalizing a rebalancing collection removes the collection from the list of collections are being rebalanced and updates the index-set appropriately based on what the target destination was set to on the call to start.
curl -XPOST http://localhost:3004/index-sets/3/rebalancing-collections/C5-PROV1/finalize
Make changes to the collection's rebalancing status. This will update a mapping of collection id to rebalancing status in the index-set.
Required params:
- status = string
-
Options:
COMPLETEcurl -XPOST http://localhost:3004/index-sets/3/rebalancing-collections/C5-PROV1/update-status?status=COMPLETE
-
In order to reshard an index to have a different number of shards in elasticsearch clusters, you will do the following:
- Start a reshard process
- Check the status
- Finalize the reshard process with the following API's below.
IMPORTANT: Only reshard one index at a time. Make sure you start, status, and finalize a reshard process COMPLETELY before resharding the next index.
Required params:
- num_shards = int (num of shards you want the index to have at the end)
- elastic_name = string (elastic cluster name you want to reshard in)
-
Options:
gran-elasticorelasticcurl -XPOST http://localhost:3004/index-sets/1/reshard/1_small_collections/start?num_shards=50&elastic_name=gran-elastic
-
Required params:
- elastic_name = string (elastic cluster name you want to reshard in)
- Options:
gran-elasticorelastic
- Options:
- task_id = string (elastic task id associated with the reindexing task of moving original index content to the new index. This is given in the output of the /reshard//start api)
Expected return status options:
-
IN_PROGRESS, COMPLETE, FAILED
Once the status of the reshard returns 'COMPLETE', you can move on to finalizing the reshard process. Finalizing an index resharding moves the ES alias to point to the new resharded index and clean up the index-set.
Required params:
- elastic_name = string (elastic cluster name you want to reshard in)
- Options:
gran-elasticorelastic
- Options:
IMPORTANT: Only finalize if the reshard status is 'COMPLETE'
curl -XPOST http://localhost:3004/index-sets/1/reshard/1_small_collections/finalize?elastic_name=gran-elastic
Rollback the resharding of a specified index to its original state before reshard was attempted.
You MUST give the elastic_name parameter to tell CMR which cluster your index is in that is going to be resharded.
Required params:
- elastic_name = string (elastic cluster name you want to reshard in)
- Options:
gran-elasticorelastic
- Options:
Rollback will be allowed IF the reshard has not been finalized, else it will not allow
curl -XPOST http://localhost:3004/index-sets/1/reshard/1_small_collections/rollback?elastic_name=gran-elastic
curl -i -H "Accept: application/json" -H "Content-type: application/json" -XPOST "http://localhost:3004/reset"
curl http://localhost:9210/index_sets/_aliases?pretty=1
By default, version conflict returned from elasticsearch will be ignored. User can override the default by passing in query parameter "ignore_conflict=false" to the request.
The ingest application will publish messages for the indexer application to consume. The messages will be to index or delete concepts from elasticsearch. Messaging is handled using the message-queue-lib which uses RabbitMQ.
If an error occurs in the indexer either because Elasticsearch is unavailable or an unexpected error occurs during indexing the CMR team will catch that error. The message will be placed on a Wait Queue as described in the message-queue-lib README. The CMR team will use an exponential backoff to retry after a set period of time. After the message has been successfully queued on the wait queue the indexer will acknowledge the message.
An uncaught error such as indexer dying or running out of memory will be handled through non-acknowledgment of the message. RabbitMQ will consider the messages as not having been processed and requeue it.
The indexer has a background job that monitors the RabbitMQ message queue size and logs it. If the message queue size exceeds the configured size (CMR_INDEXER_WARN_QUEUE_SIZE) the CMR team will log extra information that splunk can detect. The CMR team will add a splunk alert to look for the log message indicating the queue size has exceeded threshold and email CMR Operations.
- Get all index-sets response
[{:id 3,
:name "cmr-base-index-set",
:concepts
{:collection
{:c6_Collections "3_c6_collections",
:C4-collections "3_c4_collections"},
:granule
{:g5_prov5 "3_g5_prov5",
:G4-Prov3 "3_g4_prov3",
:G2-PROV1 "3_g2_prov1"}}}
{:id 55,
:name "cmr-base-index-set",
:concepts
{:collection
{:c6_Collections "55_c6_collections",
:C4-collections "55_c4_collections"},
:granule
{:g5_prov5 "55_g5_prov5",
:G4-Prov3 "55_g4_prov3",
:G2-PROV1 "55_g2_prov1"}}}]
- Get an index-set by id response
{:index-set
{:concepts
{:collection
{:c6_Collections "3_c6_collections",
:C4-collections "3_c4_collections"},
:granule
{:g5_prov5 "3_g5_prov5",
:G4-Prov3 "3_g4_prov3",
:G2-PROV1 "3_g2_prov1"}},
:name "cmr-base-index-set",
:create-reason
"include message about reasons for creating this index set",
:granule
{:index-names ["G2-PROV1" "G4-Prov3" "g5_prov5"],
:mapping
{:granule
{:_all {:enabled false},
:properties
{:collection-concept-id
{:store "yes",
:index_options "docs",
:norms false,
:type "string",
:index "not_analyzed"},
:concept-id
{:store "yes",
:index_options "docs",
:norms false,
:type "string",
:index "not_analyzed"}},
:dynamic "strict",
:_source {:enabled false},
:_id {:path "concept-id"}}},
:settings
{:index
{:number_of_replicas 0,
:refresh_interval "10s",
:number_of_shards 1}}},
:collection
{:index-names ["C4-collections" "c6_Collections"],
:mapping
{:collection
{:_all {:enabled false},
:properties
{:entry-title
{:store "yes",
:index_options "docs",
:norms false,
:type "string",
:index "not_analyzed"},
:concept-id
{:store "yes",
:index_options "docs",
:norms false,
:type "string",
:index "not_analyzed"}},
:dynamic "strict",
:_source {:enabled false},
:_id {:path "concept-id"}}},
:settings
{:index
{:number_of_replicas 0,
:refresh_interval "20s",
:number_of_shards 1}}},
:id 3}}
Copyright © 2014-2021 NASA