Skip to content

[launchpad] Add support for launchpad#851

Merged
valeriocos merged 1 commit intochaoss:masterfrom
imnitishng:launchpad_elk
May 5, 2020
Merged

[launchpad] Add support for launchpad#851
valeriocos merged 1 commit intochaoss:masterfrom
imnitishng:launchpad_elk

Conversation

@imnitishng
Copy link
Copy Markdown
Contributor

This commit adds support for Launchpad perceval backend in ELK.
Raw and enriched classes have been added.

Signed-off-by: Nitish Gupta imnitish.ng@gmail.com

@imnitishng
Copy link
Copy Markdown
Contributor Author

Hi @valeriocos, here is the first iteration of launchpad backend for ELK.
Please have a look, will be adding schemas and tests soon.
Would like to hear from you what sort of enrichment or studies we can plan for this backend.
Also I observed, getting data from launchpad was a quite slow compared to other backend, so getting data from a huge source like Ubuntu, might be quite sluggish.

@valeriocos
Copy link
Copy Markdown
Member

Hi @imnitishng ! The start looks promising!

Would like to hear from you what sort of enrichment or studies we can plan for this backend.

In this PR, I would suggest to limit the scope of the enriched data to be produced. The studies and further enriched attributes can be added later on. It would be also convenient to list some metrics to be visualized (as done here: #831 (comment))

You can take a look at the following enrichers to see which is the typical information produced.

Also I observed, getting data from launchpad was a quite slow compared to other backend, so getting data from a huge source like Ubuntu, might be quite sluggish.

Thank you for the info. To speed up the testing of this backend, we can focus on small repos. Once we have a mature implementation, we can test it on something like Ubuntu. WDYT?

@imnitishng
Copy link
Copy Markdown
Contributor Author

Sure, I'll get back to you with the metrics.

@imnitishng
Copy link
Copy Markdown
Contributor Author

imnitishng commented Apr 19, 2020

Hi @valeriocos
I have updated the PR, reduced the data stored in enriched indexes, added some new attributes. Please have a look and suggest any improvements.

Here are some metrics I have in mind -

For a general overview

  • Count of bugs over time
  • Bugs by status over time (triaged, new, confirmed, fix released)
  • Bug Submitters over time
  • Most bugs submitted by users (table)
  • Bugs per Organizations
  • Bugs by organizations over time
  • Bugs with the most activity (table)

For timing study or in a different menu (like GitHub issues timing)

  • Status widget with number of bugs, submitters, assignees, median time open
  • Median time open, 80% time open
  • Bugs and submitters over time
  • Bugs by severity
  • Bugs count by their status
  • Time taken to fix (fix commit) a bug or time worked on (time duration from when the bug was marked in progress to fix committed)

A geolocation study can be an option giving us the visualization for

  • Assignee or owner by time zone

@valeriocos
Copy link
Copy Markdown
Member

Thank you @imnitishng for advancing on this pull request. Could you share some launchpad repos to have a common base to test the backend?

#851 (comment)

The metrics look good! I would skip the last metric (Assignee or owner by time zone) using the geolocation study, to avoid adding too much complexity to this PR. We can include studies in a different pull request.

Thanks!

@imnitishng
Copy link
Copy Markdown
Contributor Author

Yes sure.
For now I have been testing on the launchpad repo itself.
We can have the Terminator, MySQL repos as an option too.

I would skip the last metric (Assignee or owner by time zone) using the geolocation study, to avoid adding too much complexity to this PR.

Sure. Waiting on your suggestions for this iteration.

@imnitishng
Copy link
Copy Markdown
Contributor Author

Hi @valeriocos, I will start the work on panels and tests, would love to have a review so we can think upon adding or modifying some attributes.

@valeriocos
Copy link
Copy Markdown
Member

Thank you @imnitishng ! As soon as the work on the Gitter dashboard is completed, I'll review this PR. The idea behind this decision is to concentrate the effort in one thing, finalize it and move then to another task.

@imnitishng
Copy link
Copy Markdown
Contributor Author

Oh, okay sure. Thanks.

@valeriocos
Copy link
Copy Markdown
Member

Sorry @imnitishng , I'm late with this PR. I'll have a look at it at most tomorrow

Copy link
Copy Markdown
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @imnitishng sorry for the late review. Overall the code looks good and easy to ready, I left some comments.

I have tested the PR with:

setup.cfg
[launchpad]
raw_index = launchpad_raw_test
enriched_index = launchpad_enrich_test

projects.json
"launchpad": [
            "https://launchpad.net/launchpad"
        ]

Please check also the comments below:

  • Note that the Perceval backend (ref here) allows also to fetch data from a package in a distribution. This case doesn't seem to be covered by the current implementation

  • Rebase your PR, it isn't aligned with master

  • Start adding tests when you have time, the review process will be easier.

  • Update the PR description with the repos you are using (or plan to) to test the PR.

  • Update the documentation in ELK and mordred

Thanks!

@imnitishng imnitishng force-pushed the launchpad_elk branch 3 times, most recently from 8c98976 to 878ad25 Compare April 27, 2020 03:09
@imnitishng imnitishng changed the base branch from master to GeorgLink-patch-1 April 27, 2020 03:22
@imnitishng imnitishng changed the base branch from GeorgLink-patch-1 to master April 27, 2020 03:22
@imnitishng
Copy link
Copy Markdown
Contributor Author

@valeriocos Should I open a new PR, cannot get rid of this problem after I did a rebase. Please help.

@valeriocos
Copy link
Copy Markdown
Member

Hi @imnitishng , yes open a new PR, if you cannot advance. However, it is convenient to use this PR to get familiar with the rebase.

Can you list the steps you did when rebasing this PR? Thanks!

@imnitishng
Copy link
Copy Markdown
Contributor Author

imnitishng commented Apr 27, 2020

Thank you @valeriocos
Here is what I did,

git fetch upstream
git rebase upstream/master

Could not rebase
Realised I had not committed after adding tests

git commit -m "added tests and schemas" --signoff 
git rebase upstream/master
git rebase -i master

Squashed commits

git push --force-with-lease origin launchpad_elk 

I think it happened because I fetched before committing?

@imnitishng
Copy link
Copy Markdown
Contributor Author

I get the problem, trying to fix it.

@valeriocos
Copy link
Copy Markdown
Member

I guess that the problem could have been with the last instruction: git rebase -i master.
Once you execute git rebase upstream/master, an editor should be prompted to fix the conflicts. Once done, you don't need to rebase with master.

@imnitishng
Copy link
Copy Markdown
Contributor Author

imnitishng commented Apr 27, 2020

Well okay. I also fixed this PR.
@valeriocos, sorry for all the hassle 😅 but what do we do now?
Continue with this or the new PR?
Tests here will fail, will update the PR you choose to work on.

We can close #863 if we use this PR.

@coveralls
Copy link
Copy Markdown

coveralls commented Apr 27, 2020

Pull Request Test Coverage Report for Build 2181

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 40 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.3%) to 81.704%

Files with Coverage Reduction New Missed Lines %
/home/travis/build/chaoss/grimoirelab-elk/grimoire_elk/utils.py 40 64.07%
Totals Coverage Status
Change from base Build 2179: 0.3%
Covered Lines: 8096
Relevant Lines: 9909

💛 - Coveralls

@valeriocos
Copy link
Copy Markdown
Member

Hi @imnitishng , no worries ! I'm fine with any of the two PRs. Please close one and I'll review the other one.

Thanks!

@imnitishng
Copy link
Copy Markdown
Contributor Author

Let's go with this one. Thanks.

@valeriocos
Copy link
Copy Markdown
Member

Hi @imnitishng , can you rebase your PR to include the changes of #864? Thanks!

@imnitishng
Copy link
Copy Markdown
Contributor Author

Done 😄

Copy link
Copy Markdown
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @imnitishng sorry for the late reply! The PR is in good shape, but some work is still needed.

@imnitishng
Copy link
Copy Markdown
Contributor Author

Hi @valeriocos, thank you for the review.
I have some queries, will update the PR after those discussions.

@valeriocos
Copy link
Copy Markdown
Member

Hi @imnitishng , thank you for the quick reply and comments!

I have answered your queries, let me know if I missed something.

@imnitishng
Copy link
Copy Markdown
Contributor Author

Hi @valeriocos I have updated the PR, please have a look.

Copy link
Copy Markdown
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @imnitishng for working on it. We are close to finalize the PR, please check the comments when you have time.

Please make sure that the schema is consistent with the final version of the enricher.

Thanks!

rich_bugtask["time_open_days"] = get_time_diff_days(data['date_created'], datetime_utcnow().replace(tzinfo=None))
else:
rich_bugtask["time_open_days"] = get_time_diff_days(data['date_closed'], data['date_created'])
rich_bugtask["created_to_assigned"] = get_time_diff_days(data['date_created'], data['date_assigned'])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency with the other time metrics, this attribute should start with time

else:
rich_bugtask["time_open_days"] = get_time_diff_days(data['date_closed'], data['date_created'])
rich_bugtask["created_to_assigned"] = get_time_diff_days(data['date_created'], data['date_assigned'])
rich_bugtask['assigned_to_closed'] = get_time_diff_days(data['date_assigned'], data['date_closed'])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

# Extract info related to bug
rich_bugtask.update(self.__extract_bug_info(data['bug_data']))

if len(data['activity_data']) > 1:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the recent change on Perceval (chaoss/grimoirelab-perceval@317e7f8), issues on package distribution cannot be fetched

Collection for launchpad: starting...
  2020-05-03 11:25:45,534 [launchpad] Incremental from: None for https://launchpad.net/python-defaults
  2020-05-03 11:25:45,546 Fetching issues of 'python-defaults' distribution from 1970-01-01 00:00:00+00:00
  2020-05-03 11:25:45,811 Error feeding raw from launchpad (https://launchpad.net/python-defaults): 404 Client Error: Not Found for url: https://api.launchpad.net/1.0/python-defaults?order_by=date_last_updated&omit_duplicates=false&status=New&status=Incomplete&status=Opinion&status=Invalid&status=Won%27t+Fix&status=Expired&status=Confirmed&status=Triaged&status=In+Progress&status=Fix+Committed&status=Fix+Released&status=Incomplete+%28with+response%29&status=Incomplete+%28without+response%29&ws.op=searchTasks&modified_since=1970-01-01T00%3A00%3A00%2B00%3A00
Traceback (most recent call last):
  File "/home/slimbook/Escritorio/sources/ELK/grimoire_elk/elk.py", line 166, in feed_backend
    ocean_backend.feed(**params)
  File "/home/slimbook/Escritorio/sources/ELK/grimoire_elk/raw/elastic.py", line 228, in feed
    self.feed_items(items)
  File "/home/slimbook/Escritorio/sources/ELK/grimoire_elk/raw/elastic.py", line 244, in feed_items
    for item in items:
  File "/home/slimbook/Escritorio/sources/perceval/perceval/backend.py", line 226, in fetch
    for item in self.fetch_items(category, **kwargs):
  File "/home/slimbook/Escritorio/sources/perceval/perceval/backends/core/launchpad.py", line 141, in fetch_items
    for issue in self._fetch_issues(from_date):
  File "/home/slimbook/Escritorio/sources/perceval/perceval/backends/core/launchpad.py", line 221, in _fetch_issues
    for raw_issues in issues_groups:
  File "/home/slimbook/Escritorio/sources/perceval/perceval/backends/core/launchpad.py", line 485, in __fetch_items
    raise e
  File "/home/slimbook/Escritorio/sources/perceval/perceval/backends/core/launchpad.py", line 477, in __fetch_items
    raw_content = self.__send_request(url_next, payload)
  File "/home/slimbook/Escritorio/sources/perceval/perceval/backends/core/launchpad.py", line 445, in __send_request
    r = self.fetch(url, payload=params)
  File "/home/slimbook/Escritorio/sources/perceval/perceval/client.py", line 143, in fetch
    response = self._fetch_from_remote(url, payload, headers, method, stream, auth)
  File "/home/slimbook/Escritorio/sources/perceval/perceval/client.py", line 186, in _fetch_from_remote
    raise e
  File "/home/slimbook/Escritorio/sources/perceval/perceval/client.py", line 181, in _fetch_from_remote
    response.raise_for_status()
  File "/home/slimbook/Escritorio/sources/venv/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.launchpad.net/1.0/python-defaults?order_by=date_last_updated&omit_duplicates=false&status=New&status=Incomplete&status=Opinion&status=Invalid&status=Won%27t+Fix&status=Expired&status=Confirmed&status=Triaged&status=In+Progress&status=Fix+Committed&status=Fix+Released&status=Incomplete+%28with+response%29&status=Incomplete+%28without+response%29&ws.op=searchTasks&modified_since=1970-01-01T00%3A00%3A00%2B00%3A00
  2020-05-03 11:25:45,815 [launchpad] Done collection for https://launchpad.net/python-defaults

Please add a test case for the method get_perceval_params_from_url, thanks

time_open_days,float,true,"Time difference from bug created to bug closed for closed bugs, time created to now for open bugs(in days). """
time_to_close_days,float,true,"Time difference from bug created to bug closed (in days). "
time_to_confirm,float,true,"Time difference from bug created to bug confirmed (in days). "
time_to_first_activity,float,true,"Time difference from bug created to first activity in bug (in days). "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name hasn't been updated

reopened,boolean,true,"True if the bug was reopened. "
security_related,boolean,true,"True if the bug is security related. "
status,keyword,true,"Current status of the bug. "
tags,list,true,"Tags the bug was marked with. "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag is missing (this field comes from Perceval)

for eitem in res.json()['hits']['hits']:
self.assertEqual(eitem['_source']['project'], "Main")

def test_perceval_params(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should enhanced to cover URLs including package info.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included it in the previous iteration.
Had to change it because of the new perceval implementation.
Will add it after packages discussion is finalized.

@imnitishng
Copy link
Copy Markdown
Contributor Author

imnitishng commented May 3, 2020

With the recent change on Perceval (chaoss/grimoirelab-perceval@317e7f8), issues on package distribution cannot be fetched


Collection for launchpad: starting...

    response.raise_for_status()
  File "/home/slimbook/Escritorio/sources/venv/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.launchpad.net/1.0/python-defaults?order_by=date_last_updated&omit_duplicates=false&status=New&status=Incomplete&status=Opinion&status=Invalid&status=Won%27t+Fix&status=Expired&status=Confirmed&status=Triaged&status=In+Progress&status=Fix+Committed&status=Fix+Released&status=Incomplete+%28with+response%29&status=Incomplete+%28without+response%29&ws.op=searchTasks&modified_since=1970-01-01T00%3A00%3A00%2B00%3A00
  2020-05-03 11:25:45,815 [launchpad] Done collection for https://launchpad.net/python-defaults

Hi @valeriocos, regarding this issue.
The implementation does work correctly for distribution packages but the caveat is we need to specify an extra parameter in the setup.cfg file.

[launchpad]
raw_index = launchpad_raw
enriched_index = launchpad_enriched
from-date = 2020-01-15
package = python-defaults

This will be a problem since now we need to change our projects file to add the URL to which this package belongs.

        "launchpad": [
            "https://launchpad.net/ubuntu"
        ],

Also, now we cannot run micro-mordred to fetch and enrich data from both packages and distributions in a single run which completely defies the purpose of projects.json file, since for every new package we will have to change the cfg file. (or remove the package parameter so we can fetch from distribution)
As far as I think we will have to change the perceval backend again, I have this option in mind

  • Remove the package parameter completely as everything was working fine with the last version of backend. And we did support packages as well as distributions.
    So our projects.json file will be something like this
        "launchpad": [
            "https://launchpad.net/launchpad",
            "https://launchpad.net/ubuntu/+source/python-defaults",
        ],

able to fetch data from both distros and packages without any concern.

This will be used for retrieving packages

perceval launchpad ubuntu/+source/python-defaults --from-date '2016-01-01'

This for distribution

perceval launchpad ubuntu --from-date '2016-01-01'

We can add a detailed note in README so the users know usages for both the cases.
Do let me know if you have some other way in mind.

I did think of including an optional boolean parameter --package in perceval if we choose to retrieve data from packages, but that would again cause problems if we specify it as true in setup.cfg and specify the URL of a distribution in projects.json.
The main goal is to run mordred for both packages and distributions in a single run without changing config files everytime.

@valeriocos
Copy link
Copy Markdown
Member

Hi @imnitishng , thank you for having a look at this!

This will be a problem since now we need to change our projects file to add the URL to which this package belongs.

You are right! We cannot pass this param via the setup.cfg, since the latter is not suitable to handle params for the single repositories.

ELK provides the method get_perceval_params_from_url to process the URLs in the projects.json and shape them in a format suitable to execute Perceval. A possible implementation for the method for Launchpad could be the one below:

    @classmethod
    def get_perceval_params_from_url(cls, url):
        params = []
        splits = url.split('/')
        if '+source' in url:
            params.append(splits[-3])
            params.append('--package')
            params.append(splits[-1])
        else:
            params.append(splits[-1])

        return params

Similar solutions are applied to other backends to pass optional params:

WDYT?

This commit adds support for launchpad perceval backend.
Added raw and enriched indexes.
Added tests and schemas.

Signed-off-by: Nitish Gupta <imnitish.ng@gmail.com>
@imnitishng
Copy link
Copy Markdown
Contributor Author

@valeriocos, thank you. So stupid of me, I did not think it this way.
I have updated the PR, please have a look. Let me know if some more work is needed.

Copy link
Copy Markdown
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome @imnitishng ! No worries!

LGTM, thanks! good job!

@valeriocos valeriocos merged commit 493bd04 into chaoss:master May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants