Hyperparameters estimation for LDA #193

alex2304 · 2017-12-04T11:09:42Z

Hello, we want to contribute to MeTa several new features. Namely, realization of three methods for estimating constant mu of the current realization of Dirichlet prior smoothing.

The ranker based on Dirichlet prior smoothing implemented in MeTa uses parameter mu for smoothing. For now, the only way to use it is to either pass own value for the parameter or to use default mu = 2000. However, it's possible to find optimal value of the parameter for a particular set of documents (see H. Wallach, 2008, p. 18) which will provide the most effective smoothing. In our contribution, we implemented three methods for estimating such optimal value of the parameter mu using given parameters of the documents set.

Implemented methods are originally introduced by (H. Wallach, 2008, pages 26-30). In fact, these methods are based on several modifications of Fixed-Point Iteration method and provide better performance.

Considering project architecture, we implemented each new method as separate ranker (see picture with classes hierarchy). Also, we added ability to use such new rankers by specifying the following in the .toml config file:

[ranker]
method = "dirichlet-digamma-rec"

Full list of methods available:

dirichlet-digamma-rec - Fixed-Point Iteration by (Minka, 2003) using digamma recurrence relation
digamma-log-approx - Fixed-Point Iteration by (Minka, 2003) using logarithmic approximation of digamma differences
digamma-mackay-peto - Fixed-Point Iteration by (MacKay and Peto, 1995) with efficient computing of some inner parameter

We also verified that methods work as expected, i.e. found parameter mu is really optimal. To do this, we generated synthetic data using Dirichlet distribution with predefined parameters, and then compared results with predefined values, as it was done in H. Wallach, 2008. As in the work of H. Wallach, we used three metrics for evaluating methods performance:

Execution time
Kullback-Leibler Divergence between "true" and computed distributions
Relative error of mu

Parameters of synthetic data we used and results of methods comparison are presented here.

alex2304 and others added 30 commits November 18, 2017 17:52

+ added optimization.h and .gitignore updated

1aa7032

[opt] dirichlet_optimizer class, digamma function

5e3e23c

[opt] minka_fpi method draft

b8dbc7d

[opt] optimization.h errors fixed, test without MeTa

3dc03a8

[opt] debug output

eeb9168

Adding optimization.cpp

766754f

Merge branch 'develop' of https://github.com/alex2304/meta into develop

98c3e7d

[opt] classes for methods in dirichlet_prior

e9c99df

Merge branch 'develop' of https://github.com/alex2304/meta into develop

b11e704

Deletion of previous stuff

54d7272

Test for dirichlet optimizations

6585189

Private/public methods

c0a357c

[opt] test indexes

76d32ae

Interface for methods

4ccda58

Refactoring of optimization interface

248c151

[opt] tmp for merge

61ece78

Tests for all functions at same time

f979264

[opt] + term_ids()

ba00c86

[opt] merged dirichlet_prior

1f13f95

[opt] + first method without testing

ed475b5

[opt] *first method builds

4528ec6

[opt] * method works

312a485

[opt] *first method debugged

b60cc54

[opt] method refactored

0a0851c

[opt] + method2

d726f70

Adding constructors and register for new ranker classes

4a6a240

Merge branch 'develop' of https://github.com/alex2304/meta into develop

f55e0de

Add rankers to factory

bc948ce

[opt] + benchmark

78d6d5c

Merge branch 'develop' of https://github.com/alex2304/meta into develop

25d89d1

MakKolts and others added 6 commits November 30, 2017 21:37

Minor fix foor output

5bc6ee6

[opt] + dirichlet_opt files

4f8fa1d

[opt] + dirichlet_prior_opt

c8ddfbf

[opt] + MacKay and Peto method

f7b634a

[opt] + comments and docs

d4b0a8d

[opt] - test files

001fac6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hyperparameters estimation for LDA #193

Hyperparameters estimation for LDA #193

Uh oh!

alex2304 commented Dec 4, 2017 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hyperparameters estimation for LDA #193

Are you sure you want to change the base?

Hyperparameters estimation for LDA #193

Uh oh!

Conversation

alex2304 commented Dec 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex2304 commented Dec 4, 2017 •

edited

Loading