Skip to content

Commit aec31c4

Browse files
authored
to #76994268, fix bug of BST component (default dropout rate) (#546)
* to #76994268, fix bug of BST component (default dropout rate) * upgrade zero inflated lognormal loss, support export structure path * add document
1 parent f09c58f commit aec31c4

File tree

20 files changed

+593
-262
lines changed

20 files changed

+593
-262
lines changed

.github/workflows/code_style.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,15 @@ jobs:
1515
with:
1616
ref: ${{ github.event.pull_request.head.sha }}
1717
submodules: recursive
18+
1819
- name: RunCiTest
1920
id: run_ci_test
2021
env:
2122
TEST_DEVICES: ""
2223
PULL_REQUEST_NUM: ${{ github.event.pull_request.number }}
2324
run: |
2425
source ~/.bashrc
25-
source activate tf25_py3
26+
conda activate tf25_py3
2627
pre-commit run -a
2728
if [ $? -eq 0 ]
2829
then

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,5 @@ pai_jobs/easy_rec*.tar.gz
3434

3535
.DS_Store
3636
.python-version
37+
easy_rec/python/test/odps_input_v3_test.py
38+
easy_rec/python/test.py

docs/source/export.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ export_config {
88
```
99

1010
- batch_size: 导出模型的batch_size,默认是-1,即可以接收任意batch_size
11-
- exporter_type: 导出类型, best | final | latest | none,默认final
11+
- exporter_type: 导出类型, best | final | latest | none,默认final
1212
- best 导出最好的模型
1313
- final 训练结束后导出
1414
- latest 导出最新的模型
@@ -88,8 +88,8 @@ pai -name easy_rec_ext -project algo_public
8888
- -Dconfig: 同训练
8989
- -Dcmd: export 模型导出
9090
- -Dexport_dir: 导出的目录
91-
- -Dcheckpoint_path: 使用指定的checkpoint_path
92-
- -Darn: rolearn  注意这个的arn要替换成客户自己的。可以从dataworks的设置中查看arn。
91+
- -Dcheckpoint_path: 可选参数,使用指定的checkpoint_path导出
92+
- -Darn: rolearn 注意这个的arn要替换成客户自己的。可以从dataworks的设置中查看arn。
9393
- -DossHost: ossHost地址
9494
- -Dbuckets: config所在的bucket和保存模型的bucket; 如果有多个bucket,逗号分割
9595
- 如果是pai内部版,则不需要指定arn和ossHost, arn和ossHost放在-Dbuckets里面
@@ -98,6 +98,9 @@ pai -name easy_rec_ext -project algo_public
9898
- --export_done_file: 导出完成标志文件名, 导出完成后,在导出目录下创建一个文件表示导出完成了
9999
- --clear_export: 删除旧的导出文件目录
100100
- --place_embedding_on_cpu: 将embedding相关的操作放在cpu上,有助于提升模型在gpu环境下的推理速度
101+
- --asset_files: 需要导出的asset文件路径, 可设置多个, 逗号分隔;
102+
- 如果需要导出到assets目录的子目录下,使用`${target_path}:${source_path}`的格式;(从版本0.8.7开始支持)
103+
- e.g. '--asset_files custom_fg_lib/fg.json:oss://${bucket}/path/to/fg.json'
101104
- 模型导出之后可以使用(EasyRecProcessor)\[./predict/在线预测.md\]部署到PAI-EAS平台
102105

103106
### 双塔召回模型

docs/source/feature/feature.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,8 @@ RawFeature:连续值特征
130130
-DossHost=oss-cn-beijing-internal.aliyuncs.com
131131
-Dwith_evaluator=1;
132132
133+
放置在`DEEP`类型的`feature_group`中时,按照下面的方式配置:
134+
133135
.. code:: protobuf
134136
135137
feature_config:{
@@ -151,6 +153,8 @@ RawFeature:连续值特征
151153
}
152154
}
153155
156+
备注:放置在`WIDE`类型的`feature_group`中时,必须要配置`embedding_dim: 1`。
157+
154158
分箱组件使用方法见: `机器学习组件 <https://help.aliyun.com/document_detail/54352.html>`_
155159
也可以手动导入分箱信息。如下:
156160

docs/source/feature/fg.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ eascmd -i <AccessKeyID> -k <AccessKeySecret> -e <EndPoint> update ali_rec_rn
182182
- processor: easyrec processor, 目前最新的版本为easyrec-3.0, [历史版本](../predict/processor.md#release).
183183
- model_config: eas 部署配置。主要控制把 item 特征加载到内存中。目前数据源支持redis和holo
184184
- period: item feature reload period, 单位minutes
185-
- url: holo url, 格式为postgresql://<AccessKeyID>:<AccessKeySecret>@<域名>:<port>/<database>
185+
- url: holo url, 格式为postgresql://<AccessKeyID>:<AccessKeySecret>@\<域名>:<port>/<database>
186186
- fg_mode: 支持tf和normal两种模式, tf模式表示fg是以TF算子的方式执行的, 性能更好
187187
- tables: item特征存储在hologres表里面, 支持分多个表存储
188188
- key: 必填, itemId列的名字;
@@ -191,7 +191,7 @@ eascmd -i <AccessKeyID> -k <AccessKeySecret> -e <EndPoint> update ali_rec_rn
191191
- timekey: 可选,用于item的增量更新,支持的格式: timestamp和int
192192
- static: 可选, 表示是静态特征,不用周期性更新
193193
- 支持多个item表, 如果多张表有重复的列, 后面的表覆盖前面的表
194-
- "tables": [{"key":"table1", ...},{"key":"table2", ...}]
194+
- "tables": \[{"key":"table1", ...},{"key":"table2", ...}\]
195195
- 如果多张表有重复的列,后面的表将覆盖前面的表
196196
- hologres表里面每一列存储一个item特征,示例:
197197
<table class="docutils" border=1>
@@ -202,7 +202,7 @@ eascmd -i <AccessKeyID> -k <AccessKeySecret> -e <EndPoint> update ali_rec_rn
202202
</table>
203203
- remote_type: Item特征数据源, 目前支持:`hologres`, `none`
204204
- hologres:通过SQL接口进行数据读取和写入,适用于海量数据的存储和查询
205-
- none: 不使用Item特征缓存,item特征通过请求传入,此时tables应设置为[]
205+
- none: 不使用Item特征缓存,item特征通过请求传入,此时tables应设置为\[\]
206206
- storage: 将oss的模型目录mount到docker的指定目录下
207207
- mount_path: docker内部的挂载路径, 与示例保持一致即可
208208
- 配置了storage就不需要配置model_path了

docs/source/models/loss.md

Lines changed: 58 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2
5353

5454
下面的配置可以同时使用`F1_REWEIGHTED_LOSS``PAIR_WISE_LOSS`,总的loss为这两个损失函数的加权求和。
5555

56-
```
56+
```protobuf
5757
losses {
5858
loss_type: F1_REWEIGHTED_LOSS
5959
weight: 1.0
@@ -71,7 +71,7 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2
7171

7272
可以调节二分类模型recall/precision相对权重的损失函数,配置如下:
7373

74-
```
74+
```protobuf
7575
{
7676
loss_type: F1_REWEIGHTED_LOSS
7777
f1_reweight_loss {
@@ -134,6 +134,10 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2
134134
- session_name: list分组的字段名,比如user_id
135135
- 参考论文:《 [Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model](https://arxiv.org/pdf/2208.06164.pdf)
136136
- 使用示例: [dbmtl_with_jrc_loss.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/dbmtl_on_taobao_with_multi_loss.config)
137+
- 有几个注意点:
138+
1. JRC_Loss不要和普通二分类loss一起使用,因为它内部已经包含了二分类loss了,最好是先单独使用
139+
1. JRC_Loss依赖mini-batch类的同session样本对,因此样本不能全局随机打散; 要把样本按照session_id 分组,同一组的样本需要shuffle到一起;(考验sql功力,如果搞不定不分组也可以,但需要保证同一个session的样本尽量排在一起,即group by session_id)
140+
1. 模型训练时`batch_size`尽可能大,在内存能够支撑的前提下`batch_size`调到最大(比如,8192)之后,再调整其他参数(如果需要的话)
137141

138142
- LISTWISE_RANK_LOSS 的参数配置
139143

@@ -142,6 +146,33 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2
142146
- label_is_logits: bool, 标记label是否为teacher模型的输出logits,默认为false
143147
- scale_logits: bool, 是否需要对模型的logits进行线性缩放,默认为false
144148

149+
- ZILN_LOSS 的参数配置
150+
151+
- mu_regularization: mu参数的正则化系数,默认值为0.01
152+
- sigma_regularization: sigma参数的正则化系数,默认值为0.01
153+
- max_sigma: sigma参数的最大值,默认值为5.0(sigma>5 就会让均值乘上 exp(0.5\*25) ≈ 2.7e5 的因子,已经很激进)
154+
- max_log_clip_value: log(预测值)的最大值,默认值为20.0(最大预测值默认为exp(20))
155+
- return_log_pred_value: 是否返回log(预测值),默认值为false
156+
- classification_weight: 分类任务的权重,默认值为1.0
157+
- regression_weight: 回归任务的权重,默认值为1.0;零值越多,建议分类权重越小,回归权重越大
158+
- 配置示例如下
159+
```protobuf
160+
losses {
161+
loss_type: ZILN_LOSS
162+
weight: 1.0
163+
loss_name: "LTV"
164+
ziln_loss {
165+
mu_regularization: 0.01
166+
sigma_regularization: 0.01
167+
max_log_clip_value: 20.0
168+
max_sigma: 5.0
169+
return_log_pred_value: false
170+
classification_weight: 1.0
171+
regression_weight: 1.0
172+
}
173+
}
174+
```
175+
145176
排序模型同时使用多个损失函数的完整示例:
146177
[cmbf_with_multi_loss.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/cmbf_with_multi_loss.config)
147178
@@ -183,6 +214,31 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2
183214
- loss_weight_strategy: Random
184215
- 表示损失函数的权重设定为归一化的随机数
185216

217+
### 根据样本设定损失函数权重(Masked Loss)
218+
219+
多目标学习任务中,通常需要根据样本的属性来设定损失函数的权重。
220+
221+
#### 根据样本属性设定损失函数权重
222+
223+
在某个目标的tower里配置`task_space_indicator_name``task_space_indicator_value`
224+
225+
- task_space_indicator_name 是特征名
226+
- task_space_indicator_value 是特征值
227+
- in_task_space_weight 目标样本的loss权重,默认值为1.0
228+
- out_task_space_weight 非目标样本的loss权重,默认值为1.0
229+
230+
如果样本的特征值与你配置的`task_space_indicator_value`相等, loss 权重 * `in_task_space_weight`;
231+
不相等则loss权重 * `out_task_space_weight`
232+
233+
`out_task_space_weight`的值改为0.0,则可实现`Masked Loss`
234+
235+
#### 根据样本label来设定损失函数权重
236+
237+
在某个目标的tower里配置`task_space_indicator_label`这个字段,标记一个 label 的名字,
238+
如果这个label的值大于0, 则 loss 权重 \*`in_task_space_weight`; 否则 loss权重 * `out_task_space_weight`
239+
240+
`out_task_space_weight`的值改为0.0,则可实现`Masked Loss`
241+
186242
### 参考论文:
187243

188244
- 《 Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics 》

docs/source/train.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@
6060
- 使用SyncReplicasOptimizer进行分布式训练(同步模式)
6161
- 仅在train_distribute为NoStrategy时可以设置成true,其它情况应该设置为false
6262
- PS异步训练也设置为false
63-
- 注意在设置为 true 时,总共的训练步数为:min(total_sample_num \* num_epochs / batch_size, num_steps) / num_workers
63+
- 注意在设置为 true 时,总共的训练步数为:min(total_sample_num * num_epochs / batch_size, num_steps) / num_workers
6464
6565
- train_distribute: 默认不开启Strategy(NoStrategy), strategy确定分布式执行的方式, 可以分成两种模式: PS-Worker模式 和 All-Reduce模式
6666

easy_rec/python/builders/loss_builder.py

Lines changed: 39 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,19 @@
77

88
from easy_rec.python.loss.focal_loss import sigmoid_focal_loss_with_logits
99
from easy_rec.python.loss.jrc_loss import jrc_loss
10-
from easy_rec.python.loss.listwise_loss import listwise_distill_loss
11-
from easy_rec.python.loss.listwise_loss import listwise_rank_loss
12-
from easy_rec.python.loss.pairwise_loss import pairwise_focal_loss
13-
from easy_rec.python.loss.pairwise_loss import pairwise_hinge_loss
14-
from easy_rec.python.loss.pairwise_loss import pairwise_logistic_loss
15-
from easy_rec.python.loss.pairwise_loss import pairwise_loss
1610
from easy_rec.python.protos.loss_pb2 import LossType
1711

18-
from easy_rec.python.loss.zero_inflated_lognormal import zero_inflated_lognormal_loss # NOQA
19-
20-
from easy_rec.python.loss.f1_reweight_loss import f1_reweight_sigmoid_cross_entropy # NOQA
12+
from easy_rec.python.loss.f1_reweight_loss import ( # NOQA
13+
f1_reweight_sigmoid_cross_entropy,)
14+
from easy_rec.python.loss.listwise_loss import ( # NOQA
15+
listwise_distill_loss, listwise_rank_loss,
16+
)
17+
from easy_rec.python.loss.pairwise_loss import ( # NOQA
18+
pairwise_focal_loss, pairwise_hinge_loss, pairwise_logistic_loss,
19+
pairwise_loss,
20+
)
21+
from easy_rec.python.loss.zero_inflated_lognormal import ( # NOQA
22+
zero_inflated_lognormal_loss,)
2123

2224
if tf.__version__ >= '2.0':
2325
tf = tf.compat.v1
@@ -36,8 +38,10 @@ def build(loss_type,
3638
return tf.losses.sigmoid_cross_entropy(
3739
label, logits=pred, weights=loss_weight, **kwargs)
3840
else:
39-
assert label.dtype in [tf.int32, tf.int64], \
40-
'label.dtype must in [tf.int32, tf.int64] when use sparse_softmax_cross_entropy.'
41+
assert label.dtype in [
42+
tf.int32,
43+
tf.int64,
44+
], 'label.dtype must in [tf.int32, tf.int64] when use sparse_softmax_cross_entropy.'
4145
return tf.losses.sparse_softmax_cross_entropy(
4246
labels=label, logits=pred, weights=loss_weight, **kwargs)
4347
elif loss_type == LossType.CROSS_ENTROPY_LOSS:
@@ -50,7 +54,23 @@ def build(loss_type,
5054
return tf.losses.mean_squared_error(
5155
labels=label, predictions=pred, weights=loss_weight, **kwargs)
5256
elif loss_type == LossType.ZILN_LOSS:
53-
loss = zero_inflated_lognormal_loss(label, pred)
57+
if loss_param is None:
58+
loss = zero_inflated_lognormal_loss(label, pred)
59+
else:
60+
mu_reg = loss_param.mu_regularization
61+
sigma_reg = loss_param.sigma_regularization
62+
max_sigma = loss_param.max_sigma
63+
class_weight = loss_param.classification_weight
64+
reg_weight = loss_param.regression_weight
65+
loss = zero_inflated_lognormal_loss(
66+
label,
67+
pred,
68+
max_sigma=max_sigma,
69+
mu_reg=mu_reg,
70+
sigma_reg=sigma_reg,
71+
class_weight=class_weight,
72+
reg_weight=reg_weight,
73+
)
5474
if np.isscalar(loss_weight) and loss_weight != 1.0:
5575
return loss * loss_weight
5676
return loss
@@ -219,9 +239,9 @@ def build_kd_loss(kds, prediction_dict, label_dict, feature_dict):
219239
"""
220240
loss_dict = {}
221241
for kd in kds:
222-
assert kd.pred_name in prediction_dict, \
223-
'invalid predict_name: %s available ones: %s' % (
224-
kd.pred_name, ','.join(prediction_dict.keys()))
242+
assert kd.pred_name in prediction_dict, 'invalid predict_name: %s available ones: %s' % (
243+
kd.pred_name,
244+
','.join(prediction_dict.keys()))
225245

226246
loss_name = kd.loss_name
227247
if not loss_name:
@@ -232,8 +252,10 @@ def build_kd_loss(kds, prediction_dict, label_dict, feature_dict):
232252
if kd.HasField('task_space_indicator_name') and kd.HasField(
233253
'task_space_indicator_value'):
234254
in_task_space = tf.to_float(
235-
tf.equal(feature_dict[kd.task_space_indicator_name],
236-
kd.task_space_indicator_value))
255+
tf.equal(
256+
feature_dict[kd.task_space_indicator_name],
257+
kd.task_space_indicator_value,
258+
))
237259
loss_weight = loss_weight * (
238260
kd.in_task_space_weight * in_task_space + kd.out_task_space_weight *
239261
(1 - in_task_space))

easy_rec/python/compat/optimizers.py

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -464,16 +464,20 @@ def _get_grad_norm(grads_and_vars, embedding_parallel=False):
464464
sparse_norms.append(gen_nn_ops.l2_loss(grad.values))
465465
else:
466466
dense_norms.append(gen_nn_ops.l2_loss(grad))
467-
reduced_norms = hvd.grouped_allreduce(
468-
part_norms, op=hvd.Sum, compression=hvd.compression.NoneCompressor)
469-
sparse_norms = sparse_norms + reduced_norms
470-
all_norms = reduced_norms + dense_norms
471-
sparse_norm = math_ops.sqrt(
472-
math_ops.reduce_sum(array_ops.stack(sparse_norms) * 2.0))
473-
dense_norm = math_ops.sqrt(
474-
math_ops.reduce_sum(array_ops.stack(dense_norms) * 2.0))
467+
if hvd is not None and part_norms:
468+
reduced_norms = hvd.grouped_allreduce(
469+
part_norms, op=hvd.Sum, compression=hvd.compression.NoneCompressor)
470+
sparse_norms = sparse_norms + reduced_norms
471+
all_norms = sparse_norms + dense_norms
472+
sparse_norm = (
473+
math_ops.sqrt(math_ops.reduce_sum(array_ops.stack(sparse_norms) * 2.0))
474+
if sparse_norms else tf.constant(0.0))
475+
dense_norm = (
476+
math_ops.sqrt(math_ops.reduce_sum(array_ops.stack(dense_norms) * 2.0))
477+
if dense_norms else tf.constant(0.0))
475478
grad_norm = math_ops.sqrt(
476-
math_ops.reduce_sum(array_ops.stack(all_norms)) * 2.0)
479+
math_ops.reduce_sum(array_ops.stack(all_norms)) *
480+
2.0) if all_norms else tf.constant(0.0)
477481
return sparse_norm, dense_norm, grad_norm
478482

479483

easy_rec/python/layers/cmbf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -329,6 +329,7 @@ def __call__(self, is_training, *args, **kwargs):
329329
if not is_training:
330330
self._model_config.hidden_dropout_prob = 0.0
331331
self._model_config.attention_probs_dropout_prob = 0.0
332+
self._model_config.text_seq_emb_dropout_prob = 0.0
332333

333334
# shape: [batch_size, image_num/image_dim, hidden_size]
334335
img_attention_fea = self.image_self_attention_tower()

0 commit comments

Comments
 (0)