alibaba
diff --git a/‎.github/workflows/code_style.yml‎
Lines changed: 2 additions & 1 deletion b/‎.github/workflows/code_style.yml‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/source/export.md‎
Lines changed: 6 additions & 3 deletions b/‎docs/source/export.md‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎docs/source/feature/feature.rst‎
Lines changed: 4 additions & 0 deletions b/‎docs/source/feature/feature.rst‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/source/feature/fg.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/source/feature/fg.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/source/models/loss.md‎
Lines changed: 58 additions & 2 deletions b/‎docs/source/models/loss.md‎
Lines changed: 58 additions & 2 deletions
diff --git a/‎docs/source/train.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/train.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎easy_rec/python/builders/loss_builder.py‎
Lines changed: 39 additions & 17 deletions b/‎easy_rec/python/builders/loss_builder.py‎
Lines changed: 39 additions & 17 deletions
diff --git a/‎easy_rec/python/compat/optimizers.py‎
Lines changed: 13 additions & 9 deletions b/‎easy_rec/python/compat/optimizers.py‎
Lines changed: 13 additions & 9 deletions
diff --git a/‎easy_rec/python/layers/cmbf.py‎
Lines changed: 1 addition & 0 deletions b/‎easy_rec/python/layers/cmbf.py‎
Lines changed: 1 addition & 0 deletions
@@ -15,14 +15,15 @@ jobs:
         with:
           ref: ${{ github.event.pull_request.head.sha }}
           submodules: recursive
+
       - name: RunCiTest
         id: run_ci_test
         env:
           TEST_DEVICES: ""
           PULL_REQUEST_NUM: ${{ github.event.pull_request.number }}
         run: |
            source ~/.bashrc
-           source activate tf25_py3
+           conda activate tf25_py3
            pre-commit run -a
            if [ $? -eq 0 ]
            then
 
@@ -34,3 +34,5 @@ pai_jobs/easy_rec*.tar.gz
 
 .DS_Store
 .python-version
+easy_rec/python/test/odps_input_v3_test.py
+easy_rec/python/test.py
@@ -8,7 +8,7 @@ export_config {
 ```
 
 - batch_size: 导出模型的batch_size，默认是-1，即可以接收任意batch_size
-- exporter_type: 导出类型,  best | final | latest | none，默认final
+- exporter_type: 导出类型, best | final | latest | none，默认final
   - best 导出最好的模型
   - final 训练结束后导出
   - latest 导出最新的模型
@@ -88,8 +88,8 @@ pai -name easy_rec_ext -project algo_public
 - -Dconfig: 同训练
 - -Dcmd: export 模型导出
 - -Dexport_dir: 导出的目录
-- -Dcheckpoint_path: 使用指定的checkpoint_path
-- -Darn: rolearn  注意这个的arn要替换成客户自己的。可以从dataworks的设置中查看arn。
+- -Dcheckpoint_path: 可选参数，使用指定的checkpoint_path导出
+- -Darn: rolearn 注意这个的arn要替换成客户自己的。可以从dataworks的设置中查看arn。
 - -DossHost: ossHost地址
 - -Dbuckets: config所在的bucket和保存模型的bucket; 如果有多个bucket，逗号分割
 - 如果是pai内部版,则不需要指定arn和ossHost, arn和ossHost放在-Dbuckets里面
@@ -98,6 +98,9 @@ pai -name easy_rec_ext -project algo_public
   - --export_done_file: 导出完成标志文件名, 导出完成后，在导出目录下创建一个文件表示导出完成了
   - --clear_export: 删除旧的导出文件目录
   - --place_embedding_on_cpu: 将embedding相关的操作放在cpu上，有助于提升模型在gpu环境下的推理速度
+  - --asset_files: 需要导出的asset文件路径, 可设置多个, 逗号分隔；
+    - 如果需要导出到assets目录的子目录下，使用`${target_path}:${source_path}`的格式；（从版本0.8.7开始支持）
+    - e.g. '--asset_files custom_fg_lib/fg.json:oss://${bucket}/path/to/fg.json'
 - 模型导出之后可以使用(EasyRecProcessor)\[./predict/在线预测.md\]部署到PAI-EAS平台
 
 ### 双塔召回模型
 
@@ -130,6 +130,8 @@ RawFeature：连续值特征
     -DossHost=oss-cn-beijing-internal.aliyuncs.com
     -Dwith_evaluator=1;
 
+放置在`DEEP`类型的`feature_group`中时，按照下面的方式配置：
+
 .. code:: protobuf
 
   feature_config:{
@@ -151,6 +153,8 @@ RawFeature：连续值特征
     }
   }
 
+备注：放置在`WIDE`类型的`feature_group`中时，必须要配置`embedding_dim: 1`。
+
 分箱组件使用方法见： `机器学习组件 <https://help.aliyun.com/document_detail/54352.html>`_
 也可以手动导入分箱信息。如下：
 
 
@@ -182,7 +182,7 @@ eascmd -i <AccessKeyID>  -k  <AccessKeySecret>   -e <EndPoint> update ali_rec_rn
 - processor: easyrec processor, 目前最新的版本为easyrec-3.0, [历史版本](../predict/processor.md#release).
 - model_config: eas 部署配置。主要控制把 item 特征加载到内存中。目前数据源支持redis和holo
   - period: item feature reload period, 单位minutes
-  - url: holo url, 格式为postgresql://<AccessKeyID>:<AccessKeySecret>@<域名>:<port>/<database>
+  - url: holo url, 格式为postgresql://<AccessKeyID>:<AccessKeySecret>@\<域名>:<port>/<database>
   - fg_mode: 支持tf和normal两种模式, tf模式表示fg是以TF算子的方式执行的, 性能更好
   - tables: item特征存储在hologres表里面, 支持分多个表存储
     - key: 必填, itemId列的名字;
@@ -191,7 +191,7 @@ eascmd -i <AccessKeyID>  -k  <AccessKeySecret>   -e <EndPoint> update ali_rec_rn
     - timekey: 可选，用于item的增量更新，支持的格式: timestamp和int
     - static: 可选, 表示是静态特征，不用周期性更新
     - 支持多个item表, 如果多张表有重复的列, 后面的表覆盖前面的表
-      - "tables": [{"key":"table1", ...},{"key":"table2", ...}]
+      - "tables": \[{"key":"table1", ...},{"key":"table2", ...}\]
       - 如果多张表有重复的列，后面的表将覆盖前面的表
     - hologres表里面每一列存储一个item特征,示例:
       <table class="docutils" border=1>
@@ -202,7 +202,7 @@ eascmd -i <AccessKeyID>  -k  <AccessKeySecret>   -e <EndPoint> update ali_rec_rn
       </table>
   - remote_type: Item特征数据源, 目前支持：`hologres`, `none`
     - hologres：通过SQL接口进行数据读取和写入，适用于海量数据的存储和查询
-    - none: 不使用Item特征缓存，item特征通过请求传入，此时tables应设置为[]
+    - none: 不使用Item特征缓存，item特征通过请求传入，此时tables应设置为\[\]
 - storage: 将oss的模型目录mount到docker的指定目录下
   - mount_path: docker内部的挂载路径, 与示例保持一致即可
   - 配置了storage就不需要配置model_path了
 
@@ -53,7 +53,7 @@ EasyRec支持两种损失函数配置方式：1）使用单个损失函数；2
 
 下面的配置可以同时使用`F1_REWEIGHTED_LOSS`和`PAIR_WISE_LOSS`，总的loss为这两个损失函数的加权求和。
 
-```
+```protobuf
   losses {
     loss_type: F1_REWEIGHTED_LOSS
     weight: 1.0
@@ -71,7 +71,7 @@ EasyRec支持两种损失函数配置方式：1）使用单个损失函数；2
 
   可以调节二分类模型recall/precision相对权重的损失函数，配置如下：
 
-  ```
+  ```protobuf
   {
     loss_type: F1_REWEIGHTED_LOSS
     f1_reweight_loss {
@@ -134,6 +134,10 @@ EasyRec支持两种损失函数配置方式：1）使用单个损失函数；2
   - session_name: list分组的字段名，比如user_id
   - 参考论文：《 [Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model](https://arxiv.org/pdf/2208.06164.pdf) 》
   - 使用示例: [dbmtl_with_jrc_loss.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/dbmtl_on_taobao_with_multi_loss.config)
+  - 有几个注意点：
+    1. JRC_Loss不要和普通二分类loss一起使用，因为它内部已经包含了二分类loss了，最好是先单独使用
+    1. JRC_Loss依赖mini-batch类的同session样本对，因此样本不能全局随机打散； 要把样本按照session_id 分组，同一组的样本需要shuffle到一起；（考验sql功力，如果搞不定不分组也可以，但需要保证同一个session的样本尽量排在一起，即group by session_id）
+    1. 模型训练时`batch_size`尽可能大，在内存能够支撑的前提下`batch_size`调到最大（比如，8192）之后，再调整其他参数（如果需要的话）
 
 - LISTWISE_RANK_LOSS 的参数配置
 
@@ -142,6 +146,33 @@ EasyRec支持两种损失函数配置方式：1）使用单个损失函数；2
   - label_is_logits: bool, 标记label是否为teacher模型的输出logits，默认为false
   - scale_logits: bool, 是否需要对模型的logits进行线性缩放，默认为false
 
+- ZILN_LOSS 的参数配置
+
+  - mu_regularization: mu参数的正则化系数，默认值为0.01
+  - sigma_regularization: sigma参数的正则化系数，默认值为0.01
+  - max_sigma: sigma参数的最大值，默认值为5.0（sigma>5 就会让均值乘上 exp(0.5\*25) ≈ 2.7e5 的因子，已经很激进）
+  - max_log_clip_value: log(预测值)的最大值，默认值为20.0（最大预测值默认为exp(20)）
+  - return_log_pred_value: 是否返回log(预测值)，默认值为false
+  - classification_weight: 分类任务的权重，默认值为1.0
+  - regression_weight: 回归任务的权重，默认值为1.0；零值越多，建议分类权重越小，回归权重越大
+  - 配置示例如下
+    ```protobuf
+    losses {
+      loss_type: ZILN_LOSS
+      weight: 1.0
+      loss_name: "LTV"
+      ziln_loss {
+        mu_regularization: 0.01
+        sigma_regularization: 0.01
+        max_log_clip_value: 20.0
+        max_sigma: 5.0
+        return_log_pred_value: false
+        classification_weight: 1.0
+        regression_weight: 1.0
+      }
+    }
+    ```
+
 排序模型同时使用多个损失函数的完整示例：
 [cmbf_with_multi_loss.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/cmbf_with_multi_loss.config)
 
@@ -183,6 +214,31 @@ EasyRec支持两种损失函数配置方式：1）使用单个损失函数；2
 - loss_weight_strategy: Random
   - 表示损失函数的权重设定为归一化的随机数
 
+### 根据样本设定损失函数权重（Masked Loss）
+
+多目标学习任务中，通常需要根据样本的属性来设定损失函数的权重。
+
+#### 根据样本属性设定损失函数权重
+
+在某个目标的tower里配置`task_space_indicator_name`和`task_space_indicator_value`
+
+- task_space_indicator_name 是特征名
+- task_space_indicator_value 是特征值
+- in_task_space_weight 目标样本的loss权重，默认值为1.0
+- out_task_space_weight 非目标样本的loss权重，默认值为1.0
+
+如果样本的特征值与你配置的`task_space_indicator_value`相等, loss 权重 * `in_task_space_weight`;
+不相等则loss权重 * `out_task_space_weight`。
+
+`out_task_space_weight`的值改为0.0，则可实现`Masked Loss`。
+
+#### 根据样本label来设定损失函数权重
+
+在某个目标的tower里配置`task_space_indicator_label`这个字段，标记一个 label 的名字，
+如果这个label的值大于0, 则 loss 权重 \*`in_task_space_weight`; 否则 loss权重 * `out_task_space_weight`。
+
+`out_task_space_weight`的值改为0.0，则可实现`Masked Loss`。
+
 ### 参考论文：
 
 - 《 Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics 》
 
@@ -60,7 +60,7 @@
   - 使用SyncReplicasOptimizer进行分布式训练(同步模式)
   - 仅在train_distribute为NoStrategy时可以设置成true，其它情况应该设置为false
   - PS异步训练也设置为false
-  - 注意在设置为 true 时，总共的训练步数为：min(total_sample_num \* num_epochs / batch_size, num_steps) / num_workers
+  - 注意在设置为 true 时，总共的训练步数为：min(total_sample_num * num_epochs / batch_size, num_steps) / num_workers
 
 - train_distribute: 默认不开启Strategy(NoStrategy), strategy确定分布式执行的方式, 可以分成两种模式: PS-Worker模式 和 All-Reduce模式
 
 
@@ -7,17 +7,19 @@
 
 from easy_rec.python.loss.focal_loss import sigmoid_focal_loss_with_logits
 from easy_rec.python.loss.jrc_loss import jrc_loss
-from easy_rec.python.loss.listwise_loss import listwise_distill_loss
-from easy_rec.python.loss.listwise_loss import listwise_rank_loss
-from easy_rec.python.loss.pairwise_loss import pairwise_focal_loss
-from easy_rec.python.loss.pairwise_loss import pairwise_hinge_loss
-from easy_rec.python.loss.pairwise_loss import pairwise_logistic_loss
-from easy_rec.python.loss.pairwise_loss import pairwise_loss
 from easy_rec.python.protos.loss_pb2 import LossType
 
-from easy_rec.python.loss.zero_inflated_lognormal import zero_inflated_lognormal_loss  # NOQA
-
-from easy_rec.python.loss.f1_reweight_loss import f1_reweight_sigmoid_cross_entropy  # NOQA
+from easy_rec.python.loss.f1_reweight_loss import (  # NOQA
+    f1_reweight_sigmoid_cross_entropy,)
+from easy_rec.python.loss.listwise_loss import (  # NOQA
+    listwise_distill_loss, listwise_rank_loss,
+)
+from easy_rec.python.loss.pairwise_loss import (  # NOQA
+    pairwise_focal_loss, pairwise_hinge_loss, pairwise_logistic_loss,
+    pairwise_loss,
+)
+from easy_rec.python.loss.zero_inflated_lognormal import (  # NOQA
+    zero_inflated_lognormal_loss,)
 
 if tf.__version__ >= '2.0':
   tf = tf.compat.v1
@@ -36,8 +38,10 @@ def build(loss_type,
       return tf.losses.sigmoid_cross_entropy(
           label, logits=pred, weights=loss_weight, **kwargs)
     else:
-      assert label.dtype in [tf.int32, tf.int64], \
-          'label.dtype must in [tf.int32, tf.int64] when use sparse_softmax_cross_entropy.'
+      assert label.dtype in [
+          tf.int32,
+          tf.int64,
+      ], 'label.dtype must in [tf.int32, tf.int64] when use sparse_softmax_cross_entropy.'
       return tf.losses.sparse_softmax_cross_entropy(
           labels=label, logits=pred, weights=loss_weight, **kwargs)
   elif loss_type == LossType.CROSS_ENTROPY_LOSS:
@@ -50,7 +54,23 @@ def build(loss_type,
     return tf.losses.mean_squared_error(
         labels=label, predictions=pred, weights=loss_weight, **kwargs)
   elif loss_type == LossType.ZILN_LOSS:
-    loss = zero_inflated_lognormal_loss(label, pred)
+    if loss_param is None:
+      loss = zero_inflated_lognormal_loss(label, pred)
+    else:
+      mu_reg = loss_param.mu_regularization
+      sigma_reg = loss_param.sigma_regularization
+      max_sigma = loss_param.max_sigma
+      class_weight = loss_param.classification_weight
+      reg_weight = loss_param.regression_weight
+      loss = zero_inflated_lognormal_loss(
+          label,
+          pred,
+          max_sigma=max_sigma,
+          mu_reg=mu_reg,
+          sigma_reg=sigma_reg,
+          class_weight=class_weight,
+          reg_weight=reg_weight,
+      )
     if np.isscalar(loss_weight) and loss_weight != 1.0:
       return loss * loss_weight
     return loss
@@ -219,9 +239,9 @@ def build_kd_loss(kds, prediction_dict, label_dict, feature_dict):
   """
   loss_dict = {}
   for kd in kds:
-    assert kd.pred_name in prediction_dict, \
-        'invalid predict_name: %s available ones: %s' % (
-            kd.pred_name, ','.join(prediction_dict.keys()))
+    assert kd.pred_name in prediction_dict, 'invalid predict_name: %s available ones: %s' % (
+        kd.pred_name,
+        ','.join(prediction_dict.keys()))
 
     loss_name = kd.loss_name
     if not loss_name:
@@ -232,8 +252,10 @@ def build_kd_loss(kds, prediction_dict, label_dict, feature_dict):
     if kd.HasField('task_space_indicator_name') and kd.HasField(
         'task_space_indicator_value'):
       in_task_space = tf.to_float(
-          tf.equal(feature_dict[kd.task_space_indicator_name],
-                   kd.task_space_indicator_value))
+          tf.equal(
+              feature_dict[kd.task_space_indicator_name],
+              kd.task_space_indicator_value,
+          ))
       loss_weight = loss_weight * (
           kd.in_task_space_weight * in_task_space + kd.out_task_space_weight *
           (1 - in_task_space))
 
@@ -464,16 +464,20 @@ def _get_grad_norm(grads_and_vars, embedding_parallel=False):
       sparse_norms.append(gen_nn_ops.l2_loss(grad.values))
     else:
       dense_norms.append(gen_nn_ops.l2_loss(grad))
-  reduced_norms = hvd.grouped_allreduce(
-      part_norms, op=hvd.Sum, compression=hvd.compression.NoneCompressor)
-  sparse_norms = sparse_norms + reduced_norms
-  all_norms = reduced_norms + dense_norms
-  sparse_norm = math_ops.sqrt(
-      math_ops.reduce_sum(array_ops.stack(sparse_norms) * 2.0))
-  dense_norm = math_ops.sqrt(
-      math_ops.reduce_sum(array_ops.stack(dense_norms) * 2.0))
+  if hvd is not None and part_norms:
+    reduced_norms = hvd.grouped_allreduce(
+        part_norms, op=hvd.Sum, compression=hvd.compression.NoneCompressor)
+    sparse_norms = sparse_norms + reduced_norms
+  all_norms = sparse_norms + dense_norms
+  sparse_norm = (
+      math_ops.sqrt(math_ops.reduce_sum(array_ops.stack(sparse_norms) * 2.0))
+      if sparse_norms else tf.constant(0.0))
+  dense_norm = (
+      math_ops.sqrt(math_ops.reduce_sum(array_ops.stack(dense_norms) * 2.0))
+      if dense_norms else tf.constant(0.0))
   grad_norm = math_ops.sqrt(
-      math_ops.reduce_sum(array_ops.stack(all_norms)) * 2.0)
+      math_ops.reduce_sum(array_ops.stack(all_norms)) *
+      2.0) if all_norms else tf.constant(0.0)
   return sparse_norm, dense_norm, grad_norm
 
 
 
@@ -329,6 +329,7 @@ def __call__(self, is_training, *args, **kwargs):
     if not is_training:
       self._model_config.hidden_dropout_prob = 0.0
       self._model_config.attention_probs_dropout_prob = 0.0
+      self._model_config.text_seq_emb_dropout_prob = 0.0
 
     # shape: [batch_size, image_num/image_dim, hidden_size]
     img_attention_fea = self.image_self_attention_tower()
Original file line number	Diff line number	Diff line change
`@@ -130,6 +130,8 @@ RawFeature：连续值特征`
`130`	`130`	`-DossHost=oss-cn-beijing-internal.aliyuncs.com`
`131`	`131`	`-Dwith_evaluator=1;`
`132`	`132`
	`133`	+放置在`DEEP`类型的`feature_group`中时，按照下面的方式配置：
	`134`	`+`
`133`	`135`	`.. code:: protobuf`
`134`	`136`
`135`	`137`	`feature_config:{`
`@@ -151,6 +153,8 @@ RawFeature：连续值特征`
`151`	`153`	`}`
`152`	`154`	`}`
`153`	`155`
	`156`	+备注：放置在`WIDE`类型的`feature_group`中时，必须要配置`embedding_dim: 1`。
	`157`	`+`
`154`	`158`	分箱组件使用方法见： `机器学习组件 <https://help.aliyun.com/document_detail/54352.html>`_
`155`	`159`	`也可以手动导入分箱信息。如下：`
`156`	`160`