跳转至

Rossler System

AI Studio快速体验

# linux
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_training.hdf5 -P ./datasets/
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_valid.hdf5 -P ./datasets/
# windows
# curl https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_training.hdf5 --create-dirs -o ./datasets/rossler_training.hdf5
# curl https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_valid.hdf5 --create-dirs -o ./datasets/rossler_valid.hdf5
python train_enn.py
python train_transformer.py
# linux
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_training.hdf5 -P ./datasets/
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_valid.hdf5 -P ./datasets/
# windows
# curl https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_training.hdf5 --create-dirs -o ./datasets/rossler_training.hdf5
# curl https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_valid.hdf5 --create-dirs -o ./datasets/rossler_valid.hdf5
python train_enn.py mode=eval EVAL.pretrained_model_path=https://paddle-org.bj.bcebos.com/paddlescience/models/rossler/rossler_pretrained.pdparams
python train_transformer.py mode=eval EVAL.pretrained_model_path=https://paddle-org.bj.bcebos.com/paddlescience/models/rossler/rossler_transformer_pretrained.pdparams EMBEDDING_MODEL_PATH=https://paddle-org.bj.bcebos.com/paddlescience/models/rossler/rossler_pretrained.pdparams
python train_transformer.py mode=export EMBEDDING_MODEL_PATH=https://paddle-org.bj.bcebos.com/paddlescience/models/rossler/rossler_pretrained.pdparams
# linux
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_training.hdf5 -P ./datasets/
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_valid.hdf5 -P ./datasets/
# windows
# curl https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_training.hdf5 --create-dirs -o ./datasets/rossler_training.hdf5
# curl https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_valid.hdf5 --create-dirs -o ./datasets/rossler_valid.hdf5
python train_transformer.py mode=infer
模型 MSE
rossler_transformer_pretrained.pdparams 0.022

1. 背景简介

Rossler System,最早由德国科学家 Rossler 提出,也是常见的混沌系统。该系统在混沌理论的研究中具有重要地位,为混沌现象提供了一种数学描述和理解方法。同时由于该系统对数值扰动极为敏感,因此也是是评估机器学习(深度学习)模型准确性的良好基准。

2. 问题定义

Rossler 系统的状态方程:

\[ \begin{cases} \dfrac{\partial x}{\partial t} = -\omega y - z, & \\ \dfrac{\partial y}{\partial t} = \omega x + \alpha y, & \\ \dfrac{\partial z}{\partial t} = \beta + z(x - \gamma) \end{cases} \]

当参数取以下值时,系统表现出经典的混沌特性:

\[\omega = 1.0, \alpha = 0.165, \beta = 0.2, \gamma = 10\]

在这个案例中,要求给定初始时刻点的坐标,预测未来一段时间内点的运动轨迹。

3. 问题求解

接下来开始讲解如何基于 PaddleScience 代码,用深度学习的方法求解该问题。本案例基于论文 Transformers for Modeling Physical Systems 方法进行求解,关于该方法的理论部分请参考此文档原论文。接下来首先会对使用的数据集进行介绍,然后对该方法两个训练步骤(Embedding 模型训练、Transformer 模型训练)的监督约束构建、模型构建等进行阐述,而其余细节请参考 API文档

3.1 数据集介绍

数据集采用了 Transformer-Physx 中提供的数据。该数据集使用龙格-库塔(Runge-Kutta)传统数值求解方法得到,数据集的划分如下:

数据集 时间序列的数量 时间步的数量 下载地址
训练集 256 1025 rossler_training.hdf5
验证集 32 1025 rossler_valid.hdf5

数据集官网为:https://zenodo.org/record/5148524#.ZDe77-xByrc

3.2 Embedding 模型

首先展示代码中定义的各个参数变量,每个参数的具体含义会在下面使用到时进行解释。

examples/rossler/conf/enn.yaml
# general settings
mode: train # running mode: train/eval
seed: 6
output_dir: ${hydra:run.dir}
TRAIN_BLOCK_SIZE: 16
VALID_BLOCK_SIZE: 32
TRAIN_FILE_PATH: ./datasets/rossler_training.hdf5
VALID_FILE_PATH: ./datasets/rossler_valid.hdf5

# model settings
MODEL:
  input_keys: ["states"]
  output_keys: ["pred_states", "recover_states"]

3.2.1 约束构建

本案例基于数据驱动的方法求解问题,因此需要使用 PaddleScience 内置的 SupervisedConstraint 构建监督约束。在定义约束之前,需要首先指定监督约束中用于数据加载的各个参数,代码如下:

examples/rossler/train_enn.py
train_dataloader_cfg = {
    "dataset": {
        "name": "RosslerDataset",
        "file_path": cfg.TRAIN_FILE_PATH,
        "input_keys": cfg.MODEL.input_keys,
        "label_keys": cfg.MODEL.output_keys,
        "block_size": cfg.TRAIN_BLOCK_SIZE,
        "stride": 16,
        "weight_dict": {
            key: value for key, value in zip(cfg.MODEL.output_keys, weights)
        },
    },
    "sampler": {
        "name": "BatchSampler",
        "drop_last": True,
        "shuffle": True,
    },
    "batch_size": cfg.TRAIN.batch_size,
    "num_workers": 4,
}

其中,"dataset" 字段定义了使用的 Dataset 类名为 RosslerDataset,另外还指定了该类初始化时参数的取值:

  1. file_path:代表训练数据集的文件路径,指定为变量 train_file_path 的值;
  2. input_keys:代表模型输入数据的变量名称,此处填入变量 input_keys
  3. label_keys:代表真实标签的变量名称,此处填入变量 output_keys
  4. block_size:代表使用多长的时间步进行训练,指定为变量 train_block_size 的值;
  5. stride:代表连续的两个训练样本之间的时间步间隔,指定为16;
  6. weight_dict:代表模型输出各个变量与真实标签损失函数的权重,此处使用 output_keysweights 生成。

"sampler" 字段定义了使用的 Sampler 类名为 BatchSampler,另外还指定了该类初始化时参数 drop_lastshuffle 均为 True

train_dataloader_cfg 还定义了 batch_sizenum_workers 的值。

定义监督约束的代码如下:

examples/rossler/train_enn.py
sup_constraint = ppsci.constraint.SupervisedConstraint(
    train_dataloader_cfg,
    ppsci.loss.MSELossWithL2Decay(
        regularization_dict={regularization_key: 1e-1 * (cfg.TRAIN_BLOCK_SIZE - 1)}
    ),
    {
        key: lambda out, k=key: out[k]
        for key in cfg.MODEL.output_keys + (regularization_key,)
    },
    name="Sup",
)

SupervisedConstraint 的第一个参数是数据的加载方式,这里使用上文中定义的 train_dataloader_cfg

第二个参数是损失函数的定义,这里使用带有 L2Decay 的 MSELoss,类名为 MSELossWithL2Decayregularization_dict 设置了正则化的变量名称和对应的权重;

第三个参数表示在训练时如何计算需要被约束的中间变量,此处我们约束的变量就是网络的输出;

第四个参数是约束条件的名字,方便后续对其索引。此处命名为 "Sup"。

3.2.2 模型构建

在该案例中,Embedding 模型的输入输出都是物理空间中点的位置坐标 \((x, y, z)\) ,使用了全连接层实现 Embedding 模型,如下图所示。

rossler_embedding

Embedding 网络模型

用 PaddleScience 代码表示如下:

examples/rossler/train_enn.py
# manually init model
data_mean, data_std = get_mean_std(sup_constraint.data_loader.dataset.data)
model = ppsci.arch.RosslerEmbedding(
    cfg.MODEL.input_keys,
    cfg.MODEL.output_keys + (regularization_key,),
    data_mean,
    data_std,
)

其中,RosslerEmbedding 的前两个参数在前文中已有描述,这里不再赘述,网络模型的第三、四个参数是训练数据集的均值和方差,用于归一化输入数据。计算均值、方差的的代码表示如下:

examples/rossler/train_enn.py
def get_mean_std(data: np.ndarray):
    mean = np.asarray(
        [np.mean(data[:, :, 0]), np.mean(data[:, :, 1]), np.min(data[:, :, 2])]
    ).reshape(1, 3)
    std = np.asarray(
        [
            np.std(data[:, :, 0]),
            np.std(data[:, :, 1]),
            np.max(data[:, :, 2]) - np.min(data[:, :, 2]),
        ]
    ).reshape(1, 3)
    return mean, std

3.2.3 学习率与优化器构建

本案例中使用的学习率方法为 ExponentialDecay ,学习率大小设置为0.001。优化器使用 Adam,梯度裁剪使用了 Paddle 内置的 ClipGradByGlobalNorm 方法。用 PaddleScience 代码表示如下

examples/rossler/train_enn.py
# init optimizer and lr scheduler
clip = paddle.nn.ClipGradByGlobalNorm(clip_norm=0.1)
lr_scheduler = ppsci.optimizer.lr_scheduler.ExponentialDecay(
    iters_per_epoch=ITERS_PER_EPOCH,
    decay_steps=ITERS_PER_EPOCH,
    **cfg.TRAIN.lr_scheduler,
)()
optimizer = ppsci.optimizer.Adam(
    lr_scheduler, grad_clip=clip, **cfg.TRAIN.optimizer
)(model)

3.2.4 评估器构建

本案例训练过程中会按照一定的训练轮数间隔,使用验证集评估当前模型的训练情况,需要使用 SupervisedValidator 构建评估器。代码如下:

examples/rossler/train_enn.py
eval_dataloader_cfg = {
    "dataset": {
        "name": "RosslerDataset",
        "file_path": cfg.VALID_FILE_PATH,
        "input_keys": cfg.MODEL.input_keys,
        "label_keys": cfg.MODEL.output_keys,
        "block_size": cfg.VALID_BLOCK_SIZE,
        "stride": 32,
        "weight_dict": {
            key: value for key, value in zip(cfg.MODEL.output_keys, weights)
        },
    },
    "sampler": {
        "name": "BatchSampler",
        "drop_last": False,
        "shuffle": False,
    },
    "batch_size": cfg.EVAL.batch_size,
    "num_workers": 4,
}

mse_validator = ppsci.validate.SupervisedValidator(
    eval_dataloader_cfg,
    ppsci.loss.MSELoss(),
    metric={"MSE": ppsci.metric.MSE()},
    name="MSE_Validator",
)
validator = {mse_validator.name: mse_validator}

SupervisedValidator 评估器与 SupervisedConstraint 比较相似,不同的是评估器需要设置评价指标 metric,在这里使用 ppsci.metric.MSE

3.2.5 模型训练与评估

完成上述设置之后,只需要将上述实例化的对象按顺序传递给 ppsci.solver.Solver,然后启动训练、评估。

examples/rossler/train_enn.py
solver = ppsci.solver.Solver(
    model,
    constraint,
    cfg.output_dir,
    optimizer,
    lr_scheduler,
    cfg.TRAIN.epochs,
    ITERS_PER_EPOCH,
    eval_during_train=True,
    validator=validator,
)
# train model
solver.train()
# evaluate after finished training
solver.eval()

3.3 Transformer 模型

上文介绍了如何构建 Embedding 模型的训练、评估,在本节中将介绍如何使用训练好的 Embedding 模型训练 Transformer 模型。因为训练 Transformer 模型的步骤与训练 Embedding 模型的步骤基本相似,因此本节在两者的重复部分的各个参数不再详细介绍。首先将代码中定义的各个参数变量展示如下,每个参数的具体含义会在下面使用到时进行解释。

examples/rossler/conf/transformer.yaml
# general settings
mode: train # running mode: train/eval
seed: 42
output_dir: ${hydra:run.dir}
log_freq: 20
TRAIN_BLOCK_SIZE: 32
VALID_BLOCK_SIZE: 256
TRAIN_FILE_PATH: ./datasets/rossler_training.hdf5
VALID_FILE_PATH: ./datasets/rossler_valid.hdf5

# set working condition
EMBEDDING_MODEL_PATH: ./outputs_rossler_enn/checkpoints/latest

3.3.1 约束构建

Transformer 模型同样基于数据驱动的方法求解问题,因此需要使用 PaddleScience 内置的 SupervisedConstraint 构建监督约束。在定义约束之前,需要首先指定监督约束中用于数据加载的各个参数,代码如下:

examples/rossler/train_transformer.py
# manually build constraint(s)
train_dataloader_cfg = {
    "dataset": {
        "name": "RosslerDataset",
        "file_path": cfg.TRAIN_FILE_PATH,
        "input_keys": cfg.MODEL.input_keys,
        "label_keys": cfg.MODEL.output_keys,
        "block_size": cfg.TRAIN_BLOCK_SIZE,
        "stride": 16,
        "embedding_model": embedding_model,
    },
    "sampler": {
        "name": "BatchSampler",
        "drop_last": True,
        "shuffle": True,
    },
    "batch_size": cfg.TRAIN.batch_size,
    "num_workers": 4,
}

数据加载的各个参数与 Embedding 模型中的基本一致,不再赘述。需要说明的是由于 Transformer 模型训练的输入数据是 Embedding 模型 Encoder 模块的输出数据,因此我们将训练好的 Embedding 模型作为 RosslerDataset 的一个参数,在初始化时首先将训练数据映射到编码空间。

定义监督约束的代码如下:

examples/rossler/train_transformer.py
sup_constraint = ppsci.constraint.SupervisedConstraint(
    train_dataloader_cfg,
    ppsci.loss.MSELoss(),
    name="Sup",
)
constraint = {sup_constraint.name: sup_constraint}

3.3.2 模型构建

在该案例中,Transformer 模型的输入输出都是编码空间中的向量,使用的 Transformer 结构如下:

rossler_transformer

Transformer 网络模型

用 PaddleScience 代码表示如下:

examples/rossler/train_transformer.py
model = ppsci.arch.PhysformerGPT2(**cfg.MODEL)

PhysformerGPT2 除了需要填入 input_keysoutput_keys 外,还需要设置 Transformer 模型的层数 num_layers、上下文的大小 num_ctx、输入的 Embedding 向量的长度 embed_size、多头注意力机制的参数 num_heads,在这里填入的数值为4、64、32、4。

3.3.3 学习率与优化器构建

本案例中使用的学习率方法为 CosineWarmRestarts,学习率大小设置为0.001。优化器使用 Adam,梯度裁剪使用了 Paddle 内置的 ClipGradByGlobalNorm 方法。用 PaddleScience 代码表示如下:

examples/rossler/train_transformer.py
# init optimizer and lr scheduler
clip = paddle.nn.ClipGradByGlobalNorm(clip_norm=0.1)
lr_scheduler = ppsci.optimizer.lr_scheduler.CosineWarmRestarts(
    iters_per_epoch=ITERS_PER_EPOCH, **cfg.TRAIN.lr_scheduler
)()
optimizer = ppsci.optimizer.Adam(
    lr_scheduler, grad_clip=clip, **cfg.TRAIN.optimizer
)(model)

3.3.4 评估器构建

训练过程中会按照一定的训练轮数间隔,使用验证集评估当前模型的训练情况,需要使用 SupervisedValidator 构建评估器。用 PaddleScience 代码表示如下:

examples/rossler/train_transformer.py
eval_dataloader_cfg = {
    "dataset": {
        "name": "RosslerDataset",
        "file_path": cfg.VALID_FILE_PATH,
        "input_keys": cfg.MODEL.input_keys,
        "label_keys": cfg.MODEL.output_keys,
        "block_size": cfg.VALID_BLOCK_SIZE,
        "stride": 1024,
        "embedding_model": embedding_model,
    },
    "sampler": {
        "name": "BatchSampler",
        "drop_last": False,
        "shuffle": False,
    },
    "batch_size": cfg.EVAL.batch_size,
    "num_workers": 4,
}

mse_validator = ppsci.validate.SupervisedValidator(
    eval_dataloader_cfg,
    ppsci.loss.MSELoss(),
    metric={"MSE": ppsci.metric.MSE()},
    name="MSE_Validator",
)
validator = {mse_validator.name: mse_validator}

3.3.5 可视化器构建

本案例中可以通过构建可视化器在模型评估时将评估结果可视化出来,由于 Transformer 模型的输出数据是预测的编码空间的数据无法直接进行可视化,因此需要额外将输出数据使用 Embedding 网络的 Decoder 模块变换到物理状态空间。

在本文中首先定义了对 Transformer 模型输出数据变换到物理状态空间的代码:

examples/rossler/train_transformer.py
def build_embedding_model(embedding_model_path: str) -> ppsci.arch.RosslerEmbedding:
    input_keys = ("states",)
    output_keys = ("pred_states", "recover_states")
    regularization_key = "k_matrix"
    model = ppsci.arch.RosslerEmbedding(input_keys, output_keys + (regularization_key,))
    save_load.load_pretrain(model, embedding_model_path)
    return model


class OutputTransform(object):
    def __init__(self, model: base.Arch):
        self.model = model
        self.model.eval()

    def __call__(self, x: Dict[str, paddle.Tensor]):
        pred_embeds = x["pred_embeds"]
        pred_states = self.model.decoder(pred_embeds)

        return pred_states
examples/rossler/train_transformer.py
# manually build constraint(s)

可以看到,程序首先载入了训练好的 Embedding 模型,然后在 OutputTransform__call__ 函数内实现了编码向量到物理状态空间的变换。

在定义好了以上代码之后,就可以实现可视化器代码的构建了:

examples/rossler/train_transformer.py
# set visualizer(optional)
states = mse_validator.data_loader.dataset.data
embedding_data = mse_validator.data_loader.dataset.embedding_data
vis_data = {
    "embeds": embedding_data[: cfg.VIS_DATA_NUMS, :-1, :],
    "states": states[: cfg.VIS_DATA_NUMS, 1:, :],
}

visualizer = {
    "visualize_states": ppsci.visualize.VisualizerScatter3D(
        vis_data,
        {
            "pred_states": lambda d: output_transform(d),
            "states": lambda d: d["states"],
        },
        num_timestamps=1,
        prefix="result_states",
    )
}

首先使用上文中的 mse_validator 中的数据集进行可视化,另外还引入了 vis_data_nums 变量用于控制需要可视化样本的数量。最后通过 VisualizerScatter3D 构建可视化器。

3.3.6 模型训练、评估与可视化

完成上述设置之后,只需要将上述实例化的对象按顺序传递给 ppsci.solver.Solver,然后启动训练、评估。

examples/rossler/train_transformer.py
solver = ppsci.solver.Solver(
    model,
    constraint,
    cfg.output_dir,
    optimizer,
    lr_scheduler,
    cfg.TRAIN.epochs,
    ITERS_PER_EPOCH,
    eval_during_train=cfg.TRAIN.eval_during_train,
    eval_freq=cfg.TRAIN.eval_freq,
    validator=validator,
    visualizer=visualizer,
)
# train model
solver.train()
# evaluate after finished training
solver.eval()
# visualize prediction after finished training
solver.visualize()

4. 完整代码

rossler/train_enn.py
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Two-stage training
# 1. Train a embedding model by running train_enn.py.
# 2. Load pretrained embedding model and freeze it, then train a transformer model by running train_transformer.py.

# This file is for step1: training a embedding model.
# This file is based on PaddleScience/ppsci API.
from os import path as osp

import hydra
import numpy as np
import paddle
from omegaconf import DictConfig

import ppsci
from ppsci.utils import logger


def get_mean_std(data: np.ndarray):
    mean = np.asarray(
        [np.mean(data[:, :, 0]), np.mean(data[:, :, 1]), np.min(data[:, :, 2])]
    ).reshape(1, 3)
    std = np.asarray(
        [
            np.std(data[:, :, 0]),
            np.std(data[:, :, 1]),
            np.max(data[:, :, 2]) - np.min(data[:, :, 2]),
        ]
    ).reshape(1, 3)
    return mean, std


def train(cfg: DictConfig):
    # set random seed for reproducibility
    ppsci.utils.misc.set_random_seed(cfg.seed)
    # initialize logger
    logger.init_logger("ppsci", osp.join(cfg.output_dir, f"{cfg.mode}.log"), "info")

    weights = (1.0 * (cfg.TRAIN_BLOCK_SIZE - 1), 1.0e3 * cfg.TRAIN_BLOCK_SIZE)
    regularization_key = "k_matrix"
    # manually build constraint(s)
    train_dataloader_cfg = {
        "dataset": {
            "name": "RosslerDataset",
            "file_path": cfg.TRAIN_FILE_PATH,
            "input_keys": cfg.MODEL.input_keys,
            "label_keys": cfg.MODEL.output_keys,
            "block_size": cfg.TRAIN_BLOCK_SIZE,
            "stride": 16,
            "weight_dict": {
                key: value for key, value in zip(cfg.MODEL.output_keys, weights)
            },
        },
        "sampler": {
            "name": "BatchSampler",
            "drop_last": True,
            "shuffle": True,
        },
        "batch_size": cfg.TRAIN.batch_size,
        "num_workers": 4,
    }

    sup_constraint = ppsci.constraint.SupervisedConstraint(
        train_dataloader_cfg,
        ppsci.loss.MSELossWithL2Decay(
            regularization_dict={regularization_key: 1e-1 * (cfg.TRAIN_BLOCK_SIZE - 1)}
        ),
        {
            key: lambda out, k=key: out[k]
            for key in cfg.MODEL.output_keys + (regularization_key,)
        },
        name="Sup",
    )
    constraint = {sup_constraint.name: sup_constraint}

    # set iters_per_epoch by dataloader length
    ITERS_PER_EPOCH = len(sup_constraint.data_loader)

    # manually init model
    data_mean, data_std = get_mean_std(sup_constraint.data_loader.dataset.data)
    model = ppsci.arch.RosslerEmbedding(
        cfg.MODEL.input_keys,
        cfg.MODEL.output_keys + (regularization_key,),
        data_mean,
        data_std,
    )

    # init optimizer and lr scheduler
    clip = paddle.nn.ClipGradByGlobalNorm(clip_norm=0.1)
    lr_scheduler = ppsci.optimizer.lr_scheduler.ExponentialDecay(
        iters_per_epoch=ITERS_PER_EPOCH,
        decay_steps=ITERS_PER_EPOCH,
        **cfg.TRAIN.lr_scheduler,
    )()
    optimizer = ppsci.optimizer.Adam(
        lr_scheduler, grad_clip=clip, **cfg.TRAIN.optimizer
    )(model)

    # manually build validator
    weights = (1.0 * (cfg.VALID_BLOCK_SIZE - 1), 1.0e4 * cfg.VALID_BLOCK_SIZE)
    eval_dataloader_cfg = {
        "dataset": {
            "name": "RosslerDataset",
            "file_path": cfg.VALID_FILE_PATH,
            "input_keys": cfg.MODEL.input_keys,
            "label_keys": cfg.MODEL.output_keys,
            "block_size": cfg.VALID_BLOCK_SIZE,
            "stride": 32,
            "weight_dict": {
                key: value for key, value in zip(cfg.MODEL.output_keys, weights)
            },
        },
        "sampler": {
            "name": "BatchSampler",
            "drop_last": False,
            "shuffle": False,
        },
        "batch_size": cfg.EVAL.batch_size,
        "num_workers": 4,
    }

    mse_validator = ppsci.validate.SupervisedValidator(
        eval_dataloader_cfg,
        ppsci.loss.MSELoss(),
        metric={"MSE": ppsci.metric.MSE()},
        name="MSE_Validator",
    )
    validator = {mse_validator.name: mse_validator}

    solver = ppsci.solver.Solver(
        model,
        constraint,
        cfg.output_dir,
        optimizer,
        lr_scheduler,
        cfg.TRAIN.epochs,
        ITERS_PER_EPOCH,
        eval_during_train=True,
        validator=validator,
    )
    # train model
    solver.train()
    # evaluate after finished training
    solver.eval()


def evaluate(cfg: DictConfig):
    # set random seed for reproducibility
    ppsci.utils.misc.set_random_seed(cfg.seed)
    # initialize logger
    logger.init_logger("ppsci", osp.join(cfg.output_dir, f"{cfg.mode}.log"), "info")

    weights = (1.0 * (cfg.TRAIN_BLOCK_SIZE - 1), 1.0e3 * cfg.TRAIN_BLOCK_SIZE)
    regularization_key = "k_matrix"
    # manually build constraint(s)
    train_dataloader_cfg = {
        "dataset": {
            "name": "RosslerDataset",
            "file_path": cfg.TRAIN_FILE_PATH,
            "input_keys": cfg.MODEL.input_keys,
            "label_keys": cfg.MODEL.output_keys,
            "block_size": cfg.TRAIN_BLOCK_SIZE,
            "stride": 16,
            "weight_dict": {
                key: value for key, value in zip(cfg.MODEL.output_keys, weights)
            },
        },
        "sampler": {
            "name": "BatchSampler",
            "drop_last": True,
            "shuffle": True,
        },
        "batch_size": cfg.TRAIN.batch_size,
        "num_workers": 4,
    }

    sup_constraint = ppsci.constraint.SupervisedConstraint(
        train_dataloader_cfg,
        ppsci.loss.MSELossWithL2Decay(
            regularization_dict={regularization_key: 1e-1 * (cfg.TRAIN_BLOCK_SIZE - 1)}
        ),
        {
            key: lambda out, k=key: out[k]
            for key in cfg.MODEL.output_keys + (regularization_key,)
        },
        name="Sup",
    )

    # manually init model
    data_mean, data_std = get_mean_std(sup_constraint.data_loader.dataset.data)
    model = ppsci.arch.RosslerEmbedding(
        cfg.MODEL.input_keys,
        cfg.MODEL.output_keys + (regularization_key,),
        data_mean,
        data_std,
    )

    # manually build validator
    weights = (1.0 * (cfg.VALID_BLOCK_SIZE - 1), 1.0e4 * cfg.VALID_BLOCK_SIZE)
    eval_dataloader_cfg = {
        "dataset": {
            "name": "RosslerDataset",
            "file_path": cfg.VALID_FILE_PATH,
            "input_keys": cfg.MODEL.input_keys,
            "label_keys": cfg.MODEL.output_keys,
            "block_size": cfg.VALID_BLOCK_SIZE,
            "stride": 32,
            "weight_dict": {
                key: value for key, value in zip(cfg.MODEL.output_keys, weights)
            },
        },
        "sampler": {
            "name": "BatchSampler",
            "drop_last": False,
            "shuffle": False,
        },
        "batch_size": cfg.EVAL.batch_size,
        "num_workers": 4,
    }

    mse_validator = ppsci.validate.SupervisedValidator(
        eval_dataloader_cfg,
        ppsci.loss.MSELoss(),
        metric={"MSE": ppsci.metric.MSE()},
        name="MSE_Validator",
    )
    validator = {mse_validator.name: mse_validator}
    solver = ppsci.solver.Solver(
        model,
        output_dir=cfg.output_dir,
        validator=validator,
        pretrained_model_path=cfg.EVAL.pretrained_model_path,
    )
    solver.eval()


@hydra.main(version_base=None, config_path="./conf", config_name="enn.yaml")
def main(cfg: DictConfig):
    if cfg.mode == "train":
        train(cfg)
    elif cfg.mode == "eval":
        evaluate(cfg)
    else:
        raise ValueError(f"cfg.mode should in ['train', 'eval'], but got '{cfg.mode}'")


if __name__ == "__main__":
    main()
rossler/train_transformer.py
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Two-stage training
# 1. Train a embedding model by running train_enn.py.
# 2. Load pretrained embedding model and freeze it, then train a transformer model by running train_transformer.py.

# This file is for step2: training a transformer model, based on frozen pretrained embedding model.
# This file is based on PaddleScience/ppsci API.
from os import path as osp
from typing import Dict

import hydra
import paddle
from omegaconf import DictConfig

import ppsci
from ppsci.arch import base
from ppsci.utils import logger
from ppsci.utils import save_load


def build_embedding_model(embedding_model_path: str) -> ppsci.arch.RosslerEmbedding:
    input_keys = ("states",)
    output_keys = ("pred_states", "recover_states")
    regularization_key = "k_matrix"
    model = ppsci.arch.RosslerEmbedding(input_keys, output_keys + (regularization_key,))
    save_load.load_pretrain(model, embedding_model_path)
    return model


class OutputTransform(object):
    def __init__(self, model: base.Arch):
        self.model = model
        self.model.eval()

    def __call__(self, x: Dict[str, paddle.Tensor]):
        pred_embeds = x["pred_embeds"]
        pred_states = self.model.decoder(pred_embeds)

        return pred_states


def train(cfg: DictConfig):
    # set random seed for reproducibility
    ppsci.utils.misc.set_random_seed(cfg.seed)
    # initialize logger
    logger.init_logger("ppsci", osp.join(cfg.output_dir, f"{cfg.mode}.log"), "info")

    embedding_model = build_embedding_model(cfg.EMBEDDING_MODEL_PATH)
    output_transform = OutputTransform(embedding_model)

    # manually build constraint(s)
    train_dataloader_cfg = {
        "dataset": {
            "name": "RosslerDataset",
            "file_path": cfg.TRAIN_FILE_PATH,
            "input_keys": cfg.MODEL.input_keys,
            "label_keys": cfg.MODEL.output_keys,
            "block_size": cfg.TRAIN_BLOCK_SIZE,
            "stride": 16,
            "embedding_model": embedding_model,
        },
        "sampler": {
            "name": "BatchSampler",
            "drop_last": True,
            "shuffle": True,
        },
        "batch_size": cfg.TRAIN.batch_size,
        "num_workers": 4,
    }

    sup_constraint = ppsci.constraint.SupervisedConstraint(
        train_dataloader_cfg,
        ppsci.loss.MSELoss(),
        name="Sup",
    )
    constraint = {sup_constraint.name: sup_constraint}

    # set iters_per_epoch by dataloader length
    ITERS_PER_EPOCH = len(constraint["Sup"].data_loader)

    # manually init model
    model = ppsci.arch.PhysformerGPT2(**cfg.MODEL)

    # init optimizer and lr scheduler
    clip = paddle.nn.ClipGradByGlobalNorm(clip_norm=0.1)
    lr_scheduler = ppsci.optimizer.lr_scheduler.CosineWarmRestarts(
        iters_per_epoch=ITERS_PER_EPOCH, **cfg.TRAIN.lr_scheduler
    )()
    optimizer = ppsci.optimizer.Adam(
        lr_scheduler, grad_clip=clip, **cfg.TRAIN.optimizer
    )(model)

    # manually build validator
    eval_dataloader_cfg = {
        "dataset": {
            "name": "RosslerDataset",
            "file_path": cfg.VALID_FILE_PATH,
            "input_keys": cfg.MODEL.input_keys,
            "label_keys": cfg.MODEL.output_keys,
            "block_size": cfg.VALID_BLOCK_SIZE,
            "stride": 1024,
            "embedding_model": embedding_model,
        },
        "sampler": {
            "name": "BatchSampler",
            "drop_last": False,
            "shuffle": False,
        },
        "batch_size": cfg.EVAL.batch_size,
        "num_workers": 4,
    }

    mse_validator = ppsci.validate.SupervisedValidator(
        eval_dataloader_cfg,
        ppsci.loss.MSELoss(),
        metric={"MSE": ppsci.metric.MSE()},
        name="MSE_Validator",
    )
    validator = {mse_validator.name: mse_validator}

    # set visualizer(optional)
    states = mse_validator.data_loader.dataset.data
    embedding_data = mse_validator.data_loader.dataset.embedding_data
    vis_data = {
        "embeds": embedding_data[: cfg.VIS_DATA_NUMS, :-1, :],
        "states": states[: cfg.VIS_DATA_NUMS, 1:, :],
    }

    visualizer = {
        "visualize_states": ppsci.visualize.VisualizerScatter3D(
            vis_data,
            {
                "pred_states": lambda d: output_transform(d),
                "states": lambda d: d["states"],
            },
            num_timestamps=1,
            prefix="result_states",
        )
    }

    solver = ppsci.solver.Solver(
        model,
        constraint,
        cfg.output_dir,
        optimizer,
        lr_scheduler,
        cfg.TRAIN.epochs,
        ITERS_PER_EPOCH,
        eval_during_train=cfg.TRAIN.eval_during_train,
        eval_freq=cfg.TRAIN.eval_freq,
        validator=validator,
        visualizer=visualizer,
    )
    # train model
    solver.train()
    # evaluate after finished training
    solver.eval()
    # visualize prediction after finished training
    solver.visualize()


def evaluate(cfg: DictConfig):
    # set random seed for reproducibility
    ppsci.utils.misc.set_random_seed(cfg.seed)
    # initialize logger
    logger.init_logger("ppsci", osp.join(cfg.output_dir, f"{cfg.mode}.log"), "info")

    embedding_model = build_embedding_model(cfg.EMBEDDING_MODEL_PATH)
    output_transform = OutputTransform(embedding_model)

    # manually init model
    model = ppsci.arch.PhysformerGPT2(**cfg.MODEL)

    # manually build validator
    eval_dataloader_cfg = {
        "dataset": {
            "name": "RosslerDataset",
            "file_path": cfg.VALID_FILE_PATH,
            "input_keys": cfg.MODEL.input_keys,
            "label_keys": cfg.MODEL.output_keys,
            "block_size": cfg.VALID_BLOCK_SIZE,
            "stride": 1024,
            "embedding_model": embedding_model,
        },
        "sampler": {
            "name": "BatchSampler",
            "drop_last": False,
            "shuffle": False,
        },
        "batch_size": cfg.EVAL.batch_size,
        "num_workers": 4,
    }

    mse_validator = ppsci.validate.SupervisedValidator(
        eval_dataloader_cfg,
        ppsci.loss.MSELoss(),
        metric={"MSE": ppsci.metric.MSE()},
        name="MSE_Validator",
    )
    validator = {mse_validator.name: mse_validator}

    # set visualizer(optional)
    states = mse_validator.data_loader.dataset.data
    embedding_data = mse_validator.data_loader.dataset.embedding_data
    vis_datas = {
        "embeds": embedding_data[: cfg.VIS_DATA_NUMS, :-1, :],
        "states": states[: cfg.VIS_DATA_NUMS, 1:, :],
    }

    visualizer = {
        "visulzie_states": ppsci.visualize.VisualizerScatter3D(
            vis_datas,
            {
                "pred_states": lambda d: output_transform(d),
                "states": lambda d: d["states"],
            },
            num_timestamps=1,
            prefix="result_states",
        )
    }

    solver = ppsci.solver.Solver(
        model,
        output_dir=cfg.output_dir,
        validator=validator,
        visualizer=visualizer,
        pretrained_model_path=cfg.EVAL.pretrained_model_path,
    )
    solver.eval()
    # visualize prediction for pretrained model(optional)
    solver.visualize()


def export(cfg: DictConfig):
    # set model
    embedding_model = build_embedding_model(cfg.EMBEDDING_MODEL_PATH)
    model_cfg = {
        **cfg.MODEL,
        "embedding_model": embedding_model,
        "input_keys": ["states"],
        "output_keys": ["pred_states"],
    }
    model = ppsci.arch.PhysformerGPT2(**model_cfg)

    # initialize solver
    solver = ppsci.solver.Solver(
        model,
        pretrained_model_path=cfg.INFER.pretrained_model_path,
    )
    # export model
    from paddle.static import InputSpec

    input_spec = [
        {
            key: InputSpec([None, 255, 3], "float32", name=key)
            for key in model.input_keys
        },
    ]

    solver.export(input_spec, cfg.INFER.export_path)


def inference(cfg: DictConfig):
    from deploy.python_infer import pinn_predictor

    predictor = pinn_predictor.PINNPredictor(cfg)

    dataset_cfg = {
        "name": "RosslerDataset",
        "file_path": cfg.VALID_FILE_PATH,
        "input_keys": cfg.MODEL.input_keys,
        "label_keys": cfg.MODEL.output_keys,
        "block_size": cfg.VALID_BLOCK_SIZE,
        "stride": 1024,
    }

    dataset = ppsci.data.dataset.build_dataset(dataset_cfg)

    input_dict = {
        "states": dataset.data[: cfg.VIS_DATA_NUMS, :-1, :],
    }

    output_dict = predictor.predict(input_dict, cfg.INFER.batch_size)

    # mapping data to cfg.INFER.output_keys
    output_keys = ["pred_states"]
    output_dict = {
        store_key: output_dict[infer_key]
        for store_key, infer_key in zip(output_keys, output_dict.keys())
    }

    input_dict = {
        "states": dataset.data[: cfg.VIS_DATA_NUMS, 1:, :],
    }

    data_dict = {**input_dict, **output_dict}
    for i in range(cfg.VIS_DATA_NUMS):
        ppsci.visualize.save_plot_from_3d_dict(
            f"./rossler_transformer_pred_{i}",
            {key: value[i] for key, value in data_dict.items()},
            ("states", "pred_states"),
        )


@hydra.main(version_base=None, config_path="./conf", config_name="transformer.yaml")
def main(cfg: DictConfig):
    if cfg.mode == "train":
        train(cfg)
    elif cfg.mode == "eval":
        evaluate(cfg)
    elif cfg.mode == "export":
        export(cfg)
    elif cfg.mode == "infer":
        inference(cfg)
    else:
        raise ValueError(
            f"cfg.mode should in ['train', 'eval', 'export', 'infer'], but got '{cfg.mode}'"
        )


if __name__ == "__main__":
    main()

5. 结果展示

下图中展示了两个不同初始条件下的模型预测结果和传统数值微分的预测结果。

result_states0

模型预测结果("pred_states")与传统数值微分结果("states")

result_states1

模型预测结果("pred_states")与传统数值微分结果("states")