AI ONES

自己训练一个小模型

自己训练一个小模型

avatarwuxiongwei

目录:

自己训练一个小模型

Chatgpt都出4o为什么还要自己训练一个小模型呢?

  • 如果你能自己训练一个小模型,说明你已经掌握Transformer了
  • 有些场景就需要小模型,小而美,就像单片机就是有它的市场
  • 训练小模型不是目的,目的是通过训练小模型跑通大模型训练的基本流程

想要看懂或写出训练小模型的代码,需要学习前置 Transformer模型、机器学习、线性代数等相关知识

看不懂也没关系,直接运行试试效果

上代码

1!pip install numpy requests torch tiktoken matplotlib pandas
1import os 2import requests 3import math 4import tiktoken 5import torch 6import torch.nn as nn 7from torch.nn import functional as F 8 9# Hyperparameters 10batch_size = 4 # How many batches per training step 11context_length = 16 # Length of the token chunk each batch 12d_model = 64 # The size of our model token embeddings 13num_blocks = 8 # Number of transformer blocks 14num_heads = 4 # Number of heads in Multi-head attention 15learning_rate = 1e-3 # 0.001 16dropout = 0.1 # Dropout rate 17max_iters = 5000 # Total of training iterations <- Change this to smaller number for testing 18eval_interval = 50 # How often to evaluate 19eval_iters = 20 # Number of iterations to average for evaluation 20device = 'cuda' if torch.cuda.is_available() else 'cpu' # Use GPU if it's available. 21TORCH_SEED = 1337 22torch.manual_seed(TORCH_SEED) 23 24# Load training data 25if not os.path.exists('data/sales_textbook.txt'): 26 url = 'https://huggingface.co/datasets/goendalf666/sales-textbook_for_convincing_and_selling/raw/main/sales_textbook.txt' 27 with open('data/sales_textbook.txt', 'w') as f: 28 f.write(requests.get(url).text) 29 30with open('data/sales_textbook.txt', 'r', encoding='utf-8') as f: 31 text = f.read() 32 33# Using TikToken (Same as GPT3) to tokenize the source text 34encoding = tiktoken.get_encoding("cl100k_base") 35tokenized_text = encoding.encode(text) 36max_token_value = max(tokenized_text) + 1 # the maximum value of the tokenized numbers 37tokenized_text = torch.tensor(tokenized_text, dtype=torch.long, device=device) # put tokenized text into tensor 38 39# Split train and validation 40split_idx = int(len(tokenized_text) * 0.9) 41train_data = tokenized_text[:split_idx] 42val_data = tokenized_text[split_idx:] 43 44 45# Define Feed Forward Network 46class FeedForward(nn.Module): 47 def __init__(self): 48 super().__init__() 49 self.d_model = d_model 50 self.dropout = dropout 51 self.ffn = nn.Sequential( 52 nn.Linear(in_features=self.d_model, out_features=self.d_model * 4), 53 nn.ReLU(), 54 nn.Linear(in_features=self.d_model * 4, out_features=self.d_model), 55 nn.Dropout(dropout), 56 ) 57 58 def forward(self, x): 59 return self.ffn(x) 60 61 62# Define Scaled Dot Product Attention 63class Attention(nn.Module): 64 def __init__(self, head_size: int): 65 super().__init__() 66 self.d_model = d_model 67 self.head_size = head_size 68 self.context_length = context_length 69 self.dropout = dropout 70 71 self.key_layer = nn.Linear(in_features=self.d_model, out_features=self.head_size, bias=False) 72 self.query_layer = nn.Linear(in_features=self.d_model, out_features=self.head_size, bias=False) 73 self.value_layer = nn.Linear(in_features=self.d_model, out_features=self.head_size, bias=False) 74 self.register_buffer('tril', torch.tril( 75 torch.ones((self.context_length, self.context_length)))) # Lower triangular mask 76 self.dropout_layer = nn.Dropout(self.dropout) 77 78 def forward(self, x): 79 B, T, C = x.shape # Batch size, Time steps(current context_length), Channels(dimensions) 80 assert T <= self.context_length 81 assert C == self.d_model 82 q = self.query_layer(x) 83 k = self.key_layer(x) 84 v = self.value_layer(x) 85 86 # Scaled dot product attention: Q @ K^T / sqrt(d_k) 87 weights = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1))) 88 # Apply masked attention 89 weights = weights.masked_fill(self.tril[:T, :T] == 0, float('-inf')) 90 weights = F.softmax(input=weights, dim=-1) 91 weights = self.dropout_layer(weights) 92 93 # Apply dot product attention: weights @ V 94 out = weights @ v 95 return out 96 97 98class MultiHeadAttention(nn.Module): 99 def __init__(self, head_size: int): 100 super().__init__() 101 self.num_heads = num_heads 102 self.head_size = head_size 103 self.d_model = d_model 104 self.context_length = context_length 105 self.dropout = dropout 106 107 self.heads = nn.ModuleList([Attention(head_size=self.head_size) for _ in range(self.num_heads)]) 108 self.projection_layer = nn.Linear(in_features=self.d_model, out_features=self.d_model) 109 self.dropout_layer = nn.Dropout(dropout) 110 111 def forward(self, x): 112 out = torch.cat([h(x) for h in self.heads], dim=-1) 113 out = self.projection_layer(out) 114 out = self.dropout_layer(out) 115 return out 116 117 118class TransformerBlock(nn.Module): 119 120 def __init__(self, num_heads: int): 121 super().__init__() 122 self.d_model = d_model 123 self.context_length = context_length 124 self.head_size = d_model // num_heads # head size should be divisible by d_model 125 self.num_heads = num_heads 126 self.dropout = dropout 127 128 self.multi_head_attention_layer = MultiHeadAttention(head_size=self.head_size) 129 self.feed_forward_layer = FeedForward() 130 self.layer_norm_1 = nn.LayerNorm(normalized_shape=self.d_model) 131 self.layer_norm_2 = nn.LayerNorm(normalized_shape=self.d_model) 132 133 def forward(self, x): 134 # Note: The order of the operations is different from the original Transformer paper 135 # The order here is: LayerNorm -> Multi-head attention -> LayerNorm -> Feed forward 136 x = x + self.multi_head_attention_layer(self.layer_norm_1(x)) # Residual connection 137 x = x + self.feed_forward_layer(self.layer_norm_2(x)) # Residual connection 138 return x 139 140 141class TransformerLanguageModel(nn.Module): 142 def __init__(self): 143 super().__init__() 144 self.d_model = d_model 145 self.context_length = context_length 146 self.num_heads = num_heads 147 self.num_blocks = num_blocks 148 self.dropout = dropout 149 self.max_token_value = max_token_value 150 # Set up token embedding look-up table 151 self.token_embedding_lookup_table = nn.Embedding(num_embeddings=self.max_token_value + 1, embedding_dim=self.d_model) 152 153 # Run all the transformer blocks 154 # Different from original paper, here we add a final layer norm after all the blocks 155 self.transformer_blocks = nn.Sequential(*( 156 [TransformerBlock(num_heads=self.num_heads) for _ in range(self.num_blocks)] + 157 [nn.LayerNorm(self.d_model)] 158 )) 159 self.language_model_out_linear_layer = nn.Linear(in_features=self.d_model, out_features=self.max_token_value) 160 161 def forward(self, idx, targets=None): 162 B, T = idx.shape 163 """ 164 # Set up position embedding look-up table 165 # following the same approach as the original Transformer paper (Sine and Cosine functions) 166 """ 167 position_encoding_lookup_table = torch.zeros(self.context_length, self.d_model) 168 position = torch.arange(0, self.context_length, dtype=torch.float).unsqueeze(1) 169 div_term = torch.exp(torch.arange(0, self.d_model, 2).float() * (-math.log(10000.0) / self.d_model)) 170 position_encoding_lookup_table[:, 0::2] = torch.sin(position * div_term) 171 position_encoding_lookup_table[:, 1::2] = torch.cos(position * div_term) 172 # change position_encoding_lookup_table from (context_length, d_model) to (T, d_model) 173 position_embedding = position_encoding_lookup_table[:T, :].to(device) 174 x = self.token_embedding_lookup_table(idx) + position_embedding 175 x = self.transformer_blocks(x) 176 # The "logits" are the output values of our model before applying softmax 177 logits = self.language_model_out_linear_layer(x) 178 179 if targets is not None: 180 B, T, C = logits.shape 181 logits_reshaped = logits.view(B * T, C) 182 targets_reshaped = targets.view(B * T) 183 loss = F.cross_entropy(input=logits_reshaped, target=targets_reshaped) 184 else: 185 loss = None 186 return logits, loss 187 188 def generate(self, idx, max_new_tokens): 189 # idx is (B,T) array of indices in the current context 190 for _ in range(max_new_tokens): 191 # Crop idx to the max size of our positional embeddings table 192 idx_crop = idx[:, -self.context_length:] 193 # Get predictions 194 logits, loss = self(idx_crop) 195 # Get the last time step from logits where the dimensions of the logits are (B,T,C) 196 logits_last_timestep = logits[:, -1, :] 197 # Apply softmax to get probabilities 198 probs = F.softmax(input=logits_last_timestep, dim=-1) 199 # Sample from the probabilities' distribution. 200 idx_next = torch.multinomial(input=probs, num_samples=1) 201 # Append the sampled indexes idx_next to idx 202 idx = torch.cat((idx, idx_next), dim=1) 203 return idx 204 205 206# Initialize the model 207model = TransformerLanguageModel() 208model = model.to(device) 209 210 211# Get input embedding batch 212def get_batch(split: str): 213 data = train_data if split == 'train' else val_data 214 idxs = torch.randint(low=0, high=len(data) - context_length, size=(batch_size,)) 215 x = torch.stack([data[idx:idx + context_length] for idx in idxs]).to(device) 216 y = torch.stack([data[idx + 1:idx + context_length + 1] for idx in idxs]).to(device) 217 return x, y 218 219 220# Calculate loss 221@torch.no_grad() 222def estimate_loss(): 223 out = {} 224 model.eval() 225 for split in ['train', 'valid']: 226 losses = torch.zeros(eval_iters) 227 for k in range(eval_iters): 228 x_batch, y_batch = get_batch(split) 229 logits, loss = model(x_batch, y_batch) 230 losses[k] = loss.item() 231 out[split] = losses.mean() 232 model.train() 233 return out 234 235 236# Use AdamW optimizer 237optimizer = torch.optim.AdamW(params=model.parameters(), lr=learning_rate) 238tracked_losses = list() 239for step in range(max_iters): 240 if step % eval_iters == 0 or step == max_iters - 1: 241 losses = estimate_loss() 242 tracked_losses.append(losses) 243 print('Step:', step, 'Training Loss:', round(losses['train'].item(), 3), 'Validation Loss:', 244 round(losses['valid'].item(), 3)) 245 246 xb, yb = get_batch('train') 247 logits, loss = model(xb, yb) 248 optimizer.zero_grad(set_to_none=True) 249 loss.backward() 250 optimizer.step() 251 252# Save the model state dictionary 253torch.save(model.state_dict(), 'model-ckpt.pt') 254 255# Generate 256model.eval() 257start = 'The salesperson' 258start_ids = encoding.encode(start) 259x = (torch.tensor(start_ids, dtype=torch.long, device=device)[None, ...]) 260y = model.generate(x, max_new_tokens=100) 261print('---------------') 262print(encoding.decode(y[0].tolist())) 263print('---------------') 264
1Step: 0 Training Loss: 11.663 Validation Loss: 11.716 2Step: 20 Training Loss: 10.297 Validation Loss: 10.478 3Step: 40 Training Loss: 8.867 Validation Loss: 9.022 4Step: 60 Training Loss: 7.346 Validation Loss: 7.613 5Step: 80 Training Loss: 6.878 Validation Loss: 7.297 6Step: 100 Training Loss: 6.659 Validation Loss: 7.208 7Step: 120 Training Loss: 6.544 Validation Loss: 7.104 8Step: 140 Training Loss: 6.325 Validation Loss: 7.199 9Step: 160 Training Loss: 6.34 Validation Loss: 6.684 10Step: 180 Training Loss: 6.154 Validation Loss: 6.89 11Step: 200 Training Loss: 6.202 Validation Loss: 6.673 12Step: 220 Training Loss: 6.045 Validation Loss: 6.761 13Step: 240 Training Loss: 5.871 Validation Loss: 6.497 14Step: 260 Training Loss: 5.957 Validation Loss: 6.347 15Step: 280 Training Loss: 5.679 Validation Loss: 6.389 16Step: 300 Training Loss: 5.816 Validation Loss: 6.603 17Step: 320 Training Loss: 5.415 Validation Loss: 6.496 18Step: 340 Training Loss: 5.32 Validation Loss: 6.1 19Step: 360 Training Loss: 5.206 Validation Loss: 6.222 20Step: 380 Training Loss: 5.403 Validation Loss: 6.451 21Step: 400 Training Loss: 5.317 Validation Loss: 5.937 22Step: 420 Training Loss: 5.159 Validation Loss: 6.033 23Step: 440 Training Loss: 5.153 Validation Loss: 6.333 24Step: 460 Training Loss: 5.232 Validation Loss: 6.001 25Step: 480 Training Loss: 5.126 Validation Loss: 6.067 26Step: 500 Training Loss: 5.123 Validation Loss: 6.044 27Step: 520 Training Loss: 4.966 Validation Loss: 5.676 28Step: 540 Training Loss: 4.774 Validation Loss: 6.023 29Step: 560 Training Loss: 4.792 Validation Loss: 6.079 30Step: 580 Training Loss: 4.743 Validation Loss: 5.722 31Step: 600 Training Loss: 4.818 Validation Loss: 5.686 32Step: 620 Training Loss: 4.675 Validation Loss: 5.741 33Step: 640 Training Loss: 4.805 Validation Loss: 6.014 34Step: 660 Training Loss: 4.81 Validation Loss: 5.758 35Step: 680 Training Loss: 4.727 Validation Loss: 5.723 36Step: 700 Training Loss: 4.737 Validation Loss: 5.792 37Step: 720 Training Loss: 4.609 Validation Loss: 5.761 38Step: 740 Training Loss: 5.018 Validation Loss: 5.705 39Step: 760 Training Loss: 4.906 Validation Loss: 5.721 40Step: 780 Training Loss: 4.791 Validation Loss: 5.779 41Step: 800 Training Loss: 4.467 Validation Loss: 5.881 42Step: 820 Training Loss: 4.443 Validation Loss: 5.502 43Step: 840 Training Loss: 4.567 Validation Loss: 5.832 44Step: 860 Training Loss: 4.577 Validation Loss: 5.956 45Step: 880 Training Loss: 4.55 Validation Loss: 5.583 46Step: 900 Training Loss: 4.478 Validation Loss: 5.465 47Step: 920 Training Loss: 4.237 Validation Loss: 5.674 48Step: 940 Training Loss: 4.462 Validation Loss: 5.427 49Step: 960 Training Loss: 4.323 Validation Loss: 5.632 50Step: 980 Training Loss: 4.323 Validation Loss: 5.711 51Step: 1000 Training Loss: 4.304 Validation Loss: 5.374 52Step: 1020 Training Loss: 4.295 Validation Loss: 5.597 53Step: 1040 Training Loss: 4.312 Validation Loss: 5.54 54Step: 1060 Training Loss: 4.351 Validation Loss: 5.456 55Step: 1080 Training Loss: 4.128 Validation Loss: 5.524 56Step: 1100 Training Loss: 4.285 Validation Loss: 5.44 57Step: 1120 Training Loss: 4.359 Validation Loss: 5.447 58Step: 1140 Training Loss: 4.276 Validation Loss: 5.527 59Step: 1160 Training Loss: 4.179 Validation Loss: 5.415 60Step: 1180 Training Loss: 4.057 Validation Loss: 5.42 61Step: 1200 Training Loss: 4.238 Validation Loss: 5.296 62Step: 1220 Training Loss: 3.979 Validation Loss: 5.535 63Step: 1240 Training Loss: 4.145 Validation Loss: 5.417 64Step: 1260 Training Loss: 4.093 Validation Loss: 5.34 65Step: 1280 Training Loss: 4.173 Validation Loss: 5.361 66Step: 1300 Training Loss: 3.876 Validation Loss: 5.449 67Step: 1320 Training Loss: 3.941 Validation Loss: 5.343 68Step: 1340 Training Loss: 4.172 Validation Loss: 5.335 69Step: 1360 Training Loss: 3.757 Validation Loss: 5.173 70Step: 1380 Training Loss: 4.106 Validation Loss: 5.207 71Step: 1400 Training Loss: 3.975 Validation Loss: 5.349 72Step: 1420 Training Loss: 4.11 Validation Loss: 5.224 73Step: 1440 Training Loss: 3.915 Validation Loss: 5.341 74Step: 1460 Training Loss: 4.05 Validation Loss: 5.302 75Step: 1480 Training Loss: 3.927 Validation Loss: 5.487 76Step: 1500 Training Loss: 3.952 Validation Loss: 5.191 77Step: 1520 Training Loss: 4.182 Validation Loss: 5.066 78Step: 1540 Training Loss: 3.851 Validation Loss: 5.205 79Step: 1560 Training Loss: 4.062 Validation Loss: 5.039 80Step: 1580 Training Loss: 3.848 Validation Loss: 4.952 81Step: 1600 Training Loss: 3.94 Validation Loss: 5.343 82Step: 1620 Training Loss: 3.78 Validation Loss: 5.243 83Step: 1640 Training Loss: 3.814 Validation Loss: 5.364 84Step: 1660 Training Loss: 3.979 Validation Loss: 5.25 85Step: 1680 Training Loss: 3.717 Validation Loss: 5.067 86Step: 1700 Training Loss: 3.681 Validation Loss: 5.574 87Step: 1720 Training Loss: 3.753 Validation Loss: 5.119 88Step: 1740 Training Loss: 3.584 Validation Loss: 5.335 89Step: 1760 Training Loss: 3.819 Validation Loss: 4.949 90Step: 1780 Training Loss: 3.823 Validation Loss: 4.921 91Step: 1800 Training Loss: 3.795 Validation Loss: 5.031 92Step: 1820 Training Loss: 3.54 Validation Loss: 5.292 93Step: 1840 Training Loss: 4.003 Validation Loss: 4.95 94Step: 1860 Training Loss: 3.759 Validation Loss: 4.86 95Step: 1880 Training Loss: 3.871 Validation Loss: 5.262 96Step: 1900 Training Loss: 3.791 Validation Loss: 4.975 97Step: 1920 Training Loss: 3.768 Validation Loss: 5.329 98Step: 1940 Training Loss: 3.689 Validation Loss: 5.011 99Step: 1960 Training Loss: 3.52 Validation Loss: 4.926 100Step: 1980 Training Loss: 3.648 Validation Loss: 5.128 101Step: 2000 Training Loss: 3.696 Validation Loss: 5.011 102Step: 2020 Training Loss: 3.756 Validation Loss: 5.086 103Step: 2040 Training Loss: 3.835 Validation Loss: 4.961 104Step: 2060 Training Loss: 3.626 Validation Loss: 5.27 105Step: 2080 Training Loss: 3.751 Validation Loss: 5.27 106Step: 2100 Training Loss: 3.856 Validation Loss: 4.967 107Step: 2120 Training Loss: 3.76 Validation Loss: 4.968 108Step: 2140 Training Loss: 3.678 Validation Loss: 4.971 109Step: 2160 Training Loss: 3.759 Validation Loss: 4.821 110Step: 2180 Training Loss: 3.504 Validation Loss: 5.243 111Step: 2200 Training Loss: 3.85 Validation Loss: 5.345 112Step: 2220 Training Loss: 3.74 Validation Loss: 5.287 113Step: 2240 Training Loss: 3.66 Validation Loss: 5.219 114Step: 2260 Training Loss: 3.684 Validation Loss: 5.101 115Step: 2280 Training Loss: 3.523 Validation Loss: 4.998 116Step: 2300 Training Loss: 3.628 Validation Loss: 5.237 117Step: 2320 Training Loss: 3.545 Validation Loss: 5.442 118Step: 2340 Training Loss: 3.428 Validation Loss: 5.192 119Step: 2360 Training Loss: 3.658 Validation Loss: 5.11 120Step: 2380 Training Loss: 3.592 Validation Loss: 5.14 121Step: 2400 Training Loss: 3.573 Validation Loss: 5.069 122Step: 2420 Training Loss: 3.414 Validation Loss: 4.745 123Step: 2440 Training Loss: 3.459 Validation Loss: 5.28 124Step: 2460 Training Loss: 3.678 Validation Loss: 5.044 125Step: 2480 Training Loss: 3.409 Validation Loss: 4.935 126Step: 2500 Training Loss: 3.484 Validation Loss: 5.054 127Step: 2520 Training Loss: 3.659 Validation Loss: 5.335 128Step: 2540 Training Loss: 3.423 Validation Loss: 5.333 129Step: 2560 Training Loss: 3.57 Validation Loss: 5.237 130Step: 2580 Training Loss: 3.57 Validation Loss: 4.961 131Step: 2600 Training Loss: 3.67 Validation Loss: 5.023 132Step: 2620 Training Loss: 3.451 Validation Loss: 4.958 133Step: 2640 Training Loss: 3.542 Validation Loss: 5.144 134Step: 2660 Training Loss: 3.474 Validation Loss: 5.076 135Step: 2680 Training Loss: 3.482 Validation Loss: 4.937 136Step: 2700 Training Loss: 3.428 Validation Loss: 5.087 137Step: 2720 Training Loss: 3.377 Validation Loss: 5.171 138Step: 2740 Training Loss: 3.404 Validation Loss: 4.779 139Step: 2760 Training Loss: 3.2 Validation Loss: 5.077 140Step: 2780 Training Loss: 3.28 Validation Loss: 5.184 141Step: 2800 Training Loss: 3.138 Validation Loss: 5.165 142Step: 2820 Training Loss: 3.374 Validation Loss: 5.091 143Step: 2840 Training Loss: 3.29 Validation Loss: 5.2 144Step: 2860 Training Loss: 3.375 Validation Loss: 5.022 145Step: 2880 Training Loss: 3.45 Validation Loss: 4.919 146Step: 2900 Training Loss: 3.465 Validation Loss: 5.134 147Step: 2920 Training Loss: 3.457 Validation Loss: 5.227 148Step: 2940 Training Loss: 3.322 Validation Loss: 4.94 149Step: 2960 Training Loss: 3.203 Validation Loss: 5.068 150Step: 2980 Training Loss: 3.372 Validation Loss: 4.924 151Step: 3000 Training Loss: 3.512 Validation Loss: 5.071 152Step: 3020 Training Loss: 3.469 Validation Loss: 4.782 153Step: 3040 Training Loss: 3.343 Validation Loss: 5.275 154Step: 3060 Training Loss: 3.201 Validation Loss: 4.854 155Step: 3080 Training Loss: 3.313 Validation Loss: 5.037 156Step: 3100 Training Loss: 3.41 Validation Loss: 4.707 157Step: 3120 Training Loss: 3.201 Validation Loss: 5.013 158Step: 3140 Training Loss: 3.344 Validation Loss: 4.895 159Step: 3160 Training Loss: 3.307 Validation Loss: 4.915 160Step: 3180 Training Loss: 3.186 Validation Loss: 4.955 161Step: 3200 Training Loss: 3.262 Validation Loss: 5.005 162Step: 3220 Training Loss: 3.331 Validation Loss: 4.845 163Step: 3240 Training Loss: 3.301 Validation Loss: 5.017 164Step: 3260 Training Loss: 3.529 Validation Loss: 4.58 165Step: 3280 Training Loss: 3.269 Validation Loss: 4.887 166Step: 3300 Training Loss: 3.1 Validation Loss: 5.046 167Step: 3320 Training Loss: 3.239 Validation Loss: 4.825 168Step: 3340 Training Loss: 3.341 Validation Loss: 5.413 169Step: 3360 Training Loss: 3.288 Validation Loss: 4.929 170Step: 3380 Training Loss: 3.315 Validation Loss: 5.259 171Step: 3400 Training Loss: 3.19 Validation Loss: 4.979 172Step: 3420 Training Loss: 3.237 Validation Loss: 5.082 173Step: 3440 Training Loss: 3.168 Validation Loss: 5.336 174Step: 3460 Training Loss: 3.305 Validation Loss: 5.259 175Step: 3480 Training Loss: 3.142 Validation Loss: 4.798 176Step: 3500 Training Loss: 3.179 Validation Loss: 5.061 177Step: 3520 Training Loss: 3.238 Validation Loss: 5.056 178Step: 3540 Training Loss: 3.171 Validation Loss: 4.955 179Step: 3560 Training Loss: 3.141 Validation Loss: 4.828 180Step: 3580 Training Loss: 3.154 Validation Loss: 4.858 181Step: 3600 Training Loss: 3.245 Validation Loss: 5.185 182Step: 3620 Training Loss: 3.076 Validation Loss: 4.518 183Step: 3640 Training Loss: 3.208 Validation Loss: 4.755 184Step: 3660 Training Loss: 3.343 Validation Loss: 4.94 185Step: 3680 Training Loss: 3.109 Validation Loss: 4.749 186Step: 3700 Training Loss: 3.137 Validation Loss: 4.929 187Step: 3720 Training Loss: 3.105 Validation Loss: 4.806 188Step: 3740 Training Loss: 3.053 Validation Loss: 4.917 189Step: 3760 Training Loss: 3.379 Validation Loss: 4.991 190Step: 3780 Training Loss: 3.278 Validation Loss: 5.268 191Step: 3800 Training Loss: 3.11 Validation Loss: 5.2 192Step: 3820 Training Loss: 3.049 Validation Loss: 5.134 193Step: 3840 Training Loss: 3.182 Validation Loss: 4.849 194Step: 3860 Training Loss: 2.989 Validation Loss: 5.004 195Step: 3880 Training Loss: 3.27 Validation Loss: 4.796 196Step: 3900 Training Loss: 3.007 Validation Loss: 4.805 197Step: 3920 Training Loss: 3.151 Validation Loss: 4.856 198Step: 3940 Training Loss: 3.125 Validation Loss: 4.832 199Step: 3960 Training Loss: 3.058 Validation Loss: 4.629 200Step: 3980 Training Loss: 3.031 Validation Loss: 4.963 201Step: 4000 Training Loss: 3.118 Validation Loss: 4.976 202Step: 4020 Training Loss: 3.152 Validation Loss: 4.949 203Step: 4040 Training Loss: 3.049 Validation Loss: 5.054 204Step: 4060 Training Loss: 3.065 Validation Loss: 5.069 205Step: 4080 Training Loss: 3.193 Validation Loss: 5.184 206Step: 4100 Training Loss: 2.92 Validation Loss: 5.0 207Step: 4120 Training Loss: 3.167 Validation Loss: 4.822 208Step: 4140 Training Loss: 3.117 Validation Loss: 4.895 209Step: 4160 Training Loss: 3.153 Validation Loss: 5.004 210Step: 4180 Training Loss: 3.213 Validation Loss: 4.874 211Step: 4200 Training Loss: 2.952 Validation Loss: 4.93 212Step: 4220 Training Loss: 3.089 Validation Loss: 5.009 213Step: 4240 Training Loss: 2.934 Validation Loss: 5.001 214Step: 4260 Training Loss: 3.035 Validation Loss: 5.085 215Step: 4280 Training Loss: 2.786 Validation Loss: 4.974 216Step: 4300 Training Loss: 3.009 Validation Loss: 4.948 217Step: 4320 Training Loss: 2.893 Validation Loss: 5.033 218Step: 4340 Training Loss: 2.859 Validation Loss: 4.889 219Step: 4360 Training Loss: 3.022 Validation Loss: 4.746 220Step: 4380 Training Loss: 2.983 Validation Loss: 5.146 221Step: 4400 Training Loss: 3.125 Validation Loss: 4.891 222Step: 4420 Training Loss: 3.003 Validation Loss: 5.253 223Step: 4440 Training Loss: 2.952 Validation Loss: 5.039 224Step: 4460 Training Loss: 3.043 Validation Loss: 4.736 225Step: 4480 Training Loss: 2.811 Validation Loss: 5.291 226Step: 4500 Training Loss: 2.927 Validation Loss: 4.883 227Step: 4520 Training Loss: 2.983 Validation Loss: 4.685 228Step: 4540 Training Loss: 3.092 Validation Loss: 4.898 229Step: 4560 Training Loss: 3.034 Validation Loss: 4.876 230Step: 4580 Training Loss: 3.036 Validation Loss: 5.188 231Step: 4600 Training Loss: 2.715 Validation Loss: 4.858 232Step: 4620 Training Loss: 3.009 Validation Loss: 5.125 233Step: 4640 Training Loss: 2.923 Validation Loss: 4.92 234Step: 4660 Training Loss: 2.869 Validation Loss: 4.923 235Step: 4680 Training Loss: 2.809 Validation Loss: 5.075 236Step: 4700 Training Loss: 3.002 Validation Loss: 5.103 237Step: 4720 Training Loss: 2.921 Validation Loss: 5.054 238Step: 4740 Training Loss: 2.81 Validation Loss: 5.074 239Step: 4760 Training Loss: 2.951 Validation Loss: 5.228 240Step: 4780 Training Loss: 2.919 Validation Loss: 4.913 241Step: 4800 Training Loss: 2.953 Validation Loss: 5.215 242Step: 4820 Training Loss: 3.022 Validation Loss: 4.832 243Step: 4840 Training Loss: 2.766 Validation Loss: 5.119 244Step: 4860 Training Loss: 2.898 Validation Loss: 5.103 245Step: 4880 Training Loss: 2.977 Validation Loss: 4.885 246Step: 4900 Training Loss: 3.036 Validation Loss: 5.128 247Step: 4920 Training Loss: 2.913 Validation Loss: 4.799 248Step: 4940 Training Loss: 2.966 Validation Loss: 4.863 249Step: 4960 Training Loss: 2.723 Validation Loss: 4.828 250Step: 4980 Training Loss: 2.752 Validation Loss: 4.666 251Step: 4999 Training Loss: 2.828 Validation Loss: 5.13 252--------------- 253The salesperson, the customer, the salesperson can effectively gather information, and ultimately increasing the likelihood of the sale. 2541. Be mindful of reinforcing of-ended questions are identified persuasive and manipulative use can significantly impact, demonstrating genuine interest requires a deeper level that we have successfully suit their approach to share their responses. Some customers, while some the root cause for them, showcasing patterns, sales professionals can meet their concerns and increase their requirements. 255When faced with your product or service is not just about attacks but crafted situations 256---------------

在线查看或执行:https://colab.research.google.com/drive/1hvgnvZhTNqJbfaHIN_29h0p9lqSx1IXU?usp=sharing

原创声明:本文为本人原创作品,首发于AI ONES https://wuxiongwei.com,如果转载,请保留本文链接,谢谢。
上一篇

如何成为 Prompt 大师 第二部

下一篇

AI换脸