智能pos機(jī)硬件標(biāo)準(zhǔn)化

新聞資訊3 | 2023-09-11 09:40 | 投稿人：pos機(jī)之家

網(wǎng)上有很多關(guān)于智能pos機(jī)硬件標(biāo)準(zhǔn)化,優(yōu)化機(jī)器學(xué)習(xí)算法輸出結(jié)果的知識，也有很多人為大家解答關(guān)于智能pos機(jī)硬件標(biāo)準(zhǔn)化的問題，今天pos機(jī)之家(m.afbey.com)為大家整理了關(guān)于這方面的知識，讓我們一起來看下吧!

本文目錄一覽：

1、智能pos機(jī)硬件標(biāo)準(zhǔn)化

智能pos機(jī)硬件標(biāo)準(zhǔn)化

關(guān)于標(biāo)準(zhǔn)化（standardization）

數(shù)據(jù)標(biāo)準(zhǔn)化能將原來的數(shù)據(jù)進(jìn)行重新調(diào)整（一般也稱為 z-score 規(guī)范化方法），以便他們具有標(biāo)準(zhǔn)正態(tài)分布的屬性，即 μ=0 和 σ=1。其中，μ 表示平均值，σ 表示標(biāo)準(zhǔn)方差。數(shù)據(jù)標(biāo)準(zhǔn)化之后的形式可以按照如下公式進(jìn)行計(jì)算：

如果我們是比較兩個(gè)不同大小維度的數(shù)據(jù)，那么將這些數(shù)據(jù)標(biāo)準(zhǔn)化到以 0 為中心并且標(biāo)準(zhǔn)差為 1 的范圍，這對許多的機(jī)器學(xué)習(xí)算法來說也是一般的要求。比如，從直覺上來說，我們可以將梯度下降看作一個(gè)特殊的例子。由于特征值 xj 在權(quán)重更新中發(fā)揮作用，那么某些權(quán)重可能比其他權(quán)重更新的速度更快，即：

其中，wj:=wj+Δwj，η 表示學(xué)習(xí)率，t 表示目標(biāo)正確分類結(jié)果，o 表示模型的輸出分類結(jié)果。

其他直觀的例子包括 KNN 算法和一些聚類算法上面，都會使用這種數(shù)據(jù)標(biāo)準(zhǔn)化的方法。

事實(shí)上，我能想到唯一不需要數(shù)據(jù)標(biāo)準(zhǔn)化的算法就應(yīng)該是基于決策樹的算法了。我們來看一般的 CART 決策樹算法。在這里我們不深入的分析信息熵的含義，我們可以把這個(gè)決策簡單的想象成 is feature x_i >= some_val ? 。從直觀上來講，我們真的不需要來關(guān)心數(shù)據(jù)特征在哪個(gè)大小維度（比如，不同數(shù)量級，不同領(lǐng)域 —— 這些真的不關(guān)心）。

那么，在哪些算法中特征數(shù)據(jù)標(biāo)準(zhǔn)化是比較重要的呢？比如下面這些算法：

對于基于歐幾里得距離的 KNN 算法，如果你想要所有的數(shù)據(jù)對算法都有貢獻(xiàn)，那么必須進(jìn)行標(biāo)準(zhǔn)化；k-means 算法；邏輯回歸，支持向量機(jī)，感知器，神經(jīng)網(wǎng)絡(luò)等，如果你正是在使用梯度下降（上升）來作為優(yōu)化器，那么采用標(biāo)準(zhǔn)化會讓權(quán)重更快的收斂；線性判別分析，PCA，核方法；

另外，我們還要考慮我們的數(shù)據(jù)是需要進(jìn)行“標(biāo)準(zhǔn)化（standardize）”還是“歸一化（normalize）”（這里是縮放到 [0, 1] 范圍）。因?yàn)橛行┧惴僭O(shè)我們的數(shù)據(jù)是以 0 為中心分布的，那么這時(shí)候進(jìn)行標(biāo)準(zhǔn)化還是歸一化就需要自己思考了。例如，如果我們對一個(gè)小型多層感知器（利用 tanh 激活函數(shù)）進(jìn)行權(quán)重初始化，權(quán)重應(yīng)該是 0 ，或者是以零為中心的小隨機(jī)數(shù)，這樣能更好的更新模型權(quán)重。作為一個(gè)經(jīng)驗(yàn)法則，我想說的是：如果你不確定對數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化還是歸一化，那么你就對數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化吧，至少它不會對數(shù)據(jù)和結(jié)果造成傷害。

標(biāo)準(zhǔn)化（Standardization）

數(shù)據(jù)的標(biāo)準(zhǔn)化是將數(shù)據(jù)按比例縮放，使之落入一個(gè)小的特定區(qū)間，標(biāo)準(zhǔn)化之后的數(shù)據(jù)可正可負(fù)，但是一般絕對值不會太大，一般是 z-score 規(guī)范化方法。

歸一化（Normalization）

主要是為了數(shù)據(jù)處理提出來的，把數(shù)據(jù)映射到 0~1 范圍之間處理，更加便捷快速，這應(yīng)該歸到數(shù)字信號處理范疇之內(nèi)。一般方法是最小-最大縮放方法。

關(guān)于最小-最大縮放處理

還有一種數(shù)據(jù)處理的方法是最小 - 最大縮放。在這種方法中，數(shù)據(jù)被縮放到一個(gè)固定的范圍 —— 通常是 0 到 1。與標(biāo)準(zhǔn)化相比，有限范圍的損失值計(jì)算最終將得到較小的標(biāo)準(zhǔn)偏差，這也可以抑制一些異常值的影響。

最小 - 最大縮放處理可以通過以下公式完成：

選擇 Z-score 標(biāo)準(zhǔn)化還是最小 - 最大縮放？

具體選擇哪一個(gè)數(shù)據(jù)處理方法沒有很明確的答案，它主要取決于具體的應(yīng)用程序。

例如，在聚類分析中，為了比較基于特定距離度量的特征數(shù)據(jù)之間的相似性，數(shù)據(jù)標(biāo)準(zhǔn)化可能是一個(gè)特別重要的方式。另一個(gè)比較突出的例子就是主成分分析，我們通常采用標(biāo)準(zhǔn)化來做數(shù)據(jù)進(jìn)行縮放。因?yàn)槲覀儗ψ畲蠡讲畹姆较蚋信d趣。

如何使用 scikit-learn 來實(shí)現(xiàn)標(biāo)準(zhǔn)化和歸一化

當(dāng)然，我們可以利用 Numpy 包來計(jì)算數(shù)據(jù)的 z-score，并使用前面提到的公式來進(jìn)行標(biāo)準(zhǔn)化。但是，如果我們使用 Python 的開源機(jī)器學(xué)習(xí)庫 scikit-learn 中的預(yù)處理模塊來做，會更加便捷。

為了下面更好的討論，我們采用 UCI 機(jī)器學(xué)習(xí)庫中的 “Wine” 數(shù)據(jù)集來進(jìn)行代碼編寫。

import pandas as pdimport numpy as npdf = pd.io.parsers.read_csv( 'https://raw.githubusercontent.com/rasbt/pattern_classification/master/data/wine_data.csv', header=None, usecols=[0,1,2] )df.columns=['Class label', 'Alcohol', 'Malic acid']df.head()

正如我們在上面的表格中所看到的，Alcohol 特征（百分比 / 體積）和 Malic acid（g/l）是在不同大小維度上面進(jìn)行描述的，所有在比較或者組合這些特征之前，進(jìn)行數(shù)據(jù)縮放是非常有必要的。

標(biāo)準(zhǔn)化和最小-最大縮放

from sklearn import preprocessingstd_scale = preprocessing.StandardScaler().fit(df[['Alcohol', 'Malic acid']])df_std = std_scale.transform(df[['Alcohol', 'Malic acid']])minmax_scale = preprocessing.MinMaxScaler().fit(df[['Alcohol', 'Malic acid']])df_minmax = minmax_scale.transform(df[['Alcohol', 'Malic acid']])print('mean after standardization:\Alcohol={:.2f}, Malic acid={:.2f}' .format(df_std[:,0].mean(), df_std[:,1].mean()))print('\Standard deviation after standardization:\Alcohol={:.2f}, Malic acid={:.2f}' .format(df_std[:,0].std(), df_std[:,1].std()))

Mean after standardization:

Alcohol=0.00, Malic acid=0.00

Standard deviation after standardization:

Alcohol=1.00, Malic acid=1.00

print('Min-value after min-max scaling:\Alcohol={:.2f}, Malic acid={:.2f}' .format(df_minmax[:,0].min(), df_minmax[:,1].min()))print('\Max-value after min-max scaling:\Alcohol={:.2f}, Malic acid={:.2f}' .format(df_minmax[:,0].max(), df_minmax[:,1].max()))

Min-value after min-max scaling:

Alcohol=0.00, Malic acid=0.00

Max-value after min-max scaling:

Alcohol=1.00, Malic acid=1.00

畫圖

%matplotlib inlinefrom matplotlib import pyplot as pltdef plot(): plt.figure(figsize=(8,6)) plt.scatter(df['Alcohol'], df['Malic acid'], color='green', label='input scale', alpha=0.5) plt.scatter(df_std[:,0], df_std[:,1], color='red', label='Standardized ', alpha=0.3) plt.scatter(df_minmax[:,0], df_minmax[:,1], color='blue', label='min-max scaled [min=0, max=1]', alpha=0.3) plt.title('Alcohol and Malic Acid content of the wine dataset') plt.xlabel('Alcohol') plt.ylabel('Malic Acid') plt.legend(loc='upper left') plt.grid() plt.tight_layout()plot()plt.show()

上面的圖包括所有三個(gè)不同比例的葡萄酒數(shù)據(jù)點(diǎn)：原始酒精含量數(shù)據(jù)（綠色），標(biāo)準(zhǔn)化之后的數(shù)據(jù)（紅色）和歸一化之后的數(shù)據(jù)（藍(lán)色）。在下面的圖中，我們將放大三個(gè)不同的坐標(biāo)軸。

fig, ax = plt.subplots(3, figsize=(6,14))for a,d,l in zip(range(len(ax)), (df[['Alcohol', 'Malic acid']].values, df_std, df_minmax), ('Input scale', 'Standardized', 'min-max scaled [min=0, max=1]') ): for i,c in zip(range(1,4), ('red', 'blue', 'green')): ax[a].scatter(d[df['Class label'].values == i, 0], d[df['Class label'].values == i, 1], alpha=0.5, color=c, label='Class %s' %i ) ax[a].set_title(l) ax[a].set_xlabel('Alcohol') ax[a].set_ylabel('Malic Acid') ax[a].legend(loc='upper left') ax[a].grid()plt.tight_layout()plt.show()自己動(dòng)手，豐衣足食

當(dāng)然，我們也可以手動(dòng)編寫標(biāo)準(zhǔn)化方程和最小-最大縮放。但是，實(shí)際正真項(xiàng)目中，還是比較推薦使用 scikit-learn 。比如：

std_scale = preprocessing.StandardScaler().fit(X_train)X_train = std_scale.transform(X_train)X_test = std_scale.transform(X_test)

接下來，我們采用純 Python 代碼來實(shí)現(xiàn)這幾個(gè)指標(biāo)，并且也會用 numpy 來進(jìn)行計(jì)算加速?；叵胍幌挛覀冇玫膸讉€(gè)參數(shù)指標(biāo)：

純 python

# Standardizationx = [1,4,5,6,6,2,3]mean = sum(x)/len(x)std_dev = (1/len(x) * sum([ (x_i - mean)**2 for x_i in x]))**0.5z_scores = [(x_i - mean)/std_dev for x_i in x]# Min-Max scalingminmax = [(x_i - min(x)) / (max(x) - min(x)) for x_i in x]Numpy

import numpy as np# Standardizationx_np = np.asarray(x)z_scores_np = (x_np - x_np.mean()) / x_np.std()# Min-Max scalingnp_minmax = (x_np - x_np.min()) / (x_np.max() - x_np.min())可視化

為了檢驗(yàn)我們的代碼是否正常工作，我們通過可視化來進(jìn)行查看。

from matplotlib import pyplot as pltfig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, figsize=(10,5))y_pos = [0 for i in range(len(x))]ax1.scatter(z_scores, y_pos, color='g')ax1.set_title('Python standardization', color='g')ax2.scatter(minmax, y_pos, color='g')ax2.set_title('Python Min-Max scaling', color='g')ax3.scatter(z_scores_np, y_pos, color='b')ax3.set_title('Python NumPy standardization', color='b')ax4.scatter(np_minmax, y_pos, color='b')ax4.set_title('Python NumPy Min-Max scaling', color='b')plt.tight_layout()for ax in (ax1, ax2, ax3, ax4): ax.get_yaxis().set_visible(False) ax.grid()plt.show()實(shí)戰(zhàn)：PCA 中是否進(jìn)行數(shù)據(jù)標(biāo)準(zhǔn)化對分類任務(wù)的影響

在文章的前面，我們提到了在 PCA 中對數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化是至關(guān)重要的，因?yàn)樗欠治霾煌卣髦g的差異?，F(xiàn)在，讓我們看看標(biāo)準(zhǔn)化是如何影響 PCA 對整個(gè)葡萄酒數(shù)據(jù)分類結(jié)果產(chǎn)生的影響。

接下來，我們主要通過這些步驟進(jìn)行描述：

讀取數(shù)據(jù)集；將原始數(shù)據(jù)集拆分為訓(xùn)練集和測試集；特征數(shù)據(jù)標(biāo)準(zhǔn)化；PCA 降維；訓(xùn)練樸素貝葉斯分類器；利用標(biāo)準(zhǔn)化數(shù)據(jù)和不利用標(biāo)準(zhǔn)化數(shù)據(jù)分別對分類器進(jìn)行評估；讀取數(shù)據(jù)集

import pandas as pddf = pd.io.parsers.read_csv( 'https://raw.githubusercontent.com/rasbt/pattern_classification/master/data/wine_data.csv', header=None, )將原始數(shù)據(jù)集拆分為訓(xùn)練集和測試集

在這一步中，我們將數(shù)據(jù)隨機(jī)的分為一個(gè)訓(xùn)練集和一個(gè)測試集，其中訓(xùn)練集包含整個(gè)數(shù)據(jù)集的 70%，測試集包含整個(gè)數(shù)據(jù)集的 30%。

from sklearn.cross_validation import train_test_splitX_wine = df.values[:,1:]y_wine = df.values[:,0]X_train, X_test, y_train, y_test = train_test_split(X_wine, y_wine, test_size=0.30, random_state=12345)特征數(shù)據(jù)標(biāo)準(zhǔn)化

from sklearn import preprocessingstd_scale = preprocessing.StandardScaler().fit(X_train)X_train_std = std_scale.transform(X_train)X_test_std = std_scale.transform(X_test)PCA 降維

現(xiàn)在，我們對標(biāo)準(zhǔn)化的數(shù)據(jù)和非標(biāo)準(zhǔn)化的數(shù)據(jù)分別進(jìn)行 PCA 操作，將數(shù)據(jù)集轉(zhuǎn)化為二維特征子空間。在一個(gè)真實(shí)的應(yīng)用程序中，我們還會有一個(gè)交叉驗(yàn)證的過程，以便找出一些過度擬合的信息。但是，在這里我們就不做這個(gè)過程了，因?yàn)槲覀儾皇且O(shè)計(jì)一個(gè)完美的分類器，我們在這里只是想要去比較標(biāo)準(zhǔn)化對分類結(jié)果的影響。

from sklearn.decomposition import PCA# on non-standardized datapca = PCA(n_components=2).fit(X_train)X_train = pca.transform(X_train)X_test = pca.transform(X_test)# om standardized datapca_std = PCA(n_components=2).fit(X_train_std)X_train_std = pca_std.transform(X_train_std)X_test_std = pca_std.transform(X_test_std)

讓我們快速的查看一下我們的新特征。如下圖：

from matplotlib import pyplot as pltfig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(10,4))for l,c,m in zip(range(1,4), ('blue', 'red', 'green'), ('^', 's', 'o')): ax1.scatter(X_train[y_train==l, 0], X_train[y_train==l, 1], color=c, label='class %s' %l, alpha=0.5, marker=m )for l,c,m in zip(range(1,4), ('blue', 'red', 'green'), ('^', 's', 'o')): ax2.scatter(X_train_std[y_train==l, 0], X_train_std[y_train==l, 1], color=c, label='class %s' %l, alpha=0.5, marker=m )ax1.set_title('Transformed NON-standardized training dataset after PCA') ax2.set_title('Transformed standardized training dataset after PCA') for ax in (ax1, ax2): ax.set_xlabel('1st principal component') ax.set_ylabel('2nd principal component') ax.legend(loc='upper right') ax.grid()plt.tight_layout()plt.show()訓(xùn)練樸素貝葉斯分類器

接下來，我們使用一個(gè)樸素貝葉斯分類器來進(jìn)行分類任務(wù)。也就是說，我們假設(shè)每一個(gè)特征都是獨(dú)立分布的?？偠灾?，這是一個(gè)簡單的分類器，但是具有很好的魯棒性。

貝葉斯規(guī)則：

其中：

ω：表示分類標(biāo)簽；P(ω | x)：表示后驗(yàn)概率；P(x | ω)：表示先驗(yàn)概率；

判別規(guī)則如下：

我不想在本文介紹很多的貝葉斯內(nèi)容，如果你對這方面感興趣，可以自己上網(wǎng)看看材料，網(wǎng)上有很多這方面的資料。

from sklearn.naive_bayes import GaussianNB# on non-standardized datagnb = GaussianNB()fit = gnb.fit(X_train, y_train)# on standardized datagnb_std = GaussianNB()fit_std = gnb_std.fit(X_train_std, y_train)利用標(biāo)準(zhǔn)化數(shù)據(jù)和不利用標(biāo)準(zhǔn)化數(shù)據(jù)分別對分類器進(jìn)行評估

from sklearn import metricspred_train = gnb.predict(X_train)print('\Prediction accuracy for the training dataset')print('{:.2%}'.format(metrics.accuracy_score(y_train, pred_train)))pred_test = gnb.predict(X_test)print('\Prediction accuracy for the test dataset')print('{:.2%}\'.format(metrics.accuracy_score(y_test, pred_test)))

Prediction accuracy for the training dataset

81.45%

Prediction accuracy for the test dataset

64.81%

pred_train_std = gnb_std.predict(X_train_std)print('\Prediction accuracy for the training dataset')print('{:.2%}'.format(metrics.accuracy_score(y_train, pred_train_std)))pred_test_std = gnb_std.predict(X_test_std)print('\Prediction accuracy for the test dataset')print('{:.2%}\'.format(metrics.accuracy_score(y_test, pred_test_std)))

Prediction accuracy for the training dataset

96.77%

Prediction accuracy for the test dataset

98.15%

正如我們所看到的，在 PCA 之前進(jìn)行標(biāo)準(zhǔn)化，確實(shí)對模型的正確率增加了不少。

來源：sebastianraschka

以上就是關(guān)于智能pos機(jī)硬件標(biāo)準(zhǔn)化,優(yōu)化機(jī)器學(xué)習(xí)算法輸出結(jié)果的知識，后面我們會繼續(xù)為大家整理關(guān)于智能pos機(jī)硬件標(biāo)準(zhǔn)化的知識，希望能夠幫助到大家！