熵权法原理及其 Python 实现

信息熵

信息熵是不确定性的一个度量,反映信息量的多少。信息量多少与随机事件发生概率有关,概率越大,不确定性越小,包含信息就越少,所以随机事件的信息量随着其发生概率递减。信息熵计算公式如下:

$$ H(X) = - \displaystyle\sum^n_{i = 1} p_i \log(p_i) $$

其中, xi 为随机变量 X 的取值, p 为随机事件 xi 发生的概率。

熵权法

根据信息熵特性,可以用来衡量一个指标的离散程度,指标离散程度越大,该指标对综合评价对影响越大,权重越大。熵权法是一种依赖于数据本身离散性的客观赋值法,用于结合多种指标对样本进行综合打分,实现样本间比较。

实现方法

假定有 n 个样本 m 个维度,用如下方式表示每个随机变量的取值:

$$ x_{ij}, i = 1,...,n, j = 1,...,m $$

为避免量纲造成的影响,首先要对指标进行标准化处理。根据指标含义,可将指标分为正向指标(取值越越好)和逆向指标(取值越越好),分别通过如下方法进行标准化:

$$ x^{'}_{ij} = \frac{x_{ij} - \min(\sum x_j)}{\max(\sum x_j) - \min(\sum x_j)} $$

$$ x^{'}_{ij} = \frac{\max(\sum x_j) - x_{ij}}{\max(\sum x_j) - \min(\sum x_j)} $$

然后计算每个维度的熵

$$ E_j = -k \displaystyle\sum^n_{i = 1} p_{ij}\ln(p_{ij}) $$

其中:

$$ p_{ij} = \frac{x^{'}_{ij}}{\displaystyle\sum^n_{i = 1}x^{'}_{ij}}, i=1,...,n, j=1,...,m $$

$$ k = \frac{1}{\ln(n)} > 0, E_j > 0 $$

计算冗余度

$$ d_j = 1 - E_j $$

计算权重

$$ w_j = \frac{d_j}{\displaystyle\sum_j d_j} $$

计算综合得分

$$ s_i = \displaystyle\sum_j w_j x^{'}_{ij} $$

Python 代码实现

#!/usr/bin/env python
# coding=utf-8

import numpy as np

def postiveIndex(data):
    n, m = data.shape
    x = np.ones([n, m])
    for j in range(m):
        max_xj, min_xj = max(data.T[j]), min(data.T[j])
        for i in range(n):
            x[i, j] = (data[i, j] - min_xj) / (max_xj - min_xj)

    return x

def negativeIndex(data):
    n, m = data.shape
    x = np.ones([n, m])

    for j in range(m):
        max_xj, min_xj = max(data.T[j]), min(data.T[j])
        for i in range(n):
            x[i, j] = (max_xj - data[i, j]) / (max_xj - min_xj)
            # print(1)

    return x

def ln(x):
    if x <= 0:
        return 0
    else:
        return np.log(x)

def calcP(data):
    n, m = data.shape
    p = np.ones([n, m])

    for j in range(m):
        for i in range(n):
            p[i, j] = (data[i, j] / np.sum(data.T[j])).astype(np.float64)

    return p

def calcEntropy(data):
    data = calcP(data)

    print("Calculated P:")
    print(data)

    n, m = data.shape
    k = 1.0 / ln(n)
    E = np.ones(m)

    for j in range(m):
        sum = 0
        for i in range(n):
            sum += (data[i, j] * ln(data[i, j]))

        print("Sum = ", sum)
        E[j] = -k * sum

    return E

def calcWeight(data):
    sum = 0
    weight = np.ones(len(data))

    for i in data:
        sum += (1 - i)

    for i in range(len(data)):
        weight[i] = (1 - data[i]) / sum

    return weight

def calcScore(weight, data):
    n, m = data.shape
    s = np.ones(n)

    for i in range(n):
        s[i] = 0
        for j in range(m):
            s[i] += weight[j] * data[i, j] * 100

    return s

if __name__ == "__main__":
    data = np.loadtxt("./data/test.txt") # Read Data

    pIndex = postiveIndex(data) # Get Positive Index
    print("Positive Index:")
    print(pIndex)

    nIndex = negativeIndex(data) # Get Negative Index
    print("Negative Index:")
    print(nIndex)

    pEntropy = calcEntropy(pIndex) # Get Postive Index Entropy
    nEntropy = calcEntropy(nIndex) # Get Negative Index Entropy

    pWeight = calcWeight(pEntropy) # Get Positive Index Weight
    print("Positive Index Weight:")
    print(pWeight)

    nWeight = calcWeight(nEntropy) # Get Negative Index Weight
    print("Negative Index Weight:")
    print(nWeight)

    pScore = calcScore(pWeight, pIndex) # Get Positive Index Score
    print("Positive Index Score:")
    print(pScore)

    nScore = calcScore(nWeight, nIndex) # Get Negative Index Score
    print("Negative Index Score")
    print(nScore)

版权声明:

以上原理部分内容来自 综合评价之熵权法