grace_t.framework.tools package

Submodules

grace_t.framework.tools.calc_metrics module

grace_t.framework.tools.calc_metrics.cal_pearson(x, y)[源代码]
Args:
  • x: a list
  • y: a list
Returns:
Pearson relation between x and y
grace_t.framework.tools.calc_metrics.generate_demo_data(class_num=2, starts_from=0)[源代码]

given class_num, generate demo data in np.array format

grace_t.framework.tools.calc_metrics.get_accuracy(labels_true, labels_pred, normalize_type=True)[源代码]

In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. Args:

  • labels_true
  • labels_pred
  • normalize_type:
    return the number of correctly classified samples. Otherwise, return the fraction of correctly classified samples.
Returns:
accuracy
grace_t.framework.tools.calc_metrics.get_auc(labels_true, labels_pred_prob, pos_label, class_num, starts_from=0)[源代码]
Args:
  • labels_true
  • labels_pred_prob
  • pos_label: Label considered as positive and others are considered negative.
  • class_num: if class_num == 2 and label starts from 0: roc_auc_score equals to roc_curve then auc’ s res
Returns:
auc
grace_t.framework.tools.calc_metrics.get_confusion_matrix(labels_true, labels_pred, class_num, starts_from)[源代码]
Args:
  • labels_true
  • labels_pred
  • class_num
  • starts_from
Returns:
(confusion_matrix_res, (tn, fp, fn, tp))
grace_t.framework.tools.calc_metrics.get_f1(labels_true, labels_pred, average_type=None)[源代码]
Args:
  • labels_true
  • labels_pred
  • average_type:
    • micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.
    • macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    • weighted: Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
    • samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
    • None: the scores for each class are returned.
Returns:
f1
grace_t.framework.tools.calc_metrics.get_pr_curve(labels_true, labels_pred_prob, pos_label, class_num, starts_from=0)[源代码]
Args:
  • labels_true
  • labels_pred_prob
  • pos_label: Label considered as positive and others are considered negative.
  • class_num: if class_num == 2 and label starts from 0: roc_auc_score equals to roc_curve then auc’ s res
Returns:
  • precision : array, shape = [n_thresholds + 1]

Precision values such that element i is the precision of predictions with score >= thresholds[i] and the last element is 1. + recall : array, shape = [n_thresholds + 1] Decreasing recall values such that element i is the recall of predictions with score >= thresholds[i] and the last element is 0. + thresholds : array, shape = [n_thresholds <= len(np.unique(probas_pred))] Increasing thresholds on the decision function used to compute precision and recall.

grace_t.framework.tools.calc_metrics.get_precision(labels_true, labels_pred, average_type)[源代码]
tp / (tp + fp)
Args:
  • labels_true
  • labels_pred
  • average_type:
    • micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.
    • macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    • weighted: Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and precision.
    • samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
    • None: the scores for each class are returned.
Returns:
precision
grace_t.framework.tools.calc_metrics.get_recall(labels_true, labels_pred, average_type=None)[源代码]
tp / (tp + fn)
Args:
  • labels_true
  • labels_pred
  • average_type:
    • micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.
    • macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    • weighted: Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
    • samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
    • None: the scores for each class are returned.
Returns:
recall
grace_t.framework.tools.calc_metrics.multiply(a, b)[源代码]
Args:
  • a: a list, like [a1,a2]
  • b: a list, like [b1,b2]
Returns:
a1b1+a2b2

grace_t.framework.tools.log module

class grace_t.framework.tools.log.SingleLevelFilter(passlevel, reject)[源代码]

基类:logging.Filter

filter specific log level

filter(record)[源代码]

real filter func

grace_t.framework.tools.log.init_log(file, level=20, when='D', backup=7, format='%(levelname)s: %(asctime)s: %(filename)s:%(lineno)d * %(thread)d %(message)s', datefmt='%m-%d %H:%M:%S')[源代码]

init_log - initialize log module

Args:
  • log_path: Log file path prefix.

    Log data will go to two files: log_path.log and log_path.log.wf Any non-exist parent directories will be created automatically

  • level: msg above the level will be displayed

    DEBUG < INFO < WARNING < ERROR < CRITICAL the default value is logging.INFO

  • when: how to split the log file by time interval
    • ‘S’ : Seconds
    • ‘M’ : Minutes
    • ‘H’ : Hours
    • ‘D’ : Days
    • ‘W’ : Week day

    default value: ‘D’

  • format: format of the log

    default format: %(levelname)s: %(asctime)s: %(filename)s:%(lineno)d * %(thread)d %(message)s INFO: 12-09 18:02:42: log.py:40 * 139814749787872 HELLO WORLD

  • backup: how many backup file to keep

    default value: 7

Raises:
  • OSError: fail to create log directories
  • IOError: fail to open log file

grace_t.framework.tools.mmr module

grace_t.framework.tools.rank_metrics module

grace_t.framework.tools.rank_metrics.average_precision(r)[源代码]

Score is average precision (area under PR curve)

Relevance is binary (nonzero is relevant).

>>> r = [1, 1, 0, 1, 0, 1, 0, 0, 0, 1]
>>> delta_r = 1. / sum(r)
>>> sum([sum(r[:x + 1]) / (x + 1.) * delta_r for x, y in enumerate(r) if y])
0.7833333333333333
>>> average_precision(r)
0.78333333333333333
Args:
r: Relevance scores (list or numpy) in rank order
(first element is the first item)
Returns:
Average precision
grace_t.framework.tools.rank_metrics.dcg_at_k(r, k, method=0)[源代码]

Score is discounted cumulative gain (dcg)

Relevance is positive real values. Can use binary as the previous methods.

Example from http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf

>>> r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
>>> dcg_at_k(r, 1)
3.0
>>> dcg_at_k(r, 1, method=1)
3.0
>>> dcg_at_k(r, 2)
5.0
>>> dcg_at_k(r, 2, method=1)
4.2618595071429155
>>> dcg_at_k(r, 10)
9.6051177391888114
>>> dcg_at_k(r, 11)
9.6051177391888114
Args:
r: Relevance scores (list or numpy) in rank order
(first element is the first item)

k: Number of results to consider method: If 0 then weights are [1.0, 1.0, 0.6309, 0.5, 0.4307, …]

If 1 then weights are [1.0, 0.6309, 0.5, 0.4307, …]
Returns:
Discounted cumulative gain
grace_t.framework.tools.rank_metrics.mean_average_precision(rs)[源代码]

Score is mean average precision

Relevance is binary (nonzero is relevant).

>>> rs = [[1, 1, 0, 1, 0, 1, 0, 0, 0, 1]]
>>> mean_average_precision(rs)
0.78333333333333333
>>> rs = [[1, 1, 0, 1, 0, 1, 0, 0, 0, 1], [0]]
>>> mean_average_precision(rs)
0.39166666666666666
Args:
rs: Iterator of relevance scores (list or numpy) in rank order
(first element is the first item)
Returns:
Mean average precision
grace_t.framework.tools.rank_metrics.mean_reciprocal_rank(rs)[源代码]

Score is reciprocal of the rank of the first relevant item

First element is ‘rank 1’. Relevance is binary (nonzero is relevant).

Example from http://en.wikipedia.org/wiki/Mean_reciprocal_rank

>>> rs = [[0, 0, 1], [0, 1, 0], [1, 0, 0]]
>>> mean_reciprocal_rank(rs)
0.61111111111111105
>>> rs = np.array([[0, 0, 0], [0, 1, 0], [1, 0, 0]])
>>> mean_reciprocal_rank(rs)
0.5
>>> rs = [[0, 0, 0, 1], [1, 0, 0], [1, 0, 0]]
>>> mean_reciprocal_rank(rs)
0.75
Args:
rs: Iterator of relevance scores (list or numpy) in rank order
(first element is the first item)
Returns:
Mean reciprocal rank
grace_t.framework.tools.rank_metrics.ndcg_at_k(r, k, method=0)[源代码]

Score is normalized discounted cumulative gain (ndcg)

Relevance is positive real values. Can use binary as the previous methods.

Example from http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf

>>> r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
>>> ndcg_at_k(r, 1)
1.0
>>> r = [2, 1, 2, 0]
>>> ndcg_at_k(r, 4)
0.9203032077642922
>>> ndcg_at_k(r, 4, method=1)
0.96519546960144276
>>> ndcg_at_k([0], 1)
0.0
>>> ndcg_at_k([1], 2)
1.0
Args:
r: Relevance scores (list or numpy) in rank order
(first element is the first item)

k: Number of results to consider method: If 0 then weights are [1.0, 1.0, 0.6309, 0.5, 0.4307, …]

If 1 then weights are [1.0, 0.6309, 0.5, 0.4307, …]
Returns:
Normalized discounted cumulative gain
grace_t.framework.tools.rank_metrics.precision_at_k(r, k)[源代码]

Score is precision @ k

Relevance is binary (nonzero is relevant).

>>> r = [0, 0, 1]
>>> precision_at_k(r, 1)
0.0
>>> precision_at_k(r, 2)
0.0
>>> precision_at_k(r, 3)
0.33333333333333331
>>> precision_at_k(r, 4)
Traceback (most recent call last):
    File "<stdin>", line 1, in ?
ValueError: Relevance score length < k
Args:
r: Relevance scores (list or numpy) in rank order
(first element is the first item)
Returns:
Precision @ k
Raises:
ValueError: len(r) must be >= k
grace_t.framework.tools.rank_metrics.r_precision(r)[源代码]

Score is precision after all relevant documents have been retrieved

Relevance is binary (nonzero is relevant).

>>> r = [0, 0, 1]
>>> r_precision(r)
0.33333333333333331
>>> r = [0, 1, 0]
>>> r_precision(r)
0.5
>>> r = [1, 0, 0]
>>> r_precision(r)
1.0
Args:
r: Relevance scores (list or numpy) in rank order
(first element is the first item)
Returns:
R Precision

Module contents