grace_t.framework.tools package¶

Submodules¶

grace_t.framework.tools.calc_metrics module¶

grace_t.framework.tools.calc_metrics.cal_pearson(x, y)[源代码]¶

Args:

x: a list
y: a list

Returns:

Pearson relation between x and y

grace_t.framework.tools.calc_metrics.generate_demo_data(class_num=2, starts_from=0)[源代码]¶: given class_num, generate demo data in np.array format

grace_t.framework.tools.calc_metrics.get_accuracy(labels_true, labels_pred, normalize_type=True)[源代码]¶

In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. Args:

labels_true

labels_pred

normalize_type:

return the number of correctly classified samples. Otherwise, return the fraction of correctly classified samples.

Returns:: accuracy

grace_t.framework.tools.calc_metrics.get_auc(labels_true, labels_pred_prob, pos_label, class_num, starts_from=0)[源代码]¶

Args:

labels_true
labels_pred_prob
pos_label: Label considered as positive and others are considered negative.
class_num: if class_num == 2 and label starts from 0: roc_auc_score equals to roc_curve then auc’ s res

Returns:

auc

grace_t.framework.tools.calc_metrics.get_confusion_matrix(labels_true, labels_pred, class_num, starts_from)[源代码]¶

Args:

labels_true
labels_pred
class_num
starts_from

Returns:

(confusion_matrix_res, (tn, fp, fn, tp))

grace_t.framework.tools.calc_metrics.get_f1(labels_true, labels_pred, average_type=None)[源代码]¶

Args:

labels_true
labels_pred
average_type:
- micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.
- macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
- weighted: Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
- samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
- None: the scores for each class are returned.

Returns:

f1

grace_t.framework.tools.calc_metrics.get_pr_curve(labels_true, labels_pred_prob, pos_label, class_num, starts_from=0)[源代码]¶

Args:

labels_true
labels_pred_prob
pos_label: Label considered as positive and others are considered negative.
class_num: if class_num == 2 and label starts from 0: roc_auc_score equals to roc_curve then auc’ s res

Returns:

precision : array, shape = [n_thresholds + 1]

Precision values such that element i is the precision of predictions with score >= thresholds[i] and the last element is 1. + recall : array, shape = [n_thresholds + 1] Decreasing recall values such that element i is the recall of predictions with score >= thresholds[i] and the last element is 0. + thresholds : array, shape = [n_thresholds <= len(np.unique(probas_pred))] Increasing thresholds on the decision function used to compute precision and recall.

grace_t.framework.tools.calc_metrics.get_precision(labels_true, labels_pred, average_type)[源代码]¶

tp / (tp + fp)

Args:

labels_true
labels_pred
average_type:
- micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.
- macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
- weighted: Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and precision.
- samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
- None: the scores for each class are returned.

Returns:

precision

grace_t.framework.tools.calc_metrics.get_recall(labels_true, labels_pred, average_type=None)[源代码]¶

tp / (tp + fn)

Args:

labels_true
labels_pred
average_type:
- micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.
- macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
- weighted: Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
- samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
- None: the scores for each class are returned.

Returns:

recall

grace_t.framework.tools.calc_metrics.multiply(a, b)[源代码]¶

Args:

a: a list, like [a1,a2]
b: a list, like [b1,b2]

Returns:

a1b1+a2b2

grace_t.framework.tools.log module¶

class grace_t.framework.tools.log.SingleLevelFilter(passlevel, reject)[源代码]¶

基类：logging.Filter

filter specific log level

filter(record)[源代码]¶: real filter func

grace_t.framework.tools.log.init_log(file, level=20, when='D', backup=7, format='%(levelname)s: %(asctime)s: %(filename)s:%(lineno)d * %(thread)d %(message)s', datefmt='%m-%d %H:%M:%S')[源代码]¶

init_log - initialize log module

Args:

log_path: Log file path prefix.

Log data will go to two files: log_path.log and log_path.log.wf Any non-exist parent directories will be created automatically
level: msg above the level will be displayed

DEBUG < INFO < WARNING < ERROR < CRITICAL the default value is logging.INFO
when: how to split the log file by time interval
- ‘S’ : Seconds
- ‘M’ : Minutes
- ‘H’ : Hours
- ‘D’ : Days
- ‘W’ : Week day
default value: ‘D’
format: format of the log

default format: %(levelname)s: %(asctime)s: %(filename)s:%(lineno)d * %(thread)d %(message)s INFO: 12-09 18:02:42: log.py:40 * 139814749787872 HELLO WORLD
backup: how many backup file to keep

default value: 7

Raises:

OSError: fail to create log directories
IOError: fail to open log file

grace_t.framework.tools.mmr module¶

grace_t.framework.tools.rank_metrics module¶

grace_t.framework.tools.rank_metrics.average_precision(r)[源代码]¶

Score is average precision (area under PR curve)

Relevance is binary (nonzero is relevant).

>>> r = [1, 1, 0, 1, 0, 1, 0, 0, 0, 1]
>>> delta_r = 1. / sum(r)
>>> sum([sum(r[:x + 1]) / (x + 1.) * delta_r for x, y in enumerate(r) if y])
0.7833333333333333
>>> average_precision(r)
0.78333333333333333

Args:

r: Relevance scores (list or numpy) in rank order: (first element is the first item)

Returns:

Average precision

grace_t.framework.tools.rank_metrics.dcg_at_k(r, k, method=0)[源代码]¶

Score is discounted cumulative gain (dcg)

Relevance is positive real values. Can use binary as the previous methods.

Example from http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf

>>> r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
>>> dcg_at_k(r, 1)
3.0
>>> dcg_at_k(r, 1, method=1)
3.0
>>> dcg_at_k(r, 2)
5.0
>>> dcg_at_k(r, 2, method=1)
4.2618595071429155
>>> dcg_at_k(r, 10)
9.6051177391888114
>>> dcg_at_k(r, 11)
9.6051177391888114

Args:

r: Relevance scores (list or numpy) in rank order: (first element is the first item)

k: Number of results to consider method: If 0 then weights are [1.0, 1.0, 0.6309, 0.5, 0.4307, …]

If 1 then weights are [1.0, 0.6309, 0.5, 0.4307, …]

Returns:

Discounted cumulative gain

grace_t.framework.tools.rank_metrics.mean_average_precision(rs)[源代码]¶

Score is mean average precision

Relevance is binary (nonzero is relevant).

>>> rs = [[1, 1, 0, 1, 0, 1, 0, 0, 0, 1]]
>>> mean_average_precision(rs)
0.78333333333333333
>>> rs = [[1, 1, 0, 1, 0, 1, 0, 0, 0, 1], [0]]
>>> mean_average_precision(rs)
0.39166666666666666

Args:

rs: Iterator of relevance scores (list or numpy) in rank order: (first element is the first item)

Returns:

Mean average precision

grace_t.framework.tools.rank_metrics.mean_reciprocal_rank(rs)[源代码]¶

Score is reciprocal of the rank of the first relevant item

First element is ‘rank 1’. Relevance is binary (nonzero is relevant).

Example from http://en.wikipedia.org/wiki/Mean_reciprocal_rank

>>> rs = [[0, 0, 1], [0, 1, 0], [1, 0, 0]]
>>> mean_reciprocal_rank(rs)
0.61111111111111105
>>> rs = np.array([[0, 0, 0], [0, 1, 0], [1, 0, 0]])
>>> mean_reciprocal_rank(rs)
0.5
>>> rs = [[0, 0, 0, 1], [1, 0, 0], [1, 0, 0]]
>>> mean_reciprocal_rank(rs)
0.75

Args:

rs: Iterator of relevance scores (list or numpy) in rank order: (first element is the first item)

Returns:

Mean reciprocal rank

grace_t.framework.tools.rank_metrics.ndcg_at_k(r, k, method=0)[源代码]¶

Score is normalized discounted cumulative gain (ndcg)

Relevance is positive real values. Can use binary as the previous methods.

Example from http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf

>>> r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
>>> ndcg_at_k(r, 1)
1.0
>>> r = [2, 1, 2, 0]
>>> ndcg_at_k(r, 4)
0.9203032077642922
>>> ndcg_at_k(r, 4, method=1)
0.96519546960144276
>>> ndcg_at_k([0], 1)
0.0
>>> ndcg_at_k([1], 2)
1.0

Args:

r: Relevance scores (list or numpy) in rank order: (first element is the first item)

k: Number of results to consider method: If 0 then weights are [1.0, 1.0, 0.6309, 0.5, 0.4307, …]

If 1 then weights are [1.0, 0.6309, 0.5, 0.4307, …]

Returns:

Normalized discounted cumulative gain

grace_t.framework.tools.rank_metrics.precision_at_k(r, k)[源代码]¶

Score is precision @ k

Relevance is binary (nonzero is relevant).

>>> r = [0, 0, 1]
>>> precision_at_k(r, 1)
0.0
>>> precision_at_k(r, 2)
0.0
>>> precision_at_k(r, 3)
0.33333333333333331
>>> precision_at_k(r, 4)
Traceback (most recent call last):
    File "<stdin>", line 1, in ?
ValueError: Relevance score length < k

Args:

r: Relevance scores (list or numpy) in rank order: (first element is the first item)

Returns:

Precision @ k

Raises:

ValueError: len(r) must be >= k

grace_t.framework.tools.rank_metrics.r_precision(r)[源代码]¶

Score is precision after all relevant documents have been retrieved

Relevance is binary (nonzero is relevant).

>>> r = [0, 0, 1]
>>> r_precision(r)
0.33333333333333331
>>> r = [0, 1, 0]
>>> r_precision(r)
0.5
>>> r = [1, 0, 0]
>>> r_precision(r)
1.0

Args:

r: Relevance scores (list or numpy) in rank order: (first element is the first item)

Returns:

R Precision

grace_t.framework.tools package¶

Submodules¶

grace_t.framework.tools.calc_metrics module¶

grace_t.framework.tools.log module¶

grace_t.framework.tools.mmr module¶

grace_t.framework.tools.rank_metrics module¶

Module contents¶