grace_t.framework.tools package¶
Submodules¶
grace_t.framework.tools.calc_metrics module¶
-
grace_t.framework.tools.calc_metrics.cal_pearson(x, y)[源代码]¶ - Args:
- x: a list
- y: a list
- Returns:
- Pearson relation between x and y
-
grace_t.framework.tools.calc_metrics.generate_demo_data(class_num=2, starts_from=0)[源代码]¶ given class_num, generate demo data in np.array format
-
grace_t.framework.tools.calc_metrics.get_accuracy(labels_true, labels_pred, normalize_type=True)[源代码]¶ In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. Args:
- labels_true
- labels_pred
- normalize_type:
- return the number of correctly classified samples. Otherwise, return the fraction of correctly classified samples.
- Returns:
- accuracy
-
grace_t.framework.tools.calc_metrics.get_auc(labels_true, labels_pred_prob, pos_label, class_num, starts_from=0)[源代码]¶ - Args:
- labels_true
- labels_pred_prob
- pos_label: Label considered as positive and others are considered negative.
- class_num: if class_num == 2 and label starts from 0: roc_auc_score equals to roc_curve then auc’ s res
- Returns:
- auc
-
grace_t.framework.tools.calc_metrics.get_confusion_matrix(labels_true, labels_pred, class_num, starts_from)[源代码]¶ - Args:
- labels_true
- labels_pred
- class_num
- starts_from
- Returns:
- (confusion_matrix_res, (tn, fp, fn, tp))
-
grace_t.framework.tools.calc_metrics.get_f1(labels_true, labels_pred, average_type=None)[源代码]¶ - Args:
- labels_true
- labels_pred
- average_type:
- micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.
- macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
- weighted: Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
- samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
- None: the scores for each class are returned.
- Returns:
- f1
-
grace_t.framework.tools.calc_metrics.get_pr_curve(labels_true, labels_pred_prob, pos_label, class_num, starts_from=0)[源代码]¶ - Args:
- labels_true
- labels_pred_prob
- pos_label: Label considered as positive and others are considered negative.
- class_num: if class_num == 2 and label starts from 0: roc_auc_score equals to roc_curve then auc’ s res
- Returns:
- precision : array, shape = [n_thresholds + 1]
Precision values such that element i is the precision of predictions with score >= thresholds[i] and the last element is 1. + recall : array, shape = [n_thresholds + 1] Decreasing recall values such that element i is the recall of predictions with score >= thresholds[i] and the last element is 0. + thresholds : array, shape = [n_thresholds <= len(np.unique(probas_pred))] Increasing thresholds on the decision function used to compute precision and recall.
-
grace_t.framework.tools.calc_metrics.get_precision(labels_true, labels_pred, average_type)[源代码]¶ - tp / (tp + fp)
- Args:
- labels_true
- labels_pred
- average_type:
- micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.
- macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
- weighted: Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and precision.
- samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
- None: the scores for each class are returned.
- Returns:
- precision
-
grace_t.framework.tools.calc_metrics.get_recall(labels_true, labels_pred, average_type=None)[源代码]¶ - tp / (tp + fn)
- Args:
- labels_true
- labels_pred
- average_type:
- micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.
- macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
- weighted: Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
- samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
- None: the scores for each class are returned.
- Returns:
- recall
grace_t.framework.tools.log module¶
-
class
grace_t.framework.tools.log.SingleLevelFilter(passlevel, reject)[源代码]¶ -
filter specific log level
-
grace_t.framework.tools.log.init_log(file, level=20, when='D', backup=7, format='%(levelname)s: %(asctime)s: %(filename)s:%(lineno)d * %(thread)d %(message)s', datefmt='%m-%d %H:%M:%S')[源代码]¶ init_log - initialize log module
- Args:
- log_path: Log file path prefix.
Log data will go to two files: log_path.log and log_path.log.wf Any non-exist parent directories will be created automatically
- level: msg above the level will be displayed
DEBUG < INFO < WARNING < ERROR < CRITICAL the default value is logging.INFO
- when: how to split the log file by time interval
- ‘S’ : Seconds
- ‘M’ : Minutes
- ‘H’ : Hours
- ‘D’ : Days
- ‘W’ : Week day
default value: ‘D’
- format: format of the log
default format: %(levelname)s: %(asctime)s: %(filename)s:%(lineno)d * %(thread)d %(message)s INFO: 12-09 18:02:42: log.py:40 * 139814749787872 HELLO WORLD
- backup: how many backup file to keep
default value: 7
- Raises:
- OSError: fail to create log directories
- IOError: fail to open log file
grace_t.framework.tools.mmr module¶
grace_t.framework.tools.rank_metrics module¶
-
grace_t.framework.tools.rank_metrics.average_precision(r)[源代码]¶ Score is average precision (area under PR curve)
Relevance is binary (nonzero is relevant).
>>> r = [1, 1, 0, 1, 0, 1, 0, 0, 0, 1] >>> delta_r = 1. / sum(r) >>> sum([sum(r[:x + 1]) / (x + 1.) * delta_r for x, y in enumerate(r) if y]) 0.7833333333333333 >>> average_precision(r) 0.78333333333333333
- Args:
- r: Relevance scores (list or numpy) in rank order
- (first element is the first item)
- Returns:
- Average precision
-
grace_t.framework.tools.rank_metrics.dcg_at_k(r, k, method=0)[源代码]¶ Score is discounted cumulative gain (dcg)
Relevance is positive real values. Can use binary as the previous methods.
Example from http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
>>> r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0] >>> dcg_at_k(r, 1) 3.0 >>> dcg_at_k(r, 1, method=1) 3.0 >>> dcg_at_k(r, 2) 5.0 >>> dcg_at_k(r, 2, method=1) 4.2618595071429155 >>> dcg_at_k(r, 10) 9.6051177391888114 >>> dcg_at_k(r, 11) 9.6051177391888114
- Args:
- r: Relevance scores (list or numpy) in rank order
- (first element is the first item)
k: Number of results to consider method: If 0 then weights are [1.0, 1.0, 0.6309, 0.5, 0.4307, …]
If 1 then weights are [1.0, 0.6309, 0.5, 0.4307, …]- Returns:
- Discounted cumulative gain
-
grace_t.framework.tools.rank_metrics.mean_average_precision(rs)[源代码]¶ Score is mean average precision
Relevance is binary (nonzero is relevant).
>>> rs = [[1, 1, 0, 1, 0, 1, 0, 0, 0, 1]] >>> mean_average_precision(rs) 0.78333333333333333 >>> rs = [[1, 1, 0, 1, 0, 1, 0, 0, 0, 1], [0]] >>> mean_average_precision(rs) 0.39166666666666666
- Args:
- rs: Iterator of relevance scores (list or numpy) in rank order
- (first element is the first item)
- Returns:
- Mean average precision
-
grace_t.framework.tools.rank_metrics.mean_reciprocal_rank(rs)[源代码]¶ Score is reciprocal of the rank of the first relevant item
First element is ‘rank 1’. Relevance is binary (nonzero is relevant).
Example from http://en.wikipedia.org/wiki/Mean_reciprocal_rank
>>> rs = [[0, 0, 1], [0, 1, 0], [1, 0, 0]] >>> mean_reciprocal_rank(rs) 0.61111111111111105 >>> rs = np.array([[0, 0, 0], [0, 1, 0], [1, 0, 0]]) >>> mean_reciprocal_rank(rs) 0.5 >>> rs = [[0, 0, 0, 1], [1, 0, 0], [1, 0, 0]] >>> mean_reciprocal_rank(rs) 0.75
- Args:
- rs: Iterator of relevance scores (list or numpy) in rank order
- (first element is the first item)
- Returns:
- Mean reciprocal rank
-
grace_t.framework.tools.rank_metrics.ndcg_at_k(r, k, method=0)[源代码]¶ Score is normalized discounted cumulative gain (ndcg)
Relevance is positive real values. Can use binary as the previous methods.
Example from http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
>>> r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0] >>> ndcg_at_k(r, 1) 1.0 >>> r = [2, 1, 2, 0] >>> ndcg_at_k(r, 4) 0.9203032077642922 >>> ndcg_at_k(r, 4, method=1) 0.96519546960144276 >>> ndcg_at_k([0], 1) 0.0 >>> ndcg_at_k([1], 2) 1.0
- Args:
- r: Relevance scores (list or numpy) in rank order
- (first element is the first item)
k: Number of results to consider method: If 0 then weights are [1.0, 1.0, 0.6309, 0.5, 0.4307, …]
If 1 then weights are [1.0, 0.6309, 0.5, 0.4307, …]- Returns:
- Normalized discounted cumulative gain
-
grace_t.framework.tools.rank_metrics.precision_at_k(r, k)[源代码]¶ Score is precision @ k
Relevance is binary (nonzero is relevant).
>>> r = [0, 0, 1] >>> precision_at_k(r, 1) 0.0 >>> precision_at_k(r, 2) 0.0 >>> precision_at_k(r, 3) 0.33333333333333331 >>> precision_at_k(r, 4) Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: Relevance score length < k- Args:
- r: Relevance scores (list or numpy) in rank order
- (first element is the first item)
- Returns:
- Precision @ k
- Raises:
- ValueError: len(r) must be >= k
-
grace_t.framework.tools.rank_metrics.r_precision(r)[源代码]¶ Score is precision after all relevant documents have been retrieved
Relevance is binary (nonzero is relevant).
>>> r = [0, 0, 1] >>> r_precision(r) 0.33333333333333331 >>> r = [0, 1, 0] >>> r_precision(r) 0.5 >>> r = [1, 0, 0] >>> r_precision(r) 1.0
- Args:
- r: Relevance scores (list or numpy) in rank order
- (first element is the first item)
- Returns:
- R Precision