使用R处理工作生活中遇到的分组问题
数据要求和格式
本函数适应的数据格式为“数据框”;数据需要包含行名和列名。具体格式如下:
函数
group_data <- function(data, by, group) {
library(tidyverse)#load needed package
data[,1:length(data)] %>%
scale() %>%
apply(1,mean) %>% #calculate mean for column
data.frame(data) ->df2 #calculate scores
group_by = quantile(df2[[1]],probs = by)#calculate group
colnames(df2)[1] = "scores"# add a colnames for scores
df2$group[df2[1] >= group_by[1]] = group[1]#create group
df2$group[df2[1] < group_by[length(by)]] = group[length(by)+1]#create group
i = 2
while(i <= length(by)) {#create group for data in range 2 to length(by)
df2$group[df2[1] < group_by[i-1] & df2[1] >= group_by[i]] = group[i]
i = i +1
}
print(df2)
}
注意:本函数需要用到“tidyverse”包,如果没有安装的话建议安装后再运行以上函数。
Case1: 业绩考核
目的:根据所有业务的综合得分考核公司员工
要求:将公司员工分为三类得分在前90%的为优秀员工;
要求:得分在90%-70%的为良好员工;
要求:得分在70%以下的为不良员工。
数据如下:
df <- data.frame(
业务1 = c(12,15,18,20,10,14),
业务2 = c(20,25,21,28,29,21),
业务3 = c(40,60,70,90,100,20),
业务4 = c(100,200,300,90,400,230)
)
rownames(df) <- c("小吴","小刘","王吴",
"小天","小明","赵四")
group = c("优秀","良好","不良")
by = c(0.9,0.7)#top0.9,0.7
group_data(data = df,by = by, group = group)
计算后得到的结果如下:
Case2: 学生排名
#目的:根据所有科目的综合得分对班级学生排名
#要求:将班级学生分为四类得分在前90%的为A;
#得分在90%-80%的为B;
#得分在80%-70%的为C;
#得分在70%以下的为D。
数据如下:
df2 <- data.frame(
物理 = sample(0:50,9,replace = FALSE),
生物 = sample(0:50,9,replace = FALSE),
政治 = sample(0:100,9,replace = FALSE),
英语 = sample(0:100,9,replace = FALSE),
语文 = sample(0:150,9,replace = FALSE),
数学 = sample(0:150,9,replace = FALSE)
)
rownames(df2) <- c("小吴","小刘","王吴",
"小天","小明","赵四",
"王五","刘三","李一")
by2 = c(0.9,0.8,0.7)#top0.9,0.8,0.7
group2 = c("A","B","C","D")
group_data(data = df2,by = by2, group = group2)
结果如下:
赞 (0)