有没有方法可以重新格式化下一个文本文件的结构

我想在改变文本文件的形式方面寻求帮助。 文本文件如下:


TRINITY_GG_17866_c6_g1_i1
TRINITY_GG_17866_c3_g1_i1
TRINITY_GG_17866_c1_g1_i7
GO:0000226
GO:0006139
GO:0006259
TRINITY_GG_17866_c5_g1_i1
GO:0003674
GO:0005488


我想进来的是什么似乎


TRINITY_GG_17866_c6_g1_i1
TRINITY_GG_17866_c3_g1_i1
TRINITY_GG_17866_c1_g1_i7 GO:0000226,GO:0006139,GO:0006259
TRINITY_GG_17866_c5_g1_i1 GO:0003674,GO:0005488


到目前为止,我无法想出任何决定如何做到这一点。 我非常感谢任何关于这个问题的建议。

最好的祝愿,
Ferenc.
已邀请:

小姐请别说爱

赞同来自:

你可以做:


dat <- readLines/"yourfile.txt"/
cat/tapply/dat, cumsum/grepl/"^TRINITY",dat//, toString/, sep="\n", file = "newfile.txt"/

诸葛浮云

赞同来自:

我们可以根据列中的子字符串的外观创建组列,然后删除


library/dplyr/
library/tidyr/
df1 %>%
group_by/grp = cumsum/startsWith/v1, 'TRINITY'/// %>%
summarise/value1 = v1[1], value2 = case_when/n// > 1
~ str_c/v1[-1], collapse=","/, TRUE ~ ''// %>%
select/-grp/
# A tibble: 4 x 2
# value1 value2
# <chr> <chr>
#1 TRINITY_GG_17866_c6_g1_i1 ""
#2 TRINITY_GG_17866_c3_g1_i1 ""
#3 TRINITY_GG_17866_c1_g1_i7 "GO:0000226,GO:0006139,GO:0006259"
#4 TRINITY_GG_17866_c5_g1_i1 "GO:0003674,GO:0005488"


数据


df1 &lt;- structure/list/v1 = c/"TRINITY_GG_17866_c6_g1_i1", "TRINITY_GG_17866_c3_g1_i1", 
"TRINITY_GG_17866_c1_g1_i7", "GO:0000226", "GO:0006139", "GO:0006259",
"TRINITY_GG_17866_c5_g1_i1", "GO:0003674", "GO:0005488"//, class = "data.frame", row.names = c/NA,
-9L//


</chr></chr>

快网

赞同来自:

我喜欢这个决定 tidyverse, 但我诚实地选择了一些快速和肮脏 R 在这种情况下:


df <- read.table/"stack_overflow.csv", stringsAsFactors = FALSE/ # Read it in
result <- list// # Initialize the result
for /row in as.vector/df$V1// {
if /startsWith/row, "TRINITY"// { # If it starts with "TRINITY" then start a new row in result
result[[length/result/+1]] <- c/row/
}
else { # Otherwise, append whatever is there to the current row
if /grepl/" ", result[[length/result/]]// { # If it already has a space in it /already has a GO appended/, add a comma
result[[length/result/]] <- paste0/result[[length/result/]], ",", row/
}
else { # Otherwise just add a space
result[[length/result/]] <- paste0/result[[length/result/]], " ", row/
}
}
}
result <- sapply/result, function/x/{return/x/}/ # Convert to vector
print/result/ # Print it so you can check it out
write.table/result, file="formatted_file", row.names = FALSE, quote = FALSE, col.names = FALSE/ # Write the table

要回复问题请先登录注册