用Shell处理以下内容:
  1. The months of learning in Old Boy education are the few months that I think the time efficient is the most.I had also studied at other training institutions before, but I was hard to understand what the tutor said and hard to follow. It was just too much to learn with no outline.
复制代码

要求:

1. 按单词出现频率降序排序。

2. 按字母出现频率降序排序。

老男孩解答:

1. 按单词出现频率降序排序实践

方法1:传统老法

第一步:去特殊字符
  1. [root@oldboy ~]# sed 's#[,\.]##g' <oldboy.log
  2. The months of learning in Old Boy education are the few months that I think the time efficient is the mostI had also studied at other training institutions before but I was hard to understand what the tutor said and hard to follow It was just too much to learn with no outline
复制代码

第二步:空格替换回车,将单词竖向排列,去重计数,然后出最终结果
  1. [root@oldboy ~]# sed 's#[,\.]##g' <oldboy.log|tr " " "\n"|sort|uniq -c|sort -rn|head -5
  2.       4 the
  3.       3 to
  4.       2 was
  5.       2 months
  6.       2 I
复制代码

方法2:awk数组,思想同方法1
  1. [root@oldboy ~]# tr " ," "\n" <oldboy.log|awk '{S[$1]++}END{for(k in S) print S[k],k}'|sort -rn|head -5
  2. 4 the
  3. 3 to
  4. 2 was
  5. 2 months
  6. 2 I
复制代码

方法3:方法直接用awk数组横向处理,而不是将单词竖向排列再处理
  1. [root@oldboy ~]# awk -F "[ ,.]+" '{for(i=1;i<NF;i++)S[$i]++}END{for(k in S) print S[k],k}' oldboy.log |sort -rn|head -5
  2. 4 the
  3. 3 to
  4. 3 I
  5. 2 was
  6. 2 months
复制代码

2. 按字母出现频率降序排序

方法1:去空格特殊字符后,然后利用grep的-o将字符竖向排列后处理
  1. [root@oldboy ~]# sed 's#[,. ]##g' oldboy.log|grep -o "."|sort|uniq -c|sort -rn|head -5
  2.      33 t
  3.      20 o
  4.      19 e
  5.      18 n
  6.      17 i
复制代码

方法2:awk数组法
  1. [root@oldboy ~]# sed 's#[,. ]##g' oldboy.log|grep -o "."|awk '{S[$1]++}END{for(k in S) print S[k],k}'|sort -rn|head -5
  2. 33 t
  3. 20 o
  4. 19 e
  5. 18 n
  6. 17 i
复制代码

方法3:依然是直接用awk数组横向处理,而不是将字符竖向排列再处理
  1. [root@oldboy ~]# sed 's#[,. ]##g' oldboy.log|awk -F "" '{for(i=1;i<NF;i++)S[$i]++}END{for(k in S) print S[k],k}'|sort -rn|head -5
  2. 33 t
  3. 20 o
  4. 18 n
  5. 18 e
  6. 17 i
复制代码


回复

使用道具 举报

    您需要登录后才可以回帖 登录 | 立即注册

    本版积分规则

    Powered by Discuz! X3.2  © 2001-2013 Comsenz Inc.