Linux下删除大数据文件中部分字段重复的行

最近写的一个数据采集程序生成了一个含有1千多万行数据的文件，数据由4个字段组成，按照要求需要删除第二个字段重复的行，找来找去linux下也没找到合适的工具，sed/gawk等流处理工具只能针对一行一行处理，并无法找到字段重复的行。看来只好自己python一个程序了，突然想起来利用mysql，于是进行乾坤大挪移：1. 利用mysqlimport --local dbname data.txt导入数据到表中，表名要与文件名一致
2. 执行下列sql语句（要求唯一的字段为uniqfield）use dbname;alter table tablename add rowid int auto_increment not null;create table t select min（rowid） as rowid from tablename group by uniqfield;create table t2 select tablename .* from tablename,t where tablename.rowid= t.rowid;drop table tablename;rename table t2 to tablename;拼接查询sql中指定列的结果集Hibernate性能优化相关资讯 Linux基础教程

Linux基础教程：对文件打包压缩（03月08日）
基础教程：Linux 新手应该知道的（09/06/2015 21:17:20）
Linux基础教程：find 与 xargs （04/05/2015 10:20:11）

Linux基础教程：tar 命令使用介绍（12/03/2015 13:19:47）
Linux基础教程（1）操作系统基础（08/15/2015 20:44:01）
Linux基础教程：从源码安装软件（04/05/2015 10:14:45）

本文评论查看全部评论（0）

表情：姓名：匿名字数

<button class="layui-btn layui-bg-purple" id="dofav">收藏该网址</button>
        </div>
    </div>
</div>
<div class="copyright">
    <div id="footbar">
        版权所有©石家庄振强科技有限公司2024 <a href="https://beian.miit.gov.cn" target="_blank">冀ICP备08103738号-5</a> <a href="/storage/sitemap.xml">网站地图</a>
    </div>
</div>
<script> var _mtj = _mtj || [];
(function () {
    var mtj = document.createElement("script");
    mtj.src = "https://node12.aizhantj.com:21233/tjjs/?k=p2tceukth5c";
    var s = document.getElementsByTagName("script")[0];
    s.parentNode.insertBefore(mtj, s);
})(); </script>
<script src="/static/lib/layui/layui.js"></script>
<script src="/static/lib/jquery/jquery.js"></script>
<script src="/static/lib/ajax.js"></script>
<script>
    layui.use(function () {
        var layer = layui.layer;
        var $ = layui.jquery;
        $("#dofav").click(function () {
            var artid = $(".article").attr("artid");
            var params={
                artid:artid,
                addtype:'escdns_article',
                t:Math.random()
            };
            if(artid>0){
                ajax.request({
                    method: "/index/article/addfavorite", type: "post", callback: function (res) {
                        layer.msg(res.msg);
                    }
                },params);
            }
        })
    })
</script>
</body>
</html>