你好,游客 登录
背景:
阅读新闻

大数据处理技术——python

[日期:2014-04-21] 来源:爱程序网  作者: [字体: ]

python 处理大数据,有需要的朋友可以参考下。

大数据处理技术

最近大数据竞赛很火,本人python没学多久,想试着写一下,只是实现了数据的处理,主要用到了dict,list,file知识。

还有一点要说,我也用matlab实现了,但是运行完要差不多两分钟,但是python秒处理,有木有啊,足见python处理文本功能之强大。

文件里的数据格式:

clientid shopingid num date

1111000 3873 2 4月5日

clientinfo = []
shopinginfo = {}
month={}
day={}
shopidflag = 0
clientstartflag = 0
total={}
tmpclientid=''
output= open('f:/a.txt','a')
with open('f:/s.txt','r') as data_file:
    for lineinfo in data_file:
        lineinfo = lineinfo.split()
        clientid = lineinfo[0]
        shopingid = lineinfo[1]
        num=[]
        num.append(lineinfo[2])
        data = lineinfo[3]
        data = data[:-1]
        data = data.split('月')
        monthvar=[]
        monthvar.append(data[0])
        dayvar=[]
        dayvar.append(data[1])
        
        if clientid in clientinfo and shopingid in shopinginfo and int(data[0])>=6:
            shopinginfo[shopingid].append(lineinfo[2])
            month[shopingid].append(data[0])
            day[shopingid].append(data[1])
        elif clientid in clientinfo and shopingid not in shopinginfo and int(data[0])>=6:
            shopinginfo[shopingid]=num
            month[shopingid]= monthvar
            day[shopingid] = dayvar
        elif clientid not in clientinfo :
            #if clientstartflag  == 1: 
            clientflag = 0
            shopinglink=''
            for (k, v) in shopinginfo.items():
                total={}
                vote=0
                for  i  in v:
                    if  i  in total:
                        total[i]+=1
                    else:
                        total[i]=1      
                for var in total:
                    if var == '0':
                        vote += total[var]
                    elif var == '1':
                        vote = 0
                        break 
                    elif var == '2':
                        vote += total[var]*2
                    else:
                        vote += total[var]*3
                    if vote >= 3:
                        if clientflag == 0:
                            output.write(tmpclientid+'\t')
                            clientflag =1
                        shopinglink+=k+','
            if clientflag == 1:
                output.write(shopinglink.strip(',')+'\r\n')
            shopinginfo={}
            month ={}
            day ={}
            clientinfo=[]
            tmpclientid=clientid
            clientinfo.append(clientid)
            shopinginfo[shopingid]=num
            month[shopingid] = monthvar
            day[shopingid] = dayvar
    shopinglink=''
    for (k, v) in shopinginfo.items():
        for  i  in v:
            if  i  in total:
                total[i]+=1
            else:
                total[i]=1
        total={}
        vote=0
        for  i  in v:
            if  i  in total:
                total[i]+=1
            else:
                total[i]=1      
        for var in total:
            if var == '0':
                vote += total[var]
            elif var == '1':
                vote = 0
                break 
            elif var == '2':
                vote += total[var]*2
            else:
                vote += total[var]*3
        if vote >= 3:
            if clientflag == 0:
                clientflag =1
            shopinglink+=k+','
    if clientflag == 1:
        output.write(tmpclientid+'\t')
        output.write(shopinglink.strip(','))
    data_file.close()
    output.close()

猜您喜欢:

  1.大数据处理与分析方法解读

  2.【干货】用R进行大数据处理集

  3.优酷引入Spark深化大数据处理

 




收藏 推荐 打印 | 录入: | 阅读:
本文评论   查看全部评论 (0)
表情: 表情 姓名: 字数
点评:
       
评论声明
  • 尊重网上道德,遵守中华人民共和国的各项有关法律法规
  • 承担一切因您的行为而直接或间接导致的民事或刑事法律责任
  • 本站管理人员有权保留或删除其管辖留言中的任意内容
  • 本站有权在网站内转载或引用您的评论
  • 参与本评论即表明您已经阅读并接受上述条款